字符串的大小比较并不是如C++那样按照字符串字符内码大小顺序从头到尾来比较的。由于我是从C/C++转过来的,我一直以来都以为.net 下字符串的比较规则和C++是一样的,直到有一天我的程序在英文操作系统下出错。

.net 下,字符串的排序受 System.Threading.Thread.CurrentThread.CurrentCulture 这个当前区域性信息影响,不同的区域性信息,字符串的排序结果会完全不同。

比如简体中文操作系统的默认当前区域性信息为 zh-CN 而英文操作系统(美国销售的)默认为 en-US ,我们就来看看这两者对中文字符串的排序有什么不同

先看 zh-CN

 

<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">            <span style="color:#0000ff">string</span>[] stringList = { <span style="color:#006080">"不"</span>, <span style="color:#006080">"啊"</span>, <span style="color:#006080">"从"</span>, <span style="color:#006080">"的"</span>,<span style="color:#006080">"一"</span> };</span></span></span></span>
 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">            System.Threading.Thread.CurrentThread.CurrentCulture = <span style="color:#0000ff">new</span> System.Globalization.CultureInfo(<span style="color:#006080">"zh-CN"</span>);</span></span></span></span>
 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">            Array.Sort(stringList);</span></span></span></span>
 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">            <span style="color:#0000ff">foreach</span> (<span style="color:#0000ff">string</span> str <span style="color:#0000ff">in</span> stringList)</span></span></span></span>
            {
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">                Console.WriteLine(str);</span></span></span></span>
            }

 

 

输出结果为:




我们再看 en-US

 

<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">            <span style="color:#0000ff">string</span>[] stringList = { <span style="color:#006080">"不"</span>, <span style="color:#006080">"啊"</span>, <span style="color:#006080">"从"</span>, <span style="color:#006080">"的"</span>,<span style="color:#006080">"一"</span> };</span></span></span></span>
 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">            System.Threading.Thread.CurrentThread.CurrentCulture = <span style="color:#0000ff">new</span> System.Globalization.CultureInfo(<span style="color:#006080">"en-US"</span>);</span></span></span></span>
 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">            Array.Sort(stringList);</span></span></span></span>
 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">            <span style="color:#0000ff">foreach</span> (<span style="color:#0000ff">string</span> str <span style="color:#0000ff">in</span> stringList)</span></span></span></span>
            {
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">                Console.WriteLine(str);</span></span></span></span>
            }

输出结果为:




我们可以看出,不同的区域性信息,上述字符串的排序结果完全不同,简体中文下,排序按照汉字的拼音顺序来排序,而en-US 下则是按汉字的unicode 内码顺序排序。

其实就是简体中文下,排序顺序也有两种,一种是拼音顺序,一种是笔画顺序,下面我们看看按笔画顺序排序的结果

 

<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">            <span style="color:#0000ff">string</span>[] stringList = { <span style="color:#006080">"不"</span>, <span style="color:#006080">"啊"</span>, <span style="color:#006080">"从"</span>, <span style="color:#006080">"的"</span>,<span style="color:#006080">"一"</span> };</span></span></span></span>
 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">            System.Threading.Thread.CurrentThread.CurrentCulture = <span style="color:#0000ff">new</span> System.Globalization.CultureInfo(<span style="color:#006080">"zh-CN_stroke"</span>);</span></span></span></span>
 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">            Array.Sort(stringList);</span></span></span></span>
 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">            <span style="color:#0000ff">foreach</span> (<span style="color:#0000ff">string</span> str <span style="color:#0000ff">in</span> stringList)</span></span></span></span>
            {
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">                Console.WriteLine(str);</span></span></span></span>
            }

输出结果为:





排序顺序对程序移植性的影响

显而易见,如果不注意这个问题,当程序从中文操作系统移植到英文操作系统上运行时,中文字符串的排序结果会完全不同,如果这个排序结果仅仅用于显示,则显示结果会不同,如果排序结果被作为一种类似主键的方式存储在文件,那么在中文操作系统下排序的文档,到了英文操作系统下就变成了不排序的文档,整个程序逻辑都会发生错误。

为了防止这种情况发生,我们必须在排序时指定一个固定的区域性信息,而不是使用操作系统默认的区域性信息。

 

二、对字符串查找的影响

<span style="color:#000000"><span style="background-color:#ffffff">           List<<span style="color:#0000ff">string</span>> list = <span style="color:#0000ff">new</span> List<<span style="color:#0000ff">string</span>>(stringList);

            Console.WriteLine(list.BinarySearch(<span style="color:#006080">"啊"</span>));
</span></span>

上面代码,如果stringList 是从文件中读出,而这个文件是在中文操作系统下生成,如果当前是英文操作系统,则这里二分法查找字符串的结果就不确定,因为输入的字符串在英文操作系统下被认为不是排序的。

 

三、对Indexof的影响

这一节直接转载 MSDN 上的原文 http://msdn.microsoft.com/zh-cn/library/a7zyyk0c%28v=VS.80%29.aspx

 

您可以使用重载的 CompareInfo.IndexOf 方法返回指定字符串中某个字符或子字符串的从零开始的索引。如果在指定字符串中未找到该字符或子字符串,此方法将返回一个负整数。在使用 CompareInfo.IndexOf 搜索指定字符时,注意接受 CompareOptions 参数的方法重载执行比较的方式与不接受 CompareOptions 参数的方法重载不同。搜索 char(在 Visual Basic 中为 Char)并且不使用 CompareOptions 类型的参数的 CompareInfo.IndexOf 重载执行区分区域性的搜索。这就是说,如果 char 是一个表示预先撰写的字符的 Unicode 值,如连字“Æ”(\u00C6),则根据区域性的不同,它可能被视为等效于它的以正确顺序排列的任何组成部分,如“AE”(\u0041 \u0045)。若要执行序号(不区分区域性)搜索(即两个 char 只有 Unicode 值相同时才被视为相等),请使用带 CompareOptions 参数的 CompareInfo.IndexOf 重载之一。将 CompareOptions 参数设置为 CompareOptions.Ordinal 值。

您也可以使用搜索 char 的 String.IndexOf 方法重载来执行序号搜索。请注意,搜索字符串的 String.IndexOf 方法重载执行区分区域性的搜索。

下面的代码示例阐释了根据区域性的不同,CompareInfo.IndexOf(string, char) 方法返回的结果的差异。针对“da-DK”(丹麦的丹麦语)创建 CultureInfo。接下来,使用 CompareInfo.IndexOf 方法的重载在字符串“Æble”和“aeble”中搜索字符“Æ”。请注意,对于“da-DK”区域性,带 CompareOptions.Ordinal 参数的 CompareInfo.IndexOf 方法与不带 CompareOptions.Ordinal 参数的 CompareInfo.Index 方法将返回相同的结果。字符“Æ”仅被视为等效于 Unicode 代码值 \u00E6。

<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4"><span style="color:#0000ff">using</span> System;</span></span></span></span>
using System.Globalization;
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4"><span style="color:#0000ff">using</span> System.Threading;</span></span></span></span>
 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4"><span style="color:#0000ff">public</span> <span style="color:#0000ff">class</span> CompareClass</span></span></span></span>
{
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4"> </span></span></span></span>
   public static void Main()
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">   {</span></span></span></span>
      string str1 = "Æble";
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">      <span style="color:#0000ff">string</span> str2 = <span style="color:#006080">"aeble"</span>; </span></span></span></span>
      char find = 'Æ';
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4"> </span></span></span></span>
      // Creates a CultureInfo for Danish in Denmark.
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">      CultureInfo ci= <span style="color:#0000ff">new</span> CultureInfo(<span style="color:#006080">"da-DK"</span>);</span></span></span></span>
 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">      <span style="color:#0000ff">int</span> result1 = ci.CompareInfo.IndexOf(str1, find);</span></span></span></span>
      int result2 = ci.CompareInfo.IndexOf(str2, find);
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">      <span style="color:#0000ff">int</span> result3 = ci.CompareInfo.IndexOf(str1, find,   </span></span></span></span>
         CompareOptions.Ordinal);
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">      <span style="color:#0000ff">int</span> result4 = ci.CompareInfo.IndexOf(str2, find, </span></span></span></span>
         CompareOptions.Ordinal);
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4"> </span></span></span></span>
      Console.WriteLine("\nCultureInfo is set to {0} ", ci.DisplayName);
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">      Console.WriteLine(<span style="color:#006080">"\nUsing CompareInfo.IndexOf(string, char) </span></span></span></span></span>
         method\nthe result of searching for {0} in the string {1} is: 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">         {2}", find, str1, result1);</span></span></span></span>
      Console.WriteLine("\nUsing CompareInfo.IndexOf(string, char) 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">         method\nthe result of searching for {0} in the string {1} is: </span></span></span></span>
         {2}", find, str2, result2);
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">      Console.WriteLine(<span style="color:#006080">"\nUsing CompareInfo.IndexOf(string, char, </span></span></span></span></span>
         CompareOptions) method\nthe result of searching for {0} in the 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">         string {1} is: {2}", find, str1, result3);</span></span></span></span>
      Console.WriteLine("\nUsing CompareInfo.IndexOf(string, char, 
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">         CompareOptions) method\nthe result of searching for {0} in the </span></span></span></span>
         string {1} is: {2}", find, str2, result4);
<span style="color:#000000"><span style="background-color:#ffffff"><span style="color:#000000"><span style="background-color:#f4f4f4">   }</span></span></span></span>
}
Logo

openEuler 是由开放原子开源基金会孵化的全场景开源操作系统项目,面向数字基础设施四大核心场景(服务器、云计算、边缘计算、嵌入式),全面支持 ARM、x86、RISC-V、loongArch、PowerPC、SW-64 等多样性计算架构

更多推荐