C# subreddit question time again! This time, “Which string comparison method is faster?“
I took a bit of a deep dive to see what each code path does. I decided to compare string.Equals(a, b)
, string.Equals(b)
, ==
, and !=
. Which one is faster? Which one runs less code?
Aside: I have a new YouTube channel called Elias Explains. I’ll be posting my videos up there from now on.
If you’re looking for the video walkthrough, here it is.
Here’s my sample program.
class Program
{
private static string string1 = "Hello world";
private static string string2 = "HELLO WORLD";
static void Main(string[] args)
{
if(string1 != string2)
{
Console.WriteLine("Not Equal.");
}
if (!string1.Equals(string2))
{
Console.WriteLine("Not Equal.");
}
}
}
Which of these two produced less code? First, here’s the !=
operator. This generated 9 lines of IL.
// IL_0001: ldsfld string ConsoleApp5.Program::string1
// IL_0006: ldsfld string ConsoleApp5.Program::string2
// IL_000b: call bool [System.Runtime]System.String::op_Inequality(string, string)
// IL_0010: stloc.0 // V_0
// IL_0011: ldloc.0 // V_0
// IL_0012: brfalse.s IL_0021
// IL_0014: nop
// IL_0015: ldstr "Not Equal."
// IL_001a: call void [System.Console]System.Console::WriteLine(string)
Next, here’s the string.Equals(string)
version. This produced 11 lines of IL. Notice the addition.
// IL_0021: ldsfld string ConsoleApp5.Program::string1
// IL_0026: ldsfld string ConsoleApp5.Program::string2
// IL_002b: callvirt instance bool [System.Runtime]System.String::Equals(string)
//
// ceq will compare the last two values pushed to the stack.
// This is a call to see if the result of string.Equals is 0 (false).
//
// IL_0030: ldc.i4.0
// IL_0031: ceq
// IL_0033: stloc.1 // V_1
// IL_0034: ldloc.1 // V_1
// IL_0035: brfalse.s IL_0044
// IL_0037: nop
// IL_0038: ldstr "Not Equal."
// IL_003d: call void [System.Console]System.Console::WriteLine(string)
There’s two more IL instructions generated for the string.Equals call – we have to compare whether the result of string.Equals was equal to 0 (false). Could this be slower? It depends on your use case, how often you’re calling this code, and what the Just-in-Time or Ahead-of-Time compiler does to your code. I would bet money that 99.9% of us would never notice the performance difference. But what about the System.String
class? Is there a difference between using ==
, !=
, and string.Equals
?From dotPeek, string operator ==:
public static bool operator ==(string a, string b)
{
return string.Equals(a, b);
}
From dotPeek, string operator !=:
public static bool operator !=(string a, string b)
{
return !string.Equals(a, b);
}
The IL for both are nearly identical save for the same ceq
call. What does string.Equals
look like though?From dotPeek, string.Equals(b):
public bool Equals(string value)
{
if (this == null)
throw new NullReferenceException();
if (value == null)
return false;
if ((object) this == (object) value)
return true;
if (this.Length != value.Length)
return false;
return string.EqualsHelper(this, value);
}
First we check to see if the string we’re comparing against is null. Otherwise, check to see if we’re comparing a string to itself. Then check to see if the lengths are different. Finally, call string.EqualsHelper
.
What about the static method, string.Equals(a, b)
? Would it be faster to call that?From dotPeek, string.Equals(a, b):
public static bool Equals(string a, string b)
{
if ((object) a == (object) b)
return true;
if (a == null || b == null || a.Length != b.Length)
return false;
return string.EqualsHelper(a, b);
}
First we check to see if the two strings are the same object reference. If they are, then return true – you are always equal to you. Otherwise, check if one of the strings is null or the lengths differ. If that’s the case, return false. Finally, if all that fails, call EqualsHelper. Looks almost identical to the instance method. What about this string.EqualsHelper
method? Get ready for some pointer fun. This is an unsafe method.From dotPeek, string.EqualsHelper:
private static unsafe bool EqualsHelper(string strA, string strB)
{
int length = strA.Length;
fixed (char* chPtr1 = &strA.m_firstChar)
fixed (char* chPtr2 = &strB.m_firstChar)
{
char* chPtr3 = chPtr1;
char* chPtr4 = chPtr2;
for (; length >= 12; length -= 12)
{
if (*(long*) chPtr3 != *(long*) chPtr4 || *(long*) (chPtr3 + 4) != *(long*) (chPtr4 + 4) || *(long*) (chPtr3 + 8) != *(long*) (chPtr4 + 8))
return false;
chPtr3 += 12;
chPtr4 += 12;
}
for (; length > 0 && *(int*) chPtr3 == *(int*) chPtr4; length -= 2)
{
chPtr3 += 2;
chPtr4 += 2;
}
return length <= 0;
}
}
Woof. That’s a bunch of code. I would hazard a guess that once the string is beyond 12 characters long, there’s a more efficient method to check string equality. This code is clever though – basically for each run of the 2nd loop, the length of the string is decremented. As soon as there’s an inequality in the loop iterator, we jump to return length <= 0
. So it’s basically a loop through memory, consuming more of the string until we’re done.So what? Which one is faster?
All of the above call string.EqualsHelper
. The direct equality is slightly faster (maybe, debatable depending on how the code gets turned into machine code) because you skip a comparison to zero. The equals operator also introduces a call to string.Equals
, so you could say it’s slightly slower due to a method call.
Again, it’s an extra jump. For a definitive answer, you need to see if the equality you’re using makes a difference in your code. If you’re checking equality once in your program, either one works. If you’re checking it thousands of times a second, it might matter.