[llvm-commits] [llvm] r61297 - in /llvm/trunk: lib/Transforms/Scalar/SimplifyLibCalls.cpp test/Transforms/SimplifyLibCalls/2008-12-20-StrcmpMemcmp.ll

Török Edwin edwintorok at gmail.com
Mon Dec 22 07:15:18 PST 2008


On 2008-12-21 10:41, Török Edwin wrote:
> On 2008-12-21 03:03, Nick Lewycky wrote:
>   
>> Eli Friedman wrote:
>>   
>>     
>>> On Sat, Dec 20, 2008 at 4:38 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>     
>>>       
>>>> On Sat, Dec 20, 2008 at 4:19 PM, Nick Lewycky <nicholas at mxc.ca> wrote:
>>>>       
>>>>         
>>>>> Author: nicholas
>>>>> Date: Sat Dec 20 18:19:21 2008
>>>>> New Revision: 61297
>>>>>
>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=61297&view=rev
>>>>> Log:
>>>>> Turn strcmp into memcmp, such as strcmp(P, "x") --> memcmp(P, "x", 2).
>>>>>         
>>>>>           
>>>> I'm pretty sure this isn't safe; take the following testcase:
>>>>
>>>> int foo(char* x) {return strcmp(x, "x", 2) == 0;}
>>>>       
>>>>         
>>> Oops, messed that up slightly; try the following:
>>> int foo(char* x) {return strcmp(x, "x") == 0;}
>>>     
>>>       
>> The pathological case is:
>>
>>    char x[1] = "\0";  // pretend we don't know its length
>>    char y[2] = "x\0"; // we know its length
>>    return strcmp(x, y);
>>
>> If we transform it into
>>
>>    return memcmp(x, y, 2)
>>
>> then the existing MemCmpOpt will exercise these cases:
>>
>>    // memcmp(S1,S2,2) != 0 -> (*(short*)LHS ^ *(short*)RHS)  != 0
>>    // memcmp(S1,S2,4) != 0 -> (*(int*)LHS ^ *(int*)RHS)  != 0
>>
>> resulting in a 2-byte load of x[1], which is illegal.
>>
>> Could someone comment on what part of this transform is wrong? I recall 
>> having a discussion about it on IRC, but can't recall the details...
>>   
>>     
>
>
> You need to check the alignment.
> I think something like:
> MinAlignStr = Minimum Alignment of Str1P, Str2P (max 16)
> MinLen = Minimum length of Str1P and Str2P
> AlignMinLen = AlignOf(MinLen) (max 16)
> If (MinAlignStr >= AlingMinLen) -> transform is safe
>   

I found the IRC discussion, we assumed that memcmp won't do a 2 byte
load if unaligned:
dwin: memcmp may do a 2 byte read
(11:12:15 PM) _sabre_: memcmp can't do a 2 byte read
(11:12:19 PM) danchr: sdt: oh, I didn't know memcmp stopped at first
difference
(11:12:19 PM) _sabre_: unless the pointers are aligned
(11:12:38 PM) _sabre_: or it knows it is safe some other way
(11:12:55 PM) baldrick: is that guaranteed?
(11:13:33 PM) edwin: for sure it can't transform strcmp(P,"1234\0") into
memcmp(P,"1234\0",5)
(11:13:39 PM) baldrick: or just what is expected by a quality
implementation :)
(11:13:58 PM) baldrick: edwin: why not?
(11:14:06 PM) edwin: because P may be 4 byte aligned

But try this with llvm-gcc:
#include <string.h>
int foo(const char *s)
{
    return !memcmp(s,"x",2);
}

It produces:
foo:
.Leh_func_begin1:
.Llabel1:
    cmpw    $120, (%rdi)
    sete    %al
    movzbl    %al, %eax
    ret

So is it a bug in MemCmp optimization?

Best regards,
--Edwin



More information about the llvm-commits mailing list