[LLVMdev] fptoui calling a function that modifies ECX
Peter Newman
peter at uformia.com
Sat Jul 20 00:44:47 PDT 2013
I've applied this and the test cases I have here continue to work, so it
looks good to me.
I've ran into another (seemingly unrelated) issue which I'll describe in
a separate email to the dev list.
--
Peter N
On 20/07/2013 5:30 AM, Craig Topper wrote:
> Here's my attempt at a fix. Adding Jakob to make sure I did this right.
>
>
> On Fri, Jul 19, 2013 at 2:34 AM, Peter Newman <peter at uformia.com
> <mailto:peter at uformia.com>> wrote:
>
> That does appear to have worked. All my tests are passing now.
>
> I'll hand this out to our other devs & testers and make sure it's
> working for them as well (not just on my machine).
>
> Thank you, again.
>
> --
> Peter N
>
>
> On 19/07/2013 5:45 PM, Craig Topper wrote:
>> I don't think that's going to work.
>>
>>
>> On Fri, Jul 19, 2013 at 12:24 AM, Peter Newman <peter at uformia.com
>> <mailto:peter at uformia.com>> wrote:
>>
>> Thank you, I'm trying this now.
>>
>>
>> On 19/07/2013 5:23 PM, Craig Topper wrote:
>>> Try adding ECX to the Defs of this part of
>>> lib/Target/X86/X86InstrCompiler.td like I've done below. I
>>> don't have a Windows machine to test myself.
>>>
>>> let Defs = [EAX, EDX, ECX, EFLAGS], FPForm = SpecialFP in {
>>> def WIN_FTOL_32 : I<0, Pseudo, (outs), (ins RFP32:$src),
>>> "# win32 fptoui",
>>> [(X86WinFTOL RFP32:$src)]>,
>>> Requires<[In32BitMode]>;
>>>
>>> def WIN_FTOL_64 : I<0, Pseudo, (outs), (ins RFP64:$src),
>>> "# win32 fptoui",
>>> [(X86WinFTOL RFP64:$src)]>,
>>> Requires<[In32BitMode]>;
>>> }
>>>
>>>
>>> On Thu, Jul 18, 2013 at 11:59 PM, Peter Newman
>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote:
>>>
>>> Oh, excellent point, I agree. My bad. Now that I'm not
>>> assuming those are the sqrt, I see the sqrtpd's in the
>>> output. Also there are three fptoui's and there are 3
>>> call instances.
>>>
>>> (Changing subject line again.)
>>>
>>> Now it looks like it's bug #13862
>>>
>>> On 19/07/2013 4:51 PM, Craig Topper wrote:
>>>> I think those calls correspond to this
>>>>
>>>> %110 = fptoui double %109 to i32
>>>>
>>>> The calls are followed by an imul with 12 which matches
>>>> up with what occurs right after the fptoui in the IR.
>>>>
>>>>
>>>> On Thu, Jul 18, 2013 at 11:48 PM, Peter Newman
>>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote:
>>>>
>>>> Yes, that is the result of module-dump.ll
>>>>
>>>>
>>>> On 19/07/2013 4:46 PM, Craig Topper wrote:
>>>>> Does this correspond to one of the .ll files you
>>>>> sent earlier?
>>>>>
>>>>>
>>>>> On Thu, Jul 18, 2013 at 11:34 PM, Peter Newman
>>>>> <peter at uformia.com <mailto:peter at uformia.com>> wrote:
>>>>>
>>>>> (Changing subject line as diagnosis has changed)
>>>>>
>>>>> I'm attaching the compiled code that I've been
>>>>> getting, both with CodeGenOpt::Default and
>>>>> CodeGenOpt::None . The crash isn't occurring
>>>>> with CodeGenOpt::None, but that seems to be
>>>>> because ECX isn't being used - it still gets
>>>>> set to 0x7fffffff by one of the calls to 76719BA1
>>>>>
>>>>> I notice that X86::SQRTPD[m|r] appear in
>>>>> X86InstrInfo::isHighLatencyDef. I was thinking
>>>>> an optimization might be removing it, but I
>>>>> don't get the sqrtpd instruction even if the
>>>>> createJIT optimization level turned off.
>>>>>
>>>>> I am trying this with the Release 3.3 code -
>>>>> I'll try it with trunk and see if I get a
>>>>> different result there. Maybe there was a
>>>>> recent commit for this.
>>>>>
>>>>> --
>>>>> Peter N
>>>>>
>>>>> On 19/07/2013 4:00 PM, Craig Topper wrote:
>>>>>> Hmm, I'm not able to get those .ll files to
>>>>>> compile if I disable SSE and I end up with
>>>>>> SSE instructions(including sqrtpd) if I don't
>>>>>> disable it.
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 18, 2013 at 10:53 PM, Peter
>>>>>> Newman <peter at uformia.com
>>>>>> <mailto:peter at uformia.com>> wrote:
>>>>>>
>>>>>> Is there something specifically required
>>>>>> to enable SSE? If it's not detected as
>>>>>> available (based from the target triple?)
>>>>>> then I don't think we enable it specifically.
>>>>>>
>>>>>> Also it seems that it should handle
>>>>>> converting to/from the vector types,
>>>>>> although I can see it getting confused
>>>>>> about needing to do that if it thinks SSE
>>>>>> isn't available at all.
>>>>>>
>>>>>>
>>>>>> On 19/07/2013 3:47 PM, Craig Topper wrote:
>>>>>>> Hmm, maybe sse isn't being enabled so
>>>>>>> its falling back to emulating sqrt?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 18, 2013 at 10:45 PM, Peter
>>>>>>> Newman <peter at uformia.com
>>>>>>> <mailto:peter at uformia.com>> wrote:
>>>>>>>
>>>>>>> In the disassembly, I'm seeing three
>>>>>>> cases of
>>>>>>> call 76719BA1
>>>>>>>
>>>>>>> I am assuming this is the sqrt
>>>>>>> function as this is the only
>>>>>>> function called in the LLVM IR.
>>>>>>>
>>>>>>> The code at 76719BA1 is:
>>>>>>>
>>>>>>> 76719BA1 push ebp
>>>>>>> 76719BA2 mov ebp,esp
>>>>>>> 76719BA4 sub esp,20h
>>>>>>> 76719BA7 and esp,0FFFFFFF0h
>>>>>>> 76719BAA fld st(0)
>>>>>>> 76719BAC fst dword ptr [esp+18h]
>>>>>>> 76719BB0 fistp qword ptr [esp+10h]
>>>>>>> 76719BB4 fild qword ptr [esp+10h]
>>>>>>> 76719BB8 mov edx,dword ptr [esp+18h]
>>>>>>> 76719BBC mov eax,dword ptr [esp+10h]
>>>>>>> 76719BC0 test eax,eax
>>>>>>> 76719BC2 je 76719DCF
>>>>>>> 76719BC8 fsubp st(1),st
>>>>>>> 76719BCA test edx,edx
>>>>>>> 76719BCC js 7671F9DB
>>>>>>> 76719BD2 fstp dword ptr [esp]
>>>>>>> 76719BD5 mov ecx,dword ptr [esp]
>>>>>>> 76719BD8 add ecx,7FFFFFFFh
>>>>>>> 76719BDE sbb eax,0
>>>>>>> 76719BE1 mov edx,dword ptr [esp+14h]
>>>>>>> 76719BE5 sbb edx,0
>>>>>>> 76719BE8 leave
>>>>>>> 76719BE9 ret
>>>>>>>
>>>>>>>
>>>>>>> As you can see at 76719BD5, it
>>>>>>> modifies ECX .
>>>>>>>
>>>>>>> I don't know that this is the sqrtpd
>>>>>>> function (for example, I'm not
>>>>>>> seeing any SSE instructions here?)
>>>>>>> but whatever it is, it's being
>>>>>>> called from the IR I attached
>>>>>>> earlier, and is modifying ECX under
>>>>>>> some circumstances.
>>>>>>>
>>>>>>>
>>>>>>> On 19/07/2013 3:29 PM, Craig Topper
>>>>>>> wrote:
>>>>>>>> That should map directly to sqrtpd
>>>>>>>> which can't modify ecx.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 18, 2013 at 10:27 PM,
>>>>>>>> Peter Newman <peter at uformia.com
>>>>>>>> <mailto:peter at uformia.com>> wrote:
>>>>>>>>
>>>>>>>> Sorry, that should have been
>>>>>>>> llvm.x86.sse2.sqrt.pd
>>>>>>>>
>>>>>>>>
>>>>>>>> On 19/07/2013 3:25 PM, Craig
>>>>>>>> Topper wrote:
>>>>>>>>> What is
>>>>>>>>> "frep.x86.sse2.sqrt.pd". I'm
>>>>>>>>> only familiar with things
>>>>>>>>> prefixed with "llvm.x86".
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jul 18, 2013 at 10:12
>>>>>>>>> PM, Peter Newman
>>>>>>>>> <peter at uformia.com
>>>>>>>>> <mailto:peter at uformia.com>> wrote:
>>>>>>>>>
>>>>>>>>> After stepping through the
>>>>>>>>> produced assembly, I
>>>>>>>>> believe I have a culprit.
>>>>>>>>>
>>>>>>>>> One of the calls to
>>>>>>>>> @frep.x86.sse2.sqrt.pd is
>>>>>>>>> modifying the value of ECX
>>>>>>>>> - while the produced code
>>>>>>>>> is expecting it to still
>>>>>>>>> contain its previous value.
>>>>>>>>>
>>>>>>>>> Peter N
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 19/07/2013 2:09 PM,
>>>>>>>>> Peter Newman wrote:
>>>>>>>>>> I've attached the
>>>>>>>>>> module->dump() that our
>>>>>>>>>> code is producing.
>>>>>>>>>> Unfortunately this is the
>>>>>>>>>> smallest test case I have
>>>>>>>>>> available.
>>>>>>>>>>
>>>>>>>>>> This is before any
>>>>>>>>>> optimization passes are
>>>>>>>>>> applied. There are two
>>>>>>>>>> separate modules in
>>>>>>>>>> existence at the time,
>>>>>>>>>> and there are no
>>>>>>>>>> guarantees about the
>>>>>>>>>> order the surrounding
>>>>>>>>>> code calls those
>>>>>>>>>> functions, so there may
>>>>>>>>>> be some interaction
>>>>>>>>>> between them? There
>>>>>>>>>> shouldn't be, they don't
>>>>>>>>>> refer to any common
>>>>>>>>>> memory etc. There is no
>>>>>>>>>> multi-threading occurring.
>>>>>>>>>>
>>>>>>>>>> The function in
>>>>>>>>>> module-dump.ll (called
>>>>>>>>>> crashfunc in this file)
>>>>>>>>>> is called with
>>>>>>>>>> - func_params 0x0018f3b0
>>>>>>>>>> double [3]
>>>>>>>>>> [0x0] -11.339976634695301
>>>>>>>>>> double
>>>>>>>>>> [0x1] -9.7504239056205506
>>>>>>>>>> double
>>>>>>>>>> [0x2] -5.2900856817382804
>>>>>>>>>> double
>>>>>>>>>> at the time of the exception.
>>>>>>>>>>
>>>>>>>>>> This is compiled on a
>>>>>>>>>> "i686-pc-win32" triple.
>>>>>>>>>> All of the non-intrinsic
>>>>>>>>>> functions referred to in
>>>>>>>>>> these modules are the
>>>>>>>>>> standard equivalents from
>>>>>>>>>> the MSVC library (e.g.
>>>>>>>>>> @asin is the standard C
>>>>>>>>>> lib double asin(
>>>>>>>>>> double ) ).
>>>>>>>>>>
>>>>>>>>>> Hopefully this is
>>>>>>>>>> reproducible for you.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> PeterN
>>>>>>>>>>
>>>>>>>>>> On 18/07/2013 4:37 PM,
>>>>>>>>>> Craig Topper wrote:
>>>>>>>>>>> Are you able to send any
>>>>>>>>>>> IR for others to
>>>>>>>>>>> reproduce this issue?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 17, 2013 at
>>>>>>>>>>> 11:23 PM, Peter Newman
>>>>>>>>>>> <peter at uformia.com
>>>>>>>>>>> <mailto:peter at uformia.com>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately, this
>>>>>>>>>>> doesn't appear to be
>>>>>>>>>>> the bug I'm hitting.
>>>>>>>>>>> I applied the fix to
>>>>>>>>>>> my source and it
>>>>>>>>>>> didn't make a
>>>>>>>>>>> difference.
>>>>>>>>>>>
>>>>>>>>>>> Also further testing
>>>>>>>>>>> found me getting the
>>>>>>>>>>> same behavior with
>>>>>>>>>>> other SIMD
>>>>>>>>>>> instructions. The
>>>>>>>>>>> common factor is in
>>>>>>>>>>> each case, ECX is
>>>>>>>>>>> set to 0x7fffffff,
>>>>>>>>>>> and it's an
>>>>>>>>>>> operation using xmm
>>>>>>>>>>> ptr ecx+offset .
>>>>>>>>>>>
>>>>>>>>>>> Additionally,
>>>>>>>>>>> turning the
>>>>>>>>>>> optimization level
>>>>>>>>>>> passed to createJIT
>>>>>>>>>>> down appears to
>>>>>>>>>>> avoid it, so I'm now
>>>>>>>>>>> leaning towards a
>>>>>>>>>>> bug in one of the
>>>>>>>>>>> optimization passes.
>>>>>>>>>>>
>>>>>>>>>>> I'm going to dig
>>>>>>>>>>> through the passes
>>>>>>>>>>> controlled by that
>>>>>>>>>>> parameter and see if
>>>>>>>>>>> I can narrow down
>>>>>>>>>>> which optimization
>>>>>>>>>>> is causing it.
>>>>>>>>>>>
>>>>>>>>>>> Peter N
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 17/07/2013 1:58
>>>>>>>>>>> PM, Solomon Boulos
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> As someone off
>>>>>>>>>>> list just told
>>>>>>>>>>> me, perhaps my
>>>>>>>>>>> new bug is the
>>>>>>>>>>> same issue:
>>>>>>>>>>>
>>>>>>>>>>> http://llvm.org/bugs/show_bug.cgi?id=16640
>>>>>>>>>>>
>>>>>>>>>>> Do you happen to
>>>>>>>>>>> be using FastISel?
>>>>>>>>>>>
>>>>>>>>>>> Solomon
>>>>>>>>>>>
>>>>>>>>>>> On Jul 16, 2013,
>>>>>>>>>>> at 6:39 PM,
>>>>>>>>>>> Peter Newman
>>>>>>>>>>> <peter at uformia.com
>>>>>>>>>>> <mailto:peter at uformia.com>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello all,
>>>>>>>>>>>
>>>>>>>>>>> I'm
>>>>>>>>>>> currently in
>>>>>>>>>>> the process
>>>>>>>>>>> of debugging
>>>>>>>>>>> a crash
>>>>>>>>>>> occurring in
>>>>>>>>>>> our program.
>>>>>>>>>>> In LLVM 3.2
>>>>>>>>>>> and 3.3 it
>>>>>>>>>>> appears that
>>>>>>>>>>> JIT
>>>>>>>>>>> generated
>>>>>>>>>>> code is
>>>>>>>>>>> attempting
>>>>>>>>>>> to perform
>>>>>>>>>>> access
>>>>>>>>>>> unaligned
>>>>>>>>>>> memory with
>>>>>>>>>>> a SSE2
>>>>>>>>>>> instruction.
>>>>>>>>>>> However this
>>>>>>>>>>> only happens
>>>>>>>>>>> under
>>>>>>>>>>> certain
>>>>>>>>>>> conditions
>>>>>>>>>>> that seem
>>>>>>>>>>> (but may not
>>>>>>>>>>> be) related
>>>>>>>>>>> to the
>>>>>>>>>>> stacks state
>>>>>>>>>>> on calling
>>>>>>>>>>> the function.
>>>>>>>>>>>
>>>>>>>>>>> Our program
>>>>>>>>>>> acts as a
>>>>>>>>>>> front-end,
>>>>>>>>>>> using the
>>>>>>>>>>> LLVM C++ API
>>>>>>>>>>> to generate
>>>>>>>>>>> a JIT
>>>>>>>>>>> generated
>>>>>>>>>>> function.
>>>>>>>>>>> This
>>>>>>>>>>> function is
>>>>>>>>>>> primarily
>>>>>>>>>>> mathematical, so
>>>>>>>>>>> we use the
>>>>>>>>>>> Vector types
>>>>>>>>>>> to take
>>>>>>>>>>> advantage of
>>>>>>>>>>> SIMD
>>>>>>>>>>> instructions
>>>>>>>>>>> (as well as
>>>>>>>>>>> a few SSE2
>>>>>>>>>>> intrinsics).
>>>>>>>>>>>
>>>>>>>>>>> This worked
>>>>>>>>>>> in LLVM 2.8
>>>>>>>>>>> but started
>>>>>>>>>>> failing in
>>>>>>>>>>> 3.2 and has
>>>>>>>>>>> continued to
>>>>>>>>>>> fail in 3.3.
>>>>>>>>>>> It fails
>>>>>>>>>>> with no
>>>>>>>>>>> optimizations applied
>>>>>>>>>>> to the LLVM
>>>>>>>>>>> Function/Module.
>>>>>>>>>>> It crashes
>>>>>>>>>>> with what is
>>>>>>>>>>> reported as
>>>>>>>>>>> a memory
>>>>>>>>>>> access error
>>>>>>>>>>> (accessing
>>>>>>>>>>> 0xffffffff),
>>>>>>>>>>> however it's
>>>>>>>>>>> suggested
>>>>>>>>>>> that this is
>>>>>>>>>>> how the SSE
>>>>>>>>>>> fault
>>>>>>>>>>> raising
>>>>>>>>>>> mechanism
>>>>>>>>>>> appears.
>>>>>>>>>>>
>>>>>>>>>>> The
>>>>>>>>>>> generated
>>>>>>>>>>> instruction
>>>>>>>>>>> varies, but
>>>>>>>>>>> it seems to
>>>>>>>>>>> often be
>>>>>>>>>>> similar to
>>>>>>>>>>> (I don't
>>>>>>>>>>> have it in
>>>>>>>>>>> front of me,
>>>>>>>>>>> sorry):
>>>>>>>>>>> movapd xmm0,
>>>>>>>>>>> xmm[ecx+0x???????]
>>>>>>>>>>> Where the
>>>>>>>>>>> xmm register
>>>>>>>>>>> changes, and
>>>>>>>>>>> the second
>>>>>>>>>>> parameter is
>>>>>>>>>>> a memory access.
>>>>>>>>>>> ECX is
>>>>>>>>>>> always set
>>>>>>>>>>> to 0x7ffffff
>>>>>>>>>>> - however I
>>>>>>>>>>> don't know
>>>>>>>>>>> if this is
>>>>>>>>>>> part of the
>>>>>>>>>>> SSE error
>>>>>>>>>>> reporting
>>>>>>>>>>> process or
>>>>>>>>>>> is part of
>>>>>>>>>>> the
>>>>>>>>>>> situation
>>>>>>>>>>> causing the
>>>>>>>>>>> error.
>>>>>>>>>>>
>>>>>>>>>>> I haven't
>>>>>>>>>>> worked out
>>>>>>>>>>> exactly what
>>>>>>>>>>> code path
>>>>>>>>>>> etc is
>>>>>>>>>>> causing this
>>>>>>>>>>> crash. I'm
>>>>>>>>>>> hoping that
>>>>>>>>>>> someone can
>>>>>>>>>>> tell me if
>>>>>>>>>>> there were
>>>>>>>>>>> any changed
>>>>>>>>>>> requirements
>>>>>>>>>>> for working
>>>>>>>>>>> with SIMD in
>>>>>>>>>>> LLVM 3.2 (or
>>>>>>>>>>> earlier, we
>>>>>>>>>>> haven't
>>>>>>>>>>> tried 3.0 or
>>>>>>>>>>> 3.1). I
>>>>>>>>>>> currently
>>>>>>>>>>> suspect the
>>>>>>>>>>> use of
>>>>>>>>>>> GlobalVariable
>>>>>>>>>>> (we first
>>>>>>>>>>> discovered
>>>>>>>>>>> the crash
>>>>>>>>>>> when using a
>>>>>>>>>>> feature that
>>>>>>>>>>> uses them),
>>>>>>>>>>> however I
>>>>>>>>>>> have
>>>>>>>>>>> attempted
>>>>>>>>>>> using
>>>>>>>>>>> setAlignment
>>>>>>>>>>> on the
>>>>>>>>>>> GlobalVariables
>>>>>>>>>>> without any
>>>>>>>>>>> change.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Peter N
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> LLVM
>>>>>>>>>>> Developers
>>>>>>>>>>> mailing list
>>>>>>>>>>> LLVMdev at cs.uiuc.edu
>>>>>>>>>>> <mailto:LLVMdev at cs.uiuc.edu>
>>>>>>>>>>> http://llvm.cs.uiuc.edu
>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> LLVM Developers
>>>>>>>>>>> mailing list
>>>>>>>>>>> LLVMdev at cs.uiuc.edu
>>>>>>>>>>> <mailto:LLVMdev at cs.uiuc.edu>
>>>>>>>>>>> http://llvm.cs.uiuc.edu
>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> ~Craig
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ~Craig
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ~Craig
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ~Craig
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ~Craig
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ~Craig
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ~Craig
>>>
>>>
>>>
>>>
>>> --
>>> ~Craig
>>
>>
>>
>>
>> --
>> ~Craig
>
>
>
>
> --
> ~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130720/75b94f25/attachment.html>
More information about the llvm-dev
mailing list