[LLVMdev] extractelement causes memory access violation - what to do?

Pete Cooper peter_cooper at apple.com
Wed Jul 1 17:03:48 PDT 2015


> On Jul 1, 2015, at 4:48 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> ----- Original Message -----
>> From: "Pete Cooper" <peter_cooper at apple.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: "LLVMdev" <llvmdev at cs.uiuc.edu>, "Paweł Bylica" <chfast at gmail.com>
>> Sent: Wednesday, July 1, 2015 6:42:41 PM
>> Subject: Re: [LLVMdev] extractelement causes memory access violation - what	to do?
>> 
>> 
>>> On Jul 1, 2015, at 3:45 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>> 
>>> ----- Original Message -----
>>>> From: "Pete Cooper" <peter_cooper at apple.com>
>>>> To: "Paweł Bylica" <chfast at gmail.com>
>>>> Cc: "Hal Finkel" <hfinkel at anl.gov>, "LLVMdev"
>>>> <llvmdev at cs.uiuc.edu>
>>>> Sent: Wednesday, July 1, 2015 12:08:37 PM
>>>> Subject: Re: [LLVMdev] extractelement causes memory access
>>>> violation - what	to do?
>>>> 
>>>> Sorry for chiming in so late in this.
>>>> 
>>>> So I agree that negative indices are UB, I don’t think thats
>>>> contentious.
>>>> 
>>>> However, I think the issue here is the DAG expansion. That is the
>>>> point at which we go from UB which would just be hidden in the
>>>> instruction to something which can crash. I think its at that
>>>> point
>>>> where we should mask to ensure that the in memory expansion
>>>> doesn’t
>>>> read out of bounds. On architectures which do the variable extract
>>>> in an instruction, they won’t be penalized by a mask,
>>> 
>>> Why do you feel that they won't be penalized by the mask? Or are
>>> you assuming will adjust the patterns to match the index plus the
>>> mask?
>> Ah, should have explained better.  What I meant was that if i can do
>> the variable extract in a register without going to memory at all
>> (so have suitable legal instructions to do so), then we won’t
>> generate a mask at all.  The mask would only be generated when the
>> legalizer moves the data to memory which we don’t do if its legal.
> 
> Ah, alright.
> 
>>> 
>>>> only the
>>>> memory expansion will be which should be rarer,
>>> 
>>> On some architectures expansion in memory is not particularly
>>> expensive because they have very good store-to-load forwarding.
>>> Adding additional masking instructions into the critical path of
>>> correct code will not be a welcome change.
>> Thats true, so i guess it depends how many architectures need to do
>> variable extracts in memory.  I have no idea if any architectures we
>> support are able to do a variable extract in a register, or if all
>> use memory.
> 
> At least on PowerPC, when using QPX, we can do this using instructions.
> 
>> If most use a register, then penalizing the few who do
>> use memory by inserting a mask seems reasonable.
>>> 
>>>> 
>>>> The point about speculation at the IR level is interesting.
>>>> Personally i’m ok with constant indices being speculated and
>>>> variable not. If we later want to find a good way to ask TTI
>>>> whether
>>>> a variable extract is cheap then we can find a way to do so.
>>> 
>>> It is not about expense, it is about not introducing UB when
>>> speculating the instruction.
>> Yeah, I see what you mean here.  So the user could have written
>> 
>> if (i >= 0) x = extract v[i]
>> 
>> but if we speculate then we aren’t guarded and have UB.  Having the
>> backend insert the mask would fix this, but I agree that someone,
>> somewhere needs to put in the mask if we want to allow speculation
>> here, and the target can’t do the variable extract in a register.
> 
> I'd rather the frontend do this if the language wants it. We can use ComputeKnownBits when we do the speculation check, and so if there is a mask, we'll be fine.
Sounds good.

I assume that if we taught SROA to promote vectors from the stack to registers, and if the vector has a variable extract index, then we’d have to insert the mask there too?

Of course, for SROA to do this, it would have to assume that the vector being in a register is cheaper than memory, which may not be true if the backend has to push/pop the vector to memory multiple times for multiple variable extracts.

Pete
> 
>> 
>> Pete
>>> 
>>> -Hal
>>> 
>>>> 
>>>> Anyway, just my 2c.
>>>> 
>>>> 
>>>> Cheers,
>>>> Pete
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Jun 30, 2015, at 2:07 PM, Paweł Bylica < chfast at gmail.com >
>>>> wrote:
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Jun 30, 2015 at 11:03 PM Hal Finkel < hfinkel at anl.gov >
>>>> wrote:
>>>> 
>>>> 
>>>> ----- Original Message -----
>>>>> From: "Paweł Bylica" < chfast at gmail.com >
>>>>> To: "David Majnemer" < david.majnemer at gmail.com >
>>>>> Cc: "LLVMdev" < llvmdev at cs.uiuc.edu >
>>>>> Sent: Tuesday, June 30, 2015 5:42:24 AM
>>>>> Subject: Re: [LLVMdev] extractelement causes memory access
>>>>> violation - what to do?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Jun 26, 2015 at 5:42 PM David Majnemer <
>>>>> david.majnemer at gmail.com > wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Jun 26, 2015 at 7:00 AM, Paweł Bylica < chfast at gmail.com
>>>>>> 
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> 
>>>>> Let's have a simple program:
>>>>> 
>>>>> define i32 @main(i32 %n, i64 %idx) {
>>>>> %idxSafe = trunc i64 %idx to i5
>>>>> %r = extractelement <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>,
>>>>> i64
>>>>> %idx
>>>>> ret i32 %r
>>>>> }
>>>>> 
>>>>> 
>>>>> The assembly of that would be:
>>>>> 
>>>>> pcmpeqd %xmm0, %xmm0
>>>>> movdqa %xmm0, -24(%rsp)
>>>>> movl -24(%rsp,%rsi,4), %eax
>>>>> retq
>>>>> 
>>>>> 
>>>>> The language reference states that the extractelement instruction
>>>>> produces undefined value in case the index argument is invalid
>>>>> (our
>>>>> case). But the implementation simply dumps the vector to the
>>>>> stack
>>>>> memory, calculates the memory offset out of the index value and
>>>>> tries to access the memory. That causes the crash.
>>>>> 
>>>>> 
>>>>> The workaround is to trunc the index value before extractelement
>>>>> (see
>>>>> %idxSafe). But what should be the ultimate solution?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> We could fix this by specifying that out of bounds access on an
>>>>> extractelement leads to full-on undefined behavior, no need to
>>>>> force
>>>>> everyone to eat the cost of a mask.
>>>>> 
>>>>> 
>>>>> I don't have preference for any of the solutions.
>>>>> 
>>>>> 
>>>>> I have a side question. It is not stated explicitly in the
>>>>> reference
>>>>> but I would assume the index of extractelement is processed as an
>>>>> unsigned value. However, the DAG Builder extends the index with
>>>>> sext. Is it correct?
>>>> 
>>>> Hrmm. Given that only (small) positive numbers are valid, it
>>>> shouldn't matter. Unless we can find a reason that it works better
>>>> to be sext, it seems conceptually cleaner to make it zext.
>>>> 
>>>> 
>>>> 
>>>> I have tried to change it to zext. 2 Mips test have failed. I
>>>> haven't
>>>> checked the details though.
>>>> sext looks pretty wrong to me because i5 -1 does not mean 31 any
>>>> more.
>>>> 
>>>> 
>>>> - PB
>>>> 
>>>> 
>>>> 
>>>> -Hal
>>>> 
>>>>> 
>>>>> 
>>>>> - PB
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>> 
>>>> 
>>>> --
>>>> Hal Finkel
>>>> Assistant Computational Scientist
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>> 
>>>> 
>>> 
>>> --
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>> 
>> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory





More information about the llvm-dev mailing list