[PATCH] Switch from i32i8imm to i8 for immediate operands on x86 vector element shift instructions.
Lang Hames
lhames at gmail.com
Mon Oct 21 10:55:38 PDT 2013
Great. Thanks very much for the review Craig.
Patch committed (with minor alterations to fix merge conflicts) in r193096.
On Thu, Oct 17, 2013 at 10:58 PM, Craig Topper <craig.topper at gmail.com>wrote:
> The patch LGTM.
>
>
> On Thu, Oct 17, 2013 at 10:47 PM, Craig Topper <craig.topper at gmail.com>wrote:
>
>> Oops. I wasn't paying attention that you create a constant 0 node
>> directly and it was only arithmetic shifts that you were doing the width-1
>> on.
>>
>>
>> On Thu, Oct 17, 2013 at 9:05 PM, Craig Topper <craig.topper at gmail.com>wrote:
>>
>>> I believe the spec for the intrinsics is to return 0 if the count is too
>>> large. So capping to the width-1 doesn't work.
>>>
>>> I checked gcc and it looks like it will use the non-immediate form if
>>> the count is > 255.
>>>
>>>
>>> On Wed, Oct 16, 2013 at 11:17 AM, Lang Hames <lhames at gmail.com> wrote:
>>>
>>>> Hi Craig,
>>>>
>>>> Thanks very much for taking a look at this. You're right that, for the
>>>> IR shift instructions, shifting by more than the width of the source is
>>>> undefined. However, the test case I'm looking at uses the
>>>> llvm.x86.sse2.ps*i intrinsics. I think the semantics of these
>>>> intrinsics should be the same as their corresponding x86 instructions,
>>>> which do define the behavior for larger shifts. What do you think?
>>>>
>>>> Cheers,
>>>> Lang.
>>>>
>>>>
>>>>
>>>> On Wed, Oct 16, 2013 at 12:08 AM, Craig Topper <craig.topper at gmail.com>wrote:
>>>>
>>>>> Isn't this considered undefined behavior? For non-vectors, a constant
>>>>> shift that's too big is converted to undef.
>>>>>
>>>>>
>>>>> On Tue, Oct 15, 2013 at 12:48 PM, Lang Hames <lhames at gmail.com> wrote:
>>>>>
>>>>>> The immediate versions of current x86 vector element shift
>>>>>> instructions use i32i8imm as the type of the immediate. The i32i8imm type
>>>>>> is for 32bit wide immediates with only 8 significant bits of information.
>>>>>>
>>>>>> I suspect this is done to make the size of the shift operand the same
>>>>>> (always 32 bits) regardless of whether a register or immediate is used. It
>>>>>> is responsible for a serious bug though: The instruction only encodes 8
>>>>>> bits for immediate shifts, so when using i32i8imm the higher bits (>=bit 8)
>>>>>> are simply dropped. This leads, for example, to a left-shift by 0x101 of a
>>>>>> 4xi32 value becoming a left shift by 0x01 when encoded. That's obviously
>>>>>> not correct.
>>>>>>
>>>>>> The attached patch switches the type of the immediate to i8. Logical
>>>>>> shifts by sizes more than the element width are turned into constant zeros.
>>>>>> Arithmetic shifts are capped at element width - 1 bits, E.g. A shift by
>>>>>> >=32 becomes a shift by 31 for an N x i32 vector. Alternatively, the
>>>>>> arithmetic shifts could be capped at the highest encodable shift amount
>>>>>> (255). My preference is for the tighter bound though - in theory it could
>>>>>> cause more shifts to be CSE'd (though in practice I don't expect it to come
>>>>>> up often).
>>>>>>
>>>>>> The patch passes the regression tests (with minor modifications) and
>>>>>> a nightly test suite run.
>>>>>>
>>>>>> Intel experts - please let me know if you see any issues with this
>>>>>> patch, otherwise I'll go ahead and apply it to fix the bug.
>>>>>>
>>>>>> Cheers,
>>>>>> Lang.
>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ~Craig
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ~Craig
>>>
>>
>>
>>
>> --
>> ~Craig
>>
>
>
>
> --
> ~Craig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131021/df00bc15/attachment.html>
More information about the llvm-commits
mailing list