[PATCH] Switch from i32i8imm to i8 for immediate operands on x86 vector element shift instructions.

Thu Oct 17 22:58:53 PDT 2013

The patch LGTM.

On Thu, Oct 17, 2013 at 10:47 PM, Craig Topper <craig.topper at gmail.com>wrote:

> Oops. I wasn't paying attention that you create a constant 0 node directly
> and it was only arithmetic shifts that you were doing the width-1 on.
>
>
> On Thu, Oct 17, 2013 at 9:05 PM, Craig Topper <craig.topper at gmail.com>wrote:
>
>> I believe the spec for the intrinsics is to return 0 if the count is too
>> large. So capping to the width-1 doesn't work.
>>
>> I checked gcc and it looks like it will use the non-immediate form if the
>> count is > 255.
>>
>>
>> On Wed, Oct 16, 2013 at 11:17 AM, Lang Hames <lhames at gmail.com> wrote:
>>
>>> Hi Craig,
>>>
>>> Thanks very much for taking a look at this. You're right that, for the
>>> IR shift instructions, shifting by more than the width of the source is
>>> undefined. However, the test case I'm looking at uses the
>>> llvm.x86.sse2.ps*i intrinsics. I think the semantics of these
>>> intrinsics should be the same as their corresponding x86 instructions,
>>> which do define the behavior for larger shifts. What do you think?
>>>
>>> Cheers,
>>> Lang.
>>>
>>>
>>>
>>> On Wed, Oct 16, 2013 at 12:08 AM, Craig Topper <craig.topper at gmail.com>wrote:
>>>
>>>> Isn't this considered undefined behavior? For non-vectors, a constant
>>>> shift that's too big is converted to undef.
>>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 12:48 PM, Lang Hames <lhames at gmail.com> wrote:
>>>>
>>>>> The immediate versions of current x86 vector element shift
>>>>> instructions use i32i8imm as the type of the immediate. The i32i8imm type
>>>>> is for 32bit wide immediates with only 8 significant bits of information.
>>>>>
>>>>> I suspect this is done to make the size of the shift operand the same
>>>>> (always 32 bits) regardless of whether a register or immediate is used. It
>>>>> is responsible for a serious bug though: The instruction only encodes 8
>>>>> bits for immediate shifts, so when using i32i8imm the higher bits (>=bit 8)
>>>>> are simply dropped. This leads, for example, to a left-shift by 0x101 of a
>>>>> 4xi32 value becoming a left shift by 0x01 when encoded. That's obviously
>>>>> not correct.
>>>>>
>>>>> The attached patch switches the type of the immediate to i8. Logical
>>>>> shifts by sizes more than the element width are turned into constant zeros.
>>>>> Arithmetic shifts are capped at element width - 1 bits, E.g. A shift by
>>>>> >=32 becomes a shift by 31 for an N x i32 vector. Alternatively, the
>>>>> arithmetic shifts could be capped at the highest encodable shift amount
>>>>> (255). My preference is for the tighter bound though - in theory it could
>>>>> cause more shifts to be CSE'd (though in practice I don't expect it to come
>>>>> up often).
>>>>>
>>>>> The patch passes the regression tests (with minor modifications) and a
>>>>> nightly test suite run.
>>>>>
>>>>> Intel experts - please let me know if you see any issues with this
>>>>> patch, otherwise I'll go ahead and apply it to fix the bug.
>>>>>
>>>>> Cheers,
>>>>> Lang.
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> ~Craig
>>>>
>>>
>>>
>>
>>
>> --
>> ~Craig
>>
>
>
>
> --
> ~Craig
>

-- 
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131017/1d6b07df/attachment.html>