[PATCH] Switch from i32i8imm to i8 for immediate operands on x86 vector element shift instructions.

Thu Oct 17 21:05:23 PDT 2013

I believe the spec for the intrinsics is to return 0 if the count is too
large. So capping to the width-1 doesn't work.

I checked gcc and it looks like it will use the non-immediate form if the
count is > 255.

On Wed, Oct 16, 2013 at 11:17 AM, Lang Hames <lhames at gmail.com> wrote:

> Hi Craig,
>
> Thanks very much for taking a look at this. You're right that, for the IR
> shift instructions, shifting by more than the width of the source is
> undefined. However, the test case I'm looking at uses the llvm.x86.sse2.ps*i
> intrinsics. I think the semantics of these intrinsics should be the same as
> their corresponding x86 instructions, which do define the behavior for
> larger shifts. What do you think?
>
> Cheers,
> Lang.
>
>
>
> On Wed, Oct 16, 2013 at 12:08 AM, Craig Topper <craig.topper at gmail.com>wrote:
>
>> Isn't this considered undefined behavior? For non-vectors, a constant
>> shift that's too big is converted to undef.
>>
>>
>> On Tue, Oct 15, 2013 at 12:48 PM, Lang Hames <lhames at gmail.com> wrote:
>>
>>> The immediate versions of current x86 vector element shift instructions
>>> use i32i8imm as the type of the immediate. The i32i8imm type is for 32bit
>>> wide immediates with only 8 significant bits of information.
>>>
>>> I suspect this is done to make the size of the shift operand the same
>>> (always 32 bits) regardless of whether a register or immediate is used. It
>>> is responsible for a serious bug though: The instruction only encodes 8
>>> bits for immediate shifts, so when using i32i8imm the higher bits (>=bit 8)
>>> are simply dropped. This leads, for example, to a left-shift by 0x101 of a
>>> 4xi32 value becoming a left shift by 0x01 when encoded. That's obviously
>>> not correct.
>>>
>>> The attached patch switches the type of the immediate to i8. Logical
>>> shifts by sizes more than the element width are turned into constant zeros.
>>> Arithmetic shifts are capped at element width - 1 bits, E.g. A shift by
>>> >=32 becomes a shift by 31 for an N x i32 vector. Alternatively, the
>>> arithmetic shifts could be capped at the highest encodable shift amount
>>> (255). My preference is for the tighter bound though - in theory it could
>>> cause more shifts to be CSE'd (though in practice I don't expect it to come
>>> up often).
>>>
>>> The patch passes the regression tests (with minor modifications) and a
>>> nightly test suite run.
>>>
>>> Intel experts - please let me know if you see any issues with this
>>> patch, otherwise I'll go ahead and apply it to fix the bug.
>>>
>>> Cheers,
>>> Lang.
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>
>>
>> --
>> ~Craig
>>
>
>

-- 
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131017/a98067ec/attachment.html>