[PATCH] Switch from i32i8imm to i8 for immediate operands on x86 vector element shift instructions.

Thu Oct 17 22:47:40 PDT 2013

Oops. I wasn't paying attention that you create a constant 0 node directly
and it was only arithmetic shifts that you were doing the width-1 on.

On Thu, Oct 17, 2013 at 9:05 PM, Craig Topper <craig.topper at gmail.com>wrote:

> I believe the spec for the intrinsics is to return 0 if the count is too
> large. So capping to the width-1 doesn't work.
>
> I checked gcc and it looks like it will use the non-immediate form if the
> count is > 255.
>
>
> On Wed, Oct 16, 2013 at 11:17 AM, Lang Hames <lhames at gmail.com> wrote:
>
>> Hi Craig,
>>
>> Thanks very much for taking a look at this. You're right that, for the IR
>> shift instructions, shifting by more than the width of the source is
>> undefined. However, the test case I'm looking at uses the
>> llvm.x86.sse2.ps*i intrinsics. I think the semantics of these intrinsics
>> should be the same as their corresponding x86 instructions, which do define
>> the behavior for larger shifts. What do you think?
>>
>> Cheers,
>> Lang.
>>
>>
>>
>> On Wed, Oct 16, 2013 at 12:08 AM, Craig Topper <craig.topper at gmail.com>wrote:
>>
>>> Isn't this considered undefined behavior? For non-vectors, a constant
>>> shift that's too big is converted to undef.
>>>
>>>
>>> On Tue, Oct 15, 2013 at 12:48 PM, Lang Hames <lhames at gmail.com> wrote:
>>>
>>>> The immediate versions of current x86 vector element shift instructions
>>>> use i32i8imm as the type of the immediate. The i32i8imm type is for 32bit
>>>> wide immediates with only 8 significant bits of information.
>>>>
>>>> I suspect this is done to make the size of the shift operand the same
>>>> (always 32 bits) regardless of whether a register or immediate is used. It
>>>> is responsible for a serious bug though: The instruction only encodes 8
>>>> bits for immediate shifts, so when using i32i8imm the higher bits (>=bit 8)
>>>> are simply dropped. This leads, for example, to a left-shift by 0x101 of a
>>>> 4xi32 value becoming a left shift by 0x01 when encoded. That's obviously
>>>> not correct.
>>>>
>>>> The attached patch switches the type of the immediate to i8. Logical
>>>> shifts by sizes more than the element width are turned into constant zeros.
>>>> Arithmetic shifts are capped at element width - 1 bits, E.g. A shift by
>>>> >=32 becomes a shift by 31 for an N x i32 vector. Alternatively, the
>>>> arithmetic shifts could be capped at the highest encodable shift amount
>>>> (255). My preference is for the tighter bound though - in theory it could
>>>> cause more shifts to be CSE'd (though in practice I don't expect it to come
>>>> up often).
>>>>
>>>> The patch passes the regression tests (with minor modifications) and a
>>>> nightly test suite run.
>>>>
>>>> Intel experts - please let me know if you see any issues with this
>>>> patch, otherwise I'll go ahead and apply it to fix the bug.
>>>>
>>>> Cheers,
>>>> Lang.
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>>
>>>
>>>
>>> --
>>> ~Craig
>>>
>>
>>
>
>
> --
> ~Craig
>

-- 
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131017/201e5b5e/attachment.html>