<div dir="ltr">Great. Thanks very much for the review Craig.<div><br></div><div>Patch committed (with minor alterations to fix merge conflicts) in r193096.</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">

On Thu, Oct 17, 2013 at 10:58 PM, Craig Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div dir="ltr">The patch LGTM.</div><div class="gmail_extra"><div><div class="h5"><br><br><div class="gmail_quote">On Thu, Oct 17, 2013 at 10:47 PM, Craig Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Oops. I wasn't paying attention that you create a constant 0 node directly and it was only arithmetic shifts that you were doing the width-1 on.</div>


<div class="gmail_extra"><div><div><br><br><div class="gmail_quote">

On Thu, Oct 17, 2013 at 9:05 PM, Craig Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div dir="ltr">I believe the spec for the intrinsics is to return 0 if the count is too large. So capping to the width-1 doesn't work. <div><br></div><div>I checked gcc and it looks like it will use the non-immediate form if the count is > 255.</div>


</div><div class="gmail_extra"><div><div><br><br><div class="gmail_quote">On Wed, Oct 16, 2013 at 11:17 AM, Lang Hames <span dir="ltr"><<a href="mailto:lhames@gmail.com" target="_blank">lhames@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Craig,<div><br></div><div>Thanks very much for taking a look at this. You're right that, for the IR shift instructions, shifting by more than the width of the source is undefined. However, the test case I'm looking at uses the <a href="http://llvm.x86.sse2.ps" target="_blank">llvm.x86.sse2.ps</a>*i intrinsics. I think the semantics of these intrinsics should be the same as their corresponding x86 instructions, which do define the behavior for larger shifts. What do you think?</div>


<div><br></div><div>Cheers,</div><div>Lang.<br></div><div><br></div></div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Oct 16, 2013 at 12:08 AM, Craig Topper <span dir="ltr"><<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Isn't this considered undefined behavior? For non-vectors, a constant shift that's too big is converted to undef.</div>


<div class="gmail_extra"><br><br><div class="gmail_quote"><div><div>On Tue, Oct 15, 2013 at 12:48 PM, Lang Hames <span dir="ltr"><<a href="mailto:lhames@gmail.com" target="_blank">lhames@gmail.com</a>></span> wrote:<br>


</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div><div dir="ltr">The immediate versions of current x86 vector element shift instructions use i32i8imm as the type of the immediate. The i32i8imm type is for 32bit wide immediates with only 8 significant bits of information.<div>


<br></div><div>I suspect this is done to make the size of the shift operand the same (always 32 bits) regardless of whether a register or immediate is used. It is responsible for a serious bug though: The instruction only encodes 8 bits for immediate shifts, so when using i32i8imm the higher bits (>=bit 8) are simply dropped. This leads, for example, to a left-shift by 0x101 of a 4xi32 value becoming a left shift by 0x01 when encoded. That's obviously not correct.</div>


<div><br></div><div>The attached patch switches the type of the immediate to i8. Logical shifts by sizes more than the element width are turned into constant zeros. Arithmetic shifts are capped at element width - 1 bits, E.g. A shift by >=32 becomes a shift by 31 for an N x i32 vector. Alternatively, the arithmetic shifts could be capped at the highest encodable shift amount (255). My preference is for the tighter bound though - in theory it could cause more shifts to be CSE'd (though in practice I don't expect it to come up often).</div>


<div><br></div><div>The patch passes the regression tests (with minor modifications) and a nightly test suite run.</div><div><br></div><div>Intel experts - please let me know if you see any issues with this patch, otherwise I'll go ahead and apply it to fix the bug.</div>


<div><br></div><div>Cheers,</div><div>Lang.</div></div>

<br></div></div>_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

<br></blockquote></div><span><font color="#888888"><br><br clear="all"><div><br></div>-- <br>~Craig

</font></span></div>

</blockquote></div><br></div>

</div></div></blockquote></div><br><br clear="all"><div><br></div></div></div><span><font color="#888888">-- <br>~Craig

</font></span></div>

</blockquote></div><br><br clear="all"><div><br></div></div></div><span><font color="#888888">-- <br>~Craig

</font></span></div>

</blockquote></div><br><br clear="all"><div><br></div></div></div><span class="HOEnZb"><font color="#888888">-- <br>~Craig

</font></span></div>

</blockquote></div><br></div>