[PATCH] [InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.

Andrea Di Biagio Andrea_DiBiagio at sn.scee.net
Tue Dec 9 11:14:40 PST 2014


Hi qcolombet, nadav, majnemer,

Hi David, Quentin (and all),

This patch teaches the Instruction Combiner how to fold a call to 'Intrinsic::x86_sse4a_insertqi' if the 'length field' (3rd operand) is set to zero, and if the sum between field 'length' and 'bit index' (4th operand) is bigger than 64.

From the AMD64 Architecture Programmer’s Manual:
1. "If the sum of the bit index + length field is greater than 64, the results are undefined."
2. "A value of zero in the field length is defined as a length of 64."

As a consequence of 1. and 2., "If the length field is 0 and the bit index is 0, bits 63:0 of the source operand are inserted. For any other value of the bit index, the results are undefined."

This patch improves the existing combining logic for Intrinsic::x86_sse4a_insertqi adding extra checks to address both point 1. and point 2.

Added extra test cases to existing test 'vec_demanded_elts.ll'.

Please let me know if ok to submit.
Thanks!
Andrea

http://reviews.llvm.org/D6583

Files:
  lib/Transforms/InstCombine/InstCombineCalls.cpp
  test/Transforms/InstCombine/vec_demanded_elts.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D6583.17085.patch
Type: text/x-patch
Size: 2738 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141209/ab87c0cc/attachment.bin>


More information about the llvm-commits mailing list