[PATCH] D13161: [PATCH, PR24373] Combine shifts for x86

Fri Oct 23 03:17:26 PDT 2015

RKSimon added a comment.

My concern is the massive increase in register pressure in load_sext_16i1_to_16i16.

We can certainly improve vXi1 -> vXiY sign extension lowering (it should be vectorizable using a broadcast + variable shl/mul + immediate sra) but I'm worried that there will be other similar cases that we just don't see in the tests.


================
Comment at: test/CodeGen/X86/sar_fold.ll:1
@@ +1,2 @@
+; RUN: llc < %s -O2 -mtriple=i686-unknown-unknown | FileCheck %s
+
----------------
Remove the -O2

================
Comment at: test/CodeGen/X86/sar_fold64.ll:1
@@ +1,2 @@
+; RUN: llc < %s -O2 -mtriple=x86_64-unknown-unknown | FileCheck %s
+
----------------
Remove the -O2

================
Comment at: test/CodeGen/X86/vector-sext.ll:1615
@@ -1621,3 +1614,3 @@
 
-define <16 x i16> @load_sext_16i1_to_16i16(<16 x i1> *%ptr) {
+define <16 x i16> @load_sext_16i1_to_16i16(<16 x i1> *%ptr) nounwind readnone {
 ; SSE2-LABEL: load_sext_16i1_to_16i16:
----------------
Turns out it didn't help ;-(


Repository:
  rL LLVM

http://reviews.llvm.org/D13161