[PATCH] D28777: [InstCombine][SSE] Add DemandedElts support for PACKSS/PACKUS instructions

Michael Kuperstein via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 17 19:14:45 PST 2017


mkuper added inline comments.


================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:1831
+    APInt DemandedElts = APInt::getAllOnesValue(VWidth);
+    if (Value *V = SimplifyDemandedVectorElts(II, DemandedElts, UndefElts)) {
+      if (V != II)
----------------
This isn't a question about this patch (and maybe I'm not the person who should be reviewing it :-) ), but I'm now confused about how this works at all.

What's the benefit of calling SimplifyDemandedVectorElts() on an instruction when you're demanding all elements, the instruction you're calling this on will also demand all the elements of its operands, and throwing UndefElts away? Is this just so that things have a chance to happen in the middle of the tree?


================
Comment at: lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp:1475
 
+    case Intrinsic::x86_sse2_packssdw_128:
+    case Intrinsic::x86_sse2_packsswb_128:
----------------
Add a TODO for the AVX512 versions?


================
Comment at: lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp:1495
+
+      // v8i16 PACK(v4i32 X, v4i32 Y) - (X[0..3],Y[0..3])
+      // v32i8 PACK(v16i16 X, v16i16 Y) - (X[0..7],Y[0..7]),(X[8..15],Y[8..15])
----------------
Can you make this more verbose? It's a bit opaque unless you already know what the layout looks like.


================
Comment at: lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp:1497
+      // v32i8 PACK(v16i16 X, v16i16 Y) - (X[0..7],Y[0..7]),(X[8..15],Y[8..15])
+      for (unsigned i = 0; i != NumLanes; ++i) {
+        for (unsigned j = 0; j != InnerVWidthPerLane; ++j) {
----------------
i -> Lane, j -> Elt (or something similar)?


================
Comment at: lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp:1527
+      for (unsigned i = 0; i != NumLanes; ++i) {
+        APInt LaneElts2 = UndefElts2.lshr(InnerVWidthPerLane * i);
+        APInt LaneElts3 = UndefElts3.lshr(InnerVWidthPerLane * i);
----------------
Maybe make UndefElts an array of 2 APInts? This will make this code slightly prettier.


Repository:
  rL LLVM

https://reviews.llvm.org/D28777





More information about the llvm-commits mailing list