[PATCH] D133850: [AArch64] Improve codegen for "trunc <4 x i64> to <4 x i8>" for all cases

Wed Sep 28 00:08:45 PDT 2022

mingmingl added a comment.

Thanks for the work. This LGTM overall. However, I don't consider myself a qualified reviewer as activity history shows, but willing to share and help by reviewing :-)

Following https://llvm.org/docs/CodeReview.html#lgtm-how-a-patch-is-accepted,  I think we could wait a little bit more for feedback (or approval :)) from other reviewers.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17843-17844
+  bool IsLittleEndian = DAG.getDataLayout().isLittleEndian();
+  if (IsLittleEndian && VT.isSimple() && VT.is64BitVector() &&
+      VT.getVectorNumElements() >= 2) {
+
----------------
nit pick about style:

It's more idiomatic to bail out early, see https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17876-17881
+    SDValue SourceOp0 = getSourceOp(Op0);
+    SDValue SourceOp1 = getSourceOp(Op1);
+    EVT DestVT = VT;
+
+    if (!SourceOp0 || !SourceOp1)
+      return SDValue();
----------------
nittest nit:

I'd probably move these before `auto HalfElementSize = lambda function`, and move 'HalfElementSize' closer to where they are used. 

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17883-17887
+    SDValue UzpOp0, UzpOp1, UzpResult;
+    if (HalfElementSize(SourceOp0, UzpOp0) &&
+        HalfElementSize(SourceOp1, UzpOp1)) {
+      if (UzpOp0.getValueType() != UzpOp1.getValueType())
+        return SDValue();
----------------
nit pick:

This is effectively require Uzp1 Op0 and Op1 have the same result type for correctness.

To avoid nested if and reduce indentation, probably move this out of `if (HalfElementSize(SourceOp0, UzpOp0) &&HalfElementSize(SourceOp1, UzpOp1))`, something like:

```
if (SourceOp0.getSimpleValueType() != SourceOp1.getSimpleValueType())
   return SDValue();

if (HalfElementSize(Op0..) && HalfElementSize(Op1..)) {
  assert (UzpOp0.getValueType() == UzpOp1.getValueType());
  ...
}
```

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17896
+
+    switch (DestVT.getSimpleVT().SimpleTy) {
+    case MVT::v2i32:
----------------
I wonder if it makes code more readable and succinct by combining this switch with the switch inside `HalfElementSize` above, given that `BitcastResultTy` and `Uzp1ResultTy` are co-related (i.e. 'BitcastResultTy' == 'TruncOperandTy', and 'TruncOperandTy' == bitcast 'Uzp1ResultTy' to double-element-size)

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17906-17907
+      break;
+    default:
+      return SDValue();
+    }
----------------
nit pick: this default should be 'llvm_unreachable' based on the context (since `HalfElementSize` switch rules out invalid cases).

(See the other comment) I wonder if merging two switches makes the code shorter without harming readability.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17630-17633
+    if (HalfElementSize(SourceOp0, UzpOp0) &&
+        HalfElementSize(SourceOp1, UzpOp1))
+      UzpResult = DAG.getNode(AArch64ISD::UZP1, DL, UzpOp0.getValueType(),
+                              UzpOp0, UzpOp1);
----------------
0x59616e wrote:
> mingmingl wrote:
> > If we see through `BITCAST` for Op0 and Op1 respectively but UzpOp0 and UzpOp1 have different return type (that could be casted to the same type), the current approach will feed `AArch64ISD::UZP1` with two SDValue of different type(see test case in https://godbolt.org/z/TT4ErT5Mf)
> > 
> > On the other hand, `AArch64ISD::UZP1` expects two operands have the same value type ([SDTypeProfile](https://github.com/llvm/llvm-project/blob/4ba1f04465859fc56514bd4dca21066eea595294/llvm/lib/Target/AArch64/AArch64InstrInfo.td#L284))
> > 
> > For little-endian, BITCAST both operands to the type of return value here should work.
> My algorithm requires the two trunc operates on the same type. If not, it won't work correctly. Take this for example:
> 
> x0 x1 x2 x3 x4 x5 x6 x7        y0 y1 y2 y3 y4 y5 y6 y7
> 
> Assume we trunc the left one from v2i32 to v2i16, the right one from v4i16 to v4i18, we get these:
> 
> 
> x0 x1 x2 x3 x4 x5 x6 x7 ______ y0 y1 y2 y3 y4 y5 y6 y7
> _______x _x________x_x___________ x_____x_____x____x
> 
> These two are asymmetric, we cannot reproduce the same result with `uzp1` since it operates on the two operands with the same action, i.e. it can only produce this:
> 
> x0 x1 x2 x3 x4 x5 x6 x7______y0 y1 y2 y3 y4 y5 y6 y7
> ______ x _x________x_x_____________x  _x________x__x
> 
> or this:
> 
> x0 x1 x2 x3 x4 x5 x6 x7______y0 y1 y2 y3 y4 y5 y6 y7
> ___x_____x_____x_____x__________x_____x____x_____x
> 
> But not both at the same time.
> 
> I bail out if the type of UzpOp0 and UzpOp1 has discrepancy.
> 
> 
Thanks for the example. The update to require the same operand type sounds reasonable.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133850/new/

https://reviews.llvm.org/D133850