[PATCH] D133850: [AArch64] Improve codegen for "trunc <4 x i64> to <4 x i8>" for all cases

Sheng via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 27 20:12:28 PDT 2022


0x59616e added inline comments.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17589-17591
+    assert((SimpleVT == MVT::v4i16 || SimpleVT == MVT::v2i32 ||
+            SimpleVT == MVT::v8i8) &&
+           "Expect SimpleVT to be one of v4i16 or v2i32 or v8i8");
----------------
mingmingl wrote:
> (I wrote `assert` in the previous work as well)
> 
> On a second thought, it's more future proof to bail out if type is not one of {v4i116, v2i32, v8i8} in this context, given that UZP1 SDNode definition doesn't require vector element type to be integer (i.e. v4f16 is ok for compilation)
> 
> Something like
> 
> ```
> Type val;
> switch (SimpleVT) {
>   case valid-case1:
>     val = ...;
>     break;
>   case valid-case2;
>     val = ...
>     break;
>   default:
>     break;
> }
> 
> if val is not set
>   bail out
> ```
> 
> 
I use switch to implement this logic at line 17892 in the latest diff.


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17630-17633
+    if (HalfElementSize(SourceOp0, UzpOp0) &&
+        HalfElementSize(SourceOp1, UzpOp1))
+      UzpResult = DAG.getNode(AArch64ISD::UZP1, DL, UzpOp0.getValueType(),
+                              UzpOp0, UzpOp1);
----------------
mingmingl wrote:
> If we see through `BITCAST` for Op0 and Op1 respectively but UzpOp0 and UzpOp1 have different return type (that could be casted to the same type), the current approach will feed `AArch64ISD::UZP1` with two SDValue of different type(see test case in https://godbolt.org/z/TT4ErT5Mf)
> 
> On the other hand, `AArch64ISD::UZP1` expects two operands have the same value type ([SDTypeProfile](https://github.com/llvm/llvm-project/blob/4ba1f04465859fc56514bd4dca21066eea595294/llvm/lib/Target/AArch64/AArch64InstrInfo.td#L284))
> 
> For little-endian, BITCAST both operands to the type of return value here should work.
My algorithm requires the two trunc operates on the same type. If not, it won't work correctly. Take this for example:

x0 x1 x2 x3 x4 x5 x6 x7        y0 y1 y2 y3 y4 y5 y6 y7

Assume we trunc the left one from v2i32 to v2i16, the right one from v4i16 to v4i18, we get these:


x0 x1 x2 x3 x4 x5 x6 x7 ______ y0 y1 y2 y3 y4 y5 y6 y7
_______x _x________x_x___________ x_____x_____x____x

These two are asymmetric, we cannot reproduce the same result with `uzp1` since it operates on the two operands with the same action, i.e. it can only produce this:

x0 x1 x2 x3 x4 x5 x6 x7______y0 y1 y2 y3 y4 y5 y6 y7
______ x _x________x_x_____________x  _x________x__x

or this:

x0 x1 x2 x3 x4 x5 x6 x7______y0 y1 y2 y3 y4 y5 y6 y7
___x_____x_____x_____x__________x_____x____x_____x

But not both at the same time.

I bail out if the type of UzpOp0 and UzpOp1 has discrepancy.




CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133850/new/

https://reviews.llvm.org/D133850



More information about the llvm-commits mailing list