[PATCH] D133850: [AArch64] Improve codegen for "trunc <4 x i64> to <4 x i8>" for all cases
Mingming Liu via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 28 00:08:45 PDT 2022
mingmingl added a comment.
Thanks for the work. This LGTM overall. However, I don't consider myself a qualified reviewer as activity history shows, but willing to share and help by reviewing :-)
Following https://llvm.org/docs/CodeReview.html#lgtm-how-a-patch-is-accepted, I think we could wait a little bit more for feedback (or approval :)) from other reviewers.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17843-17844
+ bool IsLittleEndian = DAG.getDataLayout().isLittleEndian();
+ if (IsLittleEndian && VT.isSimple() && VT.is64BitVector() &&
+ VT.getVectorNumElements() >= 2) {
+
----------------
nit pick about style:
It's more idiomatic to bail out early, see https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17876-17881
+ SDValue SourceOp0 = getSourceOp(Op0);
+ SDValue SourceOp1 = getSourceOp(Op1);
+ EVT DestVT = VT;
+
+ if (!SourceOp0 || !SourceOp1)
+ return SDValue();
----------------
nittest nit:
I'd probably move these before `auto HalfElementSize = lambda function`, and move 'HalfElementSize' closer to where they are used.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17883-17887
+ SDValue UzpOp0, UzpOp1, UzpResult;
+ if (HalfElementSize(SourceOp0, UzpOp0) &&
+ HalfElementSize(SourceOp1, UzpOp1)) {
+ if (UzpOp0.getValueType() != UzpOp1.getValueType())
+ return SDValue();
----------------
nit pick:
This is effectively require Uzp1 Op0 and Op1 have the same result type for correctness.
To avoid nested if and reduce indentation, probably move this out of `if (HalfElementSize(SourceOp0, UzpOp0) &&HalfElementSize(SourceOp1, UzpOp1))`, something like:
```
if (SourceOp0.getSimpleValueType() != SourceOp1.getSimpleValueType())
return SDValue();
if (HalfElementSize(Op0..) && HalfElementSize(Op1..)) {
assert (UzpOp0.getValueType() == UzpOp1.getValueType());
...
}
```
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17896
+
+ switch (DestVT.getSimpleVT().SimpleTy) {
+ case MVT::v2i32:
----------------
I wonder if it makes code more readable and succinct by combining this switch with the switch inside `HalfElementSize` above, given that `BitcastResultTy` and `Uzp1ResultTy` are co-related (i.e. 'BitcastResultTy' == 'TruncOperandTy', and 'TruncOperandTy' == bitcast 'Uzp1ResultTy' to double-element-size)
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17906-17907
+ break;
+ default:
+ return SDValue();
+ }
----------------
nit pick: this default should be 'llvm_unreachable' based on the context (since `HalfElementSize` switch rules out invalid cases).
(See the other comment) I wonder if merging two switches makes the code shorter without harming readability.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:17630-17633
+ if (HalfElementSize(SourceOp0, UzpOp0) &&
+ HalfElementSize(SourceOp1, UzpOp1))
+ UzpResult = DAG.getNode(AArch64ISD::UZP1, DL, UzpOp0.getValueType(),
+ UzpOp0, UzpOp1);
----------------
0x59616e wrote:
> mingmingl wrote:
> > If we see through `BITCAST` for Op0 and Op1 respectively but UzpOp0 and UzpOp1 have different return type (that could be casted to the same type), the current approach will feed `AArch64ISD::UZP1` with two SDValue of different type(see test case in https://godbolt.org/z/TT4ErT5Mf)
> >
> > On the other hand, `AArch64ISD::UZP1` expects two operands have the same value type ([SDTypeProfile](https://github.com/llvm/llvm-project/blob/4ba1f04465859fc56514bd4dca21066eea595294/llvm/lib/Target/AArch64/AArch64InstrInfo.td#L284))
> >
> > For little-endian, BITCAST both operands to the type of return value here should work.
> My algorithm requires the two trunc operates on the same type. If not, it won't work correctly. Take this for example:
>
> x0 x1 x2 x3 x4 x5 x6 x7 y0 y1 y2 y3 y4 y5 y6 y7
>
> Assume we trunc the left one from v2i32 to v2i16, the right one from v4i16 to v4i18, we get these:
>
>
> x0 x1 x2 x3 x4 x5 x6 x7 ______ y0 y1 y2 y3 y4 y5 y6 y7
> _______x _x________x_x___________ x_____x_____x____x
>
> These two are asymmetric, we cannot reproduce the same result with `uzp1` since it operates on the two operands with the same action, i.e. it can only produce this:
>
> x0 x1 x2 x3 x4 x5 x6 x7______y0 y1 y2 y3 y4 y5 y6 y7
> ______ x _x________x_x_____________x _x________x__x
>
> or this:
>
> x0 x1 x2 x3 x4 x5 x6 x7______y0 y1 y2 y3 y4 y5 y6 y7
> ___x_____x_____x_____x__________x_____x____x_____x
>
> But not both at the same time.
>
> I bail out if the type of UzpOp0 and UzpOp1 has discrepancy.
>
>
Thanks for the example. The update to require the same operand type sounds reasonable.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D133850/new/
https://reviews.llvm.org/D133850
More information about the llvm-commits
mailing list