[PATCH] D130487: [PowerPC] Fix vector_shuffle combines when inputs are scalar_to_vector of differing types.

Mon Oct 31 07:31:53 PDT 2022

nemanjai added a comment.

Can you also comment on whether this was thoroughly tested on both little endian and big endian systems (bootstrap, test-suite, SPEC, additional internal tests).

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:14917-14918
+    // to the size of the scalar input to the SCALAR_TO_VECTOR later on.
+    int LHSScalarSize = 128;
+    int RHSScalarSize = 128;

----------------
I don't follow why we need these here. They both seem to only be needed in the respective conditions (i.e. depending on whether the LHS/RHS are `scalar_to_vector` nodes). And within those conditional blocks, they are reset before they're used.

So why do we need to define them here and initialize them to the width of a vector?

================
Comment at: llvm/test/CodeGen/PowerPC/v16i8_scalar_to_vector_shuffle.ll:267

 define <16 x i8> @test_v16i8_v8i16(i16 %arg, i8 %arg1) {
 ; CHECK-LE-P8-LABEL: test_v16i8_v8i16:
----------------
The code for this one gets worse on big endian. Do we know why?

================
Comment at: llvm/test/CodeGen/PowerPC/v16i8_scalar_to_vector_shuffle.ll:347

 define <16 x i8> @test_v8i16_v16i8(i16 %arg, i8 %arg1) {
 ; CHECK-LE-P8-LABEL: test_v8i16_v16i8:
----------------
The code for this one gets worse on big endian. Do we know why?

================
Comment at: llvm/test/CodeGen/PowerPC/v16i8_scalar_to_vector_shuffle.ll:578

 define <16 x i8> @test_v16i8_v4i32(i8 %arg, i32 %arg1, <16 x i8> %a, <4 x i32> %b) {
 ; CHECK-LE-P8-LABEL: test_v16i8_v4i32:
----------------
The code for this one gets worse on big endian. Do we know why?

================
Comment at: llvm/test/CodeGen/PowerPC/v16i8_scalar_to_vector_shuffle.ll:659

 define <16 x i8> @test_v4i32_v16i8(i32 %arg, i8 %arg1) {
 ; CHECK-LE-P8-LABEL: test_v4i32_v16i8:
----------------
The code for this one gets worse on big endian. Do we know why?

================
Comment at: llvm/test/CodeGen/PowerPC/v16i8_scalar_to_vector_shuffle.ll:1431

 define <16 x i8> @test_v8i16_v4i32(<8 x i16> %a, <4 x i32> %b, i16 %arg, i32 %arg1) {
 ; CHECK-LE-P8-LABEL: test_v8i16_v4i32:
----------------
The code for this one gets worse on big endian. Do we know why?

================
Comment at: llvm/test/CodeGen/PowerPC/v16i8_scalar_to_vector_shuffle.ll:1658

 define <16 x i8> @test_v4i32_v8i16(i32 %arg, i16 %arg1) {
 ; CHECK-LE-P8-LABEL: test_v4i32_v8i16:
----------------
The code for this one gets worse on big endian. Do we know why?

There are probably a bunch of other places. Can you please review what is happening there? I'll stop adding further similar comments.

================
Comment at: llvm/test/CodeGen/PowerPC/v4i32_scalar_to_vector_shuffle.ll:123

 define void @test_v8i16_none(ptr %a) {
 ; CHECK-LE-P8-LABEL: test_v8i16_none:
----------------
The code generated for this one gets worse on all subtargets. Do we know why?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130487/new/

https://reviews.llvm.org/D130487