[PATCH] D153972: [AArch64] Fold tree of offset loads combine

Thu Jun 29 00:45:24 PDT 2023

dmgreen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:18406
+        B.getOperand(1).getOpcode() != ISD::CONCAT_VECTORS ||
+        B.getOperand(1).getNumOperands() != 4)
+      return false;
----------------
SjoerdMeijer wrote:
> Not sure if it deserve a comment, but why the 4 here?
This code is just trying to match the instructions in the above comment, which is the concat of 4 loads through the shuffles. This code is a bit of a shame as we don't reliably visit operands before the root node, and will not revisit the root after optimizing the leaf operands. See D152928, which should fix this so we can rely only on the BUILD_VECTOR code above. That looks like it will take a long time to work through all the regressions though, and in the meantime I didn't think it was best to make something very complex just for it to be removed later.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:18462
+
+  if (Op0.getOpcode() != Op1.getOpcode() || !Op0.hasOneUse() ||
+      !Op1.hasOneUse())
----------------
SjoerdMeijer wrote:
> Nit: this can be checked first in the function as an early exit before doing more work in `isLoadOrMultipleLoads`? 
The `Op0.getOpcode() != Op1.getOpcode()` part may be false for loads due to the issue with not reliably simplifying nodes before operands. The full tree we are folding is this, where the LHS and RHS are not equally simplified:
```
                t24: i64 = add nuw t22, Constant:i64<4>
              t160: f32,ch = load<(load (s32) from %ir.19, align 1, !tbaa !5)> t0, t24, undef:i64
                t20: i64 = add nuw t18, Constant:i64<4>
              t161: f32,ch = load<(load (s32) from %ir.15, align 1, !tbaa !5)> t0, t20, undef:i64
                t16: i64 = add nuw t14, Constant:i64<4>
              t162: f32,ch = load<(load (s32) from %ir.11, align 1, !tbaa !5)> t0, t16, undef:i64
                t12: i64 = add nuw t2, Constant:i64<4>
              t151: f32,ch = load<(load (s32) from %ir.7, align 1, !tbaa !5)> t0, t12, undef:i64 
            t168: v4f32 = BUILD_VECTOR t160, t161, t162, t151
          t167: v16i8 = bitcast t168
        t122: v16i16 = zero_extend t167
                t25: i64 = add nuw t23, Constant:i64<4>
              t139: f32,ch = load<(load (s32) from %ir.20, align 1, !tbaa !5)> t0, t25, undef:i64
                t21: i64 = add nuw t19, Constant:i64<4>                                          
              t140: f32,ch = load<(load (s32) from %ir.16, align 1, !tbaa !5)> t0, t21, undef:i64
                t17: i64 = add nuw t15, Constant:i64<4>                                          
              t141: f32,ch = load<(load (s32) from %ir.12, align 1, !tbaa !5)> t0, t17, undef:i64
                t13: i64 = add nuw t6, Constant:i64<4>                                           
              t130: f32,ch = load<(load (s32) from %ir.8, align 1, !tbaa !5)> t0, t13, undef:i64 
            t147: v4f32 = BUILD_VECTOR t139, t140, t141, t130                                    
          t146: v16i8 = bitcast t147
        t123: v16i16 = zero_extend t146
      t124: v16i16 = sub t122, t123
    t126: v16i32 = any_extend t124
    t72: v16i32 = BUILD_VECTOR Constant:i32<16>, Constant:i32<16>, ...                           
  t73: v16i32 = shl nsw t126, t72
            t206: f32,ch = load<(load (s32) from %ir.17, align 1, !tbaa !5)> t0, t22, undef:i64
            t207: f32,ch = load<(load (s32) from %ir.13, align 1, !tbaa !5)> t0, t18, undef:i64
            t208: f32,ch = load<(load (s32) from %ir.9, align 1, !tbaa !5)> t0, t14, undef:i64
            t197: f32,ch = load<(load (s32) from %ir.0, align 1, !tbaa !5)> t0, t2, undef:i64
          t214: v4f32 = BUILD_VECTOR t206, t207, t208, t197
        t213: v16i8 = bitcast t214
      t169: v16i16 = zero_extend t213
            t185: f32,ch = load<(load (s32) from %ir.18, align 1, !tbaa !5)> t0, t23, undef:i64  
            t186: f32,ch = load<(load (s32) from %ir.14, align 1, !tbaa !5)> t0, t19, undef:i64  
            t187: f32,ch = load<(load (s32) from %ir.10, align 1, !tbaa !5)> t0, t15, undef:i64  
            t176: f32,ch = load<(load (s32) from %ir.2, align 1, !tbaa !5)> t0, t6, undef:i64
          t193: v4f32 = BUILD_VECTOR t185, t186, t187, t176
        t192: v16i8 = bitcast t193
      t170: v16i16 = zero_extend t192
    t171: v16i16 = sub t169, t170
  t172: v16i32 = sign_extend t171
t74: v16i32 = add nsw t73, t172
```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153972/new/

https://reviews.llvm.org/D153972