[PATCH] D140069: [DAGCombiner] Scalarize vectorized loads that are splatted

Thu Dec 15 21:33:03 PST 2022

pengfei added inline comments.

================
Comment at: llvm/test/CodeGen/X86/half.ll:1342-1344
+; BWON-F16C-NEXT:    vpinsrw $0, 8(%rdi), %xmm0, %xmm0
+; BWON-F16C-NEXT:    vpshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
+; BWON-F16C-NEXT:    vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
----------------
luke wrote:
> @pengfei This looks like a regression, the scalarized load t18 gets selected as `VPINSRWrm`
> 
> ```
>   t0: ch,glue = EntryToken
>                     t2: i64,ch = CopyFromReg t0, Register:i64 %0
>                   t17: i64 = add t2, Constant:i64<8>
>                 t18: f16,ch = load<(load (s16) from %ir.p + 8, align 8)> t0, t17, undef:i64
>               t21: v8f16 = scalar_to_vector t18
>             t23: v8i16 = bitcast t21
>           t28: v8i16 = X86ISD::PSHUFLW t23, TargetConstant:i8<0>
>         t29: v4i32 = bitcast t28
>       t30: v4i32 = X86ISD::PSHUFD t29, TargetConstant:i8<0>
>     t36: v8f16 = bitcast t30
>   t10: ch,glue = CopyToReg t0, Register:v8f16 $xmm0, t36
>   t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:v8f16 $xmm0, t10:1
> ```
Right. I think this is a special case. We don't have native scalar instructions to load/store `half` or `bfloat` in old targets. Instead, we have to use the more expensive pinsrw/pextrw to emulate. Which makes scalar load/store operations are suboptimal to vector ones.
Have you noticed if other targets have a similar problem. It's better if we can find a way to avoid the regression, otherwise, I think we can add FIXME at the moment.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140069/new/

https://reviews.llvm.org/D140069