[llvm] 9fcd212 - [X86] Remove isel patterns from broadcast of loadi32.

Mon Mar 2 12:27:52 PST 2020

Craig,

I might not be understanding you correctly, but on the surface, this 
seems like a fairly common case.  Wouldn't something like the following 
trigger this?

struct T {
   uint64_t j;
   uint8_t k;
}

void foo(uint64_t *a, struct T& t)
for (int i = 0; i < N; i++) {
   a[i] += (uint64_t)t.k;
}

Given an 8 byte alignment of objects, and a packed layout the 8 bit 
field would have an 8 byte starting alignment.  After vectorization, I'd 
expect to see a load of the field outside the loop followed by an extend 
and broadcast to VF x i64.  Wouldn't that create exactly the pattern you 
removed?

Philip

On 2/28/20 4:39 PM, Craig Topper via llvm-commits wrote:
> Author: Craig Topper
> Date: 2020-02-28T16:39:27-08:00
> New Revision: 9fcd212e2f678fdbdf304399a1e58ca490dc54d1
>
> URL: https://github.com/llvm/llvm-project/commit/9fcd212e2f678fdbdf304399a1e58ca490dc54d1
> DIFF: https://github.com/llvm/llvm-project/commit/9fcd212e2f678fdbdf304399a1e58ca490dc54d1.diff
>
> LOG: [X86] Remove isel patterns from broadcast of loadi32.
>
> We already combine non extending loads with broadcasts in DAG
> combine. All these patterns are picking up is the aligned extload
> special case. But the only lit test we have that exercsises it is
> using v8i1 load that datalayout is reporting align 8 for. That
> seems generous. So without a realistic test case I don't think
> there is much value in these patterns.
>
> Added:
>      
>
> Modified:
>      llvm/lib/Target/X86/X86InstrAVX512.td
>      llvm/lib/Target/X86/X86InstrSSE.td
>      llvm/test/CodeGen/X86/vector-sext.ll
>
> Removed:
>      
>
>
> ################################################################################
> diff  --git a/llvm/lib/Target/X86/X86InstrAVX512.td b/llvm/lib/Target/X86/X86InstrAVX512.td
> index a2bd6a2853a0..1d3ef67c9d3d 100644
> --- a/llvm/lib/Target/X86/X86InstrAVX512.td
> +++ b/llvm/lib/Target/X86/X86InstrAVX512.td
> @@ -1427,10 +1427,6 @@ let Predicates = [HasAVX512] in {
>     // 32-bit targets will fail to load a i64 directly but can use ZEXT_LOAD.
>     def : Pat<(v8i64 (X86VBroadcast (v2i64 (X86vzload64 addr:$src)))),
>               (VPBROADCASTQZrm addr:$src)>;
> -
> -  // FIXME this is to handle aligned extloads from i8.
> -  def : Pat<(v16i32 (X86VBroadcast (loadi32 addr:$src))),
> -            (VPBROADCASTDZrm addr:$src)>;
>   }
>   
>   let Predicates = [HasVLX] in {
> @@ -1439,12 +1435,6 @@ let Predicates = [HasVLX] in {
>               (VPBROADCASTQZ128rm addr:$src)>;
>     def : Pat<(v4i64 (X86VBroadcast (v2i64 (X86vzload64 addr:$src)))),
>               (VPBROADCASTQZ256rm addr:$src)>;
> -
> -  // FIXME this is to handle aligned extloads from i8.
> -  def : Pat<(v4i32 (X86VBroadcast (loadi32 addr:$src))),
> -            (VPBROADCASTDZ128rm addr:$src)>;
> -  def : Pat<(v8i32 (X86VBroadcast (loadi32 addr:$src))),
> -            (VPBROADCASTDZ256rm addr:$src)>;
>   }
>   let Predicates = [HasVLX, HasBWI] in {
>     // loadi16 is tricky to fold, because !isTypeDesirableForOp, justifiably.
>
> diff  --git a/llvm/lib/Target/X86/X86InstrSSE.td b/llvm/lib/Target/X86/X86InstrSSE.td
> index e66f15747787..73bba723ab96 100644
> --- a/llvm/lib/Target/X86/X86InstrSSE.td
> +++ b/llvm/lib/Target/X86/X86InstrSSE.td
> @@ -7529,12 +7529,6 @@ let Predicates = [HasAVX2, NoVLX] in {
>               (VPBROADCASTQrm addr:$src)>;
>     def : Pat<(v4i64 (X86VBroadcast (v2i64 (X86vzload64 addr:$src)))),
>               (VPBROADCASTQYrm addr:$src)>;
> -
> -  // FIXME this is to handle aligned extloads from i8/i16.
> -  def : Pat<(v4i32 (X86VBroadcast (loadi32 addr:$src))),
> -            (VPBROADCASTDrm addr:$src)>;
> -  def : Pat<(v8i32 (X86VBroadcast (loadi32 addr:$src))),
> -            (VPBROADCASTDYrm addr:$src)>;
>   }
>   let Predicates = [HasAVX2, NoVLX_Or_NoBWI] in {
>     // loadi16 is tricky to fold, because !isTypeDesirableForOp, justifiably.
>
> diff  --git a/llvm/test/CodeGen/X86/vector-sext.ll b/llvm/test/CodeGen/X86/vector-sext.ll
> index 44ba29d978e2..0b35db5cadb2 100644
> --- a/llvm/test/CodeGen/X86/vector-sext.ll
> +++ b/llvm/test/CodeGen/X86/vector-sext.ll
> @@ -2259,7 +2259,8 @@ define <8 x i32> @load_sext_8i1_to_8i32(<8 x i1> *%ptr) {
>   ;
>   ; AVX2-LABEL: load_sext_8i1_to_8i32:
>   ; AVX2:       # %bb.0: # %entry
> -; AVX2-NEXT:    vpbroadcastd (%rdi), %ymm0
> +; AVX2-NEXT:    vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero
> +; AVX2-NEXT:    vpbroadcastd %xmm0, %ymm0
>   ; AVX2-NEXT:    vmovdqa {{.*#+}} ymm1 = [1,2,4,8,16,32,64,128]
>   ; AVX2-NEXT:    vpand %ymm1, %ymm0, %ymm0
>   ; AVX2-NEXT:    vpcmpeqd %ymm1, %ymm0, %ymm0
>
>
>          
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits