[PATCH] D109348: [X86][AVX] Prohibit creating X86ISD::VBROADCAST(128->256) when it is AVX in combineConcatVectorOps

Simon Pilgrim via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 8 04:57:08 PDT 2021


RKSimon added inline comments.


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:50997
+    // Before AVX2, we don't have vbroadcast instruction(128->256), we prohibit
+    // creating such vbroadcast.
+    if (Op0.getOpcode() == X86ISD::VBROADCAST &&
----------------
yubing wrote:
> RKSimon wrote:
> > yubing wrote:
> > > RKSimon wrote:
> > > > Why not add isel patterns to handle 128 ->256 like we do for scalar broadcasts?
> > > I didn't got your point. Would you show the code?
> > > BTW, X86ISD::VBROADCAST(A:v2f64->B:v4f64) is broadcasting A's low f64 to B's four elements, the corresponding instruction only exist in avx2(or after avx2). Do you mean lowering into broadcast_load or something else in isel pattern?
> > This is what I had in mind - expand what is already in X86InstrSSE.td:
> > ```
> > let Predicates = [HasAVX1Only] in {
> >   def : Pat<(v4f32 (X86VBroadcast FR32:$src)),
> >             (VPERMILPSri (v4f32 (COPY_TO_REGCLASS FR32:$src, VR128)), 0)>;
> >   def : Pat<(v8f32 (X86VBroadcast FR32:$src)),
> >             (VINSERTF128rr (INSERT_SUBREG (v8f32 (IMPLICIT_DEF)),
> >               (v4f32 (VPERMILPSri (v4f32 (COPY_TO_REGCLASS FR32:$src, VR128)), 0)), sub_xmm),
> >               (v4f32 (VPERMILPSri (v4f32 (COPY_TO_REGCLASS FR32:$src, VR128)), 0)), 1)>;
> >   def : Pat<(v8f32 (X86VBroadcast v4f32:$src)),
> >             (VINSERTF128rr (INSERT_SUBREG (v8f32 (IMPLICIT_DEF)),
> >               (v4f32 (VPERMILPSri VR128:$src, 0)), sub_xmm),
> >               (v4f32 (VPERMILPSri VR128:$src, 0)), 1)>;
> >   def : Pat<(v4f64 (X86VBroadcast FR64:$src)),
> >             (VINSERTF128rr (INSERT_SUBREG (v4f64 (IMPLICIT_DEF)),
> >               (v2f64 (VMOVDDUPrr (v2f64 (COPY_TO_REGCLASS FR64:$src, VR128)))), sub_xmm),
> >               (v2f64 (VMOVDDUPrr (v2f64 (COPY_TO_REGCLASS FR64:$src, VR128)))), 1)>;
> >   def : Pat<(v4f64 (X86VBroadcast v2f64:$src)),
> >             (VINSERTF128rr (INSERT_SUBREG (v4f64 (IMPLICIT_DEF)),
> >               (v2f64 (VMOVDDUPrr VR128:$src)), sub_xmm),
> >               (v2f64 (VMOVDDUPrr VR128:$src)), 1)>;
> > 
> >   def : Pat<(v4i32 (X86VBroadcast GR32:$src)),
> >             (VPSHUFDri (VMOVDI2PDIrr GR32:$src), 0)>;
> >   def : Pat<(v8i32 (X86VBroadcast GR32:$src)),
> >             (VINSERTF128rr (INSERT_SUBREG (v8i32 (IMPLICIT_DEF)),
> >               (v4i32 (VPSHUFDri (VMOVDI2PDIrr GR32:$src), 0)), sub_xmm),
> >               (v4i32 (VPSHUFDri (VMOVDI2PDIrr GR32:$src), 0)), 1)>;
> >   def : Pat<(v8i32 (X86VBroadcast v4i32:$src)),
> >             (VINSERTF128rr (INSERT_SUBREG (v8i32 (IMPLICIT_DEF)),
> >               (v4i32 (VPSHUFDri VR128:$src, 0)), sub_xmm),
> >               (v4i32 (VPSHUFDri VR128:$src, 0)), 1)>;
> >   def : Pat<(v4i64 (X86VBroadcast GR64:$src)),
> >             (VINSERTF128rr (INSERT_SUBREG (v4i64 (IMPLICIT_DEF)),
> >               (v4i32 (VPSHUFDri (VMOV64toPQIrr GR64:$src), 0x44)), sub_xmm),
> >               (v4i32 (VPSHUFDri (VMOV64toPQIrr GR64:$src), 0x44)), 1)>;
> >   def : Pat<(v4i64 (X86VBroadcast v2i64:$src)),
> >             (VINSERTF128rr (INSERT_SUBREG (v4i64 (IMPLICIT_DEF)),
> >               (v4i32 (VPSHUFDri VR128:$src, 0x44)), sub_xmm),
> >               (v4i32 (VPSHUFDri VR128:$src, 0x44)), 1)>;
> > 
> >   def : Pat<(v2i64 (X86VBroadcast i64:$src)),
> >             (VPSHUFDri (VMOV64toPQIrr GR64:$src), 0x44)>;
> >   def : Pat<(v2i64 (X86VBroadcastld64 addr:$src)),
> >             (VMOVDDUPrm addr:$src)>;
> > }
> > ```
> It confused me now because what you said in comments is expanding vbroadcast into several instructions while what we do in lowering is combine several nodes into vbroadcast. I think there is redundant work here. Shouldn't we prohibit redundant combine at the beginning?
Creating the broadcast node in DAG (even if it expands later on) has the nice effect that the input will often then have hasOneUse, which permits further load folding, optimizations, SimplifyDemandedVectorElts etc.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109348/new/

https://reviews.llvm.org/D109348



More information about the llvm-commits mailing list