[PATCH] D92645: [X86] Add X86ISD::SUBV_BROADCAST_LOAD and begin removing X86ISD::SUBV_BROADCAST (PR38969)

Sat Dec 12 04:36:40 PST 2020

RKSimon added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:44573
+        Extract = DAG.getBitcast(RegVT, Extract);
+        return DCI.CombineTo(N, Extract, SDValue(User, 1));
+      }
----------------
yubing wrote:
> RKSimon wrote:
> > yubing wrote:
> > > Hi, Simon. If we have a load and subv_broadcast_load which has the same SRC, the load will be combined into a extract_subv from subv_broadcast_load. But If some optimization delete subv_broadcast_load's other users and make it has only one user i.e. extract_subv, will extract_subv(subv_broadcast_load) roll back to a simple load?
> > Yes the new SimplifyDemandedVectorElts code at line 38024 should handle that.
> Besides, if there are two subv_broadcast_load(e.g. one is 4xi32->8xi32, another is 4xi32->16xi32) which has the same SRC, the subv_broadcast_load(4xi32->8xi32) is not combined into extract_subv(subv_broadcast_load(4xi32->16xi32)) with your current patch. Should we create a combinesubv_broadcast_load() procedure to handle with such case?
> 
> 
> ```
> @ga4 = global <4 x i32> zeroinitializer, align 8
> @gb4 = global <8 x i32> zeroinitializer, align 8
> @gc4 = global <16 x i32> zeroinitializer, align 8
> define void @main(<4 x i32> %a, <8 x i32> %b, <16 x i32> %c) {
> entry:
>   %0 = add <4 x i32> %a, <i32 1, i32 2, i32 3, i32 4>
>   %1 = add <8 x i32> %b, <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
>   %2 = and <8 x i32> %1, <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
>   %3 = add <16 x i32> %c, <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
>   %4 = and <16 x i32> %3, <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
>   store <4 x i32> %0, <4 x i32>* @ga4, align 8
>   store <8 x i32> %2, <8 x i32>* @gb4, align 8
>   store <16 x i32> %4, <16 x i32>* @gc4, align 8
>   ret void
> }
> ```
> 
Sure - I can look at adding that as part of this patch, although I don't think the existing X86ISD::SUBV_BROADCAST code does a good job of this either.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92645/new/

https://reviews.llvm.org/D92645