[PATCH] D146494: [X86] Combine constant vector inputs for FMA

Evgenii Kudriashov via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 28 19:02:23 PDT 2023


e-kud added inline comments.


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:54037
+    if (Subtarget.hasAVX2() && V.hasOneUse() &&
+        ISD::isConstantSplatVector(V.getNode(), SplatValue)) {
+      SmallVector<SDValue, 8> Ops;
----------------
goldstein.w.n wrote:
> goldstein.w.n wrote:
> > goldstein.w.n wrote:
> > > RKSimon wrote:
> > > > RKSimon wrote:
> > > > > pengfei wrote:
> > > > > > Why just limit to splat constant?
> > > > > Also - why the AVX2 and hasOneUse limits?
> > > > Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants
> > > > Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants
> > > 
> > > That should be commented explicitly as these subtle checks to prevent infinite loops can easily be removed accidentally.
> > > 
> > > Also why AVX2 still?
> > > Also - why the AVX2 and hasOneUse limits?
> > 
> > 
> > Why just limit to splat constant?
> 
> Should be match on ImmConstant no?
> Also - why the AVX2 and hasOneUse limits?

> Why just limit to splat constant?

This is to limit to broadcasts, probably it is better to work with broadcast explicitly.

I tried more general approach but it seems not much profitable. E.g. 
```
@@ -1,7 +1,7 @@
-; FMA3-LABEL: negated_constant_v4f64:
 ; FMA3:       # %bb.0:
 ; FMA3-NEXT:    vmovapd {{.*#+}} ymm2 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
-; FMA3-NEXT:    vfmadd213pd {{.*#+}} ymm2 = (ymm0 * ymm2) + ymm1
-; FMA3-NEXT:    vfmadd231pd {{.*#+}} ymm1 = (ymm0 * mem) + ymm1
-; FMA3-NEXT:    vaddpd %ymm1, %ymm2, %ymm0
+; FMA3-NEXT:    vmovapd %ymm2, %ymm3
+; FMA3-NEXT:    vfmadd213pd {{.*#+}} ymm3 = (ymm0 * ymm3) + ymm1
+; FMA3-NEXT:    vfnmadd213pd {{.*#+}} ymm2 = -(ymm0 * ymm2) + ymm1
+; FMA3-NEXT:    vaddpd %ymm2, %ymm3, %ymm0
 ; FMA3-NEXT:    retq
```

What do you think?


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:54037
+    if (Subtarget.hasAVX2() && V.hasOneUse() &&
+        ISD::isConstantSplatVector(V.getNode(), SplatValue)) {
+      SmallVector<SDValue, 8> Ops;
----------------
e-kud wrote:
> goldstein.w.n wrote:
> > goldstein.w.n wrote:
> > > goldstein.w.n wrote:
> > > > RKSimon wrote:
> > > > > RKSimon wrote:
> > > > > > pengfei wrote:
> > > > > > > Why just limit to splat constant?
> > > > > > Also - why the AVX2 and hasOneUse limits?
> > > > > Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants
> > > > > Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants
> > > > 
> > > > That should be commented explicitly as these subtle checks to prevent infinite loops can easily be removed accidentally.
> > > > 
> > > > Also why AVX2 still?
> > > > Also - why the AVX2 and hasOneUse limits?
> > > 
> > > 
> > > Why just limit to splat constant?
> > 
> > Should be match on ImmConstant no?
> > Also - why the AVX2 and hasOneUse limits?
> 
> > Why just limit to splat constant?
> 
> This is to limit to broadcasts, probably it is better to work with broadcast explicitly.
> 
> I tried more general approach but it seems not much profitable. E.g. 
> ```
> @@ -1,7 +1,7 @@
> -; FMA3-LABEL: negated_constant_v4f64:
>  ; FMA3:       # %bb.0:
>  ; FMA3-NEXT:    vmovapd {{.*#+}} ymm2 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
> -; FMA3-NEXT:    vfmadd213pd {{.*#+}} ymm2 = (ymm0 * ymm2) + ymm1
> -; FMA3-NEXT:    vfmadd231pd {{.*#+}} ymm1 = (ymm0 * mem) + ymm1
> -; FMA3-NEXT:    vaddpd %ymm1, %ymm2, %ymm0
> +; FMA3-NEXT:    vmovapd %ymm2, %ymm3
> +; FMA3-NEXT:    vfmadd213pd {{.*#+}} ymm3 = (ymm0 * ymm3) + ymm1
> +; FMA3-NEXT:    vfnmadd213pd {{.*#+}} ymm2 = -(ymm0 * ymm2) + ymm1
> +; FMA3-NEXT:    vaddpd %ymm2, %ymm3, %ymm0
>  ; FMA3-NEXT:    retq
> ```
> 
> What do you think?
> Should be match on ImmConstant no?

AFAIK, `m_*` matching works with IR. Here we work with DAG. Am I missing something?


================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:54037
+    if (Subtarget.hasAVX2() && V.hasOneUse() &&
+        ISD::isConstantSplatVector(V.getNode(), SplatValue)) {
+      SmallVector<SDValue, 8> Ops;
----------------
e-kud wrote:
> e-kud wrote:
> > goldstein.w.n wrote:
> > > goldstein.w.n wrote:
> > > > goldstein.w.n wrote:
> > > > > RKSimon wrote:
> > > > > > RKSimon wrote:
> > > > > > > pengfei wrote:
> > > > > > > > Why just limit to splat constant?
> > > > > > > Also - why the AVX2 and hasOneUse limits?
> > > > > > Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants
> > > > > > Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants
> > > > > 
> > > > > That should be commented explicitly as these subtle checks to prevent infinite loops can easily be removed accidentally.
> > > > > 
> > > > > Also why AVX2 still?
> > > > > Also - why the AVX2 and hasOneUse limits?
> > > > 
> > > > 
> > > > Why just limit to splat constant?
> > > 
> > > Should be match on ImmConstant no?
> > > Also - why the AVX2 and hasOneUse limits?
> > 
> > > Why just limit to splat constant?
> > 
> > This is to limit to broadcasts, probably it is better to work with broadcast explicitly.
> > 
> > I tried more general approach but it seems not much profitable. E.g. 
> > ```
> > @@ -1,7 +1,7 @@
> > -; FMA3-LABEL: negated_constant_v4f64:
> >  ; FMA3:       # %bb.0:
> >  ; FMA3-NEXT:    vmovapd {{.*#+}} ymm2 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
> > -; FMA3-NEXT:    vfmadd213pd {{.*#+}} ymm2 = (ymm0 * ymm2) + ymm1
> > -; FMA3-NEXT:    vfmadd231pd {{.*#+}} ymm1 = (ymm0 * mem) + ymm1
> > -; FMA3-NEXT:    vaddpd %ymm1, %ymm2, %ymm0
> > +; FMA3-NEXT:    vmovapd %ymm2, %ymm3
> > +; FMA3-NEXT:    vfmadd213pd {{.*#+}} ymm3 = (ymm0 * ymm3) + ymm1
> > +; FMA3-NEXT:    vfnmadd213pd {{.*#+}} ymm2 = -(ymm0 * ymm2) + ymm1
> > +; FMA3-NEXT:    vaddpd %ymm2, %ymm3, %ymm0
> >  ; FMA3-NEXT:    retq
> > ```
> > 
> > What do you think?
> > Should be match on ImmConstant no?
> 
> AFAIK, `m_*` matching works with IR. Here we work with DAG. Am I missing something?
> Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants

Yes. The problem appears when each constant initially has several users, then we start ping-ponging between them. To solve it thoroughly we need to check whether a constant can be eliminated completely: take two constant vectors and check that all users are fmas. It seems out of combiner context to me, doesn't it? However, I see some code that checks instruction's users during combining, maybe, it is the right way.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146494/new/

https://reviews.llvm.org/D146494



More information about the llvm-commits mailing list