[PATCH] D67981: [NFC][PowerPC] Adding FeatureFPU in the definition of FeatureISA3_0

Tue Oct 1 15:13:14 PDT 2019

jsji added inline comments.

================
Comment at: llvm/lib/Target/PowerPC/PPC.td:193
+                                     "Enable instructions added in ISA 3.0.",
+                                     [FeatureFPU]>;
 def FeatureP9Altivec : SubtargetFeature<"power9-altivec", "HasP9Altivec", "true",
----------------
nemanjai wrote:
> jsji wrote:
> > nemanjai wrote:
> > > jsji wrote:
> > > > jsji wrote:
> > > > > Looks like there is some misunderstanding of what `FeatureISA3_0` is for?
> > > > > 
> > > > > As in the description, `FeatureISA3_0` is for new instruction added in ISA 3.0, not all the instructions in ISA 3.0.
> > > > > Why it need to depend on FeatureFPU?
> > > > > 
> > > > > In other words, with the new patch `-mattr=-fpu` will also remove `isa-v30-instructions` effectively.
> > > > > So that means, if you add -mattr=-fpu, you can't use any of the new instruction added for P8? 
> > > > > Is that what you intended?
> > > > > 
> > > > I meant `P9`.
> > > Yes. That is very much part of the intent here. It is not possible to have a CPU that conforms to ISA 3.0 that does not support HW floating point operations (i.e. doesn't have a HW FPU).
> > > 
> > > Having an FPU does not imply we have an implementation of ISA 3.0. But having an implementation of ISA 3.0 **absolutely** implies having an FPU.
> > Yes, you are right, and I also agree that ` having an implementation of ISA 3.0 absolutely implies having an FPU`. 
> > 
> > However, looks like to me that `FeatureISA3_0` is NOT corresponding to `compliant implementation of ISA 3.0`.
> > But rather controlling whether we can generate *new instructions in ISA 3.0*.
> > Also, `FeatureFPU` is also not corresponding to `having an FPU`, but rather controlling whether we can generates FPU instructions.
> > 
> > So if we add the *Implies* to `FPU` here, we are more or less saying: all ISA 3.0 instructions are also FPU instructions,
> > we can only generate ISA 3.0 instructions if we are allow to generate FPU instructions.
> > 
> > Does this dependency really exists?
> > 
> > What if someone would like to avoid generating FPU instructions (eg: kernel users), 
> > but still would like to take advantage of P9 new instructions like `extswsli`, `maddld`,`modud`. 
> > 
> > I believe he/she can use `-mcpu=pwr9 -mattr=-fpu` right now, but will get performance deg if we apply this patch.
> > 
> > What do you think?
> > 
> If we are aware of a valid use case for something like this (i.e. we want instructions that were added in ISA 3.0) but we want a guarantee that no floating point instructions are emitted, we need to do a bit more work than just stop this patch from going in.
> 
> The reason we added this feature is to differentiate SPE codegen from codegen for CPUs that have an FPU. That is about all it is good for. As an example, the following will produce FP instructions:
> ```
> define dso_local signext i32 @test(double %a, double %b) local_unnamed_addr #0 {
> entry:
>   %cmp = fcmp olt double %a, %b
>   %conv = zext i1 %cmp to i32
>   ret i32 %conv
> }
> 
> llc -mattr=-fpu < a.ll | grep fcmpu
>         fcmpu 0, 1, 2
> ```
> 
> So the addition of this implication is really to avoid having to add the predicate for every pattern we want to add going forward.
> If we are aware of a valid use case for something like this (i.e. we want instructions that were added in ISA 3.0) but we want a guarantee that no floating point instructions are emitted

I believe this is an example usage in Linux Kernel build. https://bugs.llvm.org/show_bug.cgi?id=38886, where kernel builder will use `-msoft-float` , which is corresponding to `-mattr=-hard-float`.

And with this patch we will definitely lose performance when they build with -mcpu=power9.

An example would be:

```
$ cat ex.ll
@z = external local_unnamed_addr global i32*, align 8

define signext i32 @_Z2tcii(i32 signext %x, i32 signext %y){
entry:
  %0 = load i32*, i32** @z, align 8
  %add = add nsw i32 %y, %x
  %idxprom = sext i32 %add to i64
  %arrayidx = getelementptr inbounds i32, i32* %0, i64 %idxprom
  %1 = load i32, i32* %arrayidx, align 4
  ret i32 %1
}
```

The current code gen:
```
$ ~/llvm-git/build/bin/llc -mcpu=pwr9 -mattr=-hard-float < ex.ll|grep localentry -A10
        .localentry     _Z2tcii, .Lfunc_lep0-.Lfunc_gep0
# %bb.0:                                # %entry
        addis 5, 2, .LC0 at toc@ha
        ld 5, .LC0 at toc@l(5)
        ld 5, 0(5)
        add 3, 4, 3
        extswsli 3, 3, 2
        lwax 3, 5, 3
        blr
        .long   0
        .quad   0
```

With this patch:
```
$ ~/llvm-git/build/bin/llc -mcpu=pwr9 -mattr=-hard-float < ex.ll|grep localentry -A10
        .localentry     _Z2tcii, .Lfunc_lep0-.Lfunc_gep0
# %bb.0:                                # %entry
        addis 5, 2, .LC0 at toc@ha
        ld 5, .LC0 at toc@l(5)
        add 3, 4, 3
        extsw 3, 3
        sldi 3, 3, 2
        ld 5, 0(5)
        lwax 3, 5, 3
        blr
        .long   0
```

================
Comment at: llvm/lib/Target/PowerPC/PPC.td:193
+                                     "Enable instructions added in ISA 3.0.",
+                                     [FeatureFPU]>;
 def FeatureP9Altivec : SubtargetFeature<"power9-altivec", "HasP9Altivec", "true",
----------------
jsji wrote:
> nemanjai wrote:
> > jsji wrote:
> > > nemanjai wrote:
> > > > jsji wrote:
> > > > > jsji wrote:
> > > > > > Looks like there is some misunderstanding of what `FeatureISA3_0` is for?
> > > > > > 
> > > > > > As in the description, `FeatureISA3_0` is for new instruction added in ISA 3.0, not all the instructions in ISA 3.0.
> > > > > > Why it need to depend on FeatureFPU?
> > > > > > 
> > > > > > In other words, with the new patch `-mattr=-fpu` will also remove `isa-v30-instructions` effectively.
> > > > > > So that means, if you add -mattr=-fpu, you can't use any of the new instruction added for P8? 
> > > > > > Is that what you intended?
> > > > > > 
> > > > > I meant `P9`.
> > > > Yes. That is very much part of the intent here. It is not possible to have a CPU that conforms to ISA 3.0 that does not support HW floating point operations (i.e. doesn't have a HW FPU).
> > > > 
> > > > Having an FPU does not imply we have an implementation of ISA 3.0. But having an implementation of ISA 3.0 **absolutely** implies having an FPU.
> > > Yes, you are right, and I also agree that ` having an implementation of ISA 3.0 absolutely implies having an FPU`. 
> > > 
> > > However, looks like to me that `FeatureISA3_0` is NOT corresponding to `compliant implementation of ISA 3.0`.
> > > But rather controlling whether we can generate *new instructions in ISA 3.0*.
> > > Also, `FeatureFPU` is also not corresponding to `having an FPU`, but rather controlling whether we can generates FPU instructions.
> > > 
> > > So if we add the *Implies* to `FPU` here, we are more or less saying: all ISA 3.0 instructions are also FPU instructions,
> > > we can only generate ISA 3.0 instructions if we are allow to generate FPU instructions.
> > > 
> > > Does this dependency really exists?
> > > 
> > > What if someone would like to avoid generating FPU instructions (eg: kernel users), 
> > > but still would like to take advantage of P9 new instructions like `extswsli`, `maddld`,`modud`. 
> > > 
> > > I believe he/she can use `-mcpu=pwr9 -mattr=-fpu` right now, but will get performance deg if we apply this patch.
> > > 
> > > What do you think?
> > > 
> > If we are aware of a valid use case for something like this (i.e. we want instructions that were added in ISA 3.0) but we want a guarantee that no floating point instructions are emitted, we need to do a bit more work than just stop this patch from going in.
> > 
> > The reason we added this feature is to differentiate SPE codegen from codegen for CPUs that have an FPU. That is about all it is good for. As an example, the following will produce FP instructions:
> > ```
> > define dso_local signext i32 @test(double %a, double %b) local_unnamed_addr #0 {
> > entry:
> >   %cmp = fcmp olt double %a, %b
> >   %conv = zext i1 %cmp to i32
> >   ret i32 %conv
> > }
> > 
> > llc -mattr=-fpu < a.ll | grep fcmpu
> >         fcmpu 0, 1, 2
> > ```
> > 
> > So the addition of this implication is really to avoid having to add the predicate for every pattern we want to add going forward.
> > If we are aware of a valid use case for something like this (i.e. we want instructions that were added in ISA 3.0) but we want a guarantee that no floating point instructions are emitted
> 
> I believe this is an example usage in Linux Kernel build. https://bugs.llvm.org/show_bug.cgi?id=38886, where kernel builder will use `-msoft-float` , which is corresponding to `-mattr=-hard-float`.
> 
> And with this patch we will definitely lose performance when they build with -mcpu=power9.
> 
> An example would be:
> 
> ```
> $ cat ex.ll
> @z = external local_unnamed_addr global i32*, align 8
> 
> define signext i32 @_Z2tcii(i32 signext %x, i32 signext %y){
> entry:
>   %0 = load i32*, i32** @z, align 8
>   %add = add nsw i32 %y, %x
>   %idxprom = sext i32 %add to i64
>   %arrayidx = getelementptr inbounds i32, i32* %0, i64 %idxprom
>   %1 = load i32, i32* %arrayidx, align 4
>   ret i32 %1
> }
> ```
> 
> The current code gen:
> ```
> $ ~/llvm-git/build/bin/llc -mcpu=pwr9 -mattr=-hard-float < ex.ll|grep localentry -A10
>         .localentry     _Z2tcii, .Lfunc_lep0-.Lfunc_gep0
> # %bb.0:                                # %entry
>         addis 5, 2, .LC0 at toc@ha
>         ld 5, .LC0 at toc@l(5)
>         ld 5, 0(5)
>         add 3, 4, 3
>         extswsli 3, 3, 2
>         lwax 3, 5, 3
>         blr
>         .long   0
>         .quad   0
> ```
> 
> With this patch:
> ```
> $ ~/llvm-git/build/bin/llc -mcpu=pwr9 -mattr=-hard-float < ex.ll|grep localentry -A10
>         .localentry     _Z2tcii, .Lfunc_lep0-.Lfunc_gep0
> # %bb.0:                                # %entry
>         addis 5, 2, .LC0 at toc@ha
>         ld 5, .LC0 at toc@l(5)
>         add 3, 4, 3
>         extsw 3, 3
>         sldi 3, 3, 2
>         ld 5, 0(5)
>         lwax 3, 5, 3
>         blr
>         .long   0
> ```
> 
> As an example, the following will produce FP instructions:

Good example, this should be a bug, we should have a look and fix it later.

================
Comment at: llvm/lib/Target/PowerPC/PPC.td:193
+                                     "Enable instructions added in ISA 3.0.",
+                                     [FeatureFPU]>;
 def FeatureP9Altivec : SubtargetFeature<"power9-altivec", "HasP9Altivec", "true",
----------------
jsji wrote:
> jsji wrote:
> > nemanjai wrote:
> > > jsji wrote:
> > > > nemanjai wrote:
> > > > > jsji wrote:
> > > > > > jsji wrote:
> > > > > > > Looks like there is some misunderstanding of what `FeatureISA3_0` is for?
> > > > > > > 
> > > > > > > As in the description, `FeatureISA3_0` is for new instruction added in ISA 3.0, not all the instructions in ISA 3.0.
> > > > > > > Why it need to depend on FeatureFPU?
> > > > > > > 
> > > > > > > In other words, with the new patch `-mattr=-fpu` will also remove `isa-v30-instructions` effectively.
> > > > > > > So that means, if you add -mattr=-fpu, you can't use any of the new instruction added for P8? 
> > > > > > > Is that what you intended?
> > > > > > > 
> > > > > > I meant `P9`.
> > > > > Yes. That is very much part of the intent here. It is not possible to have a CPU that conforms to ISA 3.0 that does not support HW floating point operations (i.e. doesn't have a HW FPU).
> > > > > 
> > > > > Having an FPU does not imply we have an implementation of ISA 3.0. But having an implementation of ISA 3.0 **absolutely** implies having an FPU.
> > > > Yes, you are right, and I also agree that ` having an implementation of ISA 3.0 absolutely implies having an FPU`. 
> > > > 
> > > > However, looks like to me that `FeatureISA3_0` is NOT corresponding to `compliant implementation of ISA 3.0`.
> > > > But rather controlling whether we can generate *new instructions in ISA 3.0*.
> > > > Also, `FeatureFPU` is also not corresponding to `having an FPU`, but rather controlling whether we can generates FPU instructions.
> > > > 
> > > > So if we add the *Implies* to `FPU` here, we are more or less saying: all ISA 3.0 instructions are also FPU instructions,
> > > > we can only generate ISA 3.0 instructions if we are allow to generate FPU instructions.
> > > > 
> > > > Does this dependency really exists?
> > > > 
> > > > What if someone would like to avoid generating FPU instructions (eg: kernel users), 
> > > > but still would like to take advantage of P9 new instructions like `extswsli`, `maddld`,`modud`. 
> > > > 
> > > > I believe he/she can use `-mcpu=pwr9 -mattr=-fpu` right now, but will get performance deg if we apply this patch.
> > > > 
> > > > What do you think?
> > > > 
> > > If we are aware of a valid use case for something like this (i.e. we want instructions that were added in ISA 3.0) but we want a guarantee that no floating point instructions are emitted, we need to do a bit more work than just stop this patch from going in.
> > > 
> > > The reason we added this feature is to differentiate SPE codegen from codegen for CPUs that have an FPU. That is about all it is good for. As an example, the following will produce FP instructions:
> > > ```
> > > define dso_local signext i32 @test(double %a, double %b) local_unnamed_addr #0 {
> > > entry:
> > >   %cmp = fcmp olt double %a, %b
> > >   %conv = zext i1 %cmp to i32
> > >   ret i32 %conv
> > > }
> > > 
> > > llc -mattr=-fpu < a.ll | grep fcmpu
> > >         fcmpu 0, 1, 2
> > > ```
> > > 
> > > So the addition of this implication is really to avoid having to add the predicate for every pattern we want to add going forward.
> > > If we are aware of a valid use case for something like this (i.e. we want instructions that were added in ISA 3.0) but we want a guarantee that no floating point instructions are emitted
> > 
> > I believe this is an example usage in Linux Kernel build. https://bugs.llvm.org/show_bug.cgi?id=38886, where kernel builder will use `-msoft-float` , which is corresponding to `-mattr=-hard-float`.
> > 
> > And with this patch we will definitely lose performance when they build with -mcpu=power9.
> > 
> > An example would be:
> > 
> > ```
> > $ cat ex.ll
> > @z = external local_unnamed_addr global i32*, align 8
> > 
> > define signext i32 @_Z2tcii(i32 signext %x, i32 signext %y){
> > entry:
> >   %0 = load i32*, i32** @z, align 8
> >   %add = add nsw i32 %y, %x
> >   %idxprom = sext i32 %add to i64
> >   %arrayidx = getelementptr inbounds i32, i32* %0, i64 %idxprom
> >   %1 = load i32, i32* %arrayidx, align 4
> >   ret i32 %1
> > }
> > ```
> > 
> > The current code gen:
> > ```
> > $ ~/llvm-git/build/bin/llc -mcpu=pwr9 -mattr=-hard-float < ex.ll|grep localentry -A10
> >         .localentry     _Z2tcii, .Lfunc_lep0-.Lfunc_gep0
> > # %bb.0:                                # %entry
> >         addis 5, 2, .LC0 at toc@ha
> >         ld 5, .LC0 at toc@l(5)
> >         ld 5, 0(5)
> >         add 3, 4, 3
> >         extswsli 3, 3, 2
> >         lwax 3, 5, 3
> >         blr
> >         .long   0
> >         .quad   0
> > ```
> > 
> > With this patch:
> > ```
> > $ ~/llvm-git/build/bin/llc -mcpu=pwr9 -mattr=-hard-float < ex.ll|grep localentry -A10
> >         .localentry     _Z2tcii, .Lfunc_lep0-.Lfunc_gep0
> > # %bb.0:                                # %entry
> >         addis 5, 2, .LC0 at toc@ha
> >         ld 5, .LC0 at toc@l(5)
> >         add 3, 4, 3
> >         extsw 3, 3
> >         sldi 3, 3, 2
> >         ld 5, 0(5)
> >         lwax 3, 5, 3
> >         blr
> >         .long   0
> > ```
> > 
> > As an example, the following will produce FP instructions:
> 
> Good example, this should be a bug, we should have a look and fix it later.
> 
> 
> So the addition of this implication is really to avoid having to add the predicate for every pattern we want to add going forward.

Understand, can we achieve the goal by using a new predicate, like:

```def ISA3_0_and_FPU: Predicate<"PPCSubTarget->hasFPU() && PPCSubTarget->isISA3_0()">;```

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67981/new/

https://reviews.llvm.org/D67981