[PATCH] D145220: [X86][InstCombine] Simplify some `pternlog` intrinsics

Fri Mar 3 18:00:55 PST 2023

goldstein.w.n added inline comments.

================
Comment at: llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp:588
+  bool BCIsConst = isa<Constant>(B) && isa<Constant>(C);
+  bool ABCIsConst = isa<Constant>(A) && isa<Constant>(B) && isa<Constant>(C);
+
----------------
goldstein.w.n wrote:
> craig.topper wrote:
> > goldstein.w.n wrote:
> > > craig.topper wrote:
> > > > goldstein.w.n wrote:
> > > > > goldstein.w.n wrote:
> > > > > > craig.topper wrote:
> > > > > > > craig.topper wrote:
> > > > > > > > If everything is const can we someone use the immediate as a lookup table instead of manually describing every case?
> > > > > > > somehow* not someone
> > > > > > > If everything is const can we someone use the immediate as a lookup table instead of manually describing every case?
> > > > > > 
> > > > > > Yes although there are some IMM where we only need AC/AB/BC so think we would still want to check the IMM.
> > > > > > 
> > > > > > Are you concerned about compile time overhead? If not keeping the same logic for const / non-const seems simpler and less error prone.
> > > > > > > If everything is const can we someone use the immediate as a lookup table instead of manually describing every case?
> > > > > > 
> > > > > > Yes although there are some IMM where we only need AC/AB/BC so think we would still want to check the IMM.
> > > > > > 
> > > > > > Are you concerned about compile time overhead? If not keeping the same logic for const / non-const seems simpler and less error prone.
> > > > > 
> > > > > If all three are const we can just implement it with a loop above all the bits. Its just:
> > > > > ```
> > > > > for(i : bitwidth()) {
> > > > > res[i] = (imm >> ((A[i] << 2) + (B[i] << 1) + (C[i] << 0))) & 1
> > > > > }
> > > > > ```
> > > > > but still issue of we will miss cases where `C` is non-constant but also truly unneeded.
> > > > > 
> > > > I really want there to be some nice algorithm to do this optimization. It’s not easy to see that all 256 entries in the switch are correct. We have to trust that you transcribed them from Intel’s table correctly or we need to check that ourselves.
> > > > I really want there to be some nice algorithm to do this optimization. It’s not easy to see that all 256 entries in the switch are correct. We have to trust that you transcribed them from Intel’s table correctly or we need to check that ourselves.
> > > 
> > > We can do the constant case with the algorithm but I couldn't figure out how to generic turn the imm -> logic ops.
> > > 
> > > not intels table (don't know if there is one).
> > > Use the following not so clean pyscript: https://godbolt.org/z/aahK7qEhM
> > > tested that the results are correct (turned off the constant guards and use the following): https://godbolt.org/z/KMrMdoY3f
> > Intel has a table at the start of CHAPTER 5 INSTRUCTION SET REFERENCE, V in Volume 2C of the SDM
> That would have saved a moment :/
> 
> Given that they recommend a table as well not sure if there is a generic algorithm that has reasonable perf.
> I really want there to be some nice algorithm to do this optimization. It’s not easy to see that all 256 entries in the switch are correct. We have to trust that you transcribed them from Intel’s table correctly or we need to check that ourselves.

I added asserts for all cases. Could still be a typeof in the future if someone changes the return but forgets to do the assert, but does verify that what we have now is correct.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145220/new/

https://reviews.llvm.org/D145220