[PATCH] D145220: [X86][InstCombine] Simplify some `pternlog` intrinsics

Fri Mar 3 14:20:36 PST 2023

goldstein.w.n added inline comments.

================
Comment at: llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp:585
+
+  bool ABIsConst = isa<Constant>(A) && isa<Constant>(B);
+  bool ACIsConst = isa<Constant>(A) && isa<Constant>(C);
----------------
craig.topper wrote:
> goldstein.w.n wrote:
> > craig.topper wrote:
> > > Do we need to make sure we are looking at vectors of ConstantInts rather than vectors of ConstantExpr?
> > Will the ALU not fold with `ConstantExpr`? How can I test that?
> ConstantExpr can involve things that are conceptually constant but aren’t known to the compiler. Like the address of a global variable. SelectionDAG will expand ConstantExprs into instructions.
I see, the issue with `isa<ConstantInt>` is then it only works for splats, is there a check for evaluatable constant?

================
Comment at: llvm/lib/Target/X86/X86InstCombineIntrinsic.cpp:588
+  bool BCIsConst = isa<Constant>(B) && isa<Constant>(C);
+  bool ABCIsConst = isa<Constant>(A) && isa<Constant>(B) && isa<Constant>(C);
+
----------------
craig.topper wrote:
> goldstein.w.n wrote:
> > goldstein.w.n wrote:
> > > craig.topper wrote:
> > > > craig.topper wrote:
> > > > > If everything is const can we someone use the immediate as a lookup table instead of manually describing every case?
> > > > somehow* not someone
> > > > If everything is const can we someone use the immediate as a lookup table instead of manually describing every case?
> > > 
> > > Yes although there are some IMM where we only need AC/AB/BC so think we would still want to check the IMM.
> > > 
> > > Are you concerned about compile time overhead? If not keeping the same logic for const / non-const seems simpler and less error prone.
> > > > If everything is const can we someone use the immediate as a lookup table instead of manually describing every case?
> > > 
> > > Yes although there are some IMM where we only need AC/AB/BC so think we would still want to check the IMM.
> > > 
> > > Are you concerned about compile time overhead? If not keeping the same logic for const / non-const seems simpler and less error prone.
> > 
> > If all three are const we can just implement it with a loop above all the bits. Its just:
> > ```
> > for(i : bitwidth()) {
> > res[i] = (imm >> ((A[i] << 2) + (B[i] << 1) + (C[i] << 0))) & 1
> > }
> > ```
> > but still issue of we will miss cases where `C` is non-constant but also truly unneeded.
> > 
> I really want there to be some nice algorithm to do this optimization. It’s not easy to see that all 256 entries in the switch are correct. We have to trust that you transcribed them from Intel’s table correctly or we need to check that ourselves.
> I really want there to be some nice algorithm to do this optimization. It’s not easy to see that all 256 entries in the switch are correct. We have to trust that you transcribed them from Intel’s table correctly or we need to check that ourselves.

We can do the constant case with the algorithm but I couldn't figure out how to generic turn the imm -> logic ops.

not intels table (don't know if there is one).
Use the following not so clean pyscript: https://godbolt.org/z/aahK7qEhM
tested that the results are correct (turned off the constant guards and use the following): https://godbolt.org/z/KMrMdoY3f

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145220/new/

https://reviews.llvm.org/D145220