[PATCH] D104049: [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask

Thu Jun 17 09:13:39 PDT 2021

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:832-834
+  int Mask = (1ull << (64 - __builtin_clz(Arg->getZExtValue()))) - 1;
+  IntrinsicCall->setOperand(
+      1, ConstantInt::get(Arg->getType(), ~Arg->getZExtValue() & Mask));
----------------
gandhi21299 wrote:
> foad wrote:
> > This is wrong. If the value was 4 you will xor it with 7 giving 3, but you need to flip all the bits that amdgcn_class cares about, i.e. 10 low order bits. You should either xor with a fixed value of 0x3ff, or perhaps move this enum from AMDGPUInstCombineIntrinsic.cpp to a common header (maybe SIDefines.h?) and add an "ALL" value to it:
> > ```
> > 260:  case Intrinsic::amdgcn_class: {
> > 261-    enum {
> > 262-      S_NAN = 1 << 0,       // Signaling NaN
> > 263-      Q_NAN = 1 << 1,       // Quiet NaN
> > 264-      N_INFINITY = 1 << 2,  // Negative infinity
> > 265-      N_NORMAL = 1 << 3,    // Negative normal
> > 266-      N_SUBNORMAL = 1 << 4, // Negative subnormal
> > 267-      N_ZERO = 1 << 5,      // Negative zero
> > 268-      P_ZERO = 1 << 6,      // Positive zero
> > 269-      P_SUBNORMAL = 1 << 7, // Positive subnormal
> > 270-      P_NORMAL = 1 << 8,    // Positive normal
> > 271-      P_INFINITY = 1 << 9   // Positive infinity
> > 272-    };
> > ```
> Flipping only the low 10 bits fails several other tests, thoughts? @arsenm @foad 
You need to look at why they fail. Probably your patch changes (improves) the generated code, so the tests need to be updated to expect the improved code sequence.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104049/new/

https://reviews.llvm.org/D104049