[llvm] [libcxx] [libc] [clang] [flang] [compiler-rt] [libcxxabi] [lldb] [clang-tools-extra] [lld] [NVPTX] Improve lowering of v4i8 (PR #67866)

Wed Jan 24 16:19:06 PST 2024

================
@@ -2110,6 +2214,29 @@ def : Pat<(seteq Int1Regs:$a, Int1Regs:$b),
 def : Pat<(setueq Int1Regs:$a, Int1Regs:$b),
           (NOT1 (XORb1rr Int1Regs:$a, Int1Regs:$b))>;
 
+// comparisons of i8 extracted with BFE as i32
+def: Pat<(setgt (sext_inreg (trunc Int32Regs:$a), i8), (sext_inreg (trunc Int32Regs:$b), i8)),
+         (SETP_s32rr Int32Regs:$a, Int32Regs:$b, CmpGT)>;
+def: Pat<(setge (sext_inreg (trunc Int32Regs:$a), i8), (sext_inreg (trunc Int32Regs:$b), i8)),
+         (SETP_s32rr Int32Regs:$a, Int32Regs:$b, CmpGE)>;
+def: Pat<(setlt (sext_inreg (trunc Int32Regs:$a), i8), (sext_inreg (trunc Int32Regs:$b), i8)),
+         (SETP_s32rr Int32Regs:$a, Int32Regs:$b, CmpLT)>;
+def: Pat<(setle (sext_inreg (trunc Int32Regs:$a), i8), (sext_inreg (trunc Int32Regs:$b), i8)),
+         (SETP_s32rr Int32Regs:$a, Int32Regs:$b, CmpLE)>;
+
+def: Pat<(setugt (i16 (and (trunc Int32Regs:$a), 255)), (i16 (and (trunc Int32Regs:$b), 255))),
+         (SETP_u32rr Int32Regs:$a, Int32Regs:$b, CmpHI)>;
+def: Pat<(setuge (i16 (and (trunc Int32Regs:$a), 255)), (i16 (and (trunc Int32Regs:$b), 255))),
+         (SETP_u32rr Int32Regs:$a, Int32Regs:$b, CmpHS)>;
+def: Pat<(setult (i16 (and (trunc Int32Regs:$a), 255)), (i16 (and (trunc Int32Regs:$b), 255))),
+         (SETP_u32rr Int32Regs:$a, Int32Regs:$b, CmpLO)>;
+def: Pat<(setule (i16 (and (trunc Int32Regs:$a), 255)), (i16 (and (trunc Int32Regs:$b), 255))),
+         (SETP_u32rr Int32Regs:$a, Int32Regs:$b, CmpLS)>;
+def: Pat<(seteq (i16 (and (trunc Int32Regs:$a), 255)), (i16 (and (trunc Int32Regs:$b), 255))),
+         (SETP_u32rr Int32Regs:$a, Int32Regs:$b, CmpEQ)>;
+def: Pat<(setne (i16 (and (trunc Int32Regs:$a), 255)), (i16 (and (trunc Int32Regs:$b), 255))),
+         (SETP_u32rr Int32Regs:$a, Int32Regs:$b, CmpNE)>;
----------------
justinfargnoli wrote:

Hi @Artem-B, I believe that these patterns are not correct.

Doing the comparison, without first ensuring that the value only uses the lower 8 bits, can lead to incorrect results. For example, when %a := Ox100 and %b := Ox1,
setugt (i16 (and (trunc Int32Regs:$a), 255)), (i16 (and (trunc Int32Regs:$b), 255))) produces a false predicate. Whereas SETP_u32rr Int32Regs:$a, Int32Regs:$b, CmpHI will produce a true predicate.

(I'm not super familiar with the pattern syntax through, so please correct me if I'm misunderstanding something.)

https://github.com/llvm/llvm-project/pull/67866