[llvm] [NVPTX] Set boolean contents to zero-or-one (PR #108969)

Fraser Cormack via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 17 05:38:18 PDT 2024


https://github.com/frasercrmck created https://github.com/llvm/llvm-project/pull/108969

Note that this field is a property of SelectionDAG rather than one of the ISA. The previous value wasn't incorrect, per se, but since NVPTX uses 1-bit predicate registers, either choices are possible.

The problem with using zero-or-negative-one manifested in issues during instruction selection where LLVM would fold a sign-extension of a setcc into the setcc itself, which isn't a legal operation.

We could alternatively let this happen and then custom expand it back into a separate setcc + sext later. Or we could provide explicit patterns for the setp/selp sequence. However, preventing it from forming in the first place feels more appropriate to the ISA.

It does not appear to have a significant knock-on effect on codegen tests, aside from one minor change which looks harmless:

	selp.s64 	%rd6, -1, 0, %p1;
	add.s64 	%rd7, %rd5, %rd6;

to

	selp.u64 	%rd6, 1, 0, %p1;
	sub.s64 	%rd7, %rd5, %rd6;

>From f8eecfa65c812b61749c13c6723d2799266473d5 Mon Sep 17 00:00:00 2001
From: Fraser Cormack <fraser at codeplay.com>
Date: Tue, 17 Sep 2024 13:27:30 +0100
Subject: [PATCH] [NVPTX] Set boolean contents to zero-or-one

Note that this field is a property of SelectionDAG rather than one of
the ISA. The previous value wasn't incorrect, per se, but since NVPTX
uses 1-bit predicate registers, either choices are possible.

The problem with using zero-or-negative-one manifested in issues during
instruction selection where LLVM would fold a sign-extension of a setcc
into the setcc itself, which isn't a legal operation.

We could alternatively let this happen and then custom expand it back
into a separate setcc + sext later. Or we could provide explicit
patterns for the setp/selp sequence. However, preventing it from forming
in the first place feels more appropriate to the ISA.

It does not appear to have a significant knock-on effect on codegen
tests, aside from one minor change which looks harmless:

	selp.s64 	%rd6, -1, 0, %p1;
	add.s64 	%rd7, %rd5, %rd6;

to

	selp.u64 	%rd6, 1, 0, %p1;
	sub.s64 	%rd7, %rd5, %rd6;
---
 llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp |  4 +--
 llvm/test/CodeGen/NVPTX/add-sub-128bit.ll   |  4 +--
 llvm/test/CodeGen/NVPTX/sext-setcc.ll       | 29 +++++++++++++++++++++
 3 files changed, 33 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/CodeGen/NVPTX/sext-setcc.ll

diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index c5a40e4308860c..4495fdb45e74c9 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -416,8 +416,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
   MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = (unsigned) 0xFFFFFFFF;
   MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = (unsigned) 0xFFFFFFFF;
 
-  setBooleanContents(ZeroOrNegativeOneBooleanContent);
-  setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);
+  setBooleanContents(ZeroOrOneBooleanContent);
+  setBooleanVectorContents(ZeroOrOneBooleanContent);
 
   // Jump is Expensive. Don't create extra control flow for 'and', 'or'
   // condition branches.
diff --git a/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll b/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll
index 9d451e90650df1..26eb5fb223a6ca 100644
--- a/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll
+++ b/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll
@@ -23,8 +23,8 @@ define i128 @test_add(i128 %a, i128 %b) {
 define i128 @test_sub(i128 %a, i128 %b) {
 ; NOCARRY:        sub.s64
 ; NOCARRY-NEXT:   setp.lt.u64
-; NOCARRY-NEXT:   selp.s64
-; NOCARRY-NEXT:   add.s64
+; NOCARRY-NEXT:   selp.u64
+; NOCARRY-NEXT:   sub.s64
 ; NOCARRY-NEXT:   sub.s64
 
 ; CARRY:          sub.cc.s64
diff --git a/llvm/test/CodeGen/NVPTX/sext-setcc.ll b/llvm/test/CodeGen/NVPTX/sext-setcc.ll
new file mode 100644
index 00000000000000..4b56100f347f49
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/sext-setcc.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 | FileCheck %s
+; RUN: %if ptxas-11.0 %{ llc < %s -march=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
+
+define <2 x i16> @sext_setcc(ptr %p) {
+; CHECK-LABEL: sext_setcc(
+; CHECK:       {
+; CHECK-NEXT:    .reg .pred %p<3>;
+; CHECK-NEXT:    .reg .b16 %rs<5>;
+; CHECK-NEXT:    .reg .b32 %r<3>;
+; CHECK-NEXT:    .reg .b64 %rd<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0: // %entry
+; CHECK-NEXT:    ld.param.u64 %rd1, [sext_setcc_param_0];
+; CHECK-NEXT:    ld.u32 %r1, [%rd1];
+; CHECK-NEXT:    mov.b32 {%rs1, %rs2}, %r1;
+; CHECK-NEXT:    setp.eq.s16 %p1, %rs1, 0;
+; CHECK-NEXT:    setp.eq.s16 %p2, %rs2, 0;
+; CHECK-NEXT:    selp.s16 %rs3, -1, 0, %p2;
+; CHECK-NEXT:    selp.s16 %rs4, -1, 0, %p1;
+; CHECK-NEXT:    mov.b32 %r2, {%rs4, %rs3};
+; CHECK-NEXT:    st.param.b32 [func_retval0+0], %r2;
+; CHECK-NEXT:    ret;
+entry:
+  %v = load <2 x i16>, ptr %p, align 4
+  %cmp = icmp eq <2 x i16> %v, zeroinitializer
+  %sext = sext <2 x i1> %cmp to <2 x i16>
+  ret <2 x i16> %sext
+}



More information about the llvm-commits mailing list