[llvm] [NVPTX] Set boolean contents to zero-or-one (PR #108969)
Fraser Cormack via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 17 05:38:18 PDT 2024
https://github.com/frasercrmck created https://github.com/llvm/llvm-project/pull/108969
Note that this field is a property of SelectionDAG rather than one of the ISA. The previous value wasn't incorrect, per se, but since NVPTX uses 1-bit predicate registers, either choices are possible.
The problem with using zero-or-negative-one manifested in issues during instruction selection where LLVM would fold a sign-extension of a setcc into the setcc itself, which isn't a legal operation.
We could alternatively let this happen and then custom expand it back into a separate setcc + sext later. Or we could provide explicit patterns for the setp/selp sequence. However, preventing it from forming in the first place feels more appropriate to the ISA.
It does not appear to have a significant knock-on effect on codegen tests, aside from one minor change which looks harmless:
selp.s64 %rd6, -1, 0, %p1;
add.s64 %rd7, %rd5, %rd6;
to
selp.u64 %rd6, 1, 0, %p1;
sub.s64 %rd7, %rd5, %rd6;
>From f8eecfa65c812b61749c13c6723d2799266473d5 Mon Sep 17 00:00:00 2001
From: Fraser Cormack <fraser at codeplay.com>
Date: Tue, 17 Sep 2024 13:27:30 +0100
Subject: [PATCH] [NVPTX] Set boolean contents to zero-or-one
Note that this field is a property of SelectionDAG rather than one of
the ISA. The previous value wasn't incorrect, per se, but since NVPTX
uses 1-bit predicate registers, either choices are possible.
The problem with using zero-or-negative-one manifested in issues during
instruction selection where LLVM would fold a sign-extension of a setcc
into the setcc itself, which isn't a legal operation.
We could alternatively let this happen and then custom expand it back
into a separate setcc + sext later. Or we could provide explicit
patterns for the setp/selp sequence. However, preventing it from forming
in the first place feels more appropriate to the ISA.
It does not appear to have a significant knock-on effect on codegen
tests, aside from one minor change which looks harmless:
selp.s64 %rd6, -1, 0, %p1;
add.s64 %rd7, %rd5, %rd6;
to
selp.u64 %rd6, 1, 0, %p1;
sub.s64 %rd7, %rd5, %rd6;
---
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp | 4 +--
llvm/test/CodeGen/NVPTX/add-sub-128bit.ll | 4 +--
llvm/test/CodeGen/NVPTX/sext-setcc.ll | 29 +++++++++++++++++++++
3 files changed, 33 insertions(+), 4 deletions(-)
create mode 100644 llvm/test/CodeGen/NVPTX/sext-setcc.ll
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index c5a40e4308860c..4495fdb45e74c9 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -416,8 +416,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = (unsigned) 0xFFFFFFFF;
MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = (unsigned) 0xFFFFFFFF;
- setBooleanContents(ZeroOrNegativeOneBooleanContent);
- setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);
+ setBooleanContents(ZeroOrOneBooleanContent);
+ setBooleanVectorContents(ZeroOrOneBooleanContent);
// Jump is Expensive. Don't create extra control flow for 'and', 'or'
// condition branches.
diff --git a/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll b/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll
index 9d451e90650df1..26eb5fb223a6ca 100644
--- a/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll
+++ b/llvm/test/CodeGen/NVPTX/add-sub-128bit.ll
@@ -23,8 +23,8 @@ define i128 @test_add(i128 %a, i128 %b) {
define i128 @test_sub(i128 %a, i128 %b) {
; NOCARRY: sub.s64
; NOCARRY-NEXT: setp.lt.u64
-; NOCARRY-NEXT: selp.s64
-; NOCARRY-NEXT: add.s64
+; NOCARRY-NEXT: selp.u64
+; NOCARRY-NEXT: sub.s64
; NOCARRY-NEXT: sub.s64
; CARRY: sub.cc.s64
diff --git a/llvm/test/CodeGen/NVPTX/sext-setcc.ll b/llvm/test/CodeGen/NVPTX/sext-setcc.ll
new file mode 100644
index 00000000000000..4b56100f347f49
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/sext-setcc.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 | FileCheck %s
+; RUN: %if ptxas-11.0 %{ llc < %s -march=nvptx64 -mcpu=sm_80 -mattr=+ptx70 | %ptxas-verify -arch=sm_80 %}
+
+define <2 x i16> @sext_setcc(ptr %p) {
+; CHECK-LABEL: sext_setcc(
+; CHECK: {
+; CHECK-NEXT: .reg .pred %p<3>;
+; CHECK-NEXT: .reg .b16 %rs<5>;
+; CHECK-NEXT: .reg .b32 %r<3>;
+; CHECK-NEXT: .reg .b64 %rd<2>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0: // %entry
+; CHECK-NEXT: ld.param.u64 %rd1, [sext_setcc_param_0];
+; CHECK-NEXT: ld.u32 %r1, [%rd1];
+; CHECK-NEXT: mov.b32 {%rs1, %rs2}, %r1;
+; CHECK-NEXT: setp.eq.s16 %p1, %rs1, 0;
+; CHECK-NEXT: setp.eq.s16 %p2, %rs2, 0;
+; CHECK-NEXT: selp.s16 %rs3, -1, 0, %p2;
+; CHECK-NEXT: selp.s16 %rs4, -1, 0, %p1;
+; CHECK-NEXT: mov.b32 %r2, {%rs4, %rs3};
+; CHECK-NEXT: st.param.b32 [func_retval0+0], %r2;
+; CHECK-NEXT: ret;
+entry:
+ %v = load <2 x i16>, ptr %p, align 4
+ %cmp = icmp eq <2 x i16> %v, zeroinitializer
+ %sext = sext <2 x i1> %cmp to <2 x i16>
+ ret <2 x i16> %sext
+}
More information about the llvm-commits
mailing list