[llvm] 4fa8a54 - [AMDGPU] Add sanity check that fixes bad shift operation in AMD backend

Fri Aug 11 12:26:40 PDT 2023

Author: Konrad Kusiak
Date: 2023-08-11T15:26:35-04:00
New Revision: 4fa8a5487e3b1a4b2ce743b0008a912026aa3524

URL: https://github.com/llvm/llvm-project/commit/4fa8a5487e3b1a4b2ce743b0008a912026aa3524
DIFF: https://github.com/llvm/llvm-project/commit/4fa8a5487e3b1a4b2ce743b0008a912026aa3524.diff

LOG: [AMDGPU] Add sanity check that fixes bad shift operation in AMD backend

There is a problem with the
SILoadStoreOptimizer::dmasksCanBeCombined() function that can lead to
UB.

This boolean function decides if two masks can be combined into 1. The
idea here is that the bits which are "on" in one mask, don't overlap
with the "on" bits of the other. Consider an example (10 bits for
simplicity):

Mask 1: 0101101000
Mask 2: 0000000110

Those can be combined into a single mask: 0101101110.

To check if such an operation is possible, the code takes the mask
which is greater and counts how many 0s there are, starting from the
LSB and stopping at the first 1. Then, it shifts 1u by this number and
compares it with the smaller mask. The problem is that when both masks
are 0, the counter will find 32 zeroes in the first mask and will try
to do a shift by 32 positions which leads to UB.

The fix is a simple sanity check, if the bigger mask is 0 or not.

https://reviews.llvm.org/D155051

Added: 
    

Modified: 
    llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
    llvm/test/CodeGen/AMDGPU/merge-image-load.mir
    llvm/test/CodeGen/AMDGPU/merge-image-sample.mir

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp b/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
index c252d30e250e27..17105965471f65 100644

--- a/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
@@ -871,6 +871,9 @@ bool SILoadStoreOptimizer::dmasksCanBeCombined(const CombineInfo &CI,
   unsigned MaxMask = std::max(CI.DMask, Paired.DMask);
   unsigned MinMask = std::min(CI.DMask, Paired.DMask);
 
+  if (!MaxMask)
+    return false;
+
   unsigned AllowedBitsForMin = llvm::countr_zero(MaxMask);
   if ((1u << AllowedBitsForMin) <= MinMask)
     return false;

diff  --git a/llvm/test/CodeGen/AMDGPU/merge-image-load.mir b/llvm/test/CodeGen/AMDGPU/merge-image-load.mir
index e4ccdb5f7d3244..7def666ce5d392 100644
--- a/llvm/test/CodeGen/AMDGPU/merge-image-load.mir
+++ b/llvm/test/CodeGen/AMDGPU/merge-image-load.mir
@@ -172,6 +172,24 @@ body:             |
 ...
 ---
 
+# GFX9-LABEL: name: image_load_dmask_zero_not_merged
+# GFX9: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V4 %5, %3, 0, 0, 0, 0, 0, 0, -1, 0, implicit $exec :: (dereferenceable load (s32), addrspace 4)
+# GFX9: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V4 %5, %3, 0, 0, 0, 0, 0, 0, -1, 0, implicit $exec :: (dereferenceable load (s96), align 16, addrspace 4)
+
+name: image_load_dmask_zero_not_merged
+body:             |
+  bb.0.entry:
+    %0:sgpr_64 = COPY $sgpr0_sgpr1
+    %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0
+    %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99
+    %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0
+    %4:vgpr_32 = COPY %2.sub3
+    %5:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET %2:sgpr_128, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load (s128))
+    %6:vgpr_32 = IMAGE_LOAD_V1_V4 %5:vreg_128, %3:sgpr_256, 0, 0, 0, 0, 0, 0, -1, 0, implicit $exec :: (dereferenceable load (s32), addrspace 4)
+    %7:vreg_96 = IMAGE_LOAD_V3_V4 %5:vreg_128, %3:sgpr_256, 0, 0, 0, 0, 0, 0, -1, 0, implicit $exec :: (dereferenceable load (s96), align 16, addrspace 4)
+...
+---
+
 # GFX9-LABEL: name: image_load_dmask_not_disjoint_not_merged
 # GFX9: %{{[0-9]+}}:vgpr_32 = IMAGE_LOAD_V1_V4 %5, %3, 4, 0, 0, 0, 0, 0, -1, 0, implicit $exec :: (dereferenceable load (s32), addrspace 4)
 # GFX9: %{{[0-9]+}}:vreg_96 = IMAGE_LOAD_V3_V4 %5, %3, 11, 0, 0, 0, 0, 0, -1, 0, implicit $exec :: (dereferenceable load (s96), align 16, addrspace 4)

diff  --git a/llvm/test/CodeGen/AMDGPU/merge-image-sample.mir b/llvm/test/CodeGen/AMDGPU/merge-image-sample.mir
index 0a86a5b7b6a5a0..445ec69570a9f9 100644
--- a/llvm/test/CodeGen/AMDGPU/merge-image-sample.mir
+++ b/llvm/test/CodeGen/AMDGPU/merge-image-sample.mir
@@ -172,6 +172,24 @@ body:             |
 ...
 ---
 
+# GFX9-LABEL: name: image_sample_l_dmask_zero_not_merged
+# GFX9: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V4 %5, %3, %2, 0, 0, 0, 0, 0, 0, -1, 0, implicit $exec :: (dereferenceable load (s32), addrspace 4)
+# GFX9: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V4 %5, %3, %2, 0, 0, 0, 0, 0, 0, -1, 0, implicit $exec :: (dereferenceable load (s96), align 16, addrspace 4)
+
+name: image_sample_l_dmask_zero_not_merged
+body:             |
+  bb.0.entry:
+    %0:sgpr_64 = COPY $sgpr0_sgpr1
+    %1:sreg_64_xexec = S_LOAD_DWORDX2_IMM %0, 36, 0
+    %2:sgpr_128 = COPY $sgpr96_sgpr97_sgpr98_sgpr99
+    %3:sgpr_256 = S_LOAD_DWORDX8_IMM %1, 208, 0
+    %4:vgpr_32 = COPY %2.sub3
+    %5:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET %2:sgpr_128, 0, 0, 0, 0, implicit $exec :: (dereferenceable invariant load (s128))
+    %6:vgpr_32 = IMAGE_SAMPLE_L_V1_V4 %5:vreg_128, %3:sgpr_256, %2:sgpr_128, 0, 0, 0, 0, 0, 0, -1, 0, implicit $exec :: (dereferenceable load (s32), addrspace 4)
+    %7:vreg_96 = IMAGE_SAMPLE_L_V3_V4 %5:vreg_128, %3:sgpr_256, %2:sgpr_128, 0, 0, 0, 0, 0, 0, -1, 0, implicit $exec :: (dereferenceable load (s96), align 16, addrspace 4)
+...
+---
+
 # GFX9-LABEL: name: image_sample_l_dmask_not_disjoint_not_merged
 # GFX9: %{{[0-9]+}}:vgpr_32 = IMAGE_SAMPLE_L_V1_V4 %5, %3, %2, 4, 0, 0, 0, 0, 0, -1, 0, implicit $exec :: (dereferenceable load (s32), addrspace 4)
 # GFX9: %{{[0-9]+}}:vreg_96 = IMAGE_SAMPLE_L_V3_V4 %5, %3, %2, 11, 0, 0, 0, 0, 0, -1, 0, implicit $exec :: (dereferenceable load (s96), align 16, addrspace 4)