[llvm] [AMDGPU][LRO] LRO fix PHI same-BB filter; treat i8/i16 binops as profitable (PR #155800)

Thu Aug 28 03:02:54 PDT 2025

================
@@ -0,0 +1,67 @@
+; REQUIRES: amdgpu-registered-target
+; RUN: opt -S -passes=amdgpu-late-codegenprepare \
+; RUN:   -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a %s | FileCheck %s
+
+; Purpose:
+;  - Input has a loop-carried PHI of type <4 x i8> and byte-wise adds in the
+;    loop header (same basic block as the PHI).
+;  - After amdgpu-late-codegenprepare, the PHI must be coerced to i32 across
+;    the backedge, and a single dominating "bitcast i32 -> <4 x i8>" must be
+;    placed in the header (enabling SDWA-friendly lowering later).
+;
+; What we check:
+;  - PHI is i32 (no loop-carried <4 x i8> PHI remains).
+;  - A header-local bitcast i32 -> <4 x i8> exists and feeds the vector add.
+;  - The loopexit produces a bitcast <4 x i8> -> i32 for the backedge.
+
+target triple = "amdgcn-amd-amdhsa"
+
+define amdgpu_kernel void @lro_coerce_v4i8_phi(i8* nocapture %p, i32 %n) #0 {
----------------
arsenm wrote:

Use opaque pointers 

https://github.com/llvm/llvm-project/pull/155800