[PATCH] D144715: [AMDGPU] Use `S_BFE_U64` for uniform i1-i64 ext

Fri Feb 24 04:03:40 PST 2023

arsenm added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/saddo.ll:36
+; SI-NEXT:    v_mov_b32_e32 v0, s4
+; SI-NEXT:    v_mov_b32_e32 v1, s5
 ; SI-NEXT:    buffer_store_dwordx2 v[0:1], off, s[0:3], 0
----------------
Pierre-vh wrote:
> Not sure if it's a regression here. Yes there's one more instruction, but we're using more scalar instructions so isn't it beneficial in the end?
It's one more instruction because of the copy to VGPR for the store. We probably should have a rematerialize-as-VALU optimization to handle this kind of case.

================
Comment at: llvm/test/CodeGen/AMDGPU/usubo.ll:19
+; SI-NEXT:    v_cmp_gt_u64_e32 vcc, s[0:1], v[0:1]
+; SI-NEXT:    s_bfe_u64 s[6:7], vcc, 0x10000
+; SI-NEXT:    s_add_u32 s6, s0, s6
----------------
Pierre-vh wrote:
> This looks a bit like a regression but I'm not sure how to address it. The pattern comes from `zext (setcc)`.
> I thought about adding a PatFrag that doesn't accept setcc operands to zext but it feels hacky.
> Thoughts?
Before this was a 32-bit select, so I assume this was a zext to i32 so I don't see why this zext to i64 change matters. What was the DAG here?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144715/new/

https://reviews.llvm.org/D144715