[llvm] [AMDGPU] Enable vectorization of i8 values. (PR #134934)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 16 23:11:13 PDT 2025
================
@@ -344,9 +344,12 @@ unsigned GCNTTIImpl::getMinVectorRegisterBitWidth() const {
unsigned GCNTTIImpl::getMaximumVF(unsigned ElemWidth, unsigned Opcode) const {
if (Opcode == Instruction::Load || Opcode == Instruction::Store)
return 32 * 4 / ElemWidth;
- return (ElemWidth == 16 && ST->has16BitInsts()) ? 2
- : (ElemWidth == 32 && ST->hasPackedFP32Ops()) ? 2
- : 1;
+ // For a given width return the max 0number of elements that can be combined
+ // into a wider bit value:
+ return (ElemWidth == 8 && ST->has16BitInsts()) ? 4
----------------
arsenm wrote:
```suggestion
return (ElemWidth == 8) ? 4
```
Don't see has16Bits is really relevant to the 8-bit case. I guess hasSDWA would be a more plausible reason, but given this is artificial usage anyway it probably can be unconditional.
https://github.com/llvm/llvm-project/pull/134934
More information about the llvm-commits
mailing list