[PATCH] D145170: [AMDGPU] Vectorize misaligned global loads & stores

Thu Mar 2 09:47:23 PST 2023

jrbyrnes created this revision.
jrbyrnes added reviewers: rampitec, arsenm.
Herald added subscribers: kosarev, foad, kerbowa, hiraditya, tpr, dstuttard, yaxunl, jvesely, kzhuravl.
Herald added a project: All.
jrbyrnes requested review of this revision.
Herald added subscribers: llvm-commits, pcwang-thead, wdng.
Herald added a project: LLVM.

Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments.

Change-Id: I1713c6edfc189052b8a71dc1135f9a436c1042e0

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D145170

Files:
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.global.ll
  llvm/test/CodeGen/AMDGPU/global-i16-load-store.ll
  llvm/test/CodeGen/AMDGPU/load-constant-i16.ll
  llvm/test/CodeGen/AMDGPU/load-global-i16.ll
  llvm/test/CodeGen/AMDGPU/udiv.ll
  llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D145170.501905.patch
Type: text/x-patch
Size: 23382 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230302/6c882cb2/attachment.bin>