[all-commits] [llvm/llvm-project] b89236: [AMDGPU] Vectorize misaligned global loads & stores
Jeffrey Byrnes via All-commits
all-commits at lists.llvm.org
Fri Mar 3 13:19:14 PST 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: b89236a96f2f2f3e9b88d198585a8eda7fb2c443
https://github.com/llvm/llvm-project/commit/b89236a96f2f2f3e9b88d198585a8eda7fb2c443
Author: Jeffrey Byrnes <Jeffrey.Byrnes at amd.com>
Date: 2023-03-03 (Fri, 03 Mar 2023)
Changed paths:
M llvm/lib/Target/AMDGPU/AMDGPU.h
M llvm/lib/Target/AMDGPU/SIISelLowering.cpp
M llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.global.ll
A llvm/test/CodeGen/AMDGPU/global-i16-load-store.ll
M llvm/test/CodeGen/AMDGPU/load-constant-i16.ll
M llvm/test/CodeGen/AMDGPU/load-global-i16.ll
M llvm/test/CodeGen/AMDGPU/udiv.ll
M llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll
Log Message:
-----------
[AMDGPU] Vectorize misaligned global loads & stores
Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments.
Differential Revision: https://reviews.llvm.org/D145170
Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1
More information about the All-commits
mailing list