[all-commits] [llvm/llvm-project] b89236: [AMDGPU] Vectorize misaligned global loads & stores

Fri Mar 3 13:19:14 PST 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: b89236a96f2f2f3e9b88d198585a8eda7fb2c443
      https://github.com/llvm/llvm-project/commit/b89236a96f2f2f3e9b88d198585a8eda7fb2c443
  Author: Jeffrey Byrnes <Jeffrey.Byrnes at amd.com>
  Date:   2023-03-03 (Fri, 03 Mar 2023)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPU.h
    M llvm/lib/Target/AMDGPU/SIISelLowering.cpp
    M llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.global.ll
    A llvm/test/CodeGen/AMDGPU/global-i16-load-store.ll
    M llvm/test/CodeGen/AMDGPU/load-constant-i16.ll
    M llvm/test/CodeGen/AMDGPU/load-global-i16.ll
    M llvm/test/CodeGen/AMDGPU/udiv.ll
    M llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll

  Log Message:
  -----------
  [AMDGPU] Vectorize misaligned global loads & stores

Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments.

Differential Revision: https://reviews.llvm.org/D145170

Change-Id: I6ee6c76e6ace7fc373cc1b2aac3818fc1425a0c1