[all-commits] [llvm/llvm-project] c799f8: [AMDGPU] Don't cluster stores

Mon Sep 14 05:40:46 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: c799f873cb9feaea265aa3df8f3372949f8263d0
      https://github.com/llvm/llvm-project/commit/c799f873cb9feaea265aa3df8f3372949f8263d0
  Author: Jay Foad <jay.foad at amd.com>
  Date:   2020-09-14 (Mon, 14 Sep 2020)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
    M llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement-stack-lower.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.i16.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.large.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/load-unaligned.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/store-local.128.ll
    M llvm/test/CodeGen/AMDGPU/GlobalISel/store-local.96.ll
    M llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll
    M llvm/test/CodeGen/AMDGPU/call-argument-types.ll
    M llvm/test/CodeGen/AMDGPU/cluster_stores.ll
    M llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.global.ll
    M llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll
    M llvm/test/CodeGen/AMDGPU/fshr.ll
    M llvm/test/CodeGen/AMDGPU/half.ll
    M llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll
    M llvm/test/CodeGen/AMDGPU/local-memory.amdgcn.ll
    M llvm/test/CodeGen/AMDGPU/memory_clause.ll
    M llvm/test/CodeGen/AMDGPU/merge-stores.ll
    M llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll
    M llvm/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll
    M llvm/test/CodeGen/AMDGPU/store-local.128.ll
    M llvm/test/CodeGen/AMDGPU/store-local.96.ll
    M llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll
    M llvm/test/CodeGen/AMDGPU/token-factor-inline-limit-test.ll
    M llvm/test/CodeGen/AMDGPU/widen-smrd-loads.ll

  Log Message:
  -----------
  [AMDGPU] Don't cluster stores

Clustering loads has caching benefits, but as far as I know there is no
advantage to clustering stores on any AMDGPU subtargets.

The disadvantage is that it tends to increase register pressure and
restricts scheduling freedom.

Differential Revision: https://reviews.llvm.org/D85530