[all-commits] [llvm/llvm-project] 25c5da: AMDGPU Reduce reported maximum group size to 1024

Tue Nov 12 17:41:47 PST 2019

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 25c5da5a426168b38fb3e9baa918faa75e4a92b4
      https://github.com/llvm/llvm-project/commit/25c5da5a426168b38fb3e9baa918faa75e4a92b4
  Author: Matt Arsenault <Matthew.Arsenault at amd.com>
  Date:   2019-11-13 (Wed, 13 Nov 2019)

  Changed paths:
    M llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
    M llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size-v3.ll
    M llvm/test/CodeGen/AMDGPU/attr-amdgpu-flat-work-group-size.ll
    M llvm/test/CodeGen/AMDGPU/large-work-group-promote-alloca.ll

  Log Message:
  -----------
  AMDGPU Reduce reported maximum group size to 1024

While some targets allow encoding 2048, this was never tested or
supported.

  Commit: 4b472139513ba460595804f8113497844b41fbcc
      https://github.com/llvm/llvm-project/commit/4b472139513ba460595804f8113497844b41fbcc
  Author: Matt Arsenault <Matthew.Arsenault at amd.com>
  Date:   2019-11-13 (Wed, 13 Nov 2019)

  Changed paths:
    M llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
    M llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll
    M llvm/test/CodeGen/AMDGPU/array-ptr-calc-i32.ll
    M llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props-v3.ll
    M llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props.ll
    M llvm/test/CodeGen/AMDGPU/lower-range-metadata-intrinsic-call.ll
    M llvm/test/CodeGen/AMDGPU/occupancy-levels.ll
    M llvm/test/CodeGen/AMDGPU/private-memory-r600.ll
    M llvm/test/CodeGen/AMDGPU/promote-alloca-addrspacecast.ll
    M llvm/test/CodeGen/AMDGPU/promote-alloca-to-lds-icmp.ll
    M llvm/test/CodeGen/AMDGPU/promote-alloca-to-lds-phi.ll
    M llvm/test/CodeGen/AMDGPU/promote-alloca-to-lds-select.ll

  Log Message:
  -----------
  AMDGPU: Switch backend default max workgroup size to 1024

Previously this would default to 256, not the maximum supported size
of 1024. Using a maximum lower than the hardware maximum requires
language runtimes to enforce this limit for correctness, which no
language has correctly done. Switch the default to the conservatively
correct maximum, and force frontends to opt-in to the more optimal 256
default maximum.

I don't really understand why the changes in occupancy-levels.ll
increased the computed occupancy, which I expected to decrease. I'm
not sure if these tests should be forcing the old maximum.

Compare: https://github.com/llvm/llvm-project/compare/793b42a454ac...4b472139513b