[PATCH] D97062: [AMDGPU] Use divergent addresses for vector loads

Fri Feb 19 09:29:02 PST 2021

foad created this revision.
foad added reviewers: arsenm, rampitec.
Herald added subscribers: kerbowa, t-tye, tpr, dstuttard, yaxunl, nhaehnle, jvesely, kzhuravl.
foad requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.

Change some test cases to use divergent addresses for vector loads,
which should be the common case in real world code. Using uniform
addresses causes poor instruction selection for the surrounding
code which has to be fixed up post-register-allocation, and this causes
a lot of testsuite churn for a forthcoming patch to stop selecting
24-bit vector multiply instructions for uniform multiplies.

This shows up some problems in the idot tests where we fail to select
v_dot instructions because the patterns only match MUL_[UI]24 ISD nodes,
but the DAG contains i16 mul nodes instead.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D97062

Files:
  llvm/test/CodeGen/AMDGPU/idot2.ll
  llvm/test/CodeGen/AMDGPU/idot4s.ll
  llvm/test/CodeGen/AMDGPU/idot4u.ll
  llvm/test/CodeGen/AMDGPU/idot8s.ll
  llvm/test/CodeGen/AMDGPU/idot8u.ll
  llvm/test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.ll
  llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll