[PATCH] D123634: [AMDGPU] Split unaligned 4 DWORD DS operations

Tue Apr 12 14:18:26 PDT 2022

rampitec created this revision.
rampitec added reviewers: arsenm, foad.
Herald added subscribers: hsmhsm, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, jvesely, kzhuravl.
Herald added a project: All.
rampitec requested review of this revision.
Herald added a subscriber: wdng.
Herald added a project: LLVM.

Similarly to 3 DWORD operations it is better for performance
to split unlaligned operations as long a these are at least
DWORD alignmened. Performance data:

  Using platform: AMD Accelerated Parallel Processing
  Using device: gfx900:xnack-

  ds_write_b128                      aligned by 16:  4.9 sec
  ds_write2_b64                      aligned by 16:  5.1 sec
  ds_write2_b32 * 2                  aligned by 16:  5.5 sec
  ds_write_b128                      aligned by  1:  8.1 sec
  ds_write2_b64                      aligned by  1:  8.7 sec
  ds_write2_b32 * 2                  aligned by  1: 14.0 sec
  ds_write_b128                      aligned by  2:  8.1 sec
  ds_write2_b64                      aligned by  2:  8.7 sec
  ds_write2_b32 * 2                  aligned by  2: 14.0 sec
  ds_write_b128                      aligned by  4:  5.6 sec
  ds_write2_b64                      aligned by  4:  8.7 sec
  ds_write2_b32 * 2                  aligned by  4:  5.6 sec
  ds_write_b128                      aligned by  8:  5.6 sec
  ds_write2_b64                      aligned by  8:  5.1 sec
  ds_write2_b32 * 2                  aligned by  8:  5.6 sec
  ds_read_b128                       aligned by 16:  3.8 sec
  ds_read2_b64                       aligned by 16:  3.8 sec
  ds_read2_b32 * 2                   aligned by 16:  4.0 sec
  ds_read_b128                       aligned by  1:  4.6 sec
  ds_read2_b64                       aligned by  1:  8.1 sec
  ds_read2_b32 * 2                   aligned by  1: 14.0 sec
  ds_read_b128                       aligned by  2:  4.6 sec
  ds_read2_b64                       aligned by  2:  8.1 sec
  ds_read2_b32 * 2                   aligned by  2: 14.0 sec
  ds_read_b128                       aligned by  4:  4.6 sec
  ds_read2_b64                       aligned by  4:  8.1 sec
  ds_read2_b32 * 2                   aligned by  4:  4.0 sec
  ds_read_b128                       aligned by  8:  4.6 sec
  ds_read2_b64                       aligned by  8:  3.8 sec
  ds_read2_b32 * 2                   aligned by  8:  4.0 sec

  Using platform: AMD Accelerated Parallel Processing
  Using device: gfx1030

  ds_write_b128                      aligned by 16:  6.2 sec
  ds_write2_b64                      aligned by 16:  7.1 sec
  ds_write2_b32 * 2                  aligned by 16:  7.6 sec
  ds_write_b128                      aligned by  1: 24.1 sec
  ds_write2_b64                      aligned by  1: 25.2 sec
  ds_write2_b32 * 2                  aligned by  1: 43.7 sec
  ds_write_b128                      aligned by  2: 24.1 sec
  ds_write2_b64                      aligned by  2: 25.1 sec
  ds_write2_b32 * 2                  aligned by  2: 43.7 sec
  ds_write_b128                      aligned by  4: 14.4 sec
  ds_write2_b64                      aligned by  4: 25.1 sec
  ds_write2_b32 * 2                  aligned by  4:  7.6 sec
  ds_write_b128                      aligned by  8: 14.4 sec
  ds_write2_b64                      aligned by  8:  7.1 sec
  ds_write2_b32 * 2                  aligned by  8:  7.6 sec
  ds_read_b128                       aligned by 16:  6.2 sec
  ds_read2_b64                       aligned by 16:  6.3 sec
  ds_read2_b32 * 2                   aligned by 16:  7.5 sec
  ds_read_b128                       aligned by  1: 12.5 sec
  ds_read2_b64                       aligned by  1: 24.0 sec
  ds_read2_b32 * 2                   aligned by  1: 43.6 sec
  ds_read_b128                       aligned by  2: 12.5 sec
  ds_read2_b64                       aligned by  2: 24.0 sec
  ds_read2_b32 * 2                   aligned by  2: 43.6 sec
  ds_read_b128                       aligned by  4: 12.5 sec
  ds_read2_b64                       aligned by  4: 24.0 sec
  ds_read2_b32 * 2                   aligned by  4:  7.5 sec
  ds_read_b128                       aligned by  8: 12.5 sec
  ds_read2_b64                       aligned by  8:  6.3 sec
  ds_read2_b32 * 2                   aligned by  8:  7.5 sec

https://reviews.llvm.org/D123634

Files:
  llvm/lib/Target/AMDGPU/DSInstructions.td
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp
  llvm/lib/Target/AMDGPU/SIInstrInfo.td
  llvm/test/CodeGen/AMDGPU/GlobalISel/load-unaligned.ll
  llvm/test/CodeGen/AMDGPU/ds-alignment.ll
  llvm/test/CodeGen/AMDGPU/ds_write2.ll
  llvm/test/CodeGen/AMDGPU/lds-misaligned-bug.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D123634.422333.patch
Type: text/x-patch
Size: 11178 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220412/b09cac3d/attachment.bin>