[PATCH] D123524: [AMDGCN] Split unaligned 3 DWORD DS operations
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 11 10:43:26 PDT 2022
rampitec created this revision.
rampitec added reviewers: arsenm, foad.
Herald added subscribers: hsmhsm, kerbowa, hiraditya, nhaehnle, jvesely.
Herald added a project: All.
rampitec requested review of this revision.
Herald added a subscriber: wdng.
Herald added a project: LLVM.
I have written a minitest to check the performance. Overall
the benefit of aligned b96 operations on data which is not
known but happens to be aligned is small, while performance
hit of using b96 operations on a really unaligned memory is
high.
The only exception is when data is not aligned even by 4, it
is better to use b96 in this case.
Here is the test output on Vega and Navi:
Using platform: AMD Accelerated Parallel Processing
Using device: gfx900:xnack-
ds_write_b96 aligned: 3.4 sec
ds_write_b32 + ds_write_b64 aligned: 4.5 sec
ds_write_b32 * 3 aligned: 4.8 sec
ds_write_b96 misaligned by 1: 4.8 sec
ds_write_b32 + ds_write_b64 misaligned by 1: 7.2 sec
ds_write_b32 * 3 misaligned by 1: 10.0 sec
ds_write_b96 misaligned by 2: 4.8 sec
ds_write_b32 + ds_write_b64 misaligned by 2: 7.2 sec
ds_write_b32 * 3 misaligned by 2: 10.1 sec
ds_write_b96 misaligned by 4: 4.8 sec
ds_write_b32 + ds_write_b64 misaligned by 4: 4.2 sec
ds_write_b32 * 3 misaligned by 4: 4.9 sec
ds_write_b96 misaligned by 8: 4.8 sec
ds_write_b32 + ds_write_b64 misaligned by 8: 4.6 sec
ds_write_b32 * 3 misaligned by 8: 4.9 sec
ds_read_b96 aligned: 3.3 sec
ds_read_b32 + ds_read_b64 aligned: 4.9 sec
ds_read_b32 * 3 aligned: 2.6 sec
ds_read_b96 misaligned by 1: 4.1 sec
ds_read_b32 + ds_read_b64 misaligned by 1: 7.2 sec
ds_read_b32 * 3 misaligned by 1: 10.1 sec
ds_read_b96 misaligned by 2: 4.1 sec
ds_read_b32 + ds_read_b64 misaligned by 2: 7.2 sec
ds_read_b32 * 3 misaligned by 2: 10.1 sec
ds_read_b96 misaligned by 4: 4.1 sec
ds_read_b32 + ds_read_b64 misaligned by 4: 2.6 sec
ds_read_b32 * 3 misaligned by 4: 2.6 sec
ds_read_b96 misaligned by 8: 4.1 sec
ds_read_b32 + ds_read_b64 misaligned by 8: 4.9 sec
ds_read_b32 * 3 misaligned by 8: 2.6 sec
Using platform: AMD Accelerated Parallel Processing
Using device: gfx1030
ds_write_b96 aligned: 4.1 sec
ds_write_b32 + ds_write_b64 aligned: 13.0 sec
ds_write_b32 * 3 aligned: 4.5 sec
ds_write_b96 misaligned by 1: 12.5 sec
ds_write_b32 + ds_write_b64 misaligned by 1: 22.0 sec
ds_write_b32 * 3 misaligned by 1: 31.5 sec
ds_write_b96 misaligned by 2: 12.4 sec
ds_write_b32 + ds_write_b64 misaligned by 2: 22.0 sec
ds_write_b32 * 3 misaligned by 2: 31.5 sec
ds_write_b96 misaligned by 4: 12.4 sec
ds_write_b32 + ds_write_b64 misaligned by 4: 4.0 sec
ds_write_b32 * 3 misaligned by 4: 4.5 sec
ds_write_b96 misaligned by 8: 12.4 sec
ds_write_b32 + ds_write_b64 misaligned by 8: 13.0 sec
ds_write_b32 * 3 misaligned by 8: 4.5 sec
ds_read_b96 aligned: 3.8 sec
ds_read_b32 + ds_read_b64 aligned: 12.8 sec
ds_read_b32 * 3 aligned: 4.4 sec
ds_read_b96 misaligned by 1: 10.9 sec
ds_read_b32 + ds_read_b64 misaligned by 1: 21.8 sec
ds_read_b32 * 3 misaligned by 1: 31.5 sec
ds_read_b96 misaligned by 2: 10.9 sec
ds_read_b32 + ds_read_b64 misaligned by 2: 21.9 sec
ds_read_b32 * 3 misaligned by 2: 31.5 sec
ds_read_b96 misaligned by 4: 10.9 sec
ds_read_b32 + ds_read_b64 misaligned by 4: 3.8 sec
ds_read_b32 * 3 misaligned by 4: 4.5 sec
ds_read_b96 misaligned by 8: 10.9 sec
ds_read_b32 + ds_read_b64 misaligned by 8: 12.8 sec
ds_read_b32 * 3 misaligned by 8: 4.5 sec
Fixes: SWDEV-330802
https://reviews.llvm.org/D123524
Files:
llvm/lib/Target/AMDGPU/DSInstructions.td
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
llvm/test/CodeGen/AMDGPU/ds-alignment.ll
llvm/test/CodeGen/AMDGPU/lds-misaligned-bug.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D123524.421963.patch
Type: text/x-patch
Size: 4354 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220411/39bd4ab3/attachment.bin>
More information about the llvm-commits
mailing list