[PATCH] D32723: AMDGPU: Allow vectorization of packed types
Changpeng Fang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 20 13:14:58 PDT 2017
cfang accepted this revision.
cfang added inline comments.
================
Comment at: lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:261
+ // TODO: Enable this again.
if (VF == 1)
return 1;
----------------
The interleaving was disabled based on SHOC DeviceMemory readLocalMemory test. We request CQE to do a complete performance
measurement around this, and the results were very positive. The major reason to disable it is based on register usage concern.
I remember that I re-measure DeviceMemory performance later when new waitcnt insertion was introduced, and it turned out that it does not matter for DeviceMemory readLocalMemory if we enable it!
Note sure the other tests that CQE found beneficial when it is disabled.
https://reviews.llvm.org/D32723
More information about the llvm-commits
mailing list