[PATCH] D32723: AMDGPU: Allow vectorization of packed types

Changpeng Fang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 20 13:14:58 PDT 2017


cfang accepted this revision.
cfang added inline comments.


================
Comment at: lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:261
+  // TODO: Enable this again.
   if (VF == 1)
     return 1;
----------------
The interleaving was disabled based on SHOC DeviceMemory readLocalMemory test. We request CQE to do a complete performance 
measurement around this, and the results were very positive. The major reason to disable it is based on register usage concern.

I remember that I re-measure DeviceMemory performance later when new waitcnt insertion was introduced, and it turned out that it does not matter for DeviceMemory readLocalMemory if we enable it!

Note sure the other tests that CQE found beneficial when it is disabled. 


https://reviews.llvm.org/D32723





More information about the llvm-commits mailing list