[R600/SI] Merge const_load for SI

Mon Apr 21 14:49:01 PDT 2014

Hi,

these two patches enable merging several S_LOAD_BUFFER_DWORD into a vector instruction.
It does not use the SLPVectorizer because it doesnt seem to work on overloaded intrinsics.

I used codeXL output from AMD's openCL sample "ConstantBandwidth", and it looks like the driver does not attempt to merge more than 4x dword together,
that's why the new pass does not attempt to merge more than 4 scalar load together. I suspect the latency advantage of using a 8 or 16x dword load doesn't outweight
the additionnal scalar register consumption.

Vincent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140421/ee7c60ba/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-R600-SI-Add-a-SIBufferLoadMerger-pass.patch
Type: text/x-patch
Size: 8862 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140421/ee7c60ba/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-R600-SI-Support-for-vector-overloads-of-load_const.patch
Type: text/x-patch
Size: 4742 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140421/ee7c60ba/attachment-0001.bin>