[R600/SI] Merge const_load for SI
hfinkel at anl.gov
Mon Apr 21 14:54:11 PDT 2014
----- Original Message -----
> From: "Vincent Lejeune" <vljn at ovi.com>
> To: llvm-commits at cs.uiuc.edu
> Sent: Monday, April 21, 2014 4:49:01 PM
> Subject: [R600/SI] Merge const_load for SI
> these two patches enable merging several S_LOAD_BUFFER_DWORD into a
> vector instruction.
> It does not use the SLPVectorizer because it doesnt seem to work on
> overloaded intrinsics.
I suppose that the SLP vectorizer does not currently understand any target-specific memory intrinsics, but I don't see why that would prevent us from teaching it about some. Could we add some TTI callback to allow intrinsic vectorization without pushing target-dependent code into the SLP vectorizer itself?
> I used codeXL output from AMD's openCL sample "ConstantBandwidth",
> and it looks like the driver does not attempt to merge more than 4x
> dword together,
> that's why the new pass does not attempt to merge more than 4 scalar
> load together. I suspect the latency advantage of using a 8 or 16x
> dword load doesn't outweight
> the additionnal scalar register consumption.
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-commits