[R600/SI] Merge const_load for SI

Mon Apr 21 14:54:11 PDT 2014

----- Original Message -----
> From: "Vincent Lejeune" <vljn at ovi.com>
> To: llvm-commits at cs.uiuc.edu
> Sent: Monday, April 21, 2014 4:49:01 PM
> Subject: [R600/SI] Merge const_load for SI
> 
> 
> 
> 
> Hi,
> 
> 
> these two patches enable merging several S_LOAD_BUFFER_DWORD into a
> vector instruction.
> It does not use the SLPVectorizer because it doesnt seem to work on
> overloaded intrinsics.

I suppose that the SLP vectorizer does not currently understand any target-specific memory intrinsics, but I don't see why that would prevent us from teaching it about some. Could we add some TTI callback to allow intrinsic vectorization without pushing target-dependent code into the SLP vectorizer itself?

 -Hal

> 
> I used codeXL output from AMD's openCL sample "ConstantBandwidth",
> and it looks like the driver does not attempt to merge more than 4x
> dword together,
> that's why the new pass does not attempt to merge more than 4 scalar
> load together. I suspect the latency advantage of using a 8 or 16x
> dword load doesn't outweight
> the additionnal scalar register consumption.
> 
> 
> Vincent
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory