[llvm-dev] NVPTX - Reordering load instructions

Fri Jun 22 06:31:59 PDT 2018

Hi Hal, Justin,

Very useful information, thanks! LSV definitely seems like the best way
to approach this, I was too focused on matching nvcc's output. It also
doesn't look like NVPTX uses the MachineScheduler, and enabling it +
load clustering didn't seem to have any impact (but I didn't look very
closely into it).

> I think the answer is, llvm can't tell that the loads are aligned.
> Ptxas can, but only because it's (apparently) doing vectorization
> *after* it reesolves the shmem variables to physical addresses.  That
> is a cool trick, and llvm can't do it, because llvm never sees the
> physical shmem addresses.
>
> If you told llvm that the shmem variables were aligned to 16 bytes,
> LSV might do what you want here.  llvm and ptxas should be able to
> cooperate to give you the alignment you ask for in the IR.

That's pretty cool indeed, bumping the shmem GV alignment to 16 bytes
enables LSV and gets me most of the way. Some operations still aren't
vectorized though, but I know where to look now.

> It's possible that clang should opportunistically mark all shmem
> variables over a certain size as align(16) so that this happens
> automagically.  That would kind of be a weird heuristic, but maybe it
> makes sense.  I don't think that would make sense for LLVM to do that,
> though, so it wouldn't help you.

Easy enough for us to do this [1], so I'll try it out :-) That said,
nvcc emits code with `.align 4`. Maybe they rely on ptxas for that.

1: https://github.com/JuliaGPU/CUDAnative.jl/pull/204

Best,
-- 
Tim Besard
Computer Systems Lab
Department of Electronics & Information Systems
Ghent University