[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

Ees Kee via llvm-dev llvm-dev at lists.llvm.org
Sat Aug 22 09:11:22 PDT 2020

Hi Madhur and thanks for your answer.

> You can't replace thread specific variables with one thread ID.

Why not? Let me rephrase. What I'm looking for at this stage is to be able
to pick a thread in a block, and see for this particular thread, how many
memory accesses in the kernel are (statically) inferable.

For instance for these kernels
you provide concrete values for grid block and index as well as the scalar
arguments you can tell (manually) which offsets off of the pointer
arguments are being accessed by the kernel.
In contrast, in a kernel like this
cant infer them all because some indices are data-dependent.

What i'm looking for - and again, this is only a first step to something
bigger - is to automate this process.

Στις Σάβ, 22 Αυγ 2020 στις 5:38 μ.μ., ο/η Madhur Amilkanthwar <
madhur13490 at gmail.com> έγραψε:

> CUDA/GPU programs are written for a SIMT SIMD model, which means single
> instruction, multiple threads and multiple data. Programmers write a single
> program in such a way that each thread would execute it with different
> data. So, a program is one physical copy but virtually it's run by several
> threads so those grid/thread IDs are really meant for semantics of the
> program. You can't replace thread specific variables with one thread ID.
> Hence, I don't think what you're proposing would have much applicability
> in real-world benchmarks like Rodinia.If you have a strong motivating
> example then please provide a counter argument but in my experience, it
> won't be much useful.
> In some corner cases, it would be useful but those would be a general case
> of uniform code blocks.
> On Sat, Aug 22, 2020 at 8:09 PM Ees Kee via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>> Hi all,
>> As part of my research I want to investigate the relation between the
>> grid's geometry and the memory accesses of a kernel in common gpu
>> benchmarks (e.g Rodinia, Polybench etc). As a first step i want to
>> answer the following question:
>> - Given a kernel function with M possible memory accesses. For how many
>> of those M accesses we can statically infer its location given concrete
>> values for the grid/block and executing thread?
>> (Assume CUDA only for now)
>> My initial idea is to replace all uses of dim-related values, e.g:
>>     __cuda_builtin_blockDim_t::__fetch_builtin_x()
>>     __cuda_builtin_gridDim_t::__fetch_builtin_x()
>> and index related values, e.g:
>>     __cuda_builtin_blockIdx_t::__fetch_builtin_x()
>>     __cuda_builtin_threadIdx_t::__fetch_builtin_x()
>> with ConstantInts. Then run constant folding on the result and check how
>> many GEPs have constant values.
>> Would something like this work or are there complications I am not
>> thinking of? I'd appreciate any suggestions.
>> P.S i am new to LLVM
>> Thanks in advance,
>> Ees
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> --
> *Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this
> mail are of my own and my employer has no take in it. *
> Thank You.
> Madhur D. Amilkanthwar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200822/361b5b1f/attachment.html>

More information about the llvm-dev mailing list