[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

Sun Aug 23 10:43:52 PDT 2020

@Ees,
Oh, I see what you mean now. Doing such analysis would be useful for a
thread block and not just a single thread but as you say you are onto
something bigger than just a thread.

We had published a short paper in ICS around this which uses polyhedral
techniques to do such analysis and reason about uncoalesced access patterns
in Cuda programs. You can find paper at
https://dl.acm.org/doi/10.1145/2464996.2467288

On Sun, Aug 23, 2020, 11:00 PM Johannes Doerfert via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi Ees,
>
> a while back we started a project with similar scope.
> Unfortunately the development slowed down and the plans to revive it
> this summer got tanked by the US travel restrictions.
>
> Anyway, there is some some existing code that might be useful, though in
> a prototype stage. While I'm obviously biased, I would suggest we
> continue from there.
>
> @Alex @Holger can we put the latest version on github or some other
> place to share it, I'm unsure if the code I (might have) access to is
> the latest.
>
> @Ees I attached a recent paper and you might find the following links
> useful:
>
>     * 2017 LLVM Developers’ Meeting: J. Doerfert “Polyhedral Value &
> Memory Analysis ” https://youtu.be/xSA0XLYJ-G0
>
>     * "Automated Partitioning of Data-Parallel Kernels using Polyhedral
> Compilation.", P2S2 2020 (slides and video
> https://www.mcs.anl.gov/events/workshops/p2s2/2020/program.php)
>
>
> Let us know what you think :)
>
> ~ Johannes
>
>
>
>
> On 8/22/20 9:38 AM, Ees Kee via llvm-dev wrote:
>  > Hi all,
>  >
>  > As part of my research I want to investigate the relation between the
>  > grid's geometry and the memory accesses of a kernel in common gpu
>  > benchmarks (e.g Rodinia, Polybench etc). As a first step i want to
>  > answer the following question:
>  >
>  > - Given a kernel function with M possible memory accesses. For how
> many of
>  > those M accesses we can statically infer its location given concrete
> values
>  > for the grid/block and executing thread?
>  >
>  > (Assume CUDA only for now)
>  >
>  > My initial idea is to replace all uses of dim-related values, e.g:
>  >     __cuda_builtin_blockDim_t::__fetch_builtin_x()
>  >     __cuda_builtin_gridDim_t::__fetch_builtin_x()
>  >
>  > and index related values, e.g:
>  >     __cuda_builtin_blockIdx_t::__fetch_builtin_x()
>  >     __cuda_builtin_threadIdx_t::__fetch_builtin_x()
>  >
>  > with ConstantInts. Then run constant folding on the result and check how
>  > many GEPs have constant values.
>  >
>  > Would something like this work or are there complications I am not
> thinking
>  > of? I'd appreciate any suggestions.
>  >
>  > P.S i am new to LLVM
>  >
>  > Thanks in advance,
>  > Ees
>  >
>  >
>  > _______________________________________________
>  > LLVM Developers mailing list
>  > llvm-dev at lists.llvm.org
>  > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200823/5eaeecfc/attachment.html>