<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>@Madhur Thank you i will have a look at the paper.</p>
<p>> Doing such analysis would be useful for a thread block and
not just a single thread</p>
<p>Do you have any concrete use cases in mind?</p>
<p>I was thinking that i could use such an analysis to, for
instance, visualize the memory accesses performed by the kernel
(or at least the ones that it is possible to infer). Relevant
literature i find always involves tracing every access. So I'm
thinking that with something like this, tracing can be
(potentially) significantly reduced.</p>
<p>-Ees<br>
</p>
<div class="moz-cite-prefix">On 23-08-2020 19:43, Madhur
Amilkanthwar wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAMTh1gViKWN_yxcw69Ze2BnBx4wL79BB4QUWrmzZ1tdm-bWCqg@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="auto">@Ees,
<div dir="auto">Oh, I see what you mean now. Doing such analysis
would be useful for a thread block and not just a single
thread but as you say you are onto something bigger than just
a thread.</div>
<div dir="auto"><br>
</div>
<div dir="auto">We had published a short paper in ICS around
this which uses polyhedral techniques to do such analysis and
reason about uncoalesced access patterns in Cuda programs. You
can find paper at</div>
<div dir="auto"><a
href="https://dl.acm.org/doi/10.1145/2464996.2467288"
moz-do-not-send="true">https://dl.acm.org/doi/10.1145/2464996.2467288</a><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sun, Aug 23, 2020, 11:00 PM
Johannes Doerfert via llvm-dev <<a
href="mailto:llvm-dev@lists.llvm.org" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Ees,<br>
<br>
a while back we started a project with similar scope.<br>
Unfortunately the development slowed down and the plans to
revive it <br>
this summer got tanked by the US travel restrictions.<br>
<br>
Anyway, there is some some existing code that might be useful,
though in <br>
a prototype stage. While I'm obviously biased, I would suggest
we <br>
continue from there.<br>
<br>
@Alex @Holger can we put the latest version on github or some
other <br>
place to share it, I'm unsure if the code I (might have)
access to is <br>
the latest.<br>
<br>
@Ees I attached a recent paper and you might find the
following links <br>
useful:<br>
<br>
* 2017 LLVM Developers’ Meeting: J. Doerfert “Polyhedral
Value & <br>
Memory Analysis ” <a href="https://youtu.be/xSA0XLYJ-G0"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">https://youtu.be/xSA0XLYJ-G0</a><br>
<br>
* "Automated Partitioning of Data-Parallel Kernels using
Polyhedral <br>
Compilation.", P2S2 2020 (slides and video <br>
<a
href="https://www.mcs.anl.gov/events/workshops/p2s2/2020/program.php"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">https://www.mcs.anl.gov/events/workshops/p2s2/2020/program.php</a>)<br>
<br>
<br>
Let us know what you think :)<br>
<br>
~ Johannes<br>
<br>
<br>
<br>
<br>
On 8/22/20 9:38 AM, Ees Kee via llvm-dev wrote:<br>
> Hi all,<br>
><br>
> As part of my research I want to investigate the
relation between the<br>
> grid's geometry and the memory accesses of a kernel in
common gpu<br>
> benchmarks (e.g Rodinia, Polybench etc). As a first step
i want to<br>
> answer the following question:<br>
><br>
> - Given a kernel function with M possible memory
accesses. For how <br>
many of<br>
> those M accesses we can statically infer its location
given concrete <br>
values<br>
> for the grid/block and executing thread?<br>
><br>
> (Assume CUDA only for now)<br>
><br>
> My initial idea is to replace all uses of dim-related
values, e.g:<br>
> __cuda_builtin_blockDim_t::__fetch_builtin_x()<br>
> __cuda_builtin_gridDim_t::__fetch_builtin_x()<br>
><br>
> and index related values, e.g:<br>
> __cuda_builtin_blockIdx_t::__fetch_builtin_x()<br>
> __cuda_builtin_threadIdx_t::__fetch_builtin_x()<br>
><br>
> with ConstantInts. Then run constant folding on the
result and check how<br>
> many GEPs have constant values.<br>
><br>
> Would something like this work or are there
complications I am not <br>
thinking<br>
> of? I'd appreciate any suggestions.<br>
><br>
> P.S i am new to LLVM<br>
><br>
> Thanks in advance,<br>
> Ees<br>
><br>
><br>
> _______________________________________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org"
target="_blank" rel="noreferrer" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
> <a
href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank"
rel="noreferrer" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
<a
href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote>
</div>
</blockquote>
</body>
</html>