<div dir="ltr"><div>CUDA/GPU programs are written for a SIMT SIMD model, which means single instruction, multiple threads and multiple data. Programmers write a single program in such a way that each thread would execute it with different data. So, a program is one physical copy but virtually it's run by several threads so those grid/thread IDs are really meant for semantics of the program. You can't replace thread specific variables with one thread ID. <br></div><div><br></div><div>Hence, I don't think what you're proposing would have much applicability in real-world benchmarks like Rodinia.If you have a strong motivating example then please provide a counter argument but in my experience, it won't be much useful. <br></div><div><br></div><div>In some corner cases, it would be useful but those would be a general case of uniform code blocks.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Aug 22, 2020 at 8:09 PM Ees Kee via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi all,</div><div><br></div>As part of my research I want to investigate the relation between the grid's geometry and the memory accesses of a kernel in common gpu benchmarks (e.g Rodinia, Polybench etc). As a first step i want to answer the following question:<div><br><div>- Given a kernel function with M possible memory accesses. For how many of those M accesses we can statically infer its location given concrete values for the grid/block and executing thread?</div><div><br></div><div>(Assume CUDA only for now)</div><div><br></div><div>My initial idea is to replace all uses of dim-related values, e.g:</div><div>    __cuda_builtin_blockDim_t::__fetch_builtin_x()<br>    __cuda_builtin_gridDim_t::__fetch_builtin_x()</div><div><br>and index related values, e.g:<br>    __cuda_builtin_blockIdx_t::__fetch_builtin_x()</div><div>    __cuda_builtin_threadIdx_t::__fetch_builtin_x()</div><div><br></div><div>with ConstantInts. Then run constant folding on the result and check how many GEPs have constant values. </div><div><br></div><div>Would something like this work or are there complications I am not thinking of? I'd appreciate any suggestions.</div><div><br></div><div>P.S i am new to LLVM</div><div><br></div><div>Thanks in advance,</div><div>Ees</div><div><br></div><div> </div></div></div>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><i style="font-size:12.8px">Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this mail are of my own and my employer has no take in it. </i><br></div><div>Thank You.<br>Madhur D. Amilkanthwar<br><br></div></div></div>