<div dir="ltr"><div>Hi all,</div><div><br></div>As part of my research I want to investigate the relation between the grid's geometry and the memory accesses of a kernel in common gpu benchmarks (e.g Rodinia, Polybench etc). As a first step i want to answer the following question:<div><br><div>- Given a kernel function with M possible memory accesses. For how many of those M accesses we can statically infer its location given concrete values for the grid/block and executing thread?</div><div><br></div><div>(Assume CUDA only for now)</div><div><br></div><div>My initial idea is to replace all uses of dim-related values, e.g:</div><div>    __cuda_builtin_blockDim_t::__fetch_builtin_x()<br>    __cuda_builtin_gridDim_t::__fetch_builtin_x()</div><div><br>and index related values, e.g:<br>    __cuda_builtin_blockIdx_t::__fetch_builtin_x()</div><div>    __cuda_builtin_threadIdx_t::__fetch_builtin_x()</div><div><br></div><div>with ConstantInts. Then run constant folding on the result and check how many GEPs have constant values. </div><div><br></div><div>Would something like this work or are there complications I am not thinking of? I'd appreciate any suggestions.</div><div><br></div><div>P.S i am new to LLVM</div><div><br></div><div>Thanks in advance,</div><div>Ees</div><div><br></div><div> </div></div></div>