<div dir="ltr">Hi Nimit,<div>I am happy to see that the work related to uncoalesced access is becoming mainstream and targeting LLVM. When we started working on this back in 2013 (during my masters), we had just a small codebase and we never upstreamed it. Of course, our algorithm was the first attempt to detect uncoalesced accesses and there has been a lot of intellectual progress on this topic since then.</div><div><br></div><div>I am happy to see that you have cited our work in your paper :)</div><div><br></div><div>I second Jason's thoughts and specifically about the below </div><div><br></div><div>>I wonder if it makes sense to promote the GPU Drano static analysis to a full-fledged LLVM tool (e.g., llvm/tools/llvm-gpudrano) instead of manually running a series of clang+opt steps? That might make it a bit more convenient to use. Even though the tool is today CUDA specific, it could have a target flag where only a NVIDIA/CUDA value is recognized and implemented. Just a thought and totally optional.</div><div><br></div><div>It makes sense to have this as a standalone tool. It will definitely remove barriers to access the tool.</div><div><br></div><div>I would love to review the code when you create a patch!</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Nov 17, 2021 at 7:24 AM Jason Eckhardt via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<div dir="ltr">

<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

Hi Nimit,</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

<br>

</div>

<div>This is interesting and promising work! I haven't seen anyone else respond yet, so I'll just give a few "pre-review" observations/suggestions that I noticed. As a first step, and if you haven't already,

<span>take a look at <a href="https://llvm.org/docs/DeveloperPolicy.html" id="gmail-m_6345632949188743765LPNoLPOWALinkPreview" target="_blank">

https://llvm.org/docs/DeveloperPolicy.html</a> and <a href="https://llvm.org/docs/GettingStarted.html#sending-patches" id="gmail-m_6345632949188743765LPNoLPOWALinkPreview_1" target="_blank">

https://llvm.org/docs/GettingStarted.html#sending-patches</a> for concrete information and steps for submitting your patch.<br>

</span></div>

<div>

<ol>

<li><span>Reviewers will ask you to format your code according to LLVM Coding Standards. I see a few issues, such as some non-standard variable naming and code formatting. Since this is new code, you can perform blanket "clang-format" and "clang-tidy" runs

 over all the code to resolve most issues automatically before submitting your patch (or at least before committing).</span></li><li><span>This work is billed in a way that sounds like a generic "GPU" analysis. However, as written today, it is CUDA/NVIDIA specific. For example, UncoalescedAnalysis::ExecuteInstruction references NVVM intrinsics (e.g., llvm.nvvm.read.ptx.sreg.tid.x). Also, UncoalescedAnalysis::getConstantExprValue

 hardcodes NVIDIA-specific address spaces 3 and 4 for Shared and Constant, respectively (hardcoding those values is itself another issue). That said, much of the code is generic. Probably what makes sense here is to isolate the above instances (and any other

 target dependences) into target-specific hooks. TargetTransformInfo might be the right place for these (see similar APIs like TTI::isSourceOfDivergence). In this way, the bulk of the code remains target-independent, and the various GPUs only need to implement

 their (hopefully very small) specific hooks/overrides to utilize the analyses. Of course, there is still the external issues-- such as what specific vendor tools/libraries are needed, and that would be documented separately by each GPU target.</span></li><li><span>There don't seem to be any simple unit/regression ("lit") tests. I do see that there are the benchmark directories NVIDIA_CUDA-8.0_Samples and rodinia_3.1, but these aren't suitable unit tests. In fact, those might be good to add to the LLVM "test

 suite" (see <a href="https://llvm.org/docs/TestSuiteGuide.html" id="gmail-m_6345632949188743765LPNoLPOWALinkPreview" target="_blank">

https://llvm.org/docs/TestSuiteGuide.html</a>). Instead, patches are usually required to have some way to unit-test them. Often these are small IR tests. In some cases, C++ test cases are used when it isn't feasible to test via IR (testing, say, operations

 on a new ADT). See, e.g., llvm/test/Analysis for examples of testing analysis passes.</span></li><li><span>I wonder if it makes sense to promote the GPU Drano static analysis to a full-fledged LLVM tool (e.g., llvm/tools/llvm-gpudrano) instead of manually running a series of clang+opt steps? That might make it a bit more convenient to use. Even though

 the tool is today CUDA specific, it could have a target flag where only a NVIDIA/CUDA value is recognized and implemented. Just a thought and totally optional.<br>

</span></li><li><span>Adding some documentation would be useful. At the minimum, you might add a paragraph to llvm/docs/Passes.rst. But a more substantial write-up (like the one for llvm/docs/Vectorizers.rst) would be even better.<br>

</span></li><li style="display:block">

<div></div>

</li><li style="display:block"><br>

</li></ol>

</div>

<div id="gmail-m_6345632949188743765appendonsend"></div>

<hr style="display:inline-block;width:98%">

<div id="gmail-m_6345632949188743765divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> llvm-dev <<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>> on behalf of Nimit Singhania via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>><br>

<b>Sent:</b> Monday, October 25, 2021 2:50 PM<br>

<b>To:</b> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a> <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>><br>

<b>Subject:</b> [llvm-dev] Static Analysis for GPU Program Performance in LLVM</font>

<div> </div>

</div>

<div>

<table bgcolor="#FFEB9C" border="1">

<tbody>

<tr>

<td><font face="verdana" color="black" size="1"><b>External email: Use caution opening links or attachments</b>

</font></td>

</tr>

</tbody>

</table>

<br>

<div>

<div dir="ltr">

<div>Hi there,</div>

<div><br>

</div>

<div>I would like to propose addition of two static analyses to LLVM framework that can help detect performance issues in GPU programs: The first analysis directly detects the issue with memory congestion across GPU threads; the second analysis checks independence

 for block-size for synchronization-free programs that allows performance tuning of block-size without impacting correctness. Both these static analyses were developed as part of my PhD thesis and are available on github. Please see the link here to see more

 details:</div>

<div><br>

</div>

<div><a href="https://github.com/upenn-acg/gpudrano-static-analysis_v1.0" target="_blank">https://github.com/upenn-acg/gpudrano-static-analysis_v1.0</a></div>

<div><br>

</div>

<div>We would like to upstream these analyses to LLVM. There are many advantages of the work. These are ground-breaking analyses that allow light-weight compile-time detection of performance and correctness issues in GPU programs that concern

<i>inter-thread </i>behavior. Being light-weight allows them to operate efficiently at compile-time. Inter-thread behavior of the program concerns the behaviors of the program that are observed due to the interaction between threads and are not local to individual

 threads. Such analysis is difficult to perform in a generic multi-threaded program, however due to the regularity of GPU parallelism, the analyses are feasible at compile-time.</div>

<div><br>

</div>

<div>These analyses can be the basis for optimizations that can improve the performance of GPU programs multifold. Given the complexity of GPU programming and the lack of support for tools in this space, the analyses provide the first steps towards robust tools

 for analysis and optimization of GPU programs. There are two publications that have been published for this work, which can be found in the references below. I would be happy to answer any questions or concerns regarding this work.</div>

<div><br>

</div>

<div>Regards,</div>

<div>Nimit</div>

<div><br>

</div>

<div>References:<br>

1. FMSD 2021: Static analysis for detecting uncoalesced accesses in GPU programs, Rajeev Alur, Joseph Devietti, Omar Navarro Leija, and Nimit Singhania.</div>

<div>2. SAS 2018: Block-Size Independence of GPU Programs, Rajeev Alur, Joseph Devietti, and Nimit Singhania.

</div>

</div>

</div>

</div>

</div>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><i style="font-size:12.8px">Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this mail are of my own and my employer has no take in it. </i><br></div><div>Thank You.<br>Madhur D. Amilkanthwar<br><br></div></div></div>