[llvm-dev] Static Analysis for GPU Program Performance in LLVM

Wed Dec 1 23:05:28 PST 2021

Thank you Jason and Madhur for your remarks. They are very helpful. I am
working on cleaning up the code base and porting the code into an LLVM tool.

I quickly wanted to check if you know of a tutorial or a reference tool
that could guide me on creating a standalone tool. I would very much
appreciate your help.

Kind regards,
Nimit

On Tue, Nov 16, 2021 at 7:44 PM Madhur Amilkanthwar <madhur13490 at gmail.com>
wrote:

> Hi Nimit,
> I am happy to see that the work related to uncoalesced access is becoming
> mainstream and targeting LLVM. When we started working on this back in 2013
> (during my masters), we had just a small codebase and we never upstreamed
> it. Of course, our algorithm was the first attempt to detect uncoalesced
> accesses and there has been a lot of intellectual progress on this topic
> since then.
>
> I am happy to see that you have cited our work in your paper :)
>
> I second Jason's thoughts and specifically about the below
>
> >I wonder if it makes sense to promote the GPU Drano static analysis to a
> full-fledged LLVM tool (e.g., llvm/tools/llvm-gpudrano) instead of manually
> running a series of clang+opt steps? That might make it a bit more
> convenient to use. Even though the tool is today CUDA specific, it could
> have a target flag where only a NVIDIA/CUDA value is recognized and
> implemented. Just a thought and totally optional.
>
> It makes sense to have this as a standalone tool. It will
> definitely remove barriers to access the tool.
>
> I would love to review the code when you create a patch!
>
>
>
> On Wed, Nov 17, 2021 at 7:24 AM Jason Eckhardt via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Nimit,
>>
>> This is interesting and promising work! I haven't seen anyone else
>> respond yet, so I'll just give a few "pre-review" observations/suggestions
>> that I noticed. As a first step, and if you haven't already, take a look
>> at https://llvm.org/docs/DeveloperPolicy.html and
>> https://llvm.org/docs/GettingStarted.html#sending-patches for concrete
>> information and steps for submitting your patch.
>>
>>    1. Reviewers will ask you to format your code according to LLVM
>>    Coding Standards. I see a few issues, such as some non-standard variable
>>    naming and code formatting. Since this is new code, you can perform blanket
>>    "clang-format" and "clang-tidy" runs over all the code to resolve most
>>    issues automatically before submitting your patch (or at least before
>>    committing).
>>    2. This work is billed in a way that sounds like a generic "GPU"
>>    analysis. However, as written today, it is CUDA/NVIDIA specific. For
>>    example, UncoalescedAnalysis::ExecuteInstruction references NVVM intrinsics
>>    (e.g., llvm.nvvm.read.ptx.sreg.tid.x).
>>    Also, UncoalescedAnalysis::getConstantExprValue hardcodes NVIDIA-specific
>>    address spaces 3 and 4 for Shared and Constant, respectively (hardcoding
>>    those values is itself another issue). That said, much of the code is
>>    generic. Probably what makes sense here is to isolate the above instances
>>    (and any other target dependences) into target-specific hooks.
>>    TargetTransformInfo might be the right place for these (see similar APIs
>>    like TTI::isSourceOfDivergence). In this way, the bulk of the code remains
>>    target-independent, and the various GPUs only need to implement their
>>    (hopefully very small) specific hooks/overrides to utilize the analyses. Of
>>    course, there is still the external issues-- such as what specific vendor
>>    tools/libraries are needed, and that would be documented separately by each
>>    GPU target.
>>    3. There don't seem to be any simple unit/regression ("lit") tests. I
>>    do see that there are the benchmark directories NVIDIA_CUDA-8.0_Samples and
>>    rodinia_3.1, but these aren't suitable unit tests. In fact, those might be
>>    good to add to the LLVM "test suite" (see
>>    https://llvm.org/docs/TestSuiteGuide.html). Instead, patches are
>>    usually required to have some way to unit-test them. Often these are small
>>    IR tests. In some cases, C++ test cases are used when it isn't feasible to
>>    test via IR (testing, say, operations on a new ADT). See,
>>    e.g., llvm/test/Analysis for examples of testing analysis passes.
>>    4. I wonder if it makes sense to promote the GPU Drano static
>>    analysis to a full-fledged LLVM tool (e.g., llvm/tools/llvm-gpudrano)
>>    instead of manually running a series of clang+opt steps? That might make it
>>    a bit more convenient to use. Even though the tool is today CUDA specific,
>>    it could have a target flag where only a NVIDIA/CUDA value is recognized
>>    and implemented. Just a thought and totally optional.
>>    5. Adding some documentation would be useful. At the minimum, you
>>    might add a paragraph to llvm/docs/Passes.rst. But a more substantial
>>    write-up (like the one for llvm/docs/Vectorizers.rst) would be even better.
>>    6.
>>    7.
>>
>> ------------------------------
>> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Nimit
>> Singhania via llvm-dev <llvm-dev at lists.llvm.org>
>> *Sent:* Monday, October 25, 2021 2:50 PM
>> *To:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
>> *Subject:* [llvm-dev] Static Analysis for GPU Program Performance in LLVM
>>
>> *External email: Use caution opening links or attachments*
>> Hi there,
>>
>> I would like to propose addition of two static analyses to LLVM framework
>> that can help detect performance issues in GPU programs: The first analysis
>> directly detects the issue with memory congestion across GPU threads; the
>> second analysis checks independence for block-size for synchronization-free
>> programs that allows performance tuning of block-size without impacting
>> correctness. Both these static analyses were developed as part of my PhD
>> thesis and are available on github. Please see the link here to see more
>> details:
>>
>> https://github.com/upenn-acg/gpudrano-static-analysis_v1.0
>>
>> We would like to upstream these analyses to LLVM. There are many
>> advantages of the work. These are ground-breaking analyses that allow
>> light-weight compile-time detection of performance and correctness issues
>> in GPU programs that concern *inter-thread *behavior. Being light-weight
>> allows them to operate efficiently at compile-time. Inter-thread behavior
>> of the program concerns the behaviors of the program that are observed due
>> to the interaction between threads and are not local to individual threads.
>> Such analysis is difficult to perform in a generic multi-threaded program,
>> however due to the regularity of GPU parallelism, the analyses are feasible
>> at compile-time.
>>
>> These analyses can be the basis for optimizations that can improve the
>> performance of GPU programs multifold. Given the complexity of GPU
>> programming and the lack of support for tools in this space, the analyses
>> provide the first steps towards robust tools for analysis and optimization
>> of GPU programs. There are two publications that have been published for
>> this work, which can be found in the references below. I would be happy to
>> answer any questions or concerns regarding this work.
>>
>> Regards,
>> Nimit
>>
>> References:
>> 1. FMSD 2021: Static analysis for detecting uncoalesced accesses in GPU
>> programs, Rajeev Alur, Joseph Devietti, Omar Navarro Leija, and Nimit
>> Singhania.
>> 2. SAS 2018: Block-Size Independence of GPU Programs, Rajeev Alur, Joseph
>> Devietti, and Nimit Singhania.
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> --
> *Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this
> mail are of my own and my employer has no take in it. *
> Thank You.
> Madhur D. Amilkanthwar
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211201/df28a04e/attachment.html>