[llvm-dev] Static Analysis for GPU Program Performance in LLVM
Nimit Singhania via llvm-dev
llvm-dev at lists.llvm.org
Wed Dec 1 23:05:28 PST 2021
Thank you Jason and Madhur for your remarks. They are very helpful. I am
working on cleaning up the code base and porting the code into an LLVM tool.
I quickly wanted to check if you know of a tutorial or a reference tool
that could guide me on creating a standalone tool. I would very much
appreciate your help.
Kind regards,
Nimit
On Tue, Nov 16, 2021 at 7:44 PM Madhur Amilkanthwar <madhur13490 at gmail.com>
wrote:
> Hi Nimit,
> I am happy to see that the work related to uncoalesced access is becoming
> mainstream and targeting LLVM. When we started working on this back in 2013
> (during my masters), we had just a small codebase and we never upstreamed
> it. Of course, our algorithm was the first attempt to detect uncoalesced
> accesses and there has been a lot of intellectual progress on this topic
> since then.
>
> I am happy to see that you have cited our work in your paper :)
>
> I second Jason's thoughts and specifically about the below
>
> >I wonder if it makes sense to promote the GPU Drano static analysis to a
> full-fledged LLVM tool (e.g., llvm/tools/llvm-gpudrano) instead of manually
> running a series of clang+opt steps? That might make it a bit more
> convenient to use. Even though the tool is today CUDA specific, it could
> have a target flag where only a NVIDIA/CUDA value is recognized and
> implemented. Just a thought and totally optional.
>
> It makes sense to have this as a standalone tool. It will
> definitely remove barriers to access the tool.
>
> I would love to review the code when you create a patch!
>
>
>
> On Wed, Nov 17, 2021 at 7:24 AM Jason Eckhardt via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Nimit,
>>
>> This is interesting and promising work! I haven't seen anyone else
>> respond yet, so I'll just give a few "pre-review" observations/suggestions
>> that I noticed. As a first step, and if you haven't already, take a look
>> at https://llvm.org/docs/DeveloperPolicy.html and
>> https://llvm.org/docs/GettingStarted.html#sending-patches for concrete
>> information and steps for submitting your patch.
>>
>> 1. Reviewers will ask you to format your code according to LLVM
>> Coding Standards. I see a few issues, such as some non-standard variable
>> naming and code formatting. Since this is new code, you can perform blanket
>> "clang-format" and "clang-tidy" runs over all the code to resolve most
>> issues automatically before submitting your patch (or at least before
>> committing).
>> 2. This work is billed in a way that sounds like a generic "GPU"
>> analysis. However, as written today, it is CUDA/NVIDIA specific. For
>> example, UncoalescedAnalysis::ExecuteInstruction references NVVM intrinsics
>> (e.g., llvm.nvvm.read.ptx.sreg.tid.x).
>> Also, UncoalescedAnalysis::getConstantExprValue hardcodes NVIDIA-specific
>> address spaces 3 and 4 for Shared and Constant, respectively (hardcoding
>> those values is itself another issue). That said, much of the code is
>> generic. Probably what makes sense here is to isolate the above instances
>> (and any other target dependences) into target-specific hooks.
>> TargetTransformInfo might be the right place for these (see similar APIs
>> like TTI::isSourceOfDivergence). In this way, the bulk of the code remains
>> target-independent, and the various GPUs only need to implement their
>> (hopefully very small) specific hooks/overrides to utilize the analyses. Of
>> course, there is still the external issues-- such as what specific vendor
>> tools/libraries are needed, and that would be documented separately by each
>> GPU target.
>> 3. There don't seem to be any simple unit/regression ("lit") tests. I
>> do see that there are the benchmark directories NVIDIA_CUDA-8.0_Samples and
>> rodinia_3.1, but these aren't suitable unit tests. In fact, those might be
>> good to add to the LLVM "test suite" (see
>> https://llvm.org/docs/TestSuiteGuide.html). Instead, patches are
>> usually required to have some way to unit-test them. Often these are small
>> IR tests. In some cases, C++ test cases are used when it isn't feasible to
>> test via IR (testing, say, operations on a new ADT). See,
>> e.g., llvm/test/Analysis for examples of testing analysis passes.
>> 4. I wonder if it makes sense to promote the GPU Drano static
>> analysis to a full-fledged LLVM tool (e.g., llvm/tools/llvm-gpudrano)
>> instead of manually running a series of clang+opt steps? That might make it
>> a bit more convenient to use. Even though the tool is today CUDA specific,
>> it could have a target flag where only a NVIDIA/CUDA value is recognized
>> and implemented. Just a thought and totally optional.
>> 5. Adding some documentation would be useful. At the minimum, you
>> might add a paragraph to llvm/docs/Passes.rst. But a more substantial
>> write-up (like the one for llvm/docs/Vectorizers.rst) would be even better.
>> 6.
>> 7.
>>
>> ------------------------------
>> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Nimit
>> Singhania via llvm-dev <llvm-dev at lists.llvm.org>
>> *Sent:* Monday, October 25, 2021 2:50 PM
>> *To:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
>> *Subject:* [llvm-dev] Static Analysis for GPU Program Performance in LLVM
>>
>> *External email: Use caution opening links or attachments*
>> Hi there,
>>
>> I would like to propose addition of two static analyses to LLVM framework
>> that can help detect performance issues in GPU programs: The first analysis
>> directly detects the issue with memory congestion across GPU threads; the
>> second analysis checks independence for block-size for synchronization-free
>> programs that allows performance tuning of block-size without impacting
>> correctness. Both these static analyses were developed as part of my PhD
>> thesis and are available on github. Please see the link here to see more
>> details:
>>
>> https://github.com/upenn-acg/gpudrano-static-analysis_v1.0
>>
>> We would like to upstream these analyses to LLVM. There are many
>> advantages of the work. These are ground-breaking analyses that allow
>> light-weight compile-time detection of performance and correctness issues
>> in GPU programs that concern *inter-thread *behavior. Being light-weight
>> allows them to operate efficiently at compile-time. Inter-thread behavior
>> of the program concerns the behaviors of the program that are observed due
>> to the interaction between threads and are not local to individual threads.
>> Such analysis is difficult to perform in a generic multi-threaded program,
>> however due to the regularity of GPU parallelism, the analyses are feasible
>> at compile-time.
>>
>> These analyses can be the basis for optimizations that can improve the
>> performance of GPU programs multifold. Given the complexity of GPU
>> programming and the lack of support for tools in this space, the analyses
>> provide the first steps towards robust tools for analysis and optimization
>> of GPU programs. There are two publications that have been published for
>> this work, which can be found in the references below. I would be happy to
>> answer any questions or concerns regarding this work.
>>
>> Regards,
>> Nimit
>>
>> References:
>> 1. FMSD 2021: Static analysis for detecting uncoalesced accesses in GPU
>> programs, Rajeev Alur, Joseph Devietti, Omar Navarro Leija, and Nimit
>> Singhania.
>> 2. SAS 2018: Block-Size Independence of GPU Programs, Rajeev Alur, Joseph
>> Devietti, and Nimit Singhania.
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> --
> *Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this
> mail are of my own and my employer has no take in it. *
> Thank You.
> Madhur D. Amilkanthwar
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211201/df28a04e/attachment.html>
More information about the llvm-dev
mailing list