[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
Johannes Doerfert via llvm-dev
llvm-dev at lists.llvm.org
Mon Mar 30 18:51:53 PDT 2020
On 3/30/20 8:07 PM, Shiva Stanford wrote:
> 1. Thanks for the clarifications. I will stick to non-containerized OS X
> for now.
Sounds good. As long as you can build it and run lit and llvm-test suite
tests :)
> 2. As an aside, I did try to build a Debian docker container by git
cloning
> into it and using the Dockerfile in LLVM/utils/docker as a starting
point:
> - some changes needed to updated packages (GCC in particular needs to be
> latest) and the Debian image (Debian 9 instead of Debian 8) pretty much
> sets up the docker container well. But for some reason, the Ninja build
> tool within the CMake Generator fails. I am looking into it. Maybe I can
> produce a working docker workflow for others who want to build and work
> with LLVM in a container environment.
Feel free to propose a fix but I'm the wrong one to talk to ;)
> 3. I have submitted the final proposal today to GSoC 2020 today after
> incorporating some comments and thoughts. When you all get a chance to
> review, let me know your thoughts.
Good. Can you share the google docs with me
(johannesdoerfert at gmail.com)? [Or did you and I misplaced the link? In
that case send it again ;)]
> 4. On GPU extension, my thoughts were around what an integrated compiler
> like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when GCC is
substituted
> with LLVM and if that arrangement can be optimized for ML passes.
> But I am beginning to think that structuring this problem well and
> doing meaningful work over the summer might be a bit difficult.
As far as I know, neither GCC nor Clang will behave much differently if
they are used by nvcc than in their standalone mode.
Having an "ML-mode" is probably a generic thing to look at. Though, the
"high-level" optimizations are not necessarily performed in LLVM-IR.
> As mentors, do you have any thoughts on how LLVM might be integrated
> into a joint CPU-GPU compiler by the likes of Nvidia, Apple etc.?
I'm unsure what you ask exactly. Clang can be used in CPU-GPU
compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it?
I'm personally mostly interested in generic optimizations in this space
but actually quite interested. Some ideas:
- transfer latency hiding (another GSoC project),
- kernel granularity optimizations (not worked being worked on yet but
requires some infrastructe changes that are as of now still in the
making),
- data "location" tracking so we can "move" computation to the right
device, e.g., for really dependence free loops like `pragma omp loop`
I can list more things but I'm unsure this is the direction you were
thinking.
Cheers,
Johannes
> Best
> Shiva
>
>
>
> On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert <
> johannesdoerfert at gmail.com> wrote:
>
>>
>> On 3/27/20 3:46 PM, Shiva Stanford wrote:
>>> Hi Johannes - great we are engaging on this.
>>>
>>> Some responses now and some later.
>>>
>>> 1. When you say setup LLVM dev environment +. clang + tools etc, do you
>>> mean setup LLVM compiler code from the repo and build it locally?
If so,
>>> yes, this is all done from my end - that is, I have built all this
on my
>>> machine and compiled and run a couple of function passes. I have
look at
>>> some LLVM emits from clang tools but I will familiarize more. I have
>> added
>>> some small code segments, modified CMAKE Lists and re-built code to
get a
>>> feel for the packaging structure. Btw, is there a version of Basel
build
>>> for this? Right now, I am using OS X as the SDK as Apple is the one
that
>>> has adopted LLVM the most. But I can switch to Linux containers to
>>> completely wall off the LLVM build against any OS X system builds to
>>> prevent path obfuscation and truly have a separate address space. Is
>> there
>>> a preferable environment? In any case, I am thinking of containerizing
>> the
>>> build, so OS X system paths don't interfere with include paths -
have you
>>> received feedback from other developers on whether the include paths
>>> interfere with OS X LLVM system build?
>>
>>
>> Setup sounds good.
>>
>> I have never used OS X but people do and I would expect it to be OK.
>>
>> I don't think you need to worry about this right now.
>>
>>
>>> 2. The attributor pass refactoring gives some specific direction as a
>>> startup project - so that's great. Let me study this pass and I
will get
>>> back to you with more questions.
>>
>> Sure.
>>
>>
>>> 3. Yes, I will stick to the style guide (Baaaah...Stanford is strict on
>>> code styling and so are you guys :)) for sure.
>>
>> For better or worse.
>>
>>
>> Cheers,
>>
>> Johannes
>>
>>
>>
>>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <
>>> johannesdoerfert at gmail.com> wrote:
>>>
>>>> Hi Shiva,
>>>>
>>>> apologies for the delayed response.
>>>>
>>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:
>>>> > I am a grad CS student at Stanford and wanted to engage with EJ
>> Park,
>>>> > Giorgis Georgakoudis, Johannes Doerfert to further develop the
>> Machine
>>>> > Learning and Compiler Optimization concept.
>>>>
>>>> Cool!
>>>>
>>>>
>>>> > My background is in machine learning, cluster computing,
distributed
>>>> > systems etc. I am a good C/C++ developer and have a strong
>> background in
>>>> > algorithms and data structure.
>>>>
>>>> Sounds good.
>>>>
>>>>
>>>> > I am also taking an advanced compiler course this quarter at
>>>> Stanford. So I
>>>> > would be studying several of these topics anyways - so I thought I
>>>> might as
>>>> > well co-engage on the LLVM compiler infra project.
>>>>
>>>> Agreed ;)
>>>>
>>>>
>>>> > I am currently studying the background information on SCC Call
>> Graphs,
>>>> > Dominator Trees and other Global and inter-procedural analysis to
>> lay
>>>> some
>>>> > ground work on how to tackle this optimization pass using ML
models.
>>>> I have
>>>> > run a couple of all program function passes and visualized call
>> graphs
>>>> to
>>>> > get familiarized with the LLVM optimization pass setup. I have
also
>>>> setup
>>>> > and learnt the use of GDB to debug function pass code.
>>>>
>>>> Very nice.
>>>>
>>>>
>>>> > I have submitted the ML and Compiler Optimization proposal to GSOC
>>>> 2020. I
>>>> > have added an additional feature to enhance the ML optimization to
>>>> include
>>>> > crossover code to GPU and investigate how the function call graphs
>> can
>>>> be
>>>> > visualized as SCCs across CPU and GPU implementations. If the
>>>> extension to
>>>> > GPU is too much for a summer project, potentially we can focus on
>>>> > developing a framework for studying SCCs across a unified CPU, GPU
>> setup
>>>> > and leave the coding, if feasible, to next Summer. All preliminary
>>>> ideas.
>>>>
>>>> I haven't looked at the proposals yet (I think we can only after the
>>>> deadline). TBH, I'm not sure I fully understand your extension. Also,
>>>> full disclosure, the project is pretty open-ended from my side at
least.
>>>> I do not necessarily believe we (=llvm) is ready for a ML driven
pass or
>>>> even inference in practice. What I want is to explore the use of ML to
>>>> improve the code we have, especially heuristics. We build analysis and
>>>> transformations but it is hard to combine them in a way that balances
>>>> compile-time, code-size, and performance.
>>>>
>>>> Some high-level statements that might help to put my view into
>>>> perspective:
>>>>
>>>> I want to use ML to identify patterns and code features that we can
>>>> check for using common techniques but when we base our decision making
>>>> on these patterns or features we achieve better compile-time,
code-size,
>>>> and/or performance.
>>>> I want to use ML to identify shortcomings in our existing heuristics,
>>>> e.g. transformation cut-off values or pass schedules. This could also
>>>> mean to identify alternative (combination of) values that perform
>>>> substantially better (on some inputs).
>>>>
>>>>
>>>> > Not sure how to proceed from here. Hence my email to this list.
>>>> Please let
>>>> > me know.
>>>>
>>>> The email to the list was a great first step. The next one usually
is to
>>>> setup an LLVM development and testing environment, thus LLVM + Clang +
>>>> LLVM-Test Suite that you can use. It is also advised to work on a
small
>>>> task before the GSoC to get used to the LLVM development.
>>>>
>>>> I don't have a really small ML "coding" task handy right now but the
>>>> project is more about experiments anyway. To get some LLVM development
>>>> experience we can just take a small task in the IPO Attributor pass.
>>>>
>>>> One thing we need and we don't have is data. The Attributor is a
>>>> fixpoint iteration framework so the number of iterations is pretty
>>>> integral part. We have a statistics counter to determine if the number
>>>> required was higher than the given threshold but not one to determine
>>>> the maximum iteration count required during compilation. It would be
>>>> great if you could add that, thus a statistics counter that shows how
>>>> many iterations where required until a fixpoint was found across all
>>>> invocations of the Attributor. Does this make sense? Let me know what
>>>> you think and feel free to ask questions via email or on IRC.
>>>>
>>>> Cheers,
>>>> Johannes
>>>>
>>>> P.S. Check out the coding style guide and the how to contribute guide!
>>>>
>>>>
>>>> > Thank you
>>>> > Shiva Badruswamy
>>>> > shivastanford at gmail.com
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > LLVM Developers mailing list
>>>> > llvm-dev at lists.llvm.org
>>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>
>
More information about the llvm-dev
mailing list