[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

Mon Mar 30 18:51:53 PDT 2020

On 3/30/20 8:07 PM, Shiva Stanford wrote:
 > 1. Thanks for the clarifications. I will stick to non-containerized OS X
 > for now.

Sounds good. As long as you can build it and run lit and llvm-test suite
tests :)

 > 2. As an aside, I did try to build a Debian docker container by git 
cloning
 > into it and using the Dockerfile in LLVM/utils/docker as a starting 
point:
 >  - some changes needed to updated packages (GCC in particular needs to be
 > latest) and the Debian image (Debian 9 instead of Debian 8) pretty much
 > sets up the docker container well. But for some reason, the Ninja build
 > tool within the CMake Generator fails. I am looking into it. Maybe I can
 > produce a working docker workflow for others who want to build and work
 > with LLVM in a container environment.

Feel free to propose a fix but I'm the wrong one to talk to ;)

 > 3. I have submitted the final proposal today to GSoC 2020 today after
 > incorporating some comments and thoughts. When you all get a chance to
 > review, let me know your thoughts.

Good. Can you share the google docs with me
(johannesdoerfert at gmail.com)? [Or did you and I misplaced the link? In
that case send it again ;)]

 > 4. On GPU extension, my thoughts were around what an integrated compiler
 > like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when GCC is 
substituted
 > with LLVM and if that arrangement can be optimized for ML passes.
 > But I am beginning to think that structuring this problem well and
 > doing meaningful work over the summer might be a bit difficult.

As far as I know, neither GCC nor Clang will behave much differently if
they are used by nvcc than in their standalone mode.

Having an "ML-mode" is probably a generic thing to look at. Though, the
"high-level" optimizations are not necessarily performed in LLVM-IR.

 > As mentors, do you have any thoughts on how LLVM might be integrated
 > into a joint CPU-GPU compiler by the likes of Nvidia, Apple etc.?

I'm unsure what you ask exactly. Clang can be used in CPU-GPU
compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it?
I'm personally mostly interested in generic optimizations in this space
but actually quite interested. Some ideas:
  - transfer latency hiding (another GSoC project),
  - kernel granularity optimizations (not worked being worked on yet but
    requires some infrastructe changes that are as of now still in the
    making),
  - data "location" tracking so we can "move" computation to the right
    device, e.g., for really dependence free loops like `pragma omp loop`

I can list more things but I'm unsure this is the direction you were
thinking.

Cheers,
   Johannes

 > Best
 > Shiva
 >
 >
 >
 > On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert <
 > johannesdoerfert at gmail.com> wrote:
 >
 >>
 >> On 3/27/20 3:46 PM, Shiva Stanford wrote:
 >>> Hi Johannes - great we are engaging on this.
 >>>
 >>> Some responses now and some later.
 >>>
 >>> 1. When you say setup LLVM dev environment +. clang + tools etc, do you
 >>> mean setup LLVM compiler code from the repo and build it locally? 
If so,
 >>> yes, this is all done from my end - that is, I have built all this 
on my
 >>> machine and compiled and run a couple of function passes. I have 
look at
 >>> some LLVM emits from clang tools but I will familiarize more. I have
 >> added
 >>> some small code segments, modified CMAKE Lists and re-built code to 
get a
 >>> feel for the packaging  structure. Btw, is there a version of Basel 
build
 >>> for this? Right now, I am using OS X as the SDK as Apple is the one 
that
 >>> has adopted LLVM the most. But I can switch to Linux containers to
 >>> completely wall off the LLVM build against any OS X system builds to
 >>> prevent path obfuscation and truly have a separate address space. Is
 >> there
 >>> a preferable environment? In any case, I am thinking of containerizing
 >> the
 >>> build, so OS X system paths don't interfere with include paths - 
have you
 >>> received feedback from other developers on whether the include paths
 >>> interfere with OS X LLVM system build?
 >>
 >>
 >> Setup sounds good.
 >>
 >> I have never used OS X but people do and I would expect it to be OK.
 >>
 >> I don't think you need to worry about this right now.
 >>
 >>
 >>> 2. The attributor pass refactoring gives some specific direction as a
 >>> startup project - so that's great. Let me study this pass and I 
will get
 >>> back to you with more questions.
 >>
 >> Sure.
 >>
 >>
 >>> 3. Yes, I will stick to the style guide (Baaaah...Stanford is strict on
 >>> code styling and so are you guys :)) for sure.
 >>
 >> For better or worse.
 >>
 >>
 >> Cheers,
 >>
 >>    Johannes
 >>
 >>
 >>
 >>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <
 >>> johannesdoerfert at gmail.com> wrote:
 >>>
 >>>> Hi Shiva,
 >>>>
 >>>> apologies for the delayed response.
 >>>>
 >>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:
 >>>>   > I am a grad CS student at Stanford and wanted to engage with EJ
 >> Park,
 >>>>   > Giorgis Georgakoudis, Johannes Doerfert to further develop the
 >> Machine
 >>>>   > Learning and Compiler Optimization concept.
 >>>>
 >>>> Cool!
 >>>>
 >>>>
 >>>>   > My background is in machine learning, cluster computing, 
distributed
 >>>>   > systems etc. I am a good C/C++ developer and have a strong
 >> background in
 >>>>   > algorithms and data structure.
 >>>>
 >>>> Sounds good.
 >>>>
 >>>>
 >>>>   > I am also taking an advanced compiler course this quarter at
 >>>> Stanford. So I
 >>>>   > would be studying several of these topics anyways - so I thought I
 >>>> might as
 >>>>   > well co-engage on the LLVM compiler infra project.
 >>>>
 >>>> Agreed ;)
 >>>>
 >>>>
 >>>>   > I am currently studying the background information on SCC Call
 >> Graphs,
 >>>>   > Dominator Trees and other Global and inter-procedural analysis to
 >> lay
 >>>> some
 >>>>   > ground work on how to tackle this optimization pass using ML 
models.
 >>>> I have
 >>>>   > run a couple of all program function passes and visualized call
 >> graphs
 >>>> to
 >>>>   > get familiarized with the LLVM optimization pass setup. I have 
also
 >>>> setup
 >>>>   > and learnt the use of GDB to debug function pass code.
 >>>>
 >>>> Very nice.
 >>>>
 >>>>
 >>>>   > I have submitted the ML and Compiler Optimization proposal to GSOC
 >>>> 2020. I
 >>>>   > have added an additional feature to enhance the ML optimization to
 >>>> include
 >>>>   > crossover code to GPU and investigate how the function call graphs
 >> can
 >>>> be
 >>>>   > visualized as SCCs across CPU and GPU implementations. If the
 >>>> extension to
 >>>>   > GPU is too much for a summer project, potentially we can focus on
 >>>>   > developing a framework for studying SCCs across a unified CPU, GPU
 >> setup
 >>>>   > and leave the coding, if feasible, to next Summer. All preliminary
 >>>> ideas.
 >>>>
 >>>> I haven't looked at the proposals yet (I think we can only after the
 >>>> deadline). TBH, I'm not sure I fully understand your extension. Also,
 >>>> full disclosure, the project is pretty open-ended from my side at 
least.
 >>>> I do not necessarily believe we (=llvm) is ready for a ML driven 
pass or
 >>>> even inference in practice. What I want is to explore the use of ML to
 >>>> improve the code we have, especially heuristics. We build analysis and
 >>>> transformations but it is hard to combine them in a way that balances
 >>>> compile-time, code-size, and performance.
 >>>>
 >>>> Some high-level statements that might help to put my view into
 >>>> perspective:
 >>>>
 >>>> I want to use ML to identify patterns and code features that we can
 >>>> check for using common techniques but when we base our decision making
 >>>> on these patterns or features we achieve better compile-time, 
code-size,
 >>>> and/or performance.
 >>>> I want to use ML to identify shortcomings in our existing heuristics,
 >>>> e.g. transformation cut-off values or pass schedules. This could also
 >>>> mean to identify alternative (combination of) values that perform
 >>>> substantially better (on some inputs).
 >>>>
 >>>>
 >>>>   > Not sure how to proceed from here. Hence my email to this list.
 >>>> Please let
 >>>>   > me know.
 >>>>
 >>>> The email to the list was a great first step. The next one usually 
is to
 >>>> setup an LLVM development and testing environment, thus LLVM + Clang +
 >>>> LLVM-Test Suite that you can use. It is also advised to work on a 
small
 >>>> task before the GSoC to get used to the LLVM development.
 >>>>
 >>>> I don't have a really small ML "coding" task handy right now but the
 >>>> project is more about experiments anyway. To get some LLVM development
 >>>> experience we can just take a small task in the IPO Attributor pass.
 >>>>
 >>>> One thing we need and we don't have is data. The Attributor is a
 >>>> fixpoint iteration framework so the number of iterations is pretty
 >>>> integral part. We have a statistics counter to determine if the number
 >>>> required was higher than the given threshold but not one to determine
 >>>> the maximum iteration count required during compilation. It would be
 >>>> great if you could add that, thus a statistics counter that shows how
 >>>> many iterations where required until a fixpoint was found across all
 >>>> invocations of the Attributor. Does this make sense? Let me know what
 >>>> you think and feel free to ask questions via email or on IRC.
 >>>>
 >>>> Cheers,
 >>>>     Johannes
 >>>>
 >>>> P.S. Check out the coding style guide and the how to contribute guide!
 >>>>
 >>>>
 >>>>   > Thank you
 >>>>   > Shiva Badruswamy
 >>>>   > shivastanford at gmail.com
 >>>>   >
 >>>>   >
 >>>>   > _______________________________________________
 >>>>   > LLVM Developers mailing list
 >>>>   > llvm-dev at lists.llvm.org
 >>>>   > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
 >>>>
 >>>>
 >>
 >