[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations

Tue Mar 31 07:22:23 PDT 2020

Hi Johannes at al:

1. I could not see anywhere that GSoC 2020 requires a timeline in the
proposal. If anything they have laid down their own timelines which meets
with my summer break, starting in June.

2. Here's their timeline link:
https://developers.google.com/open-source/gsoc/timeline

Best
Shiva

On Tue, Mar 31, 2020 at 3:22 AM Shiva Stanford <shivastanford at gmail.com>
wrote:

> 1. Draft proposals via gdoc. Final via PDF.
> 2. I did not see any timeline requests from GSoC but spring quarter ends
> June 6 or so or maybe by a week more due to Coronavirus schedule delays.
> Summer begins then. I will look into it some more in the morning and see
> what I can add to timelines.
>
> Thanks.
>
> On Mon, Mar 30, 2020 at 11:43 PM Johannes Doerfert <
> johannesdoerfert at gmail.com> wrote:
>
>>
>> On 3/30/20 9:28 PM, Shiva Stanford wrote:
>>  > Hi Johannes:
>>  >
>>  > 1. Attached is the submitted PDF.
>>
>> I thought they make you submit via gdoc and I also thought they wanted a
>> timeline and had other requirements. Please verify this so it's not a
>> problem (I base this on the proposals I've seen this year and not on the
>> information actually provided by GSoC).
>>
>>
>>  > 2. I have a notes section where I state: I am still unsure of the GPU
>>  > extension I proposed as I dont know how LLVM plays into the GPU cross
>> over
>>  > space like how nvcc (Nvidia's compiler integrates gcc and PTX) does.
>>
>> You can use clang as "host compiler". As mentioned before, there is
>> clang-cuda and OpenMP offloading also generates PTX for the GPU code.
>>
>>
>>  > I dont know if there is a chance that function graphs in the CPU+GPU
>>  > name spaces are seamless/continupus within nvcc or if nvcc is just a
>>  > wrapper that invokes gcc on the cpu sources and ptx on the gpu
>>  > sources.
>>
>> Something like that as far as I know.
>>
>>
>>  > So what I have said is  - if there is time to investigate we could
>>  > look at this. But I am not sure I am even framing the problem
>>  > statement correctly at this point.
>>
>> As I said, I'd be very happy for you to also work on GPU related things,
>> what exactly can be defined over the next weeks.
>>
>> GPU offloading is by nature inter-procedural (take CUDA kernels) so
>> creating the infrastructure to alter the granularity of kernels
>> (when/where to fuse/split them) could be a task. For this it is fairly
>> important (as far as I know now) to predict the register usage
>> accurately. Using learning here might be interesting as well.
>>
>> As you mention in the pdf, one can also split the index space to balance
>> computation. When we implement something like `pragma omp loop` we can
>> also balance computations across multiple GPUs as long as we get the
>> data movement right.
>>
>>
>>  > 3. I have added a tentative tasks section and made a note that the
>>  > project is open ended and things are quite fluid and may change
>>  > significantly.
>>
>> That is good. This is a moving target and open ended task, I expect
>> things to be determined more clearly as we go and based on the data we
>> gather.
>>
>> Cheers,
>>    Johannes
>>
>>
>>  > Cheers Shiva
>>  >
>>  >
>>  > On Mon, Mar 30, 2020 at 6:52 PM Johannes Doerfert <
>>  > johannesdoerfert at gmail.com> wrote:
>>  >
>>  >> On 3/30/20 8:07 PM, Shiva Stanford wrote:
>>  >>  > 1. Thanks for the clarifications. I will stick to
>>  >>  > non-containerized OS X for now.
>>  >>
>>  >> Sounds good. As long as you can build it and run lit and llvm-test
>>  >> suite tests :)
>>  >>
>>  >>
>>  >>  > 2. As an aside, I did try to build a Debian docker container by
>>  >>  > git
>>  >> cloning
>>  >>  > into it and using the Dockerfile in LLVM/utils/docker as a
>>  >>  > starting
>>  >> point:
>>  >>  >  - some changes needed to updated packages (GCC in particular
>>  >>  >  needs to
>>  >> be
>>  >>  > latest) and the Debian image (Debian 9 instead of Debian 8) pretty
>>  >>  > much sets up the docker container well. But for some reason, the
>>  >>  > Ninja build tool within the CMake Generator fails. I am looking
>>  >>  > into it. Maybe I can produce a working docker workflow for others
>>  >>  > who want to build and work with LLVM in a container environment.
>>  >>
>>  >> Feel free to propose a fix but I'm the wrong one to talk to ;)
>>  >>
>>  >>
>>  >>  > 3. I have submitted the final proposal today to GSoC 2020 today
>>  >>  > after incorporating some comments and thoughts. When you all get a
>>  >>  > chance to review, let me know your thoughts.
>>  >>
>>  >> Good. Can you share the google docs with me
>>  >> (johannesdoerfert at gmail.com)? [Or did you and I misplaced the link?
>>  >> In that case send it again ;)]
>>  >>
>>  >>
>>  >>  > 4. On GPU extension, my thoughts were around what an integrated
>>  >>  > compiler like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when
>>  >>  > GCC is
>>  >> substituted
>>  >>  > with LLVM and if that arrangement can be optimized for ML passes.
>>  >>  > But I am beginning to think that structuring this problem well and
>>  >>  > doing meaningful work over the summer might be a bit difficult.
>>  >>
>>  >> As far as I know, neither GCC nor Clang will behave much differently
>>  >> if they are used by nvcc than in their standalone mode.
>>  >>
>>  >> Having an "ML-mode" is probably a generic thing to look at. Though,
>>  >> the "high-level" optimizations are not necessarily performed in
>>  >> LLVM-IR.
>>  >>
>>  >>
>>  >>  > As mentors, do you have any thoughts on how LLVM might be
>>  >>  > integrated into a joint CPU-GPU compiler by the likes of Nvidia,
>>  >>  > Apple etc.?
>>  >>
>>  >> I'm unsure what you ask exactly. Clang can be used in CPU-GPU
>>  >> compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it?
>>  >> I'm personally mostly interested in generic optimizations in this
>>  >> space but actually quite interested. Some ideas: - transfer latency
>>  >> hiding (another GSoC project), - kernel granularity optimizations
>>  >> (not worked being worked on yet but requires some infrastructe
>>  >> changes that are as of now still in the making), - data "location"
>>  >> tracking so we can "move" computation to the right device, e.g., for
>>  >> really dependence free loops like `pragma omp loop`
>>  >>
>>  >> I can list more things but I'm unsure this is the direction you were
>>  >> thinking.
>>  >>
>>  >> Cheers, Johannes
>>  >>
>>  >>  > Best Shiva
>>  >>  >
>>  >>  >
>>  >>  >
>>  >>  > On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert <
>>  >>  > johannesdoerfert at gmail.com> wrote:
>>  >>  >
>>  >>  >>
>>  >>  >> On 3/27/20 3:46 PM, Shiva Stanford wrote:
>>  >>  >>> Hi Johannes - great we are engaging on this.
>>  >>  >>>
>>  >>  >>> Some responses now and some later.
>>  >>  >>>
>>  >>  >>> 1. When you say setup LLVM dev environment +. clang + tools etc,
>>  >>  >>> do
>>  >> you
>>  >>  >>> mean setup LLVM compiler code from the repo and build it
>>  >>  >>> locally?
>>  >> If so,
>>  >>  >>> yes, this is all done from my end - that is, I have built all
>>  >>  >>> this
>>  >> on my
>>  >>  >>> machine and compiled and run a couple of function passes. I have
>>  >> look at
>>  >>  >>> some LLVM emits from clang tools but I will familiarize more. I
>>  >>  >>> have
>>  >>  >> added
>>  >>  >>> some small code segments, modified CMAKE Lists and re-built code
>>  >>  >>> to
>>  >> get a
>>  >>  >>> feel for the packaging  structure. Btw, is there a version of
>>  >>  >>> Basel
>>  >> build
>>  >>  >>> for this? Right now, I am using OS X as the SDK as Apple is the
>>  >>  >>> one
>>  >> that
>>  >>  >>> has adopted LLVM the most. But I can switch to Linux containers
>>  >>  >>> to completely wall off the LLVM build against any OS X system
>>  >>  >>> builds to prevent path obfuscation and truly have a separate
>>  >>  >>> address space. Is
>>  >>  >> there
>>  >>  >>> a preferable environment? In any case, I am thinking of
>>  >>  >>> containerizing
>>  >>  >> the
>>  >>  >>> build, so OS X system paths don't interfere with include paths -
>>  >> have you
>>  >>  >>> received feedback from other developers on whether the include
>>  >>  >>> paths interfere with OS X LLVM system build?
>>  >>  >>
>>  >>  >>
>>  >>  >> Setup sounds good.
>>  >>  >>
>>  >>  >> I have never used OS X but people do and I would expect it to be
>>  >>  >> OK.
>>  >>  >>
>>  >>  >> I don't think you need to worry about this right now.
>>  >>  >>
>>  >>  >>
>>  >>  >>> 2. The attributor pass refactoring gives some specific direction
>>  >>  >>> as a startup project - so that's great. Let me study this pass
>>  >>  >>> and I
>>  >> will get
>>  >>  >>> back to you with more questions.
>>  >>  >>
>>  >>  >> Sure.
>>  >>  >>
>>  >>  >>
>>  >>  >>> 3. Yes, I will stick to the style guide (Baaaah...Stanford is
>>  >>  >>> strict
>>  >> on
>>  >>  >>> code styling and so are you guys :)) for sure.
>>  >>  >>
>>  >>  >> For better or worse.
>>  >>  >>
>>  >>  >>
>>  >>  >> Cheers,
>>  >>  >>
>>  >>  >>    Johannes
>>  >>  >>
>>  >>  >>
>>  >>  >>
>>  >>  >>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <
>>  >>  >>> johannesdoerfert at gmail.com> wrote:
>>  >>  >>>
>>  >>  >>>> Hi Shiva,
>>  >>  >>>>
>>  >>  >>>> apologies for the delayed response.
>>  >>  >>>>
>>  >>  >>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:
>>  >>  >>>>   > I am a grad CS student at Stanford and wanted to engage
>>  >>  >>>>   > with EJ
>>  >>  >> Park,
>>  >>  >>>>   > Giorgis Georgakoudis, Johannes Doerfert to further develop
>>  >>  >>>>   > the
>>  >>  >> Machine
>>  >>  >>>>   > Learning and Compiler Optimization concept.
>>  >>  >>>>
>>  >>  >>>> Cool!
>>  >>  >>>>
>>  >>  >>>>
>>  >>  >>>>   > My background is in machine learning, cluster computing,
>>  >> distributed
>>  >>  >>>>   > systems etc. I am a good C/C++ developer and have a strong
>>  >>  >> background in
>>  >>  >>>>   > algorithms and data structure.
>>  >>  >>>>
>>  >>  >>>> Sounds good.
>>  >>  >>>>
>>  >>  >>>>
>>  >>  >>>>   > I am also taking an advanced compiler course this quarter
>>  >>  >>>>   > at
>>  >>  >>>> Stanford. So I
>>  >>  >>>>   > would be studying several of these topics anyways - so I
>>  >>  >>>>   > thought
>>  >> I
>>  >>  >>>> might as
>>  >>  >>>>   > well co-engage on the LLVM compiler infra project.
>>  >>  >>>>
>>  >>  >>>> Agreed ;)
>>  >>  >>>>
>>  >>  >>>>
>>  >>  >>>>   > I am currently studying the background information on SCC
>>  >>  >>>>   > Call
>>  >>  >> Graphs,
>>  >>  >>>>   > Dominator Trees and other Global and inter-procedural
>>  >>  >>>>   > analysis to
>>  >>  >> lay
>>  >>  >>>> some
>>  >>  >>>>   > ground work on how to tackle this optimization pass using
>>  >>  >>>>   > ML
>>  >> models.
>>  >>  >>>> I have
>>  >>  >>>>   > run a couple of all program function passes and visualized
>>  >>  >>>>   > call
>>  >>  >> graphs
>>  >>  >>>> to
>>  >>  >>>>   > get familiarized with the LLVM optimization pass setup. I
>>  >>  >>>>   > have
>>  >> also
>>  >>  >>>> setup
>>  >>  >>>>   > and learnt the use of GDB to debug function pass code.
>>  >>  >>>>
>>  >>  >>>> Very nice.
>>  >>  >>>>
>>  >>  >>>>
>>  >>  >>>>   > I have submitted the ML and Compiler Optimization proposal
>>  >>  >>>>   > to
>>  >> GSOC
>>  >>  >>>> 2020. I
>>  >>  >>>>   > have added an additional feature to enhance the ML
>>  >>  >>>>   > optimization
>>  >> to
>>  >>  >>>> include
>>  >>  >>>>   > crossover code to GPU and investigate how the function call
>>  >> graphs
>>  >>  >> can
>>  >>  >>>> be
>>  >>  >>>>   > visualized as SCCs across CPU and GPU implementations. If
>>  >>  >>>>   > the
>>  >>  >>>> extension to
>>  >>  >>>>   > GPU is too much for a summer project, potentially we can
>>  >>  >>>>   > focus on developing a framework for studying SCCs across a
>>  >>  >>>>   > unified CPU,
>>  >> GPU
>>  >>  >> setup
>>  >>  >>>>   > and leave the coding, if feasible, to next Summer. All
>>  >> preliminary
>>  >>  >>>> ideas.
>>  >>  >>>>
>>  >>  >>>> I haven't looked at the proposals yet (I think we can only
>>  >>  >>>> after the deadline). TBH, I'm not sure I fully understand your
>>  >>  >>>> extension. Also, full disclosure, the project is pretty
>>  >>  >>>> open-ended from my side at
>>  >> least.
>>  >>  >>>> I do not necessarily believe we (=llvm) is ready for a ML
>>  >>  >>>> driven
>>  >> pass or
>>  >>  >>>> even inference in practice. What I want is to explore the use
>>  >>  >>>> of ML
>>  >> to
>>  >>  >>>> improve the code we have, especially heuristics. We build
>>  >>  >>>> analysis
>>  >> and
>>  >>  >>>> transformations but it is hard to combine them in a way that
>>  >>  >>>> balances compile-time, code-size, and performance.
>>  >>  >>>>
>>  >>  >>>> Some high-level statements that might help to put my view into
>>  >>  >>>> perspective:
>>  >>  >>>>
>>  >>  >>>> I want to use ML to identify patterns and code features that we
>>  >>  >>>> can check for using common techniques but when we base our
>>  >>  >>>> decision
>>  >> making
>>  >>  >>>> on these patterns or features we achieve better compile-time,
>>  >> code-size,
>>  >>  >>>> and/or performance.  I want to use ML to identify shortcomings
>>  >>  >>>> in our existing heuristics, e.g. transformation cut-off values
>>  >>  >>>> or pass schedules. This could also mean to identify alternative
>>  >>  >>>> (combination of) values that perform substantially better (on
>>  >>  >>>> some inputs).
>>  >>  >>>>
>>  >>  >>>>
>>  >>  >>>>   > Not sure how to proceed from here. Hence my email to this
>>  >>  >>>>   > list.
>>  >>  >>>> Please let
>>  >>  >>>>   > me know.
>>  >>  >>>>
>>  >>  >>>> The email to the list was a great first step. The next one
>>  >>  >>>> usually
>>  >> is to
>>  >>  >>>> setup an LLVM development and testing environment, thus LLVM +
>>  >>  >>>> Clang
>>  >> +
>>  >>  >>>> LLVM-Test Suite that you can use. It is also advised to work on
>>  >>  >>>> a
>>  >> small
>>  >>  >>>> task before the GSoC to get used to the LLVM development.
>>  >>  >>>>
>>  >>  >>>> I don't have a really small ML "coding" task handy right now
>>  >>  >>>> but the project is more about experiments anyway. To get some
>>  >>  >>>> LLVM
>>  >> development
>>  >>  >>>> experience we can just take a small task in the IPO Attributor
>>  >>  >>>> pass.
>>  >>  >>>>
>>  >>  >>>> One thing we need and we don't have is data. The Attributor is
>>  >>  >>>> a fixpoint iteration framework so the number of iterations is
>>  >>  >>>> pretty integral part. We have a statistics counter to determine
>>  >>  >>>> if the
>>  >> number
>>  >>  >>>> required was higher than the given threshold but not one to
>>  >>  >>>> determine the maximum iteration count required during
>>  >>  >>>> compilation. It would be great if you could add that, thus a
>>  >>  >>>> statistics counter that shows how many iterations where
>>  >>  >>>> required until a fixpoint was found across all invocations of
>>  >>  >>>> the Attributor. Does this make sense? Let me know what you
>>  >>  >>>> think and feel free to ask questions via email or on IRC.
>>  >>  >>>>
>>  >>  >>>> Cheers, Johannes
>>  >>  >>>>
>>  >>  >>>> P.S. Check out the coding style guide and the how to contribute
>>  >> guide!
>>  >>  >>>>
>>  >>  >>>>
>>  >>  >>>>   > Thank you Shiva Badruswamy shivastanford at gmail.com
>>  >>  >>>>   >
>>  >>  >>>>   >
>>  >>  >>>>   > _______________________________________________ LLVM
>>  >>  >>>>   > Developers mailing list llvm-dev at lists.llvm.org
>>  >>  >>>>   > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>  >>  >>>>
>>  >>  >>>>
>>  >>  >>
>>  >>  >
>>  >>
>>  >>
>>  >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200331/17c11fe9/attachment-0001.html>