[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
    Shiva Stanford via llvm-dev 
    llvm-dev at lists.llvm.org
       
    Tue Mar 31 03:22:54 PDT 2020
    
    
  
1. Draft proposals via gdoc. Final via PDF.
2. I did not see any timeline requests from GSoC but spring quarter ends
June 6 or so or maybe by a week more due to Coronavirus schedule delays.
Summer begins then. I will look into it some more in the morning and see
what I can add to timelines.
Thanks.
On Mon, Mar 30, 2020 at 11:43 PM Johannes Doerfert <
johannesdoerfert at gmail.com> wrote:
>
> On 3/30/20 9:28 PM, Shiva Stanford wrote:
>  > Hi Johannes:
>  >
>  > 1. Attached is the submitted PDF.
>
> I thought they make you submit via gdoc and I also thought they wanted a
> timeline and had other requirements. Please verify this so it's not a
> problem (I base this on the proposals I've seen this year and not on the
> information actually provided by GSoC).
>
>
>  > 2. I have a notes section where I state: I am still unsure of the GPU
>  > extension I proposed as I dont know how LLVM plays into the GPU cross
> over
>  > space like how nvcc (Nvidia's compiler integrates gcc and PTX) does.
>
> You can use clang as "host compiler". As mentioned before, there is
> clang-cuda and OpenMP offloading also generates PTX for the GPU code.
>
>
>  > I dont know if there is a chance that function graphs in the CPU+GPU
>  > name spaces are seamless/continupus within nvcc or if nvcc is just a
>  > wrapper that invokes gcc on the cpu sources and ptx on the gpu
>  > sources.
>
> Something like that as far as I know.
>
>
>  > So what I have said is  - if there is time to investigate we could
>  > look at this. But I am not sure I am even framing the problem
>  > statement correctly at this point.
>
> As I said, I'd be very happy for you to also work on GPU related things,
> what exactly can be defined over the next weeks.
>
> GPU offloading is by nature inter-procedural (take CUDA kernels) so
> creating the infrastructure to alter the granularity of kernels
> (when/where to fuse/split them) could be a task. For this it is fairly
> important (as far as I know now) to predict the register usage
> accurately. Using learning here might be interesting as well.
>
> As you mention in the pdf, one can also split the index space to balance
> computation. When we implement something like `pragma omp loop` we can
> also balance computations across multiple GPUs as long as we get the
> data movement right.
>
>
>  > 3. I have added a tentative tasks section and made a note that the
>  > project is open ended and things are quite fluid and may change
>  > significantly.
>
> That is good. This is a moving target and open ended task, I expect
> things to be determined more clearly as we go and based on the data we
> gather.
>
> Cheers,
>    Johannes
>
>
>  > Cheers Shiva
>  >
>  >
>  > On Mon, Mar 30, 2020 at 6:52 PM Johannes Doerfert <
>  > johannesdoerfert at gmail.com> wrote:
>  >
>  >> On 3/30/20 8:07 PM, Shiva Stanford wrote:
>  >>  > 1. Thanks for the clarifications. I will stick to
>  >>  > non-containerized OS X for now.
>  >>
>  >> Sounds good. As long as you can build it and run lit and llvm-test
>  >> suite tests :)
>  >>
>  >>
>  >>  > 2. As an aside, I did try to build a Debian docker container by
>  >>  > git
>  >> cloning
>  >>  > into it and using the Dockerfile in LLVM/utils/docker as a
>  >>  > starting
>  >> point:
>  >>  >  - some changes needed to updated packages (GCC in particular
>  >>  >  needs to
>  >> be
>  >>  > latest) and the Debian image (Debian 9 instead of Debian 8) pretty
>  >>  > much sets up the docker container well. But for some reason, the
>  >>  > Ninja build tool within the CMake Generator fails. I am looking
>  >>  > into it. Maybe I can produce a working docker workflow for others
>  >>  > who want to build and work with LLVM in a container environment.
>  >>
>  >> Feel free to propose a fix but I'm the wrong one to talk to ;)
>  >>
>  >>
>  >>  > 3. I have submitted the final proposal today to GSoC 2020 today
>  >>  > after incorporating some comments and thoughts. When you all get a
>  >>  > chance to review, let me know your thoughts.
>  >>
>  >> Good. Can you share the google docs with me
>  >> (johannesdoerfert at gmail.com)? [Or did you and I misplaced the link?
>  >> In that case send it again ;)]
>  >>
>  >>
>  >>  > 4. On GPU extension, my thoughts were around what an integrated
>  >>  > compiler like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when
>  >>  > GCC is
>  >> substituted
>  >>  > with LLVM and if that arrangement can be optimized for ML passes.
>  >>  > But I am beginning to think that structuring this problem well and
>  >>  > doing meaningful work over the summer might be a bit difficult.
>  >>
>  >> As far as I know, neither GCC nor Clang will behave much differently
>  >> if they are used by nvcc than in their standalone mode.
>  >>
>  >> Having an "ML-mode" is probably a generic thing to look at. Though,
>  >> the "high-level" optimizations are not necessarily performed in
>  >> LLVM-IR.
>  >>
>  >>
>  >>  > As mentors, do you have any thoughts on how LLVM might be
>  >>  > integrated into a joint CPU-GPU compiler by the likes of Nvidia,
>  >>  > Apple etc.?
>  >>
>  >> I'm unsure what you ask exactly. Clang can be used in CPU-GPU
>  >> compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it?
>  >> I'm personally mostly interested in generic optimizations in this
>  >> space but actually quite interested. Some ideas: - transfer latency
>  >> hiding (another GSoC project), - kernel granularity optimizations
>  >> (not worked being worked on yet but requires some infrastructe
>  >> changes that are as of now still in the making), - data "location"
>  >> tracking so we can "move" computation to the right device, e.g., for
>  >> really dependence free loops like `pragma omp loop`
>  >>
>  >> I can list more things but I'm unsure this is the direction you were
>  >> thinking.
>  >>
>  >> Cheers, Johannes
>  >>
>  >>  > Best Shiva
>  >>  >
>  >>  >
>  >>  >
>  >>  > On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert <
>  >>  > johannesdoerfert at gmail.com> wrote:
>  >>  >
>  >>  >>
>  >>  >> On 3/27/20 3:46 PM, Shiva Stanford wrote:
>  >>  >>> Hi Johannes - great we are engaging on this.
>  >>  >>>
>  >>  >>> Some responses now and some later.
>  >>  >>>
>  >>  >>> 1. When you say setup LLVM dev environment +. clang + tools etc,
>  >>  >>> do
>  >> you
>  >>  >>> mean setup LLVM compiler code from the repo and build it
>  >>  >>> locally?
>  >> If so,
>  >>  >>> yes, this is all done from my end - that is, I have built all
>  >>  >>> this
>  >> on my
>  >>  >>> machine and compiled and run a couple of function passes. I have
>  >> look at
>  >>  >>> some LLVM emits from clang tools but I will familiarize more. I
>  >>  >>> have
>  >>  >> added
>  >>  >>> some small code segments, modified CMAKE Lists and re-built code
>  >>  >>> to
>  >> get a
>  >>  >>> feel for the packaging  structure. Btw, is there a version of
>  >>  >>> Basel
>  >> build
>  >>  >>> for this? Right now, I am using OS X as the SDK as Apple is the
>  >>  >>> one
>  >> that
>  >>  >>> has adopted LLVM the most. But I can switch to Linux containers
>  >>  >>> to completely wall off the LLVM build against any OS X system
>  >>  >>> builds to prevent path obfuscation and truly have a separate
>  >>  >>> address space. Is
>  >>  >> there
>  >>  >>> a preferable environment? In any case, I am thinking of
>  >>  >>> containerizing
>  >>  >> the
>  >>  >>> build, so OS X system paths don't interfere with include paths -
>  >> have you
>  >>  >>> received feedback from other developers on whether the include
>  >>  >>> paths interfere with OS X LLVM system build?
>  >>  >>
>  >>  >>
>  >>  >> Setup sounds good.
>  >>  >>
>  >>  >> I have never used OS X but people do and I would expect it to be
>  >>  >> OK.
>  >>  >>
>  >>  >> I don't think you need to worry about this right now.
>  >>  >>
>  >>  >>
>  >>  >>> 2. The attributor pass refactoring gives some specific direction
>  >>  >>> as a startup project - so that's great. Let me study this pass
>  >>  >>> and I
>  >> will get
>  >>  >>> back to you with more questions.
>  >>  >>
>  >>  >> Sure.
>  >>  >>
>  >>  >>
>  >>  >>> 3. Yes, I will stick to the style guide (Baaaah...Stanford is
>  >>  >>> strict
>  >> on
>  >>  >>> code styling and so are you guys :)) for sure.
>  >>  >>
>  >>  >> For better or worse.
>  >>  >>
>  >>  >>
>  >>  >> Cheers,
>  >>  >>
>  >>  >>    Johannes
>  >>  >>
>  >>  >>
>  >>  >>
>  >>  >>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <
>  >>  >>> johannesdoerfert at gmail.com> wrote:
>  >>  >>>
>  >>  >>>> Hi Shiva,
>  >>  >>>>
>  >>  >>>> apologies for the delayed response.
>  >>  >>>>
>  >>  >>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:
>  >>  >>>>   > I am a grad CS student at Stanford and wanted to engage
>  >>  >>>>   > with EJ
>  >>  >> Park,
>  >>  >>>>   > Giorgis Georgakoudis, Johannes Doerfert to further develop
>  >>  >>>>   > the
>  >>  >> Machine
>  >>  >>>>   > Learning and Compiler Optimization concept.
>  >>  >>>>
>  >>  >>>> Cool!
>  >>  >>>>
>  >>  >>>>
>  >>  >>>>   > My background is in machine learning, cluster computing,
>  >> distributed
>  >>  >>>>   > systems etc. I am a good C/C++ developer and have a strong
>  >>  >> background in
>  >>  >>>>   > algorithms and data structure.
>  >>  >>>>
>  >>  >>>> Sounds good.
>  >>  >>>>
>  >>  >>>>
>  >>  >>>>   > I am also taking an advanced compiler course this quarter
>  >>  >>>>   > at
>  >>  >>>> Stanford. So I
>  >>  >>>>   > would be studying several of these topics anyways - so I
>  >>  >>>>   > thought
>  >> I
>  >>  >>>> might as
>  >>  >>>>   > well co-engage on the LLVM compiler infra project.
>  >>  >>>>
>  >>  >>>> Agreed ;)
>  >>  >>>>
>  >>  >>>>
>  >>  >>>>   > I am currently studying the background information on SCC
>  >>  >>>>   > Call
>  >>  >> Graphs,
>  >>  >>>>   > Dominator Trees and other Global and inter-procedural
>  >>  >>>>   > analysis to
>  >>  >> lay
>  >>  >>>> some
>  >>  >>>>   > ground work on how to tackle this optimization pass using
>  >>  >>>>   > ML
>  >> models.
>  >>  >>>> I have
>  >>  >>>>   > run a couple of all program function passes and visualized
>  >>  >>>>   > call
>  >>  >> graphs
>  >>  >>>> to
>  >>  >>>>   > get familiarized with the LLVM optimization pass setup. I
>  >>  >>>>   > have
>  >> also
>  >>  >>>> setup
>  >>  >>>>   > and learnt the use of GDB to debug function pass code.
>  >>  >>>>
>  >>  >>>> Very nice.
>  >>  >>>>
>  >>  >>>>
>  >>  >>>>   > I have submitted the ML and Compiler Optimization proposal
>  >>  >>>>   > to
>  >> GSOC
>  >>  >>>> 2020. I
>  >>  >>>>   > have added an additional feature to enhance the ML
>  >>  >>>>   > optimization
>  >> to
>  >>  >>>> include
>  >>  >>>>   > crossover code to GPU and investigate how the function call
>  >> graphs
>  >>  >> can
>  >>  >>>> be
>  >>  >>>>   > visualized as SCCs across CPU and GPU implementations. If
>  >>  >>>>   > the
>  >>  >>>> extension to
>  >>  >>>>   > GPU is too much for a summer project, potentially we can
>  >>  >>>>   > focus on developing a framework for studying SCCs across a
>  >>  >>>>   > unified CPU,
>  >> GPU
>  >>  >> setup
>  >>  >>>>   > and leave the coding, if feasible, to next Summer. All
>  >> preliminary
>  >>  >>>> ideas.
>  >>  >>>>
>  >>  >>>> I haven't looked at the proposals yet (I think we can only
>  >>  >>>> after the deadline). TBH, I'm not sure I fully understand your
>  >>  >>>> extension. Also, full disclosure, the project is pretty
>  >>  >>>> open-ended from my side at
>  >> least.
>  >>  >>>> I do not necessarily believe we (=llvm) is ready for a ML
>  >>  >>>> driven
>  >> pass or
>  >>  >>>> even inference in practice. What I want is to explore the use
>  >>  >>>> of ML
>  >> to
>  >>  >>>> improve the code we have, especially heuristics. We build
>  >>  >>>> analysis
>  >> and
>  >>  >>>> transformations but it is hard to combine them in a way that
>  >>  >>>> balances compile-time, code-size, and performance.
>  >>  >>>>
>  >>  >>>> Some high-level statements that might help to put my view into
>  >>  >>>> perspective:
>  >>  >>>>
>  >>  >>>> I want to use ML to identify patterns and code features that we
>  >>  >>>> can check for using common techniques but when we base our
>  >>  >>>> decision
>  >> making
>  >>  >>>> on these patterns or features we achieve better compile-time,
>  >> code-size,
>  >>  >>>> and/or performance.  I want to use ML to identify shortcomings
>  >>  >>>> in our existing heuristics, e.g. transformation cut-off values
>  >>  >>>> or pass schedules. This could also mean to identify alternative
>  >>  >>>> (combination of) values that perform substantially better (on
>  >>  >>>> some inputs).
>  >>  >>>>
>  >>  >>>>
>  >>  >>>>   > Not sure how to proceed from here. Hence my email to this
>  >>  >>>>   > list.
>  >>  >>>> Please let
>  >>  >>>>   > me know.
>  >>  >>>>
>  >>  >>>> The email to the list was a great first step. The next one
>  >>  >>>> usually
>  >> is to
>  >>  >>>> setup an LLVM development and testing environment, thus LLVM +
>  >>  >>>> Clang
>  >> +
>  >>  >>>> LLVM-Test Suite that you can use. It is also advised to work on
>  >>  >>>> a
>  >> small
>  >>  >>>> task before the GSoC to get used to the LLVM development.
>  >>  >>>>
>  >>  >>>> I don't have a really small ML "coding" task handy right now
>  >>  >>>> but the project is more about experiments anyway. To get some
>  >>  >>>> LLVM
>  >> development
>  >>  >>>> experience we can just take a small task in the IPO Attributor
>  >>  >>>> pass.
>  >>  >>>>
>  >>  >>>> One thing we need and we don't have is data. The Attributor is
>  >>  >>>> a fixpoint iteration framework so the number of iterations is
>  >>  >>>> pretty integral part. We have a statistics counter to determine
>  >>  >>>> if the
>  >> number
>  >>  >>>> required was higher than the given threshold but not one to
>  >>  >>>> determine the maximum iteration count required during
>  >>  >>>> compilation. It would be great if you could add that, thus a
>  >>  >>>> statistics counter that shows how many iterations where
>  >>  >>>> required until a fixpoint was found across all invocations of
>  >>  >>>> the Attributor. Does this make sense? Let me know what you
>  >>  >>>> think and feel free to ask questions via email or on IRC.
>  >>  >>>>
>  >>  >>>> Cheers, Johannes
>  >>  >>>>
>  >>  >>>> P.S. Check out the coding style guide and the how to contribute
>  >> guide!
>  >>  >>>>
>  >>  >>>>
>  >>  >>>>   > Thank you Shiva Badruswamy shivastanford at gmail.com
>  >>  >>>>   >
>  >>  >>>>   >
>  >>  >>>>   > _______________________________________________ LLVM
>  >>  >>>>   > Developers mailing list llvm-dev at lists.llvm.org
>  >>  >>>>   > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>  >>  >>>>
>  >>  >>>>
>  >>  >>
>  >>  >
>  >>
>  >>
>  >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200331/c1a8b964/attachment-0001.html>
    
    
More information about the llvm-dev
mailing list