[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
Shiva Stanford via llvm-dev
llvm-dev at lists.llvm.org
Tue Mar 31 07:22:23 PDT 2020
Hi Johannes at al:
1. I could not see anywhere that GSoC 2020 requires a timeline in the
proposal. If anything they have laid down their own timelines which meets
with my summer break, starting in June.
2. Here's their timeline link:
https://developers.google.com/open-source/gsoc/timeline
Best
Shiva
On Tue, Mar 31, 2020 at 3:22 AM Shiva Stanford <shivastanford at gmail.com>
wrote:
> 1. Draft proposals via gdoc. Final via PDF.
> 2. I did not see any timeline requests from GSoC but spring quarter ends
> June 6 or so or maybe by a week more due to Coronavirus schedule delays.
> Summer begins then. I will look into it some more in the morning and see
> what I can add to timelines.
>
> Thanks.
>
> On Mon, Mar 30, 2020 at 11:43 PM Johannes Doerfert <
> johannesdoerfert at gmail.com> wrote:
>
>>
>> On 3/30/20 9:28 PM, Shiva Stanford wrote:
>> > Hi Johannes:
>> >
>> > 1. Attached is the submitted PDF.
>>
>> I thought they make you submit via gdoc and I also thought they wanted a
>> timeline and had other requirements. Please verify this so it's not a
>> problem (I base this on the proposals I've seen this year and not on the
>> information actually provided by GSoC).
>>
>>
>> > 2. I have a notes section where I state: I am still unsure of the GPU
>> > extension I proposed as I dont know how LLVM plays into the GPU cross
>> over
>> > space like how nvcc (Nvidia's compiler integrates gcc and PTX) does.
>>
>> You can use clang as "host compiler". As mentioned before, there is
>> clang-cuda and OpenMP offloading also generates PTX for the GPU code.
>>
>>
>> > I dont know if there is a chance that function graphs in the CPU+GPU
>> > name spaces are seamless/continupus within nvcc or if nvcc is just a
>> > wrapper that invokes gcc on the cpu sources and ptx on the gpu
>> > sources.
>>
>> Something like that as far as I know.
>>
>>
>> > So what I have said is - if there is time to investigate we could
>> > look at this. But I am not sure I am even framing the problem
>> > statement correctly at this point.
>>
>> As I said, I'd be very happy for you to also work on GPU related things,
>> what exactly can be defined over the next weeks.
>>
>> GPU offloading is by nature inter-procedural (take CUDA kernels) so
>> creating the infrastructure to alter the granularity of kernels
>> (when/where to fuse/split them) could be a task. For this it is fairly
>> important (as far as I know now) to predict the register usage
>> accurately. Using learning here might be interesting as well.
>>
>> As you mention in the pdf, one can also split the index space to balance
>> computation. When we implement something like `pragma omp loop` we can
>> also balance computations across multiple GPUs as long as we get the
>> data movement right.
>>
>>
>> > 3. I have added a tentative tasks section and made a note that the
>> > project is open ended and things are quite fluid and may change
>> > significantly.
>>
>> That is good. This is a moving target and open ended task, I expect
>> things to be determined more clearly as we go and based on the data we
>> gather.
>>
>> Cheers,
>> Johannes
>>
>>
>> > Cheers Shiva
>> >
>> >
>> > On Mon, Mar 30, 2020 at 6:52 PM Johannes Doerfert <
>> > johannesdoerfert at gmail.com> wrote:
>> >
>> >> On 3/30/20 8:07 PM, Shiva Stanford wrote:
>> >> > 1. Thanks for the clarifications. I will stick to
>> >> > non-containerized OS X for now.
>> >>
>> >> Sounds good. As long as you can build it and run lit and llvm-test
>> >> suite tests :)
>> >>
>> >>
>> >> > 2. As an aside, I did try to build a Debian docker container by
>> >> > git
>> >> cloning
>> >> > into it and using the Dockerfile in LLVM/utils/docker as a
>> >> > starting
>> >> point:
>> >> > - some changes needed to updated packages (GCC in particular
>> >> > needs to
>> >> be
>> >> > latest) and the Debian image (Debian 9 instead of Debian 8) pretty
>> >> > much sets up the docker container well. But for some reason, the
>> >> > Ninja build tool within the CMake Generator fails. I am looking
>> >> > into it. Maybe I can produce a working docker workflow for others
>> >> > who want to build and work with LLVM in a container environment.
>> >>
>> >> Feel free to propose a fix but I'm the wrong one to talk to ;)
>> >>
>> >>
>> >> > 3. I have submitted the final proposal today to GSoC 2020 today
>> >> > after incorporating some comments and thoughts. When you all get a
>> >> > chance to review, let me know your thoughts.
>> >>
>> >> Good. Can you share the google docs with me
>> >> (johannesdoerfert at gmail.com)? [Or did you and I misplaced the link?
>> >> In that case send it again ;)]
>> >>
>> >>
>> >> > 4. On GPU extension, my thoughts were around what an integrated
>> >> > compiler like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when
>> >> > GCC is
>> >> substituted
>> >> > with LLVM and if that arrangement can be optimized for ML passes.
>> >> > But I am beginning to think that structuring this problem well and
>> >> > doing meaningful work over the summer might be a bit difficult.
>> >>
>> >> As far as I know, neither GCC nor Clang will behave much differently
>> >> if they are used by nvcc than in their standalone mode.
>> >>
>> >> Having an "ML-mode" is probably a generic thing to look at. Though,
>> >> the "high-level" optimizations are not necessarily performed in
>> >> LLVM-IR.
>> >>
>> >>
>> >> > As mentors, do you have any thoughts on how LLVM might be
>> >> > integrated into a joint CPU-GPU compiler by the likes of Nvidia,
>> >> > Apple etc.?
>> >>
>> >> I'm unsure what you ask exactly. Clang can be used in CPU-GPU
>> >> compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it?
>> >> I'm personally mostly interested in generic optimizations in this
>> >> space but actually quite interested. Some ideas: - transfer latency
>> >> hiding (another GSoC project), - kernel granularity optimizations
>> >> (not worked being worked on yet but requires some infrastructe
>> >> changes that are as of now still in the making), - data "location"
>> >> tracking so we can "move" computation to the right device, e.g., for
>> >> really dependence free loops like `pragma omp loop`
>> >>
>> >> I can list more things but I'm unsure this is the direction you were
>> >> thinking.
>> >>
>> >> Cheers, Johannes
>> >>
>> >> > Best Shiva
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert <
>> >> > johannesdoerfert at gmail.com> wrote:
>> >> >
>> >> >>
>> >> >> On 3/27/20 3:46 PM, Shiva Stanford wrote:
>> >> >>> Hi Johannes - great we are engaging on this.
>> >> >>>
>> >> >>> Some responses now and some later.
>> >> >>>
>> >> >>> 1. When you say setup LLVM dev environment +. clang + tools etc,
>> >> >>> do
>> >> you
>> >> >>> mean setup LLVM compiler code from the repo and build it
>> >> >>> locally?
>> >> If so,
>> >> >>> yes, this is all done from my end - that is, I have built all
>> >> >>> this
>> >> on my
>> >> >>> machine and compiled and run a couple of function passes. I have
>> >> look at
>> >> >>> some LLVM emits from clang tools but I will familiarize more. I
>> >> >>> have
>> >> >> added
>> >> >>> some small code segments, modified CMAKE Lists and re-built code
>> >> >>> to
>> >> get a
>> >> >>> feel for the packaging structure. Btw, is there a version of
>> >> >>> Basel
>> >> build
>> >> >>> for this? Right now, I am using OS X as the SDK as Apple is the
>> >> >>> one
>> >> that
>> >> >>> has adopted LLVM the most. But I can switch to Linux containers
>> >> >>> to completely wall off the LLVM build against any OS X system
>> >> >>> builds to prevent path obfuscation and truly have a separate
>> >> >>> address space. Is
>> >> >> there
>> >> >>> a preferable environment? In any case, I am thinking of
>> >> >>> containerizing
>> >> >> the
>> >> >>> build, so OS X system paths don't interfere with include paths -
>> >> have you
>> >> >>> received feedback from other developers on whether the include
>> >> >>> paths interfere with OS X LLVM system build?
>> >> >>
>> >> >>
>> >> >> Setup sounds good.
>> >> >>
>> >> >> I have never used OS X but people do and I would expect it to be
>> >> >> OK.
>> >> >>
>> >> >> I don't think you need to worry about this right now.
>> >> >>
>> >> >>
>> >> >>> 2. The attributor pass refactoring gives some specific direction
>> >> >>> as a startup project - so that's great. Let me study this pass
>> >> >>> and I
>> >> will get
>> >> >>> back to you with more questions.
>> >> >>
>> >> >> Sure.
>> >> >>
>> >> >>
>> >> >>> 3. Yes, I will stick to the style guide (Baaaah...Stanford is
>> >> >>> strict
>> >> on
>> >> >>> code styling and so are you guys :)) for sure.
>> >> >>
>> >> >> For better or worse.
>> >> >>
>> >> >>
>> >> >> Cheers,
>> >> >>
>> >> >> Johannes
>> >> >>
>> >> >>
>> >> >>
>> >> >>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <
>> >> >>> johannesdoerfert at gmail.com> wrote:
>> >> >>>
>> >> >>>> Hi Shiva,
>> >> >>>>
>> >> >>>> apologies for the delayed response.
>> >> >>>>
>> >> >>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:
>> >> >>>> > I am a grad CS student at Stanford and wanted to engage
>> >> >>>> > with EJ
>> >> >> Park,
>> >> >>>> > Giorgis Georgakoudis, Johannes Doerfert to further develop
>> >> >>>> > the
>> >> >> Machine
>> >> >>>> > Learning and Compiler Optimization concept.
>> >> >>>>
>> >> >>>> Cool!
>> >> >>>>
>> >> >>>>
>> >> >>>> > My background is in machine learning, cluster computing,
>> >> distributed
>> >> >>>> > systems etc. I am a good C/C++ developer and have a strong
>> >> >> background in
>> >> >>>> > algorithms and data structure.
>> >> >>>>
>> >> >>>> Sounds good.
>> >> >>>>
>> >> >>>>
>> >> >>>> > I am also taking an advanced compiler course this quarter
>> >> >>>> > at
>> >> >>>> Stanford. So I
>> >> >>>> > would be studying several of these topics anyways - so I
>> >> >>>> > thought
>> >> I
>> >> >>>> might as
>> >> >>>> > well co-engage on the LLVM compiler infra project.
>> >> >>>>
>> >> >>>> Agreed ;)
>> >> >>>>
>> >> >>>>
>> >> >>>> > I am currently studying the background information on SCC
>> >> >>>> > Call
>> >> >> Graphs,
>> >> >>>> > Dominator Trees and other Global and inter-procedural
>> >> >>>> > analysis to
>> >> >> lay
>> >> >>>> some
>> >> >>>> > ground work on how to tackle this optimization pass using
>> >> >>>> > ML
>> >> models.
>> >> >>>> I have
>> >> >>>> > run a couple of all program function passes and visualized
>> >> >>>> > call
>> >> >> graphs
>> >> >>>> to
>> >> >>>> > get familiarized with the LLVM optimization pass setup. I
>> >> >>>> > have
>> >> also
>> >> >>>> setup
>> >> >>>> > and learnt the use of GDB to debug function pass code.
>> >> >>>>
>> >> >>>> Very nice.
>> >> >>>>
>> >> >>>>
>> >> >>>> > I have submitted the ML and Compiler Optimization proposal
>> >> >>>> > to
>> >> GSOC
>> >> >>>> 2020. I
>> >> >>>> > have added an additional feature to enhance the ML
>> >> >>>> > optimization
>> >> to
>> >> >>>> include
>> >> >>>> > crossover code to GPU and investigate how the function call
>> >> graphs
>> >> >> can
>> >> >>>> be
>> >> >>>> > visualized as SCCs across CPU and GPU implementations. If
>> >> >>>> > the
>> >> >>>> extension to
>> >> >>>> > GPU is too much for a summer project, potentially we can
>> >> >>>> > focus on developing a framework for studying SCCs across a
>> >> >>>> > unified CPU,
>> >> GPU
>> >> >> setup
>> >> >>>> > and leave the coding, if feasible, to next Summer. All
>> >> preliminary
>> >> >>>> ideas.
>> >> >>>>
>> >> >>>> I haven't looked at the proposals yet (I think we can only
>> >> >>>> after the deadline). TBH, I'm not sure I fully understand your
>> >> >>>> extension. Also, full disclosure, the project is pretty
>> >> >>>> open-ended from my side at
>> >> least.
>> >> >>>> I do not necessarily believe we (=llvm) is ready for a ML
>> >> >>>> driven
>> >> pass or
>> >> >>>> even inference in practice. What I want is to explore the use
>> >> >>>> of ML
>> >> to
>> >> >>>> improve the code we have, especially heuristics. We build
>> >> >>>> analysis
>> >> and
>> >> >>>> transformations but it is hard to combine them in a way that
>> >> >>>> balances compile-time, code-size, and performance.
>> >> >>>>
>> >> >>>> Some high-level statements that might help to put my view into
>> >> >>>> perspective:
>> >> >>>>
>> >> >>>> I want to use ML to identify patterns and code features that we
>> >> >>>> can check for using common techniques but when we base our
>> >> >>>> decision
>> >> making
>> >> >>>> on these patterns or features we achieve better compile-time,
>> >> code-size,
>> >> >>>> and/or performance. I want to use ML to identify shortcomings
>> >> >>>> in our existing heuristics, e.g. transformation cut-off values
>> >> >>>> or pass schedules. This could also mean to identify alternative
>> >> >>>> (combination of) values that perform substantially better (on
>> >> >>>> some inputs).
>> >> >>>>
>> >> >>>>
>> >> >>>> > Not sure how to proceed from here. Hence my email to this
>> >> >>>> > list.
>> >> >>>> Please let
>> >> >>>> > me know.
>> >> >>>>
>> >> >>>> The email to the list was a great first step. The next one
>> >> >>>> usually
>> >> is to
>> >> >>>> setup an LLVM development and testing environment, thus LLVM +
>> >> >>>> Clang
>> >> +
>> >> >>>> LLVM-Test Suite that you can use. It is also advised to work on
>> >> >>>> a
>> >> small
>> >> >>>> task before the GSoC to get used to the LLVM development.
>> >> >>>>
>> >> >>>> I don't have a really small ML "coding" task handy right now
>> >> >>>> but the project is more about experiments anyway. To get some
>> >> >>>> LLVM
>> >> development
>> >> >>>> experience we can just take a small task in the IPO Attributor
>> >> >>>> pass.
>> >> >>>>
>> >> >>>> One thing we need and we don't have is data. The Attributor is
>> >> >>>> a fixpoint iteration framework so the number of iterations is
>> >> >>>> pretty integral part. We have a statistics counter to determine
>> >> >>>> if the
>> >> number
>> >> >>>> required was higher than the given threshold but not one to
>> >> >>>> determine the maximum iteration count required during
>> >> >>>> compilation. It would be great if you could add that, thus a
>> >> >>>> statistics counter that shows how many iterations where
>> >> >>>> required until a fixpoint was found across all invocations of
>> >> >>>> the Attributor. Does this make sense? Let me know what you
>> >> >>>> think and feel free to ask questions via email or on IRC.
>> >> >>>>
>> >> >>>> Cheers, Johannes
>> >> >>>>
>> >> >>>> P.S. Check out the coding style guide and the how to contribute
>> >> guide!
>> >> >>>>
>> >> >>>>
>> >> >>>> > Thank you Shiva Badruswamy shivastanford at gmail.com
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > _______________________________________________ LLVM
>> >> >>>> > Developers mailing list llvm-dev at lists.llvm.org
>> >> >>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >> >>>>
>> >> >>>>
>> >> >>
>> >> >
>> >>
>> >>
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200331/17c11fe9/attachment-0001.html>
More information about the llvm-dev
mailing list