[llvm-dev] Machine learning and compiler optimizations: using inter-procedural analysis to select optimizations
Johannes Doerfert via llvm-dev
llvm-dev at lists.llvm.org
Mon Mar 30 23:42:23 PDT 2020
On 3/30/20 9:28 PM, Shiva Stanford wrote:
> Hi Johannes:
>
> 1. Attached is the submitted PDF.
I thought they make you submit via gdoc and I also thought they wanted a
timeline and had other requirements. Please verify this so it's not a
problem (I base this on the proposals I've seen this year and not on the
information actually provided by GSoC).
> 2. I have a notes section where I state: I am still unsure of the GPU
> extension I proposed as I dont know how LLVM plays into the GPU cross
over
> space like how nvcc (Nvidia's compiler integrates gcc and PTX) does.
You can use clang as "host compiler". As mentioned before, there is
clang-cuda and OpenMP offloading also generates PTX for the GPU code.
> I dont know if there is a chance that function graphs in the CPU+GPU
> name spaces are seamless/continupus within nvcc or if nvcc is just a
> wrapper that invokes gcc on the cpu sources and ptx on the gpu
> sources.
Something like that as far as I know.
> So what I have said is - if there is time to investigate we could
> look at this. But I am not sure I am even framing the problem
> statement correctly at this point.
As I said, I'd be very happy for you to also work on GPU related things,
what exactly can be defined over the next weeks.
GPU offloading is by nature inter-procedural (take CUDA kernels) so
creating the infrastructure to alter the granularity of kernels
(when/where to fuse/split them) could be a task. For this it is fairly
important (as far as I know now) to predict the register usage
accurately. Using learning here might be interesting as well.
As you mention in the pdf, one can also split the index space to balance
computation. When we implement something like `pragma omp loop` we can
also balance computations across multiple GPUs as long as we get the
data movement right.
> 3. I have added a tentative tasks section and made a note that the
> project is open ended and things are quite fluid and may change
> significantly.
That is good. This is a moving target and open ended task, I expect
things to be determined more clearly as we go and based on the data we
gather.
Cheers,
Johannes
> Cheers Shiva
>
>
> On Mon, Mar 30, 2020 at 6:52 PM Johannes Doerfert <
> johannesdoerfert at gmail.com> wrote:
>
>> On 3/30/20 8:07 PM, Shiva Stanford wrote:
>> > 1. Thanks for the clarifications. I will stick to
>> > non-containerized OS X for now.
>>
>> Sounds good. As long as you can build it and run lit and llvm-test
>> suite tests :)
>>
>>
>> > 2. As an aside, I did try to build a Debian docker container by
>> > git
>> cloning
>> > into it and using the Dockerfile in LLVM/utils/docker as a
>> > starting
>> point:
>> > - some changes needed to updated packages (GCC in particular
>> > needs to
>> be
>> > latest) and the Debian image (Debian 9 instead of Debian 8) pretty
>> > much sets up the docker container well. But for some reason, the
>> > Ninja build tool within the CMake Generator fails. I am looking
>> > into it. Maybe I can produce a working docker workflow for others
>> > who want to build and work with LLVM in a container environment.
>>
>> Feel free to propose a fix but I'm the wrong one to talk to ;)
>>
>>
>> > 3. I have submitted the final proposal today to GSoC 2020 today
>> > after incorporating some comments and thoughts. When you all get a
>> > chance to review, let me know your thoughts.
>>
>> Good. Can you share the google docs with me
>> (johannesdoerfert at gmail.com)? [Or did you and I misplaced the link?
>> In that case send it again ;)]
>>
>>
>> > 4. On GPU extension, my thoughts were around what an integrated
>> > compiler like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when
>> > GCC is
>> substituted
>> > with LLVM and if that arrangement can be optimized for ML passes.
>> > But I am beginning to think that structuring this problem well and
>> > doing meaningful work over the summer might be a bit difficult.
>>
>> As far as I know, neither GCC nor Clang will behave much differently
>> if they are used by nvcc than in their standalone mode.
>>
>> Having an "ML-mode" is probably a generic thing to look at. Though,
>> the "high-level" optimizations are not necessarily performed in
>> LLVM-IR.
>>
>>
>> > As mentors, do you have any thoughts on how LLVM might be
>> > integrated into a joint CPU-GPU compiler by the likes of Nvidia,
>> > Apple etc.?
>>
>> I'm unsure what you ask exactly. Clang can be used in CPU-GPU
>> compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it?
>> I'm personally mostly interested in generic optimizations in this
>> space but actually quite interested. Some ideas: - transfer latency
>> hiding (another GSoC project), - kernel granularity optimizations
>> (not worked being worked on yet but requires some infrastructe
>> changes that are as of now still in the making), - data "location"
>> tracking so we can "move" computation to the right device, e.g., for
>> really dependence free loops like `pragma omp loop`
>>
>> I can list more things but I'm unsure this is the direction you were
>> thinking.
>>
>> Cheers, Johannes
>>
>> > Best Shiva
>> >
>> >
>> >
>> > On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert <
>> > johannesdoerfert at gmail.com> wrote:
>> >
>> >>
>> >> On 3/27/20 3:46 PM, Shiva Stanford wrote:
>> >>> Hi Johannes - great we are engaging on this.
>> >>>
>> >>> Some responses now and some later.
>> >>>
>> >>> 1. When you say setup LLVM dev environment +. clang + tools etc,
>> >>> do
>> you
>> >>> mean setup LLVM compiler code from the repo and build it
>> >>> locally?
>> If so,
>> >>> yes, this is all done from my end - that is, I have built all
>> >>> this
>> on my
>> >>> machine and compiled and run a couple of function passes. I have
>> look at
>> >>> some LLVM emits from clang tools but I will familiarize more. I
>> >>> have
>> >> added
>> >>> some small code segments, modified CMAKE Lists and re-built code
>> >>> to
>> get a
>> >>> feel for the packaging structure. Btw, is there a version of
>> >>> Basel
>> build
>> >>> for this? Right now, I am using OS X as the SDK as Apple is the
>> >>> one
>> that
>> >>> has adopted LLVM the most. But I can switch to Linux containers
>> >>> to completely wall off the LLVM build against any OS X system
>> >>> builds to prevent path obfuscation and truly have a separate
>> >>> address space. Is
>> >> there
>> >>> a preferable environment? In any case, I am thinking of
>> >>> containerizing
>> >> the
>> >>> build, so OS X system paths don't interfere with include paths -
>> have you
>> >>> received feedback from other developers on whether the include
>> >>> paths interfere with OS X LLVM system build?
>> >>
>> >>
>> >> Setup sounds good.
>> >>
>> >> I have never used OS X but people do and I would expect it to be
>> >> OK.
>> >>
>> >> I don't think you need to worry about this right now.
>> >>
>> >>
>> >>> 2. The attributor pass refactoring gives some specific direction
>> >>> as a startup project - so that's great. Let me study this pass
>> >>> and I
>> will get
>> >>> back to you with more questions.
>> >>
>> >> Sure.
>> >>
>> >>
>> >>> 3. Yes, I will stick to the style guide (Baaaah...Stanford is
>> >>> strict
>> on
>> >>> code styling and so are you guys :)) for sure.
>> >>
>> >> For better or worse.
>> >>
>> >>
>> >> Cheers,
>> >>
>> >> Johannes
>> >>
>> >>
>> >>
>> >>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <
>> >>> johannesdoerfert at gmail.com> wrote:
>> >>>
>> >>>> Hi Shiva,
>> >>>>
>> >>>> apologies for the delayed response.
>> >>>>
>> >>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:
>> >>>> > I am a grad CS student at Stanford and wanted to engage
>> >>>> > with EJ
>> >> Park,
>> >>>> > Giorgis Georgakoudis, Johannes Doerfert to further develop
>> >>>> > the
>> >> Machine
>> >>>> > Learning and Compiler Optimization concept.
>> >>>>
>> >>>> Cool!
>> >>>>
>> >>>>
>> >>>> > My background is in machine learning, cluster computing,
>> distributed
>> >>>> > systems etc. I am a good C/C++ developer and have a strong
>> >> background in
>> >>>> > algorithms and data structure.
>> >>>>
>> >>>> Sounds good.
>> >>>>
>> >>>>
>> >>>> > I am also taking an advanced compiler course this quarter
>> >>>> > at
>> >>>> Stanford. So I
>> >>>> > would be studying several of these topics anyways - so I
>> >>>> > thought
>> I
>> >>>> might as
>> >>>> > well co-engage on the LLVM compiler infra project.
>> >>>>
>> >>>> Agreed ;)
>> >>>>
>> >>>>
>> >>>> > I am currently studying the background information on SCC
>> >>>> > Call
>> >> Graphs,
>> >>>> > Dominator Trees and other Global and inter-procedural
>> >>>> > analysis to
>> >> lay
>> >>>> some
>> >>>> > ground work on how to tackle this optimization pass using
>> >>>> > ML
>> models.
>> >>>> I have
>> >>>> > run a couple of all program function passes and visualized
>> >>>> > call
>> >> graphs
>> >>>> to
>> >>>> > get familiarized with the LLVM optimization pass setup. I
>> >>>> > have
>> also
>> >>>> setup
>> >>>> > and learnt the use of GDB to debug function pass code.
>> >>>>
>> >>>> Very nice.
>> >>>>
>> >>>>
>> >>>> > I have submitted the ML and Compiler Optimization proposal
>> >>>> > to
>> GSOC
>> >>>> 2020. I
>> >>>> > have added an additional feature to enhance the ML
>> >>>> > optimization
>> to
>> >>>> include
>> >>>> > crossover code to GPU and investigate how the function call
>> graphs
>> >> can
>> >>>> be
>> >>>> > visualized as SCCs across CPU and GPU implementations. If
>> >>>> > the
>> >>>> extension to
>> >>>> > GPU is too much for a summer project, potentially we can
>> >>>> > focus on developing a framework for studying SCCs across a
>> >>>> > unified CPU,
>> GPU
>> >> setup
>> >>>> > and leave the coding, if feasible, to next Summer. All
>> preliminary
>> >>>> ideas.
>> >>>>
>> >>>> I haven't looked at the proposals yet (I think we can only
>> >>>> after the deadline). TBH, I'm not sure I fully understand your
>> >>>> extension. Also, full disclosure, the project is pretty
>> >>>> open-ended from my side at
>> least.
>> >>>> I do not necessarily believe we (=llvm) is ready for a ML
>> >>>> driven
>> pass or
>> >>>> even inference in practice. What I want is to explore the use
>> >>>> of ML
>> to
>> >>>> improve the code we have, especially heuristics. We build
>> >>>> analysis
>> and
>> >>>> transformations but it is hard to combine them in a way that
>> >>>> balances compile-time, code-size, and performance.
>> >>>>
>> >>>> Some high-level statements that might help to put my view into
>> >>>> perspective:
>> >>>>
>> >>>> I want to use ML to identify patterns and code features that we
>> >>>> can check for using common techniques but when we base our
>> >>>> decision
>> making
>> >>>> on these patterns or features we achieve better compile-time,
>> code-size,
>> >>>> and/or performance. I want to use ML to identify shortcomings
>> >>>> in our existing heuristics, e.g. transformation cut-off values
>> >>>> or pass schedules. This could also mean to identify alternative
>> >>>> (combination of) values that perform substantially better (on
>> >>>> some inputs).
>> >>>>
>> >>>>
>> >>>> > Not sure how to proceed from here. Hence my email to this
>> >>>> > list.
>> >>>> Please let
>> >>>> > me know.
>> >>>>
>> >>>> The email to the list was a great first step. The next one
>> >>>> usually
>> is to
>> >>>> setup an LLVM development and testing environment, thus LLVM +
>> >>>> Clang
>> +
>> >>>> LLVM-Test Suite that you can use. It is also advised to work on
>> >>>> a
>> small
>> >>>> task before the GSoC to get used to the LLVM development.
>> >>>>
>> >>>> I don't have a really small ML "coding" task handy right now
>> >>>> but the project is more about experiments anyway. To get some
>> >>>> LLVM
>> development
>> >>>> experience we can just take a small task in the IPO Attributor
>> >>>> pass.
>> >>>>
>> >>>> One thing we need and we don't have is data. The Attributor is
>> >>>> a fixpoint iteration framework so the number of iterations is
>> >>>> pretty integral part. We have a statistics counter to determine
>> >>>> if the
>> number
>> >>>> required was higher than the given threshold but not one to
>> >>>> determine the maximum iteration count required during
>> >>>> compilation. It would be great if you could add that, thus a
>> >>>> statistics counter that shows how many iterations where
>> >>>> required until a fixpoint was found across all invocations of
>> >>>> the Attributor. Does this make sense? Let me know what you
>> >>>> think and feel free to ask questions via email or on IRC.
>> >>>>
>> >>>> Cheers, Johannes
>> >>>>
>> >>>> P.S. Check out the coding style guide and the how to contribute
>> guide!
>> >>>>
>> >>>>
>> >>>> > Thank you Shiva Badruswamy shivastanford at gmail.com
>> >>>> >
>> >>>> >
>> >>>> > _______________________________________________ LLVM
>> >>>> > Developers mailing list llvm-dev at lists.llvm.org
>> >>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>>>
>> >>>>
>> >>
>> >
>>
>>
>
More information about the llvm-dev
mailing list