<div><div dir="auto">1. Draft proposals via gdoc. Final via PDF. </div><div dir="auto">2. I did not see any timeline requests from GSoC but spring quarter ends June 6 or so or maybe by a week more due to Coronavirus schedule delays. Summer begins then. I will look into it some more in the morning and see what I can add to timelines.</div><div dir="auto"><br></div><div dir="auto">Thanks. </div></div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 30, 2020 at 11:43 PM Johannes Doerfert <<a href="mailto:johannesdoerfert@gmail.com">johannesdoerfert@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
On 3/30/20 9:28 PM, Shiva Stanford wrote:<br>
> Hi Johannes:<br>
><br>
> 1. Attached is the submitted PDF.<br>
<br>
I thought they make you submit via gdoc and I also thought they wanted a<br>
timeline and had other requirements. Please verify this so it's not a<br>
problem (I base this on the proposals I've seen this year and not on the<br>
information actually provided by GSoC).<br>
<br>
<br>
> 2. I have a notes section where I state: I am still unsure of the GPU<br>
> extension I proposed as I dont know how LLVM plays into the GPU cross <br>
over<br>
> space like how nvcc (Nvidia's compiler integrates gcc and PTX) does.<br>
<br>
You can use clang as "host compiler". As mentioned before, there is<br>
clang-cuda and OpenMP offloading also generates PTX for the GPU code.<br>
<br>
<br>
> I dont know if there is a chance that function graphs in the CPU+GPU<br>
> name spaces are seamless/continupus within nvcc or if nvcc is just a<br>
> wrapper that invokes gcc on the cpu sources and ptx on the gpu<br>
> sources.<br>
<br>
Something like that as far as I know.<br>
<br>
<br>
> So what I have said is - if there is time to investigate we could<br>
> look at this. But I am not sure I am even framing the problem<br>
> statement correctly at this point.<br>
<br>
As I said, I'd be very happy for you to also work on GPU related things,<br>
what exactly can be defined over the next weeks.<br>
<br>
GPU offloading is by nature inter-procedural (take CUDA kernels) so<br>
creating the infrastructure to alter the granularity of kernels<br>
(when/where to fuse/split them) could be a task. For this it is fairly<br>
important (as far as I know now) to predict the register usage<br>
accurately. Using learning here might be interesting as well.<br>
<br>
As you mention in the pdf, one can also split the index space to balance<br>
computation. When we implement something like `pragma omp loop` we can<br>
also balance computations across multiple GPUs as long as we get the<br>
data movement right.<br>
<br>
<br>
> 3. I have added a tentative tasks section and made a note that the<br>
> project is open ended and things are quite fluid and may change<br>
> significantly.<br>
<br>
That is good. This is a moving target and open ended task, I expect<br>
things to be determined more clearly as we go and based on the data we<br>
gather.<br>
<br>
Cheers,<br>
Johannes<br>
<br>
<br>
> Cheers Shiva<br>
><br>
><br>
> On Mon, Mar 30, 2020 at 6:52 PM Johannes Doerfert <<br>
> <a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>> wrote:<br>
><br>
>> On 3/30/20 8:07 PM, Shiva Stanford wrote:<br>
>> > 1. Thanks for the clarifications. I will stick to<br>
>> > non-containerized OS X for now.<br>
>><br>
>> Sounds good. As long as you can build it and run lit and llvm-test<br>
>> suite tests :)<br>
>><br>
>><br>
>> > 2. As an aside, I did try to build a Debian docker container by<br>
>> > git<br>
>> cloning<br>
>> > into it and using the Dockerfile in LLVM/utils/docker as a<br>
>> > starting<br>
>> point:<br>
>> > - some changes needed to updated packages (GCC in particular<br>
>> > needs to<br>
>> be<br>
>> > latest) and the Debian image (Debian 9 instead of Debian 8) pretty<br>
>> > much sets up the docker container well. But for some reason, the<br>
>> > Ninja build tool within the CMake Generator fails. I am looking<br>
>> > into it. Maybe I can produce a working docker workflow for others<br>
>> > who want to build and work with LLVM in a container environment.<br>
>><br>
>> Feel free to propose a fix but I'm the wrong one to talk to ;)<br>
>><br>
>><br>
>> > 3. I have submitted the final proposal today to GSoC 2020 today<br>
>> > after incorporating some comments and thoughts. When you all get a<br>
>> > chance to review, let me know your thoughts.<br>
>><br>
>> Good. Can you share the google docs with me<br>
>> (<a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>)? [Or did you and I misplaced the link?<br>
>> In that case send it again ;)]<br>
>><br>
>><br>
>> > 4. On GPU extension, my thoughts were around what an integrated<br>
>> > compiler like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when<br>
>> > GCC is<br>
>> substituted<br>
>> > with LLVM and if that arrangement can be optimized for ML passes.<br>
>> > But I am beginning to think that structuring this problem well and<br>
>> > doing meaningful work over the summer might be a bit difficult.<br>
>><br>
>> As far as I know, neither GCC nor Clang will behave much differently<br>
>> if they are used by nvcc than in their standalone mode.<br>
>><br>
>> Having an "ML-mode" is probably a generic thing to look at. Though,<br>
>> the "high-level" optimizations are not necessarily performed in<br>
>> LLVM-IR.<br>
>><br>
>><br>
>> > As mentors, do you have any thoughts on how LLVM might be<br>
>> > integrated into a joint CPU-GPU compiler by the likes of Nvidia,<br>
>> > Apple etc.?<br>
>><br>
>> I'm unsure what you ask exactly. Clang can be used in CPU-GPU<br>
>> compilation via Cuda, OpenCL, OpenMP offload, Sycl, ... is this it?<br>
>> I'm personally mostly interested in generic optimizations in this<br>
>> space but actually quite interested. Some ideas: - transfer latency<br>
>> hiding (another GSoC project), - kernel granularity optimizations<br>
>> (not worked being worked on yet but requires some infrastructe<br>
>> changes that are as of now still in the making), - data "location"<br>
>> tracking so we can "move" computation to the right device, e.g., for<br>
>> really dependence free loops like `pragma omp loop`<br>
>><br>
>> I can list more things but I'm unsure this is the direction you were<br>
>> thinking.<br>
>><br>
>> Cheers, Johannes<br>
>><br>
>> > Best Shiva<br>
>> ><br>
>> ><br>
>> ><br>
>> > On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert <<br>
>> > <a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>> wrote:<br>
>> ><br>
>> >><br>
>> >> On 3/27/20 3:46 PM, Shiva Stanford wrote:<br>
>> >>> Hi Johannes - great we are engaging on this.<br>
>> >>><br>
>> >>> Some responses now and some later.<br>
>> >>><br>
>> >>> 1. When you say setup LLVM dev environment +. clang + tools etc,<br>
>> >>> do<br>
>> you<br>
>> >>> mean setup LLVM compiler code from the repo and build it<br>
>> >>> locally?<br>
>> If so,<br>
>> >>> yes, this is all done from my end - that is, I have built all<br>
>> >>> this<br>
>> on my<br>
>> >>> machine and compiled and run a couple of function passes. I have<br>
>> look at<br>
>> >>> some LLVM emits from clang tools but I will familiarize more. I<br>
>> >>> have<br>
>> >> added<br>
>> >>> some small code segments, modified CMAKE Lists and re-built code<br>
>> >>> to<br>
>> get a<br>
>> >>> feel for the packaging structure. Btw, is there a version of<br>
>> >>> Basel<br>
>> build<br>
>> >>> for this? Right now, I am using OS X as the SDK as Apple is the<br>
>> >>> one<br>
>> that<br>
>> >>> has adopted LLVM the most. But I can switch to Linux containers<br>
>> >>> to completely wall off the LLVM build against any OS X system<br>
>> >>> builds to prevent path obfuscation and truly have a separate<br>
>> >>> address space. Is<br>
>> >> there<br>
>> >>> a preferable environment? In any case, I am thinking of<br>
>> >>> containerizing<br>
>> >> the<br>
>> >>> build, so OS X system paths don't interfere with include paths -<br>
>> have you<br>
>> >>> received feedback from other developers on whether the include<br>
>> >>> paths interfere with OS X LLVM system build?<br>
>> >><br>
>> >><br>
>> >> Setup sounds good.<br>
>> >><br>
>> >> I have never used OS X but people do and I would expect it to be<br>
>> >> OK.<br>
>> >><br>
>> >> I don't think you need to worry about this right now.<br>
>> >><br>
>> >><br>
>> >>> 2. The attributor pass refactoring gives some specific direction<br>
>> >>> as a startup project - so that's great. Let me study this pass<br>
>> >>> and I<br>
>> will get<br>
>> >>> back to you with more questions.<br>
>> >><br>
>> >> Sure.<br>
>> >><br>
>> >><br>
>> >>> 3. Yes, I will stick to the style guide (Baaaah...Stanford is<br>
>> >>> strict<br>
>> on<br>
>> >>> code styling and so are you guys :)) for sure.<br>
>> >><br>
>> >> For better or worse.<br>
>> >><br>
>> >><br>
>> >> Cheers,<br>
>> >><br>
>> >> Johannes<br>
>> >><br>
>> >><br>
>> >><br>
>> >>> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <<br>
>> >>> <a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>> wrote:<br>
>> >>><br>
>> >>>> Hi Shiva,<br>
>> >>>><br>
>> >>>> apologies for the delayed response.<br>
>> >>>><br>
>> >>>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:<br>
>> >>>> > I am a grad CS student at Stanford and wanted to engage<br>
>> >>>> > with EJ<br>
>> >> Park,<br>
>> >>>> > Giorgis Georgakoudis, Johannes Doerfert to further develop<br>
>> >>>> > the<br>
>> >> Machine<br>
>> >>>> > Learning and Compiler Optimization concept.<br>
>> >>>><br>
>> >>>> Cool!<br>
>> >>>><br>
>> >>>><br>
>> >>>> > My background is in machine learning, cluster computing,<br>
>> distributed<br>
>> >>>> > systems etc. I am a good C/C++ developer and have a strong<br>
>> >> background in<br>
>> >>>> > algorithms and data structure.<br>
>> >>>><br>
>> >>>> Sounds good.<br>
>> >>>><br>
>> >>>><br>
>> >>>> > I am also taking an advanced compiler course this quarter<br>
>> >>>> > at<br>
>> >>>> Stanford. So I<br>
>> >>>> > would be studying several of these topics anyways - so I<br>
>> >>>> > thought<br>
>> I<br>
>> >>>> might as<br>
>> >>>> > well co-engage on the LLVM compiler infra project.<br>
>> >>>><br>
>> >>>> Agreed ;)<br>
>> >>>><br>
>> >>>><br>
>> >>>> > I am currently studying the background information on SCC<br>
>> >>>> > Call<br>
>> >> Graphs,<br>
>> >>>> > Dominator Trees and other Global and inter-procedural<br>
>> >>>> > analysis to<br>
>> >> lay<br>
>> >>>> some<br>
>> >>>> > ground work on how to tackle this optimization pass using<br>
>> >>>> > ML<br>
>> models.<br>
>> >>>> I have<br>
>> >>>> > run a couple of all program function passes and visualized<br>
>> >>>> > call<br>
>> >> graphs<br>
>> >>>> to<br>
>> >>>> > get familiarized with the LLVM optimization pass setup. I<br>
>> >>>> > have<br>
>> also<br>
>> >>>> setup<br>
>> >>>> > and learnt the use of GDB to debug function pass code.<br>
>> >>>><br>
>> >>>> Very nice.<br>
>> >>>><br>
>> >>>><br>
>> >>>> > I have submitted the ML and Compiler Optimization proposal<br>
>> >>>> > to<br>
>> GSOC<br>
>> >>>> 2020. I<br>
>> >>>> > have added an additional feature to enhance the ML<br>
>> >>>> > optimization<br>
>> to<br>
>> >>>> include<br>
>> >>>> > crossover code to GPU and investigate how the function call<br>
>> graphs<br>
>> >> can<br>
>> >>>> be<br>
>> >>>> > visualized as SCCs across CPU and GPU implementations. If<br>
>> >>>> > the<br>
>> >>>> extension to<br>
>> >>>> > GPU is too much for a summer project, potentially we can<br>
>> >>>> > focus on developing a framework for studying SCCs across a<br>
>> >>>> > unified CPU,<br>
>> GPU<br>
>> >> setup<br>
>> >>>> > and leave the coding, if feasible, to next Summer. All<br>
>> preliminary<br>
>> >>>> ideas.<br>
>> >>>><br>
>> >>>> I haven't looked at the proposals yet (I think we can only<br>
>> >>>> after the deadline). TBH, I'm not sure I fully understand your<br>
>> >>>> extension. Also, full disclosure, the project is pretty<br>
>> >>>> open-ended from my side at<br>
>> least.<br>
>> >>>> I do not necessarily believe we (=llvm) is ready for a ML<br>
>> >>>> driven<br>
>> pass or<br>
>> >>>> even inference in practice. What I want is to explore the use<br>
>> >>>> of ML<br>
>> to<br>
>> >>>> improve the code we have, especially heuristics. We build<br>
>> >>>> analysis<br>
>> and<br>
>> >>>> transformations but it is hard to combine them in a way that<br>
>> >>>> balances compile-time, code-size, and performance.<br>
>> >>>><br>
>> >>>> Some high-level statements that might help to put my view into<br>
>> >>>> perspective:<br>
>> >>>><br>
>> >>>> I want to use ML to identify patterns and code features that we<br>
>> >>>> can check for using common techniques but when we base our<br>
>> >>>> decision<br>
>> making<br>
>> >>>> on these patterns or features we achieve better compile-time,<br>
>> code-size,<br>
>> >>>> and/or performance. I want to use ML to identify shortcomings<br>
>> >>>> in our existing heuristics, e.g. transformation cut-off values<br>
>> >>>> or pass schedules. This could also mean to identify alternative<br>
>> >>>> (combination of) values that perform substantially better (on<br>
>> >>>> some inputs).<br>
>> >>>><br>
>> >>>><br>
>> >>>> > Not sure how to proceed from here. Hence my email to this<br>
>> >>>> > list.<br>
>> >>>> Please let<br>
>> >>>> > me know.<br>
>> >>>><br>
>> >>>> The email to the list was a great first step. The next one<br>
>> >>>> usually<br>
>> is to<br>
>> >>>> setup an LLVM development and testing environment, thus LLVM +<br>
>> >>>> Clang<br>
>> +<br>
>> >>>> LLVM-Test Suite that you can use. It is also advised to work on<br>
>> >>>> a<br>
>> small<br>
>> >>>> task before the GSoC to get used to the LLVM development.<br>
>> >>>><br>
>> >>>> I don't have a really small ML "coding" task handy right now<br>
>> >>>> but the project is more about experiments anyway. To get some<br>
>> >>>> LLVM<br>
>> development<br>
>> >>>> experience we can just take a small task in the IPO Attributor<br>
>> >>>> pass.<br>
>> >>>><br>
>> >>>> One thing we need and we don't have is data. The Attributor is<br>
>> >>>> a fixpoint iteration framework so the number of iterations is<br>
>> >>>> pretty integral part. We have a statistics counter to determine<br>
>> >>>> if the<br>
>> number<br>
>> >>>> required was higher than the given threshold but not one to<br>
>> >>>> determine the maximum iteration count required during<br>
>> >>>> compilation. It would be great if you could add that, thus a<br>
>> >>>> statistics counter that shows how many iterations where<br>
>> >>>> required until a fixpoint was found across all invocations of<br>
>> >>>> the Attributor. Does this make sense? Let me know what you<br>
>> >>>> think and feel free to ask questions via email or on IRC.<br>
>> >>>><br>
>> >>>> Cheers, Johannes<br>
>> >>>><br>
>> >>>> P.S. Check out the coding style guide and the how to contribute<br>
>> guide!<br>
>> >>>><br>
>> >>>><br>
>> >>>> > Thank you Shiva Badruswamy <a href="mailto:shivastanford@gmail.com" target="_blank">shivastanford@gmail.com</a><br>
>> >>>> ><br>
>> >>>> ><br>
>> >>>> > _______________________________________________ LLVM<br>
>> >>>> > Developers mailing list <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
>> >>>> > <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
>> >>>><br>
>> >>>><br>
>> >><br>
>> ><br>
>><br>
>><br>
><br>
<br>
</blockquote></div></div>