<div dir="ltr"><div>1. Thanks for the clarifications. I will stick to non-containerized OS X for now. </div><div><br></div><div>2. As an aside, I did try to build a Debian docker container by git cloning into it and using the Dockerfile in LLVM/utils/docker as a starting point:</div><div> - some changes needed to updated packages (GCC in particular needs to be latest) and the Debian image (Debian 9 instead of Debian 8) pretty much sets up the docker container well. But for some reason, the Ninja build tool within the CMake Generator fails. I am looking into it. Maybe I can produce a working docker workflow for others who want to build and work with LLVM in a container environment. </div><div><br></div><div>3. I have submitted the final proposal today to GSoC 2020 today after incorporating some comments and thoughts. When you all get a chance to review, let me know your thoughts. </div><div><br></div><div>4. On GPU extension, my thoughts were around what an integrated compiler like Nvidia's nvcc (GCC for CPU + PTX for GPU) does when GCC is substituted with LLVM and if that arrangement can be optimized for ML passes. But I am beginning to think that structuring this problem well and doing meaningful work over the summer might be a bit difficult. As mentors, do you have any thoughts on how LLVM might be integrated into a joint CPU-GPU compiler by the likes of Nvidia, Apple etc.?</div><div><br></div><div>Best</div><div>Shiva</div><div><br></div><div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 30, 2020 at 5:30 PM Johannes Doerfert <<a href="mailto:johannesdoerfert@gmail.com">johannesdoerfert@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br>

On 3/27/20 3:46 PM, Shiva Stanford wrote:<br>

> Hi Johannes - great we are engaging on this.<br>

><br>

> Some responses now and some later.<br>

><br>

> 1. When you say setup LLVM dev environment +. clang + tools etc, do you<br>

> mean setup LLVM compiler code from the repo and build it locally? If so,<br>

> yes, this is all done from my end - that is, I have built all this on my<br>

> machine and compiled and run a couple of function passes. I have look at<br>

> some LLVM emits from clang tools but I will familiarize more. I have added<br>

> some small code segments, modified CMAKE Lists and re-built code to get a<br>

> feel for the packaging  structure. Btw, is there a version of Basel build<br>

> for this? Right now, I am using OS X as the SDK as Apple is the one that<br>

> has adopted LLVM the most. But I can switch to Linux containers to<br>

> completely wall off the LLVM build against any OS X system builds to<br>

> prevent path obfuscation and truly have a separate address space. Is there<br>

> a preferable environment? In any case, I am thinking of containerizing the<br>

> build, so OS X system paths don't interfere with include paths - have you<br>

> received feedback from other developers on whether the include paths<br>

> interfere with OS X LLVM system build?<br>

<br>

<br>

Setup sounds good.<br>

<br>

I have never used OS X but people do and I would expect it to be OK.<br>

<br>

I don't think you need to worry about this right now.<br>

<br>

<br>

> 2. The attributor pass refactoring gives some specific direction as a<br>

> startup project - so that's great. Let me study this pass and I will get<br>

> back to you with more questions.<br>

<br>

Sure.<br>

<br>

<br>

> 3. Yes, I will stick to the style guide (Baaaah...Stanford is strict on<br>

> code styling and so are you guys :)) for sure.<br>

<br>

For better or worse.<br>

<br>

<br>

Cheers,<br>

<br>

   Johannes<br>

<br>

<br>

<br>

> On Thu, Mar 26, 2020 at 9:42 AM Johannes Doerfert <<br>

> <a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>> wrote:<br>

><br>

>> Hi Shiva,<br>

>><br>

>> apologies for the delayed response.<br>

>><br>

>> On 3/24/20 4:13 AM, Shiva Stanford via llvm-dev wrote:<br>

>>   > I am a grad CS student at Stanford and wanted to engage with EJ Park,<br>

>>   > Giorgis Georgakoudis, Johannes Doerfert to further develop the Machine<br>

>>   > Learning and Compiler Optimization concept.<br>

>><br>

>> Cool!<br>

>><br>

>><br>

>>   > My background is in machine learning, cluster computing, distributed<br>

>>   > systems etc. I am a good C/C++ developer and have a strong background in<br>

>>   > algorithms and data structure.<br>

>><br>

>> Sounds good.<br>

>><br>

>><br>

>>   > I am also taking an advanced compiler course this quarter at<br>

>> Stanford. So I<br>

>>   > would be studying several of these topics anyways - so I thought I<br>

>> might as<br>

>>   > well co-engage on the LLVM compiler infra project.<br>

>><br>

>> Agreed ;)<br>

>><br>

>><br>

>>   > I am currently studying the background information on SCC Call Graphs,<br>

>>   > Dominator Trees and other Global and inter-procedural analysis to lay<br>

>> some<br>

>>   > ground work on how to tackle this optimization pass using ML models.<br>

>> I have<br>

>>   > run a couple of all program function passes and visualized call graphs<br>

>> to<br>

>>   > get familiarized with the LLVM optimization pass setup. I have also<br>

>> setup<br>

>>   > and learnt the use of GDB to debug function pass code.<br>

>><br>

>> Very nice.<br>

>><br>

>><br>

>>   > I have submitted the ML and Compiler Optimization proposal to GSOC<br>

>> 2020. I<br>

>>   > have added an additional feature to enhance the ML optimization to<br>

>> include<br>

>>   > crossover code to GPU and investigate how the function call graphs can<br>

>> be<br>

>>   > visualized as SCCs across CPU and GPU implementations. If the<br>

>> extension to<br>

>>   > GPU is too much for a summer project, potentially we can focus on<br>

>>   > developing a framework for studying SCCs across a unified CPU, GPU setup<br>

>>   > and leave the coding, if feasible, to next Summer. All preliminary<br>

>> ideas.<br>

>><br>

>> I haven't looked at the proposals yet (I think we can only after the<br>

>> deadline). TBH, I'm not sure I fully understand your extension. Also,<br>

>> full disclosure, the project is pretty open-ended from my side at least.<br>

>> I do not necessarily believe we (=llvm) is ready for a ML driven pass or<br>

>> even inference in practice. What I want is to explore the use of ML to<br>

>> improve the code we have, especially heuristics. We build analysis and<br>

>> transformations but it is hard to combine them in a way that balances<br>

>> compile-time, code-size, and performance.<br>

>><br>

>> Some high-level statements that might help to put my view into<br>

>> perspective:<br>

>><br>

>> I want to use ML to identify patterns and code features that we can<br>

>> check for using common techniques but when we base our decision making<br>

>> on these patterns or features we achieve better compile-time, code-size,<br>

>> and/or performance.<br>

>> I want to use ML to identify shortcomings in our existing heuristics,<br>

>> e.g. transformation cut-off values or pass schedules. This could also<br>

>> mean to identify alternative (combination of) values that perform<br>

>> substantially better (on some inputs).<br>

>><br>

>><br>

>>   > Not sure how to proceed from here. Hence my email to this list.<br>

>> Please let<br>

>>   > me know.<br>

>><br>

>> The email to the list was a great first step. The next one usually is to<br>

>> setup an LLVM development and testing environment, thus LLVM + Clang +<br>

>> LLVM-Test Suite that you can use. It is also advised to work on a small<br>

>> task before the GSoC to get used to the LLVM development.<br>

>><br>

>> I don't have a really small ML "coding" task handy right now but the<br>

>> project is more about experiments anyway. To get some LLVM development<br>

>> experience we can just take a small task in the IPO Attributor pass.<br>

>><br>

>> One thing we need and we don't have is data. The Attributor is a<br>

>> fixpoint iteration framework so the number of iterations is pretty<br>

>> integral part. We have a statistics counter to determine if the number<br>

>> required was higher than the given threshold but not one to determine<br>

>> the maximum iteration count required during compilation. It would be<br>

>> great if you could add that, thus a statistics counter that shows how<br>

>> many iterations where required until a fixpoint was found across all<br>

>> invocations of the Attributor. Does this make sense? Let me know what<br>

>> you think and feel free to ask questions via email or on IRC.<br>

>><br>

>> Cheers,<br>

>>     Johannes<br>

>><br>

>> P.S. Check out the coding style guide and the how to contribute guide!<br>

>><br>

>><br>

>>   > Thank you<br>

>>   > Shiva Badruswamy<br>

>>   > <a href="mailto:shivastanford@gmail.com" target="_blank">shivastanford@gmail.com</a><br>

>>   ><br>

>>   ><br>

>>   > _______________________________________________<br>

>>   > LLVM Developers mailing list<br>

>>   > <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

>>   > <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

>><br>

>><br>

</blockquote></div></div>