[llvm-dev] [GSOC] "Project: Improve inter-procedural analyses and optimisations"

Wed Mar 18 16:22:38 PDT 2020

1) Apologies for being late to the discussion.
2) I will respond to multiple mails in this thread.

On 03/16, Fahad Nayyar wrote:
> Dear Stefanos,
> 
> Thanks for such a detailed explanation!
> I'll have to study your mail and try out some things before I can ask
> specific questions for further discussion.
> 
> But I want to discuss this point right away:
> 
> *> I'd start by doing diagrams about how different parts of the code
> interact with each other (e.g. where does the Attributor start? what does
> it call then? how are attributes created? etc.) When starting out, these
> things might not be important to tackle. But they helped me.*
> 
> I totally agree with your point of drawing diagrams about different parts
> of the code. I tried this but was not able to succeed. It would be very
> helpful if you can tell me what is the entry point of Attributor (which
> method is called the very first time?)?

As Stefanos noted, runAttributorOnFunctions is the "entry point". From
there we go to Attributor::run, which will then call
`AbstractAttribute::update` until a fixpoint was found or a timeout
reached.

> Also, is there any way we can debug opt command in gdb like fashion (ie.
> setting breakpoints, stepping through instructions one by one?)?
> This would help me a lot in the initial code reading period.

I recommend to run opt with -debug-only=attributor and look at the
output for a short code example. Not all abstract attributes print much
about their internal deduction process but you see the state before and
after an update at least.

Cheers,
  Johannes

> Thanks and Regards
> Fahad Nayyar
> 
> 
> 
> 
> On Mon, Mar 16, 2020 at 7:23 AM Stefanos Baziotis <
> stefanos.baziotis at gmail.com> wrote:
> 
> > Hi Farad,
> >
> > > I tried to do this for the NoUnwind attribute Hmm, I don't have
> > experience with this attribute but it seems like a good starting point
> > since it doesn't do much. First of all, be sure that you run with: opt
> > -passes=attributor -attributor-disable=false This uses the new pass
> > manager which is another discussion. Now, to the point: If you open
> > nounwind.ll, it has a bunch of test cases and I don't think it's a good
> > idea to run Attributor in all of them at first. So, break it into
> > individual tests. First of all, note that the Attributor follows an
> > optimistic path to attribute deduction. That is, you always start assuming
> > that an attribute is valid until you have info that it isn't. Seeing TEST 1
> > should be relatively obvious that it doesn't have something that breaks the
> > initial assumption, that's why you see no change in state between calls to
> > updateImpl(). But now we have to dig deeper: What can break our initial
> > assumption (which is that a function does not throw) in the case of
> > NoUnwind? You see in the topmost updateImpl() that they're a bunch Opcodes.
> > Those are instructions (inside our function) that can potentially break our
> > initial assumption (e.g. call to a function that either we know it throws
> > or at least it is not guaranteed that it doesn't). In the updateImpl(),
> > using checkForAllInstructions, we loop through all those instructions and
> > call for each one the predicate CheckForNoUnwind. Note a couple of things:
> > - TEST 1 has no such instructions whatsoever so if you put a print inside
> > the predicate, you'll see nothing. - You can see that the predicate returns
> > a boolean value. If it's true, we continue to the next instruction,
> > otherwise we stop. If we make it through all of them,
> > checkForAllInstructions() returns true. Otherwise false. The predicate
> > checks if any instruction breaks our assumption (we'll see shortly how). If
> > it does, we immediately indicate pessimistic fixpoint. What about TEST 2
> > and 3 ? Those 2 functions have a call inside but if you put a print inside
> > the predicate, you'll see again nothing. The reason for that brings us to
> > an important part of the Attributor and that is that "deadness" (and the
> > relative attribute) is very important. checkForAllInstructions() will only
> > go through instructions that are considered live. The 2 calls are not
> > considered because another part of the Attributor (which is out of topic
> > right now) has (somewhat) deduced that they go in an endless recursion. If
> > you run the Attributor with these 2 functions you'll see another important
> > point. Specifically that the function bodies have only an `unreachable`
> > instruction inside. That is, the attributor not only deduces (and provides)
> > info through attributes but it also transforms code. In this case it
> > changed the function bodies to unreachable. Finally, I think it's
> > interesting to see TEST 4: You see that it calls a function that we don't
> > know it doesn't throw. This should break our assumption. And it does.
> > Inside the topmost updateImpl() and inside the predicate, if you put a
> > print, you'll see the call (e.g. dbgs() << I << "\n"). We ask I.mayThrow().
> > Note that mayThrow() will return false in the case that we're somehow sure
> > that the instruction does not throw. In this case the instruction is a
> > call, so it will return true if we're sure that the called function does
> > not throw. But we're not, so we move forward. What then happens is a little
> > bit weird but it basically AANoUnwind asks AANoUnwindCallSite for info
> > because this instruction is a call site. Attributes ask one another and
> > this is very important in the Attributor because one's information is
> > useful in another. We do it with getAAFor. Without getting into too much
> > info, the other attribute says that it's not assumed unwind so we indicate
> > pessimistic fixpoint (the reality is that the order of calls between the
> > attributes is reverse, but again, out of topic). I hope that gave you a
> > better understanding! > I know how in [1] Johaanes explained the use of
> > MaxObjSize and Dereferenceable in the AliasAnalysis. But I would be happy
> > if I could come up with some even better example. As you said, that'll
> > probably take some time but that's ok There are opportunities everywhere.
> > For example, consider this: https://godbolt.org/z/HFWo_J It's a for loop
> > that does a load inside from %p. The load seems to be invariant, we could
> > move it out of the loop. -licm in the cmd arguments means it invokes the
> > Loop-Invariant Code Motion pass, which does such things. But it doesn't
> > move the load out. The reason for that is that consider the case where %n
> > == 0 and %p == null. In the initial code, we would never get into the loop
> > and we would not have a trap. While, with this transformation, we will have
> > and thus, we just changed the initial behavior. So, the transformation is
> > not done. However, if you put dereferenceable(4) attribute in %p, it will
> > be done. Because now you now you can certainly dereference. So, attribute
> > info is useful in ways we may not consider :) > I can start with TODO at
> > line no. 2605  // TODO: Return the number of reachable queries. I'm not
> > familiar with it. Currently it doesn't seem to do anything but I may miss
> > something. I suppose what it asks is to track how many queries to this
> > attribute have been done by outside users which should be easy. > Since
> > the code is very large right now, I thought to refer to some of the very
> > initial patches of attributor Maybe, I don't know. But I assume things
> > will have changed from then and you may get lost. I'd start by doing
> > diagrams about how different parts of the code interact with each other
> > (e.g. where does the Attributor start? what does it call then? how are
> > attributes created? etc.) When starting out, these things might not be
> > important to tackle. But they helped me. > How should I indicate to the
> > community that I have started working towards this issue (should I comment
> > on the issue page on github?)? I can try to work on AAReachability TODO after
> > solving this issue.
> > You can write it in the Github comments. I don't think you can / need to
> > do something else. Kind regards, Stefanos Baziotis
> >
> > Στις Δευ, 16 Μαρ 2020 στις 12:12 π.μ., ο/η Fahad Nayyar <
> > fahad17049 at iiitd.ac.in> έγραψε:
> >
> >> Dear Stefan and Stefanos,
> >>
> >> Thanks for your suggestions!
> >>
> >> > I'd suggest that you try to run the Attributor and follow a specific
> >> attribute's updates and see what it tries to deduce. That is, see its
> >> updateImpl(). With a couple of prints you can get a good idea of what it
> >> does and what info it gets from other attributes (and when it stops).
> >>
> >> I tried to do this for the NoUnwind attribute. I printed getState(), i
> >> sAssumedNoUnwind(), isKnownNoUnwind() in updateImpl method of classes
> >> AANoUnwind and AANoUnwindCallSite. I run the tests in nounwind.ll (/llvm/test/Transforms/Attributor/nounwind.ll).
> >> I used this command to run the test: “opt -attributor
> >> -attributor-disable=false nounwind.ll -S &> nounwind_out.ll”. But After
> >> seeing the output I was not able to understand how the attribute is
> >> changing for the tests. Its status was almost constant every time updateimp
> >> was called. Please tell me what other things should I try to print to
> >> better observe how the NoUnwind attribute is changing over the
> >> iterations of fix point analysis. Also please verify whether I am using the
> >> correct command to run the tests.
> >>
> >> > Also, probably this will be a very interesting panel discussion for
> >> you: https://www.youtube.com/watch?v=cC2cspQgSxM
> >>
> >> Thanks for suggesting this! I watched the video and now I understand the
> >> pros and cons of inlining. But I still think that It would take me a while
> >> before I can come up with a very good example demonstrating the use of of
> >> of the Attribues in some IPO pass. I know how in [1] Johaanes explained
> >> the use of MaxObjSize and Dereferenceable in the AliasAnalysis. But I
> >> would be happy if I could come up with some even better example.
> >>
> >> > You are somewhat right. However, H2S is not about 'use-after-free' bug
> >> detection, but rather its prevention. We already do this, see example.
> >> <https://godbolt.org/z/HgrC7H>
> >>
> >> Thanks for sharing the example. Just for clarification, was this example
> >> demonstrating the point that we can automatically correct use-after-free
> >> bugs using attributes? If yes, then I didn’t understand how and which
> >> attribute helped in this correction? Also is it not wrong to change the IR
> >> as in this example? Replacing %1 = tail call noalias i8* @malloc(i64 4)
> >> ;  tail call void @no_sync_func(i8* %1) with  %1 = alloca i8, i64 4 solved
> >> the use-after-free bug, but doesn’t it also change the semantic of the
> >> program?
> >>
> >> > In the meantime you could look at some TODOs in the Attributor itself
> >> and try those you see fit.
> >>
> >> I looked up some of the TODOs. I found AAReachability a very interesting
> >> attribute. I can start with TODO at line no. 2605  // TODO: Return the
> >> number of reachable queries. I can work towards this TODO. But I first
> >> want your advice on whether it looks doable for me. I can see that
> >> implementation of AAReachability attribute is not complete yet. I can
> >> try to learn more about it from D70233 <https://reviews.llvm.org/D70233>
> >> and D71617 <https://reviews.llvm.org/D71617>.
> >>
> >> I am trying to get more familiar with Attributor’s code. Since the code
> >> is very large right now, I thought to refer to some of the very initial
> >> patches of attributor (D59918, <https://reviews.llvm.org/D59918> D60012
> >> <https://reviews.llvm.org/D60012>, D63379
> >> <https://reviews.llvm.org/D63379>). I believe that by looking at these
> >> three I can get a better idea of the framework as a whole. Please suggest
> >> if this is a good idea or not. Also please suggest any other way by which I
> >> can improve my understanding of the code.
> >>
> >> I can see that Johanned have put up some issues for GSOC aspirants. I
> >> think that [2] <https://github.com/llvm/llvm-project/issues/179> ([Attributor]
> >> Cleanup and upstream `Attribute::MaxObjectSize`) will be a very good
> >> issue for me, It seems doable and I can get familiar with the whole process
> >> of writing a patch for an issue. How should I indicate to the community
> >> that I have started working towards this issue (should I comment on the
> >> issue page on github?)? I can try to work on AAReachability TODO after
> >> solving this issue.
> >>
> >> Thanks and Regards
> >>
> >>
> >> References
> >>
> >> [1] https://youtu.be/HVvvCSSLiTw
> >> [2] https://github.com/llvm/llvm-project/issues/179
> >>
> >>
> >>
> >>
> >> On Sat, Mar 14, 2020 at 4:12 PM Stefan Stipanovic <stefomeister at gmail.com>
> >> wrote:
> >>
> >>> Hi Fahad,
> >>>
> >>>
> >>>> > Improve dynamic memory related capabilities of Attributor. For
> >>>> example Improve HeapToStackConversions. Maybe such deductions can help
> >>>> safety (dis)provers. For example, can we improve the use-after-free
> >>>> bug detection using some attributes?
> >>>> Stefan should know more about H2S. Regarding the use-after-free, I
> >>>> don't think there's currently any plans for it directly, but they can be I
> >>>> assume.
> >>>
> >>>
> >>> You are somewhat right. However, H2S is not about 'use-after-free' bug
> >>> detection, but rather its prevention. We already do this, see example
> >>> <https://godbolt.org/z/HgrC7H>.
> >>>
> >>> In the rest of this post I'll try to help you familiarize yourself with
> >>>> the Attributor and maybe answer your questions.
> >>>> Johannes can then give you specific things to do to get started.
> >>>
> >>>
> >>> In the meantime you could look at some TODOs in the Attributor itself
> >>> and try those you see fit.
> >>>
> >>> If you have any questions, don't hesitate to ask.
> >>>
> >>> -stefan
> >>>
> >>> On Fri, Mar 13, 2020 at 10:14 PM Stefanos Baziotis <
> >>> stefanos.baziotis at gmail.com> wrote:
> >>>
> >>>> Hi Fahad,
> >>>>
> >>>> We're all happy to see you being interested in LLVM! More so in the
> >>>> Attributor! I'm a relatively new contributor so I
> >>>> think I can help. Please note that the Attributor, apart from Johannes
> >>>> (who CC'd), has at least another 2 great
> >>>> contributors, Hideto and Stefan (who I also CC'd). They were among the
> >>>> initial creators.
> >>>>
> >>>> In the rest of this post I'll try to help you familiarize yourself with
> >>>> the Attributor and maybe answer your questions.
> >>>> Johannes can then give you specific things to do to get started.
> >>>>
> >>>> Starting off, understanding the theory of data-flow analysis can help.
> >>>> I'd say don't get too hang up on it, you just
> >>>> have to understand the idea of fix-point analyses.
> >>>>
> >>>> I don't how much you know about the Attributor, so I'll defer a too
> >>>> long (or too beginner) description because you might already know
> >>>> a lot of things. You can of course any specific questions you want:
> >>>> A summary is:
> >>>> The Attributor tries to deduce attributes in different points of an
> >>>> LLVM IR program (you can see that in the video).
> >>>> The deduction of these attributes is inter-connected, which is the
> >>>> whole point of the Attributor. The attributes
> >>>> "ask" one another for information. For example, one attribute tries to
> >>>> see if a load loads from null pointer.
> >>>> But the pointer operand might be non-constant (like %v in LLVM IR).
> >>>> Well, another attribute, whose job is to do value simplification
> >>>> (i.e. constant folding / propagation etc.) might have folded that (%v)
> >>>> into the constant null. So, the former can ask him.
> >>>> These connections give the power and the complexity.
> >>>>
> >>>> The attributes have a state, that changes. When the state stops
> >>>> changing, it has reached a fixpoint, at which point
> >>>> the deduction of it stops. From the initialization of the attribute
> >>>> until a fixpoint is reached, the state changes
> >>>> in updates (called updateImpl() in the source code). This is where
> >>>> attributes try to deduce new things, ask one another
> >>>> and eventually try to reach a fixpoint.
> >>>>
> >>>> Finally, a fixpoint can be enforced. Because if we for some reason
> >>>> never stop changing, it would run forever.
> >>>> Note however that attributes should be programmed in a way that
> >>>> fixpoint should be able to be reached
> >>>> (This is where theory might help a little).
> >>>>
> >>>> I'd suggest that you try to run the Attributor and follow a specific
> >>>> attribute's updates and see what it tries to deduce.
> >>>> That is, see its updateImpl(). With a couple of prints you can get a
> >>>> good idea of what it does and what info it
> >>>> gets from other attributes (and when it stops). You can of course ask
> >>>> us if you're interested in a specific one, if
> >>>> there's something you don't understand etc.
> >>>>
> >>>> Now, to (try to) answer your questions and hopefully other people can
> >>>> help.
> >>>> > How Attributor can help for standard inter-procedural and
> >>>> intra-procedural analysis passes of LLVm. I’ve seen the tutorial [4].
> >>>> I would like to discuss ways of improving other optimization passes
> >>>> similarly (or some examples which have already been implemented).
> >>>>
> >>>> The Attributor AFAIK is self-contained. It's not in "production" yet
> >>>> and so it's not connected with other passes. At this point, LLVM is focused
> >>>> on heavy inlining, which while very useful, you'll lose a lot of the
> >>>> interprocedural information.
> >>>> Note that there are other transforms that do Inter-Procedural
> >>>> Optimization (
> >>>> https://github.com/llvm/llvm-project/tree/master/llvm/lib/Transforms/IPO)
> >>>> but they don't follow the idea of the Attributor.
> >>>> But they might follow a fix-point analysis.
> >>>>
> >>>> > Improve dynamic memory related capabilities of Attributor. For
> >>>> example Improve HeapToStackConversions. Maybe such deductions can help
> >>>> safety (dis)provers. For example, can we improve the use-after-free
> >>>> bug detection using some attributes?
> >>>> Stefan should know more about H2S. Regarding the use-after-free, I
> >>>> don't think there's currently any plans for it directly, but they can be I
> >>>> assume.
> >>>>
> >>>> > Improve Liveness related capabilities of Attributor. Again I want to
> >>>> consider whether some attribute deduction can help liveness (dis)provers.
> >>>> For example NoReturn, WillReturn can be improved. I am sure these 2
> >>>> attributes do not cover all the cases as it is an undecidable problem. But
> >>>> I was wondering whether there is room for improvement in their deduction
> >>>> mechanism. Liveness is certainly something that we're currently trying to
> >>>> improve and I don't think we'll ever stop. Most of the attributes interact
> >>>> with the deadness attribute (AAIsDead) both for asking it info and
> >>>> providing it info (i.e. the undefined-behavior attribute hopefully will at
> >>>> some point be able to tell AAIsDead that a block is dead because it
> >>>> contains UB). > Is there any attribute that tells whether a function
> >>>> has side-effects (does it always gives the same output for the same
> >>>> input? Or does it affect some global variable directly or indirectly?)? No
> >>>> AFAIK, although you might be interested in this:
> >>>> https://reviews.llvm.org/D74691#1887983
> >>>>
> >>>> I hope this was helpful! Don't hesitate to ask any questions.
> >>>>
> >>>> Kind regards,
> >>>> Stefanos Baziotis
> >>>>
> >>>> Στις Παρ, 13 Μαρ 2020 στις 10:25 μ.μ., ο/η Fahad Nayyar via llvm-dev <
> >>>> llvm-dev at lists.llvm.org> έγραψε:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> My name is Fahad Nayyar. I am an undergraduate student from India.
> >>>>>
> >>>>> I am interested to participate in GSOC under the project “Improve
> >>>>> inter-procedural analyses and optimizations”.
> >>>>>
> >>>>> I have been using LLVM for the past 8 months. I have written various
> >>>>> intra-procedural analysis in LLVM as FunctionPass for my course projects
> >>>>> and research projects. But I’ve not contributed to the LLVM community yet.
> >>>>> I am very excited to contribute to LLVM!
> >>>>>
> >>>>> I am not too familiar with the inter-procedural analysis
> >>>>> infrastructure of LLVM. I have written small toy inter-procedural dataflow
> >>>>> analysis (like taint analysis, reaching definitions, etc) for JAVA programs
> >>>>> using SOOT tool *[5].* I am familiar with the theory of
> >>>>> inter-procedural analysis (I’ve read some chapters of  [1],  [2] and
> >>>>> [3] for this).
> >>>>>
> >>>>> I am trying to understand the LLVM’s Attributor framework. I am
> >>>>> interested in these 3 aspects:
> >>>>>
> >>>>>    1.
> >>>>>
> >>>>>    How Attributor can help for standard inter-procedural and
> >>>>>    intra-procedural analysis passes of LLVm. I’ve seen the tutorial
> >>>>>    [4]. I would like to discuss ways of improving other optimization
> >>>>>    passes similarly (or some examples which have already been implemented).
> >>>>>    2.
> >>>>>
> >>>>>    Improve dynamic memory related capabilities of Attributor. For
> >>>>>    example Improve HeapToStackConversions. Maybe such deductions can
> >>>>>    help safety (dis)provers. For example, can we improve the use-after-free
> >>>>>    bug detection using some attributes?
> >>>>>    3.
> >>>>>
> >>>>>    Improve Liveness related capabilities of Attributor. Again I want
> >>>>>    to consider whether some attribute deduction can help liveness
> >>>>>    (dis)provers. For example NoReturn, WillReturn can be improved. I
> >>>>>    am sure these 2 attributes do not cover all the cases as it is an
> >>>>>    undecidable problem. But I was wondering whether there is room for
> >>>>>    improvement in their deduction mechanism.
> >>>>>    4.
> >>>>>
> >>>>>    Can we optimize the attribute deduction algorithm to reduce
> >>>>>    compile time?
> >>>>>    5.
> >>>>>
> >>>>>    Is there any attribute that tells whether a function has
> >>>>>    side-effects (does it always gives the same output for the same
> >>>>>    input? Or does it affect some global variable directly or indirectly?)?
> >>>>>
> >>>>>
> >>>>> It would be great if Johannes can provide me some TODOs before
> >>>>> submitting my proposal. Also please tell some specific IPO improvement
> >>>>> goals which you have in mind for this project. I would be most interested
> >>>>> in memory-related attributes, liveness deductions from attributes and
> >>>>> measurable better IPO using attribute deduction.
> >>>>>
> >>>>> Thanks and Regards.
> >>>>>
> >>>>> References:
> >>>>>
> >>>>> [1] Principles of Program Analysis.
> >>>>> <https://www.springer.com/gp/book/9783540654100>
> >>>>>
> >>>>> [2] Data Flow Analysis: Theory and Practice.
> >>>>> <https://dl.acm.org/doi/book/10.5555/1592955>
> >>>>>
> >>>>> [3] Static Program Analysis. <https://cs.au.dk/~amoeller/spa/spa.pdf>
> >>>>>
> >>>>> [4] 2019 LLVM Developers’ Meeting: J. Doerfert “The Attributor: A
> >>>>> Versatile Inter-procedural Fixpoint.."
> >>>>> <https://www.youtube.com/watch?v=HVvvCSSLiTw>
> >>>>> [5] Soot - A Java optimization framework
> >>>>> <https://github.com/Sable/soot>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> LLVM Developers mailing list
> >>>>> llvm-dev at lists.llvm.org
> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>>>
> >>>>

-- 

Johannes Doerfert
Researcher

Argonne National Laboratory
Lemont, IL 60439, USA

jdoerfert at anl.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200318/44dff725/attachment-0001.sig>