[llvm-dev] RFC: Pass to prune redundant profiling instrumentation

Sean Silva via llvm-dev llvm-dev at lists.llvm.org
Fri Mar 11 20:48:46 PST 2016


On Fri, Mar 11, 2016 at 8:09 PM, Xinliang David Li <xinliangli at gmail.com>
wrote:

>
>
> On Fri, Mar 11, 2016 at 7:43 PM, Sean Silva via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Fri, Mar 11, 2016 at 7:26 PM, Xinliang David Li <davidxl at google.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Mar 11, 2016 at 7:04 PM, Sean Silva <chisophugis at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Mar 11, 2016 at 6:00 PM, Vedant Kumar <vsk at apple.com> wrote:
>>>>
>>>>>
>>>>> > On Mar 11, 2016, at 5:28 PM, Sean Silva <chisophugis at gmail.com>
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Fri, Mar 11, 2016 at 12:47 PM, Vedant Kumar <vsk at apple.com>
>>>>> wrote:
>>>>> > There have been a lot of responses. I'll try to summarize the thread
>>>>> and respond
>>>>> > to some of the questions/feedback.
>>>>> >
>>>>> >
>>>>> > Summary
>>>>> > =======
>>>>> >
>>>>> > 1. We should teach GlobalDCE to strip out instrumented functions
>>>>> which the
>>>>> >    inliner cannot delete.
>>>>> >
>>>>> > 2. Sean suggests adding metadata to loads/stores of counters to
>>>>> convey that
>>>>> >    they do not alias normal program data.
>>>>> >
>>>>> >    I'm not familiar with AA, but off-hand this seems like a good
>>>>> idea.
>>>>> >
>>>>> > 3. Sean also suggests improving the optimizer to registerize
>>>>> counters when
>>>>> >    possible.
>>>>> >
>>>>> >    Ditto, seems like a good idea.
>>>>> >
>>>>> > 4. Sean has an alternate proposal for counter pruning which works by
>>>>> remapping
>>>>> >    FE-generated counters to counters generated by the IR
>>>>> instrumentation pass.
>>>>> >    David likes this proposal.
>>>>> >
>>>>> >    I have some questions about this but am willing to try it. IMO,
>>>>> the pruning
>>>>> >    pass I've suggested could complement this nicely -- I don't view
>>>>> them as
>>>>> >    mutually exclusive (though I could be wrong).
>>>>> >
>>>>> > 5. There seems to be consensus over the need to improve performance
>>>>> of
>>>>> >    FE-instrumented programs.
>>>>> >
>>>>> > 6. David is concerned that the proposed pruning pass is too narrow
>>>>> in scope,
>>>>> >    and would like to see results on a larger benchmark. I'll try to
>>>>> get some
>>>>> >    numbers.
>>>>> >
>>>>> > 7. David mentioned some optimizations that are possible if we make
>>>>> coverage
>>>>> >    information "yes or no" (per line). Justin pointed out that this
>>>>> doesn't
>>>>> >    work for a few of Apple's use cases.
>>>>> >
>>>>> >
>>>>> > Clarifications
>>>>> > ==============
>>>>> >
>>>>> > 1. In my original email, two of my numbers were incorrect. I listed
>>>>> the
>>>>> >    sizes of the __llvm_prf_alias sections, not the number of actual
>>>>> aliases.
>>>>> >    The corrected lines are:
>>>>> >
>>>>> > O3 + PGOInstr + Pruning:      0.606s (8.6% performance win, _42_
>>>>> aliases)
>>>>> > O3 + CoverageInstr + Pruning: 0.610s (11.6% performance win, _44_
>>>>> aliases)
>>>>> >
>>>>> > 2.
>>>>> >
>>>>> > >>  Determine which profile counters are essential.
>>>>> > >
>>>>> > > What is an "essential counter"?
>>>>> >
>>>>> > The pass I proposed works like this:
>>>>> >
>>>>> >   for (PC : ProfileCounters) {
>>>>> >     if (Update-Sites-Of(PC) == Update-Sites-Of(Other-PC)) {
>>>>> >       mark PC as "essential"
>>>>> >       mark Other-PC as "redundant" (i.e, alias of PC)
>>>>> >     }
>>>>> >   }
>>>>> >
>>>>> > Where Update-Sites-Of(PC) is the set of basic blocks in which PC is
>>>>> updated.
>>>>> > If there are multiple counters updated in each update site for PC,
>>>>> they are
>>>>> > all marked as aliases in one pass.
>>>>> >
>>>>> >
>>>>> > 3.
>>>>> >
>>>>> > >> Erase all updates to redundant profile counters.
>>>>> > >
>>>>> > > When you are saying "erase", you need to actually replace the
>>>>> multiple counter increment with *a new* counter, right? I.e. when replacing:
>>>>> > >
>>>>> > >     instrprof_increment(profc_f2);
>>>>> > >     instrprof_increment(profc_f1);
>>>>> > >
>>>>> > > you need to emit:
>>>>> > >
>>>>> > >     instrprof_increment(profc_f2f1_fused);
>>>>> >
>>>>> >
>>>>> > No. The pass only erases updates to counters which it can prove are
>>>>> > truly redundant. There is no need to create a new (or fused) counter
>>>>> > because profc_f1 == profc_f2 in all possible profiles.
>>>>> >
>>>>> > Oh, this makes it substantially less useful I would think (and I
>>>>> misunderstood your original proposal the same way that Mehdi did here).
>>>>> > I guess I'm not sure what situations your proposed transformation
>>>>> would catch.
>>>>> > The only one I can think of is something like:
>>>>> > - function foo() has an entry counter
>>>>> > - function bar() calls foo() from a BB that has a counter
>>>>> > - function bar() is the only caller of foo() and calls it in only
>>>>> one BB.
>>>>> > Can you add some debug dumping to your pass and give some real-world
>>>>> examples of counters that it merges?
>>>>>
>>>>> I still have the pruned IR from the example I brought up at the start
>>>>> of the thread:
>>>>>
>>>>>   https://ghostbin.com/paste/qus2s
>>>>>
>>>>> Look for the __llvm_prf_alias section.
>>>>>
>>>>
>>>> Could you maybe digest that a bit for us to give source-level examples
>>>> of what those counters are that end up aliased?
>>>>
>>>> Also, in the linked IR there is still a tremendous amount of counter
>>>> redundancy being left on the table that a "counter fusion" approach would
>>>> avoid.
>>>> E.g.
>>>> https://ghostbin.com/paste/qus2s#L306 - 6 counter updates in the same
>>>> BB
>>>> https://ghostbin.com/paste/qus2s#L286 - 3 counter updates in the same
>>>> BB
>>>> https://ghostbin.com/paste/qus2s#L261 - 2 counter updates in the same
>>>> BB
>>>> https://ghostbin.com/paste/qus2s#L191 - 5 counter updates in the same
>>>> BB
>>>> https://ghostbin.com/paste/qus2s#L435 - 9 counter updates in the same
>>>> BB
>>>> https://ghostbin.com/paste/qus2s#L373 - 2 counter updates in the same
>>>> BB
>>>>
>>>> It seems like the "identical counter merge" leaves a lot on the table
>>>> that even a simple BB-at-a-time counter fusion approach would get right.
>>>>
>>>> Also, notice that in that IR there are 21 BB's but 65 counters (even
>>>> after identical counter merge). This indicates that actually eliminating
>>>> the original counters entirely (besides recording a way to reconstruct
>>>> them, but that can be compressed) might actually reduce the total amount of
>>>> data in the final binary.
>>>>
>>>>
>>>>>
>>>>>
>>>>> > The fused counter approach (or the one I suggested which is just a
>>>>> generalization of that (see below)) seems like it can reduce the number of
>>>>> counter updates much more.
>>>>> >
>>>>> >
>>>>> > 4.
>>>>> >
>>>>> > >> The complexity of this pass is O(N*M), where N is the number of
>>>>> profile
>>>>> > >> counters, and M is the average number of updates per counter.
>>>>> > >
>>>>> > > What you're describing here is not clear to me? But I may have
>>>>> been totally misunderstanding the whole concept :)
>>>>> >
>>>>> > Hopefully the pseudo-code helps? Alternatively I can put a WIP patch
>>>>> up
>>>>> > on Phab.
>>>>> >
>>>>> > > It should be just O(N) where N is the number of instructions.
>>>>> >
>>>>> > The complexity of the pass is not upper-bounded by the number of
>>>>> > instructions in the program because certain loads and stores can be
>>>>> visited
>>>>> > more than once.
>>>>> >
>>>>> >
>>>>> > FE to IR Counter Remapping
>>>>> > ==========================
>>>>> >
>>>>> > I have a question about this plan:
>>>>> >
>>>>> > > for each CFG edge:
>>>>> > >     record which FE counters have ended up associated with it
>>>>> > > remove FE counters
>>>>> > > run IR instrumentation pass
>>>>> > > emit a side table mapping IR instr counters to FE counters
>>>>> >
>>>>> > Currently, -instrprof happens early in the pipeline. IIUC this is
>>>>> done to
>>>>> > allow the optimizer to work with load+add+stores, instead of profile
>>>>> update
>>>>> > intrinsics.
>>>>> >
>>>>> > Say we introduce a counter remapping pass like the one Sean
>>>>> suggested. It
>>>>> > should be run before -instrprof so that we don't waste time lowering
>>>>> a bunch
>>>>> > of instrprof_increment intrinsics which we'll have to throw away
>>>>> later.
>>>>> >
>>>>> > But that means that the CFGs that the counter remapping pass
>>>>> operates on won't
>>>>> > reflect changes made by the inliner (or any other optimizations
>>>>> which alter the
>>>>> > CFG), right?
>>>>> >
>>>>> > ISTM the pruning pass I've proposed is useful whether we're doing
>>>>> FE-based
>>>>> > instrumentation _or_ late instrumentation. Since it operates on
>>>>> loads+stores
>>>>> > directly, it can clean up redundant counter increments at any point
>>>>> in the
>>>>> > pipeline (after -instrprof).
>>>>> >
>>>>> >
>>>>> > What I suggested would operate on loads/stores and could be used for
>>>>> cleanup at any point in the pipeline too.
>>>>> >
>>>>> > It would just be a generic transformation "okay, we have some set of
>>>>> counter updates chosen earlier in the pipeline that are now mangled / made
>>>>> redundant by further optimization; let's re-run counter placement and emit
>>>>> some side tables mapping the newly chosen counters to the original
>>>>> counters". I'm not sure how complicated it is to reconstruct one set of
>>>>> counters from the other (just linear combinations, I think? So hopefully
>>>>> not too complicated).
>>>>>
>>>>> I think I understand what you're proposing.
>>>>>
>>>>> To not break code coverage, we would need to emit the 'side table' you
>>>>> described
>>>>> into the binary (so that we can correctly evaluate counter
>>>>> expressions).
>>>>>
>>>>
>>>> Yes. Otherwise the original counters cannot be reconstructed. This is
>>>> no different from emitting the aliases needed by the "identical counter
>>>> merge" approach.
>>>>
>>>>
>>>>>
>>>>> While I think this is feasible and can get better results, it might
>>>>> not *also* be
>>>>> simpler than running pass I described. IMO what you've described would
>>>>> take more
>>>>> work to implement and would put a higher burden on the compiler (the
>>>>> pipeline looks
>>>>> something like -instrprof, -remove-counters, -pgo-instr-gen,
>>>>> -remap-counters).
>>>>> That's not a bad thing -- just something to be aware of.
>>>>>
>>>>
>>>> FWIW, I was imagining that this would be a single pass that would call
>>>> into the IR instr gen as a utility (some refactoring would probably be
>>>> needed, but that's doable).
>>>>
>>>> I expect it to be more work to implement the fully general thing I
>>>> described. But implementing a BB-at-a-time counter fusion I think would
>>>> likely be not much more complicated than "identical counter merge". It
>>>> would just be:
>>>>
>>>> for each BB:
>>>>     collect all counter updates in this BB, together with their
>>>> multiplicities
>>>>     create new counter
>>>>     emit side-table data that relates the new counter to an array of
>>>> (other counter, multiplicity of update)
>>>>
>>>> The runtime just emits the side-table and then llvm-profdata does:
>>>>
>>>> for each counter C:
>>>>     for (otherCounter, multiplicity) in side-table[C]:
>>>>         counters[otherCounter] += multiplicity * counters[C]
>>>>
>>>>
>>> There are other issues that can complicate the matter.
>>>
>>> 1) The assumption in the algorithm is that the source counter has only
>>> one update site -- but instead it may have more than one sites update due
>>> to inlining.  Only the counts collected in the out-of-line function
>>> instance's update site can be safely propagated to the 'fused' counters.
>>>
>>
>> We may be miscommunicating. `otherCounter` can take on the same value in
>> different iterations of `for each counter C:`. So the counter that is
>> present in the out-of-line version would be updated by the fused counters
>> from other BB's where it has been inlined, just as if there had in fact
>> been an increment at runtime in those other BB's.
>>
>
> There is no misunderstanding on this part :)  What I meant is that  the
> right hand side of the propagation:
>
>  = multiplicity * counters[C]
>
> must be only from the out of line instance of C, not the aggregated
> counter value for C. Assuming C's own inline instance update have all be
> 'fused' (which may not be true), this can be handled by forming a counter
> fusion graph, and walk the graph in post-order for the propagation.
>

Ah, I see what you mean. Instead of counters[C] we would have to save a
copy of the original counters and reference that, since you may have
mutated counters[C].


>
>
>
>>
>>> 2) the 'source' BB may not even have a 'source' counter owned by the
>>> caller function created -- so 'implicit' counters need to be used
>>>
>>
>> Yes, but I don't think the handling of this will be substantially more
>> complicated than aliases in the "merge identical counters" approach. More
>> likely, the correct factoring would be to have a clear distinction between
>> the counters that are incremented at runtime and the counters that the
>> original instrumentor (esp. frontends) wants to read in the final profdata
>> file. This seems like infrastructure needed for various kinds of
>> post-cleanup approaches.
>>
>>
>
> What is needed is another side table -- virtual counter table which maps
> from the virtual counter to real counter expressions -- so that the fusion
> side table can have virtual counter as source index too.
>
>
>> 3) The side table can be huge for large applications.
>>>
>>
>> Yes, but the counter array is also huge, which could be substantially
>> reduced. Also the text size would be reduced. It is unclear to me that it
>> will actually cause significant growth.
>>
>
> The counter array may not get reduced unless the owning function is
> completely eliminated though.
>

In a large C++ program I remember looking at, inlining removed 40,000 of
60,000 functions. At least for C++ FE instrumentation (which is Vedant's
original example), I think the reduction in the counter array would be
significant. Perhaps for Swift or other languages it would not be though.
Another potential source of counter reduction is CFG simplifications.

-- Sean Silva


>
>
>>
>>
>>>
>>> IMO, doing post-cleanup (other than enhancing generic optimizations that
>>> can be applied to other globals)  is not the right approach.
>>>
>>
>> I don't disagree. I am just providing an example that *if* we want
>> post-cleanup, there are more effective ways than "merge identical counters".
>>
>> I am actually not hugely interested in FE instrumentation now that we
>> have IR instrumentation. IR instrumentation seems like clearly the right
>> approach for the use cases that I am concerned about.
>> However, if I were strongly invested in FE instrumentation, I think that
>> a cleanup pass makes sense. The more general approach I propose would be a
>> way to translate FE instrumentation to IR instrumentation, thus we can
>> concentrate on IR instr.
>>
>>
> ok.
>
> thanks,
>
> David
>
>
>> -- Sean Silva
>>
>>
>>>
>>> David
>>>
>>>
>>>
>>>
>>>> -- Sean Silva
>>>>
>>>>
>>>>>
>>>>> vedant
>>>>>
>>>>>
>>>>> > In other words, what I was suggesting is basically a generalization
>>>>> of Mehdi's "fused" counter transformation.
>>>>> >
>>>>> > Ideally this would be a reusable transformation. We could run it one
>>>>> or multiple times throughout the optimization pipeline, and probably want
>>>>> to always run it at least once at the very end before codegen. (note that
>>>>> you only ever need at most a single side-table mapping the "current"
>>>>> counters to the "original" counters no matter how many times you re-run
>>>>> counter placement, since every time you re-run counter placement you are
>>>>> emitting a completely new side-table and deleting the old one, and the side
>>>>> table always reconstructs the original counters).
>>>>> >
>>>>> > The availability of this facility would enable frontends to be as
>>>>> naive as they want about their counter placement, which is probably a net
>>>>> win for LLVM as a toolkit. It would be analogous to the alloca trick that
>>>>> frontends use to avoid having to construct SSA themselves: the existence of
>>>>> mem2reg allows them to not have to worry about it.
>>>>> >
>>>>> > -- Sean Silva
>>>>> >
>>>>> >
>>>>> >
>>>>> > vedant
>>>>> >
>>>>> >
>>>>> > > On Mar 10, 2016, at 7:48 PM, Mehdi Amini <mehdi.amini at apple.com>
>>>>> wrote:
>>>>> > >
>>>>> > >>
>>>>> > >> On Mar 10, 2016, at 7:21 PM, Vedant Kumar via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>> > >>
>>>>> > >> Hi,
>>>>> > >>
>>>>> > >> I'd like to add a new pass to LLVM which removes redundant
>>>>> profile counter
>>>>> > >> updates. The goal is to speed up code coverage testing and
>>>>> profile generation
>>>>> > >> for PGO.
>>>>> > >>
>>>>> > >> I'm sending this email out to describe my approach, share some
>>>>> early results,
>>>>> > >> and gather feedback.
>>>>> > >>
>>>>> > >>
>>>>> > >> Problem Overview
>>>>> > >> ================
>>>>> > >>
>>>>> > >> A profile counter is redundant if it's incremented in exactly the
>>>>> same basic
>>>>> > >> blocks as some other profile counter. Consider the following
>>>>> module:
>>>>> > >>
>>>>> > >>   local void f1() {
>>>>> > >>     instrprof_increment(profc_f1);
>>>>> > >>   }
>>>>> > >>
>>>>> > >>   void f2() {
>>>>> > >>     instrprof_increment(profc_f2);
>>>>> > >>     f1();
>>>>> > >>   }
>>>>> > >>
>>>>> > >> Once the inliner runs and deletes f1, we're left with:
>>>>> > >>
>>>>> > >>   void f2() {
>>>>> > >>     instrprof_increment(profc_f2);
>>>>> > >>     instrprof_increment(profc_f1);
>>>>> > >>   }
>>>>> > >>
>>>>> > >> Now we can say profc_f1 is redundant (or, an alias for profc_f2).
>>>>> > >>
>>>>> > >> I've noticed that frontend-based instrumentation can generate
>>>>> many redundant
>>>>> > >> profile counters. This degrades performance and increases code
>>>>> size.  We can
>>>>> > >> address the problem by getting rid of redundant counter updates.
>>>>> The trick is
>>>>> > >> to make sure we get back the same profiles.
>>>>> > >>
>>>>> > >>
>>>>> > >> Proposed Solution
>>>>> > >> =================
>>>>> > >>
>>>>> > >> I propose a pruning pass which takes the following steps:
>>>>> > >>
>>>>> > >> 1. Delete functions with local linkage and only one use, if that
>>>>> use is in
>>>>> > >>    a profile data record.
>>>>> > >>
>>>>> > >>    These functions are left around by the inliner (it doesn't
>>>>> know that
>>>>> > >>    they're safe to delete). Deleting them reduces code size and
>>>>> simplifies
>>>>> > >>    subsequent analysis of profile counters.
>>>>> > >
>>>>> > > I think that this is something global-DCE should be teached about
>>>>> (if it does not know already).
>>>>> > >
>>>>> > >
>>>>> > >>
>>>>> > >> 2. Determine which profile counters are essential.
>>>>> > >
>>>>> > > What is an "essential counter"?
>>>>> > >
>>>>> > >>
>>>>> > >> 3. Erase all updates to redundant profile counters.
>>>>> > >
>>>>> > > When you are saying "erase", you need to actually replace the
>>>>> multiple counter increment with *a new* counter, right? I.e. when replacing:
>>>>> > >
>>>>> > >     instrprof_increment(profc_f2);
>>>>> > >     instrprof_increment(profc_f1);
>>>>> > >
>>>>> > > you need to emit:
>>>>> > >
>>>>> > >     instrprof_increment(profc_f2f1_fused);
>>>>> > >
>>>>> > >
>>>>> > >> 4. Emit the aliases into a new section in the binary.
>>>>> > >>
>>>>> > >>    Aliases are represented as { Dst: i64*, Src: i64* } tuples.
>>>>> Some changes
>>>>> > >>    in compiler-rt are required to walk the alias section and fill
>>>>> in the
>>>>> > >>    correct execution counts at program exit time.
>>>>> > >>
>>>>> > >> This pass needs to be run after the inliner in order to be
>>>>> effective.
>>>>> > >>
>>>>> > >> The complexity of this pass is O(N*M), where N is the number of
>>>>> profile
>>>>> > >> counters, and M is the average number of updates per counter.
>>>>> > >
>>>>> > > What you're describing here is not clear to me? But I may have
>>>>> been totally misunderstanding the whole concept :)
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > Mehdi
>>>>> > >
>>>>> > >
>>>>> > >> In practice it is
>>>>> > >> a bit faster, since we can skip the analysis of counters which
>>>>> are discovered to
>>>>> > >> be redundant early on in the process.
>>>>> > >>
>>>>> > >>
>>>>> > >> Early Results
>>>>> > >> =============
>>>>> > >>
>>>>> > >> The pruning pass results in 25% speed improvement in the example
>>>>> program above
>>>>> > >> (where f2 is called in a loop 10^8 times).
>>>>> > >>
>>>>> > >> Here is a slightly less contrived example:
>>>>> > >>
>>>>> > >> #include <vector>
>>>>> > >> #include <algorithm>
>>>>> > >> #include <cstdlib>
>>>>> > >>
>>>>> > >> static void escape(void *p) {
>>>>> > >>   asm volatile("" : : "g"(p) : "memory");
>>>>> > >> }
>>>>> > >>
>>>>> > >> int main(int argc, char **argv) {
>>>>> > >>   std::vector<int> V(atoi(argv[1]));
>>>>> > >>   escape(reinterpret_cast<void *>(V.data()));
>>>>> > >>   std::sort(V.begin(), V.end());
>>>>> > >>   return V[0];
>>>>> > >> }
>>>>> > >>
>>>>> > >> I get the following results on my desktop (10^8 elements, 5 runs
>>>>> each):
>>>>> > >>
>>>>> > >> O3:                           0.262s
>>>>> > >> O3 + PGOInstr:                0.663s
>>>>> > >> O3 + PGOInstr + Pruning:      0.606s (8.6% performance win, 672
>>>>> aliases)
>>>>> > >> O3 + CoverageInstr:           0.690s
>>>>> > >> O3 + CoverageInstr + Pruning: 0.610s (11.6% performance win, 688
>>>>> aliases)
>>>>> > >>
>>>>> > >>
>>>>> > >> Next Steps?
>>>>> > >> ===========
>>>>> > >>
>>>>> > >> Is the performance of instrumented code something we think we
>>>>> need to fix?
>>>>> > >> What's an acceptable compile-time overhead for running this
>>>>> pruning pass? Is
>>>>> > >> the general approach a non-starter for anybody?
>>>>> > >>
>>>>> > >> I'd like to get some feedback and gauge interest before pursuing
>>>>> this further.
>>>>> > >> Possible next steps include benchmarking instrumented versions of
>>>>> clang and
>>>>> > >> swift on the relevant test suites, running performance tests from
>>>>> lnt, running
>>>>> > >> compile-time tests, and measuring any code size differences.
>>>>> > >>
>>>>> > >>
>>>>> > >> thanks
>>>>> > >> vedant
>>>>> > >>
>>>>> > >> _______________________________________________
>>>>> > >> LLVM Developers mailing list
>>>>> > >> llvm-dev at lists.llvm.org
>>>>> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160311/8c60326c/attachment-0001.html>


More information about the llvm-dev mailing list