[PATCH] D17212: [ThinLTO] Support for call graph in per-module and combined summary.
Teresa Johnson via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 1 11:43:37 PST 2016
On Tue, Mar 1, 2016 at 11:32 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
> On Mar 1, 2016, at 11:25 AM, Teresa Johnson <tejohnson at google.com> wrote:
>
> I have a prototype to identify all GV references not in functions.
> Basically I walk all GVs (functions, variables, aliases) in the module,
> walk the Users and look for any that are GlobalValues (so not an
> Instruction). But a few questions after looking at what it is capturing:
>
> For aliases @a = alias @b, we will end up creating a reference from a->b.
> Do we want these in the reference graph? On one hand, the alias
> representation is already shown in the module via the MODULE_CODE_ALIAS
> record. OTOH, if we want a full reference graph to do summary-based
> optimizations without reading the full module, I guess we need this
> relationship in a reference list.
>
>
> I don't know enough about aliases.
> Shouldn't we consider an alias group as a single entity?
>
Well presumably the group could be referenced via either name from another
module, so we might have to capture the relationship here for completeness.
>
>
> Also, I am seeing cases where there are GlobalVariables in the Module
> where the Users are not being captured either in the function level or the
> new module-level reference list. The specific example I found was a
> GlobalVariable used in a ConstantExpr. That ConstantExpr is used in another
> ConstantExpr, which in turn is used in a store instruction in a function:
>
> E.g. I see this with a vtable:
> GlobalVariable: @_ZTVN11xercesc_2_512DGXMLScannerE = unnamed_addr constant
> ...
> User (ConstantExpr): i8** getelementptr inbounds ([17 x i8*], [17 x i8*]*
> @_ZTVN11xercesc_2_512DGXMLScannerE, i64 0, i64 2)
>
> All references to this constant expression I can see in the dumped Module
> are nested in instructions within functions. E.g.:
> store i32 (...)** bitcast (i8** getelementptr inbounds ([17 x i8*], [17
> x i8*]* @_ZTVN11xercesc_2_512DGXMLScannerE, i64 0, i64 2) to i32 (...)**),
> i32 (...)*** %1, align 8
>
> So this GlobalVariable use is not identified and included in any reference
> list currently. Interestingly, the existing Value::isUsedInBasicBlock will
> not find it either, since it looks at operands directly hanging off
> instructions and Users of the value that are Instructions. Do we care about
> such uses?
>
>
> So any use of a global variable that would load or store through a
> constant gep would not be found? But it would be found if the gep isn't
> constant?
> This distinction seems arbitrary to me, and I would probably consider it
> as a bug for the purpose of the graph. Did I misunderstood something?
>
If it is not used directly in the operand list of the instruction it won't
currently be found. This seems like a bug in my algorithm that could be
fixed by recursively walking the operands on each User found in an operand
list. I was curious when I saw that isUsedInBasicBlock would miss these as
well, but I don't know enough about its single callsite to know whether
that is acceptable there. I can fix my code to recursively find these cases.
What I am wondering is if there will be cases where a GV is used at the
module-level in an initializer via an intervening ConstantExpr that I will
miss since those are being walked bottom up (looking at users of GVs rather
than recursively downwards in the Function/Instruction case). I may have to
do something more complicated to get these.
Teresa
> --
> Mehdi
>
>
>
> Teresa
>
> On Tue, Mar 1, 2016 at 9:01 AM, Teresa Johnson <tejohnson at google.com>
> wrote:
>
>>
>>
>> On Tue, Mar 1, 2016 at 8:42 AM, Mehdi Amini <mehdi.amini at apple.com>
>> wrote:
>>
>>>
>>> On Mar 1, 2016, at 8:27 AM, Teresa Johnson <tejohnson at google.com> wrote:
>>>
>>>
>>>
>>> On Tue, Mar 1, 2016 at 12:11 AM, Mehdi Amini <mehdi.amini at apple.com>
>>> wrote:
>>>
>>>>
>>>> On Feb 29, 2016, at 12:20 PM, Teresa Johnson <tejohnson at google.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> On Mon, Feb 29, 2016 at 11:32 AM, Mehdi Amini <mehdi.amini at apple.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On Feb 29, 2016, at 8:59 AM, Teresa Johnson <tejohnson at google.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 29, 2016 at 8:45 AM, Mehdi Amini <mehdi.amini at apple.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> On Feb 29, 2016, at 8:18 AM, Teresa Johnson <tejohnson at google.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 26, 2016 at 1:54 PM, Teresa Johnson <tejohnson at google.com
>>>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Feb 26, 2016 at 11:18 AM, Xinliang David Li <
>>>>>>> davidxl at google.com> wrote:
>>>>>>>
>>>>>>>> In previous discussions with Teresa, we mentioned recording static
>>>>>>>> callsite info in the summary -- this can drive more intelligent inline
>>>>>>>> decisions to enable more linker GC.
>>>>>>>>
>>>>>>>
>>>>>>> Yes, see WIP patch D17656.
>>>>>>>
>>>>>>>
>>>>>>>> Regarding global variable analysis -- I think there is a way for
>>>>>>>> thinLTO to work nicely in whole program mode.
>>>>>>>> The variable reference graph need edge that records access types
>>>>>>>> (read, write, address taken). The plugin step will merge the information
>>>>>>>> (with real linker feedback) ..
>>>>>>>>
>>>>>>>
>>>>>>> Based on various potential use cases mentioned here and in D17656
>>>>>>> and discussions with Mehdi, I am going to try expanding this to be a
>>>>>>> full-fledged reference graph. It may not include the access types
>>>>>>> initially. Need to think about how best to represent the variable and
>>>>>>> function reference information in the bitcode and the index...
>>>>>>>
>>>>>>
>>>>>> I collected some stats about the number of GV references in functions
>>>>>> across SPEC cpu2006, both as callees (expanded the current logic to
>>>>>> identify invokes as well), as well as other GV references. When each
>>>>>> function is written into bitcode, where I am scanning for current function
>>>>>> summary info, I iterate over all instruction operands(), and look for any
>>>>>> that are GlobalValues. Then I check if the function has a callsite and
>>>>>> depending on whether the GV is the callee or not, increment the appropriate
>>>>>> counter. I think that should get all of the GV references?
>>>>>>
>>>>>> Across all benchmarks and overall, most functions contain at least
>>>>>> one call (74% overall). The situation for non-callee GV refs is a bit more
>>>>>> mixed from benchmark to benchmark, but overall across all benchmarks most
>>>>>> (77%) have none.
>>>>>>
>>>>>> Given that, here is some thinking about the best bitcode
>>>>>> representation:
>>>>>>
>>>>>> For the current patch, each call is at least a tuple (callee value
>>>>>> id, static #callsites) or a triple including the profile count with PGO. I
>>>>>> suspect we don't care as much about the static # references or the profile
>>>>>> count of those references for non-callsite GV references. And as David
>>>>>> suggested, we may want to include other stats eventually for these normal
>>>>>> references (e.g. read, write, address taken). So I don't think we want to
>>>>>> use the same tuple or triple as we are using for the calls.
>>>>>>
>>>>>> Additionally, each bitcode record may only contain a single array, at
>>>>>> the end. So we will need to encode this single array with info in the
>>>>>> record on how to decode it. E.g. in the current patch I am using different
>>>>>> abbrev ids for the no call, calls without pgo and calls with pgo cases. But
>>>>>> as Mehdi noted, as we add additional cases this could result in a
>>>>>> combinatoric explosion of abbrev ids. And in any case, we would need to
>>>>>> note in the record how many entries are for callee references vs non-callee
>>>>>> GV references. Since most functions have at least one call and most have no
>>>>>> other GV references, we could optimize this a bit by including a VBR to
>>>>>> indicate the number of non-callee references (in most cases this will be a
>>>>>> single VBR chunk containing the value 0), and deducing the number of callee
>>>>>> references from the size of the record minus the number of non-callee
>>>>>> references. (In fact, I could get rid of the NOCALLS abbrev id variants in
>>>>>> the current patch using this approach to deduce whether there are any calls
>>>>>> encoded in the record.) However, the disadvantage of this approach is that
>>>>>> it requires an additional VBR per summary record.
>>>>>>
>>>>>> Another approach would be to include the references in a separate
>>>>>> subsequent summary section record. Since most summaries contain calls, we
>>>>>> may want to continue to include the calls in the main summary record, or
>>>>>> could also encode those in a separate record type for consistency. The
>>>>>> issue then becomes how to correlate those records with the main
>>>>>> corresponding summary record. For per-module summaries, the function
>>>>>> summary contains the value id, but including that in the reference records
>>>>>> could require a bigger size overhead. Additionally, the combined summaries
>>>>>> don't include value ids, they are associated with the corresponding VST
>>>>>> entry by including the bitcode offset of the summary record in the combined
>>>>>> VST. However, I think we can handle this by simply requiring that the
>>>>>> reference records immediately follow the corresponding summary record.
>>>>>> There is already precedence in the bitcode format for the order of the
>>>>>> record being important for correlation - specifically the order of the
>>>>>> MODULE_CODE_GLOBALVAR and MODULE_CODE_FUNCTION records implicitly encodes
>>>>>> their value ids (for correlation with VST entries).
>>>>>>
>>>>>> Given the above, I would suggest that we have 4 abbrev ids for the
>>>>>> function summary section. Using the PERMODULE case as an example, this
>>>>>> would be:
>>>>>>
>>>>>> FS_PERMODULE_ENTRY: [valueid, linkage, instcount]
>>>>>> FS_PERMODULE_CALLS: [n x (calleevalueid, callsitecount)]
>>>>>> FS_PERMODULE_CALLS_PROFILE: [n x (calleevalueid, callsitecount,
>>>>>> profilecount)]
>>>>>> FS_PERMODULE_REFS: [n x refvalueid]
>>>>>>
>>>>>> where only one of FS_PERMODULE_CALLS/FS_PERMODULE_CALLS_PROFILE would
>>>>>> be specified, and the ordering for each function would have to be ENTRY
>>>>>> then the CALLS and REFS records, with no intervening summary records from
>>>>>> other functions.
>>>>>>
>>>>>>
>>>>>> Isn't it conceptually having a block per function?
>>>>>> This is something it mentioned when we talked about the combinatorial
>>>>>> issue with codes.
>>>>>> What's the drawback of using blocks?
>>>>>>
>>>>>
>>>>> Just to clarify, I think you are suggesting something like the
>>>>> following organization:
>>>>>
>>>>> <FUNCTION_SUMMARY_BLOCK ...
>>>>> <SUMMARY_BLOCK ...>
>>>>> <FS_PERMODULE_ENTRY: [valueid, linkage, instcount]>
>>>>> <FS_PERMODULE_CALLS: [n x (calleevalueid, callsitecount)]>
>>>>> <FS_PERMODULE_REFS: [n x refvalueid]>
>>>>> </SUMMARY_BLOCK ...>
>>>>> ...
>>>>> </FUNCTION_SUMMARY_BLOCK>
>>>>>
>>>>> With one block per function?
>>>>>
>>>>>
>>>>> Yes.
>>>>>
>>>>> Does that have any advantage over the scheme where we don't have the
>>>>> additional level of block and the associated records must be adjacent?
>>>>>
>>>>>
>>>>> It seems more "structured" and representing more accurately the data
>>>>> we're storing.
>>>>> But I don't know all the trade-off associated with it? (for example is
>>>>> it possible for the record in a SUMMARY_BLOCK to use abbrev id from the
>>>>> FUNCTION_SUMMARY_BLOCK?)
>>>>>
>>>>
>>>> No, according to
>>>> http://llvm.org/docs/BitCodeFormat.html#define-abbrev-encoding: a
>>>> DEFINE_ABBREV "exists inside this immediate block — it is not visible in
>>>> subblocks or enclosing blocks". However, there is a special case where if
>>>> the DEFINE_ABBREV is in the module-level BLOCKINFO block it is visible to
>>>> the whole file. I've been putting the DEFINE_ABBREVS for the summary blocks
>>>> inside those blocks since they were only needed locally, but that would
>>>> have to be moved to the BLOCKINFO since we wouldn't want to have to repeat
>>>> them in every subblock. This has the disadvantage of bloating out the
>>>> abbrev ID space - right now the typical abbrev ID fits into a 3-bit space
>>>> and I don't know how close we are to that. If we add more to the BLOCKINFO
>>>> for example, then DEFINE_ABBREVs in nested blocks use higher numbers.
>>>>
>>>> I'm not sure we really need the subblocking shown above. Right now the
>>>> records per function is well-defined. For example, with the above scheme,
>>>> each function has a single ENTRY record, followed immediately by up to 1
>>>> each CALL and REF records.
>>>>
>>>> I collected some more stats across SPEC cpu2006 comparing the relative
>>>> overheads of the different schemes I described above. This assumes that
>>>> each call contributes 2 entries to each record's array size since it is a
>>>> tuple (assume non-PGO), and each non-call ref contributes 1 entry to the
>>>> associated record's array size. I compared
>>>> 1) keeping both calls and refs in a single record using a VBR to
>>>> indicate the number of non-call refs (I looked at both 4-bit and 6-bit VBRs
>>>> since this count is small or 0 in most cases), which means the array size
>>>> is the total number of calls*2+refs for that function
>>>> 2) keeping the calls in the main entry (so the array size is the number
>>>> of calls*2), plus when the ref count is non-zero, a second record requiring
>>>> a 3-bit abbrev id and where the array size is the number of non-call refs
>>>> 3) using separate records for both calls and non-call refs. So each
>>>> record (if the associated count is non-zero) contains the 3-bit abbrev id
>>>> and the array size is the number of calls*2 or the number of non-call refs,
>>>> respectively.
>>>>
>>>> I computed the actual VBR6 bit size required for each record for the
>>>> array size in each case.
>>>>
>>>> The results show that 2) is clearly smallest across cpu2006. 3) is a
>>>> bit bigger but might be worth it if splitting out calls into a separate
>>>> record seems more consistent and clear.
>>>>
>>>>
>>>> I don't have a strong opinion here.
>>>> How do you plan to take into account references from Globals to
>>>> function? (like a VTable for instance?)
>>>>
>>>
>>> This is not in the current patch I am working on since the only edges
>>> are in function summaries. The easiest thing to do in this case is to
>>> include a bit in the function summary indicating that there are uses that
>>> aren't within functions (function has a use that is a non-BlockAddress
>>> Constant, i.e. not on an Instruction), to ensure that such functions are
>>> treated conservatively with respect to whether they have uses or not. This
>>> probably belongs in a follow-up patch, e.g. maybe in D17656 which adds the
>>> total count of static callsites to each function summary in the combined
>>> index.
>>>
>>>
>>> What you are describing won't record the reference edges right? IIUC
>>> this will say if yes or no a function is referenced from somewhere.
>>>
>>
>> Right, we would need to add a record to record edges from module-level
>> global references to such values to get more information, since there is no
>> associated function summary.
>>
>> But after sending this I realized we can't just record this flag directly
>> when compiling into bitcode since the reference could be in another module.
>> So to capture such module-level global references, we will need to at least
>> record in the per-module bitcode which value ids are referenced at the
>> module-level. Either as a list of value ids per module, which would be used
>> to set the above bit in the combined index function summary, or as a full
>> reference list of edges from global value ids to the referenced value id.
>>
>> Do you think it will be much more useful to know which global value ids
>> are referencing the function at the module level, or just record that there
>> are such references to the function? I suppose if we want to also record
>> module-level references to non-function GVs we can't use a function summary
>> bit anyway.
>>
>> To get the full info we'd either need to generalize the function summary
>> section to be a global value section, and include a new record for the
>> module-level reference list, or create a new summary section for these
>> references.
>>
>> Teresa
>>
>>
>>
>>> --
>>> Mehdi
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> I'm not sure offhand how much overhead is added by keeping each
>>>> function's summary in a separate sunblock.
>>>>
>>>>
>>>> Don't waste time on that, I'm fine with the other solutions.
>>>>
>>>> Thanks!
>>>>
>>>> --
>>>> Mehdi
>>>>
>>>>
>>>>
>>>>
>>>> Teresa
>>>>
>>>>
>>>>> I don't have a strong opinion on this, just wondering.
>>>>>
>>>>> --
>>>>> Mehdi
>>>>>
>>>>>
>>>>>
>>>>>> Also, the REFS case could eventually be extended to include other
>>>>>> information for each refvalueid.
>>>>>>
>>>>>> Like I said earlier, though, since most functions have calls, we
>>>>>> could reduce the required records by including the call array with the
>>>>>> other summary info (e.g. combine the CALLS with the ENTRY as in the current
>>>>>> patch). That would save a bit of space (one less abbrev id to define per
>>>>>> bitcode file, and 3 less bits per function by avoiding specifying the extra
>>>>>> abbrev id).
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Thanks,
>>>>>> Teresa
>>>>>>
>>>>>>
>>>>>>> Teresa
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>> On Fri, Feb 26, 2016 at 11:06 AM, Mehdi Amini <
>>>>>>>> mehdi.amini at apple.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Feb 26, 2016, at 10:58 AM, Xinliang David Li <
>>>>>>>>> davidxl at google.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Feb 26, 2016 at 10:53 AM, Mehdi Amini <
>>>>>>>>> mehdi.amini at apple.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> > On Feb 26, 2016, at 10:38 AM, Teresa Johnson <
>>>>>>>>>> tejohnson at google.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > tejohnson added a comment.
>>>>>>>>>> >
>>>>>>>>>> > In http://reviews.llvm.org/D17212#362864, @joker.eph wrote:
>>>>>>>>>> >
>>>>>>>>>> >> How could we integrate accesses to global variable as part of
>>>>>>>>>> this?
>>>>>>>>>> >> It turns out that to be able to benefit from the linker
>>>>>>>>>> information on what symbol is exported during the import, this is a must.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > Well without it you still can see which function symbols will
>>>>>>>>>> be exported, just not the variables, so you are running with less info and
>>>>>>>>>> I guess need to assume that all static variables will be exposed and
>>>>>>>>>> promote them.
>>>>>>>>>>
>>>>>>>>>> The scheme I am currently setting up is:
>>>>>>>>>>
>>>>>>>>>> 1) The linker gives us the list of symbols that need to be
>>>>>>>>>> "preserved" (can't internalize)
>>>>>>>>>> 2) Link the combined index
>>>>>>>>>> 3) Compute the import list for every module *by just looking at
>>>>>>>>>> the profile*
>>>>>>>>>> 4) Do the promotion
>>>>>>>>>>
>>>>>>>>>> There is absolutely no assumption for the promotion (last step):
>>>>>>>>>> you exactly know what will be imported by *every module*, and you can
>>>>>>>>>> promote the optimal minimal amount of symbols.
>>>>>>>>>>
>>>>>>>>>> All of that is good and should work with your call-graph patch
>>>>>>>>>> "as is".
>>>>>>>>>>
>>>>>>>>>> I'm looking to go two steps further during stage 3:
>>>>>>>>>>
>>>>>>>>>> 1) I want to drive the importing heuristic cost to actually take
>>>>>>>>>> into account the need for promotion.
>>>>>>>>>> I'll start some test on the extreme case by *forbiding* any
>>>>>>>>>> promotion, i.e. if a function references an internal function or global,
>>>>>>>>>> then it can't be imported in any other module. On the long term it may be
>>>>>>>>>> interesting to include this in the importing threshold.
>>>>>>>>>> This can be implemented with a flag or an int in the summary
>>>>>>>>>> "numberOfInternalGlobalsReferenced", but will not be enough for step 2
>>>>>>>>>> (below).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Interesting. What is the motivation for this type of tuning? Does
>>>>>>>>> is try to preserve more variable analysis (e.g, aliasing) in backend
>>>>>>>>> compilation?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes! LLVM optimization bails terribly with global variable, but
>>>>>>>>> likes a lot "internal" variables. On some other (platform dependent)
>>>>>>>>> aspect, accessing non-internal globals requires going through the GOT which
>>>>>>>>> is not the case with internal. Internal globals can be totally eliminated
>>>>>>>>> or turned into allocas, which in turn enables more other optimization.
>>>>>>>>> In the same way, in many aspects, the optimizer benefits from
>>>>>>>>> internal functions.
>>>>>>>>>
>>>>>>>>> Some things can be recovered with more effort (for instance in the
>>>>>>>>> linker plugin for ThinLTO we're running : internalize + global opt +
>>>>>>>>> global DCE *before* doing any promotion). But we will fight assumptions in
>>>>>>>>> the optimizer for a long time.
>>>>>>>>>
>>>>>>>>> All of this is driven by our current performance tuning of course.
>>>>>>>>> We are seeing Full-LTO generating a binary almost two times
>>>>>>>>> smaller on
>>>>>>>>> test-suite/trunk/SingleSource/Benchmarks/Adobe-C++/loop_unroll.cpp while
>>>>>>>>> ThinLTO is on the same level as the regular O3.
>>>>>>>>> (this is a single source benchmark, I wonder if the inliner
>>>>>>>>> heuristic wouldn't need some tuning for O3 in general).
>>>>>>>>>
>>>>>>>>> If you have any other idea or input :)
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Mehdi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2) I want to benefit from the linker information from stage 1 to
>>>>>>>>>> internalize symbols.
>>>>>>>>>> It means that the information about the fact that a function is
>>>>>>>>>> referencing an internal global *can't be in the summary* because the
>>>>>>>>>> front-end does not know that the global will be internalized.
>>>>>>>>>> This can be implemented by not having a "call graph" but a
>>>>>>>>>> "reference graph" (not sure on the terminology): i.e. edges would be there
>>>>>>>>>> for any uses of a symbol to another one and not only calls.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Reference graph is what GCC uses :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> > To refine that behavior for variables, yes, we'd need
>>>>>>>>>> additional info in the summary.
>>>>>>>>>> >
>>>>>>>>>> > (For davidxl or anyone else who didn't see the IRC
>>>>>>>>>> conversation, Mehdi is looking at doing pure summary-based importing
>>>>>>>>>> decisions in the linker step, then giving this info to the ThinLTO backends
>>>>>>>>>> to avoid promotion of local values that aren't exported. For a distributed
>>>>>>>>>> build if we wanted to do it this way the importing decisions would all be
>>>>>>>>>> made in the plugin step, then would need to be serialized out for the
>>>>>>>>>> distributed backends to check.)
>>>>>>>>>> >
>>>>>>>>>> > Two possibilities depending on the fidelity of the info you
>>>>>>>>>> think you need:
>>>>>>>>>> >
>>>>>>>>>> > 1. One possibility is to just put a flag in the function
>>>>>>>>>> summary if it accesses *any* local variables, and adjust the importing
>>>>>>>>>> threshold accordingly. Then in the ThinLTO backend for the exporting module
>>>>>>>>>> you need to check which of your own functions are on the import list, and
>>>>>>>>>> which local variables they access, and promote accordingly.
>>>>>>>>>> >
>>>>>>>>>> > 2. If it will be really beneficial to note exactly which local
>>>>>>>>>> variables are accessed by which function, we'll need to broaden the edges
>>>>>>>>>> list to include accesses to variables (I assume you only care about local
>>>>>>>>>> variables here). E.g. the per-module summary edge list for a function would
>>>>>>>>>> need to include value ids for any local variables referenced by that
>>>>>>>>>> function (not sure that the other parts of the triple, the static and
>>>>>>>>>> profile counts, are needed for that). Then in the combined VST we need to
>>>>>>>>>> include entries for GUIDs of things that don't have a function summary, but
>>>>>>>>>> are referenced by these edges. When accessing a function summary edge list
>>>>>>>>>> for a candidate function to import, you could then see the GUID of any
>>>>>>>>>> local variables accessed. You wouldn't know them by name, but if for
>>>>>>>>>> example you wanted a heuristic like "if >=N hot import candidate functions
>>>>>>>>>> from module M access a local variable with GUID G, go ahead and import
>>>>>>>>>> those and let G be promoted by the backend (which like in approach #1 needs
>>>>>>>>>> to check which local variables are accessed by any functions on an import
>>>>>>>>>> list)".
>>>>>>>>>> >
>>>>>>>>>> > Obviously 1) is easier and cheaper space-wise. What are your
>>>>>>>>>> thoughts?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So 1) is cheaper, but 2) a lot more powerful as explained above :)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> yes -- we may find other good uses of it.
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Mehdi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>>>>>>> 408-460-2413
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>>>>>> 408-460-2413
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>>>>> 408-460-2413
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>>>> 408-460-2413
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>>> 408-460-2413
>>>
>>>
>>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>> 408-460-2413
>>
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson at google.com |
> 408-460-2413
>
>
>
--
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160301/4fc3a2dc/attachment.html>
More information about the llvm-commits
mailing list