[PATCH] D17028: [ThinLTO] Use MD5 hash in function index.

Tue Feb 9 21:57:29 PST 2016

On Tue, Feb 9, 2016 at 9:49 PM, Xinliang David Li <davidxl at google.com>
wrote:

> On Tue, Feb 9, 2016 at 9:37 PM, Teresa Johnson <tejohnson at google.com>
> wrote:
> > tejohnson added inline comments.
> >
> > ================
> > Comment at: include/llvm/IR/FunctionInfo.h:151
> > @@ -151,1 +150,3 @@
> > +/// Map from function GUID to corresponding function info structures.
> > +typedef DenseMap<uint64_t, FunctionInfoList> FunctionInfoMapTy;
> >
> > ----------------
> > davidxl wrote:
> >> I am a little concerned with using DenseMap data structure here.
> >>
> >> Both DenseMap and StringMap are implemented as open hashtab with
> quadratic probing. The load factor of both are guaranteed to < 0.75 -- that
> means there are lots of empty buckets in a large table. The main
> differences are:
> >>
> >> 1) StringMap uses indirection -- each bucket contains pointer to the
> key-value pair object, so the bucket size is small. The copy is also
> cheaper when rehashing happens. However DenseMap is  key-value pair is
> embedded inside the bucket. That means if the value type has large size, it
> will cause lots of waste in memory.  The copy of dense map can also cause
> large overhead.
> >> 2) When resizing, DenseMap will round up the new size to the next power
> of 2 value -- this further increases the memory overhead.
> >>
> >> There is another bad side effect is that if elements are inserted into
> map one by one, it will incur lots of reallocation operation (just like
> vector) unless the size of the map is known before hand and properly
> resized at the beginning.
> >>
> >> I suggest using std::map if the size of the map is not known priori. If
> it is known, and if the map lookup happens after the map is created, it
> might be better to use a vector (pushback) followed by a sort after the map
> is populated.
> > Good point, I hadn't compared the relative overheads of these data
> structures.
> >
> > Unfortunately, we don't know the size of this map ahead of time. E.g.
> when merging multiple indexes to create the combined index. And we do
> lookups while adding (e.g. when creating the combined index we may have
> multiple comdat with the same name/GUID).
> >
> > So it sounds like the best bet for now is to use std::map, and we can
> investigate alternatives that might involve restructuring the accesses if
> this turns out not to be efficient enough.
> >
> > ================
> > Comment at: include/llvm/IR/Module.h:200
> > @@ -197,1 +199,3 @@
> >
> > +  /// Get the module's original source file name, either the same as the
> > +  /// ModuleID when compiling from source, or from name recorded in
> bitcode.
> > ----------------
> > davidxl wrote:
> >> This comment is not quite clear in its meaning.
> > The point I'm trying to make is that if we are compiling this from
> bitcode, it comes from the new bitcode record, otherwise it is the name of
> the input file (which is the same as the ModuleID). E.g.:
> >
> > clang -c foo.c -flto=thin // ModuleID = SourceFileName = foo.c
> > (foo.o is bitcode with new MODULE_SOURCE_FILENAME record = "foo.c")
> > clang foo.o -flto=thin // ModuleID = foo.o, SourceFileName = foo.c
> >
> > ================
> > Comment at: test/tools/gold/X86/thinlto.ll:27
> > @@ -26,3 +26,3 @@
> >  ; COMBINED-NEXT: <VALUE_SYMTAB
> > -; COMBINED-NEXT: <COMBINED_FNENTRY {{.*}} record string = '{{f|g}}'
> > -; COMBINED-NEXT: <COMBINED_FNENTRY {{.*}} record string = '{{f|g}}'
> > +; COMBINED-NEXT: <COMBINED_FNENTRY {{.*}}
> > +; COMBINED-NEXT: <COMBINED_FNENTRY {{.*}}
> > ----------------
> > davidxl wrote:
> >> test [offset, guid] pattern here?
> > I'm afraid testing the specific offsets will be brittle, unless I just
> check for any integer in that field. I could test for the GUIDs, but would
> have to handle either ordering as the order in the combined index is not
> currently guaranteed. Let me go ahead and check for the GUIDs as that
> maintains the same level of checking as we had with the function names.
>
> You can use [0-9]+ to match numeric ids.
>

Right. Wasn't sure whether you were suggesting checking just for the format
or for the specific values. Sounds like the former. That's easy, will do.

Teresa

>
> David
>
> >
> >
> > http://reviews.llvm.org/D17028
> >
> >
> >
>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160209/671e7fba/attachment.html>