[LLVMdev] debugloc metadata variation
Duncan P. N. Exon Smith
dexonsmith at apple.com
Thu Oct 23 09:43:00 PDT 2014
> On 2014-Oct-23, at 09:19, David Blaikie <dblaikie at gmail.com> wrote:
>
> (sorry for the duplicate Fred, I failed at reply-all the first time)
>
> On Wed, Oct 22, 2014 at 6:33 PM, Frédéric Riss <friss at apple.com> wrote:
>
> > On Oct 22, 2014, at 4:57 PM, David Blaikie <dblaikie at gmail.com> wrote:
> >
> > Just working on some of the gmlt+fission debug info stuff and I came across a comment that might be relevant to reducing the number of distinct debugloc metadata nodes:
> >
> > "or some sub-optimal metadata that
> > // isn't structurally identical (see: file path/name info from clang, which
> > // includes the directory of the cpp file being built, even when the file name
> > // is absolute (such as an <> lookup header)))"
> >
> > Seems that the file path/name isn't well canonicalized so as to allow metadata level merging when linking. Might be helpful to figure out that issue at some point.
>
> Incidentally I worked on an issue last week where the line table would get entries representing the same file, but where the file/dir split wasn’t done at the same place. I have a patch that remerges them at emission, but I was planing on investigating more the source of the duplication before I submit anything.
>
> The cases I’ve seen have one duplicated entry though, nothing that could have a visible impact on memory consumption.
>
> So the particular case where I think this arises in a way that might be measurable is if you have a build system that changes directories to build subprojects (like our make build system, if I understand correctly - but not our cmake build system, again, if I understand correctly):
>
> imagine a simple directory layout:
>
> include/
> foo.h
> lib/
> a/
> a.cpp // includes foo.h and calls one inline function from it (or uses a type, etc) from some external function a()
> b/
> b.cpp // does the same thing as a.cpp, but with its own external function, b()
>
> if you run "clang++ -emit-llvm -S -Iinclude -c lib/a/a.cpp lib/b/b.cpp -g" you get two .ll files both with the obvious:
> !9 = metadata !{metadata !"include/foo.h", metadata !"/tmp/dbginfo/pathtest"}
> But if you do this instead: "cd lib/a; clang++ -emit-llvm -S -I../../include -c a.cpp -g; cd ../../lib/b; clang++ -emit-llvm -S -I../../include -c b.cpp -g" you get two different nodes:
>
> !9 = metadata !{metadata !"../../include/foo.h", metadata !"/tmp/dbginfo/pathtest/lib/b"}
>
> !9 = metadata !{metadata !"../../include/foo.h", metadata !"/tmp/dbginfo/pathtest/lib/a"}
> and now you're left with a situation in which almost all the metadata is different and any place you were relying on the standard metadata uniquing you won't get it :(
>
This might be fixed by making `MDFile` (or `DIFile`) first-class. We
just need to canonicalize on creation.
class MDFile {
public:
// Split the path at the right place.
MDFile *get(LLVMContext &C, StringRef Path);
// Convenience for callers, but the path gets canonicalized.
MDFile *get(LLVMContext &C, StringRef File, StringRef Dir);
StringRef getFilename() const;
StringRef getDirectory() const;
};
Note that whether we continue to use `MDString` under the hood is an
implementation detail.
However, path canonicalization (in particular, eating "..") requires
a `stat()` to do correctly on *NIX, so the implementation would have
to cache lookups. Doesn't seem hard though.
More information about the llvm-dev
mailing list