[Lldb-commits] [PATCH] Skip symlinks to the original file when searching for debug info

Mon Feb 23 18:42:51 PST 2015

I thought of something else too. separating the disk access access from the
path manipulation also provides a nice solution to being able to perform
operations on files that aren't tied to a particular path. For example, it
lets you write functions that can operate on inodes or fds, and you can
then turn around and express a FileSpec overload in terms of that for
greater code reuse. You could then expose this to Python perhaps, there are
places where we send fds in connection strings, so if you have an fd that
you know is on the remote, you can do something useful with it. Awkward to
make this work with a FileSpec.

An example of this might be a debugger command that lists details of all
fds a process has open (an extremely useful command which i plan to add for
windows down the line, that hopefully i can generalize to all platforms).
Now you see that fd 0x1C looks interesting, you want to inspect it further.
If you know the fd refers to a file, you want to get some file specific
details about it. But why should take a completely different code path than
if all you have is a path? There's potential for reuse here.

It gets even worse when you start factoring in the situation with OS
specific differences. There's tons of windows or NTFS specific details of a
file that we can query or manipulate that would just get in your way if you
had to look at the code every time you opened up FileSpec.cpp. Writing
SBFileSystem::Foo(file_spec) might be awkward, but i think it has a lot of
benefits that should be considered
On Mon, Feb 23, 2015 at 2:34 PM Zachary Turner <zturner at google.com> wrote:

> Can you think of any examples where it wouldn't work?  Since the FileSpec
> doesn't currently store the platform, it seems it's not necessary for any
> of our current use cases, although I admit I don't know what the future
> holds.
>
> Even if the FileSpec stores the Platform though, I still prefer just
> having a GetPlatform() method -- which might return null if no platform has
> been set -- and using that to call into another class.  Strictly as a
> library design issue, I think it's better and more flexible to keep the
> responsibilities separated.  What if I don't have a FileSpec but I want to
> call one of these methods?  I have to create a FileSpec now, permanently
> increasing the size of the string pool even when it might not have been
> important to me to save that path.  There's probably a setting or command
> somewhere to clear it, but anyway, the issue is that I might not always
> want to use it this way.  Coupling functionality together introduces
> inflexibility by design and forces you to use everything a specific way,
> even if it's not what you want.  Separating them gives you the flexibility
> to use it however you want.
>
> On Mon Feb 23 2015 at 2:21:37 PM <jingham at apple.com> wrote:
>
>>
>> > On Feb 23, 2015, at 2:09 PM, Zachary Turner <zturner at google.com> wrote:
>> >
>> > I agree that for the reasons you point out, writing a low level llvm
>> API call is not the right approach.  But a FileSystem class seems like a
>> great place for this.  A platform is not necessary to append a path
>> component or normalize a path, for example.  So why muddy up its
>> implementation with things that not everything cares about?  Would you
>> require every FileSpec to be created with a platform?  Or only support
>> using the currently selected platform for operations?  Or would you have
>> the operations take a platform as one of their arguments?  None of the
>> options is particularly appealing to me.
>> >
>> > So instead, why not this:
>> >
>> > FileSpec spec1("/Foo/Project/subdirA/../headers/foo.h");
>> > FileSpec spec2("/Foo/Project/subdirB/../headers/foo.h");
>> > FileSystem FS(platform_sp);
>> > if (FS.equivalent(spec1, spec2)) {
>> > }
>>
>> The problem with this is that the code that is operating on the file spec
>> at this point may or may not know how to interpret it.  For instance, in
>> the breakpoint case we could make this work, because the breakpoints can
>> get their hands on their Target and that knows what platform is relevant.
>> So it can use that platform.  But there's no guarantee that will always be
>> the case.  So it seems to me better to record that sort of information - if
>> you know it - when you know it, rather than lose the info and then hope
>> somebody else can reconstruct it after the fact.
>>
>> Jim
>>
>> >
>> > On Mon Feb 23 2015 at 2:03:31 PM <jingham at apple.com> wrote:
>> >
>> > > On Feb 23, 2015, at 1:51 PM, Zachary Turner <zturner at google.com>
>> wrote:
>> > >
>> > > Well I really like to keep ifdefs out of code.  I know that Host is
>> the place for ifdefs, and FileSpec is in Host, but I still like to keep
>> them to a minimum.  It's too easy to break things when you have to try to
>> figure out which ifdef paths you need to modify.  That was the reason for
>> creating the FileSystem class.  It exposes a platform agnostic interface to
>> low level FS operations.  Similar to llvm::sys::fs, and there's probably
>> some duplication there that could be removed, but I just didn't want to
>> break too much stuff at the time.
>> >
>> > >
>> > > I really wouldn't want to see a bunch of ifdefs ending up in FileSpec
>> (there's already some there that I'd like to see removed, as described
>> below).  A lot of what it does is consistent across all platforms and is
>> just focused on splitting apart a string, and constifying the directory and
>> filename portions.  And so I think it would be a shame to muck up this
>> (mostly) platform-agnostic class with some platform specific stuff.  It
>> would be great if there were a way to isolate the disk access part.  I
>> don't think it's too crazy to write something like
>> > >
>> > > FileSpec MyFile("/usr/bin/foo");
>> > > FileSystem::DirectoryEnumerator Enumerator(MyFile);
>> > > for (FileSpec &Child : Enumerator) {
>> > >     if (FileSystem::IsDirectory(Child)) {
>> > >         // Do something.
>> > >     }
>> > > }
>> > >
>> > > In fact I think it's even better.
>> > > 1) It improves readability by making it clear that you're hitting the
>> file system, so you should be careful about what type of path you're using.
>> > > 2) It improves modularity by isolating the code that hits the file
>> system from the code that doesn't.
>> > > 3) It cleans up the FileSpec implementation somewhat.  Currently
>> EnumerateDirectory is a big ugly function with a couple of different
>> preprocessor branches.
>> > > 4) Separating the responsibilities like this leads to more flexible
>> usage patterns.  Like having DirectoryEnumerator be an object that can be
>> used in a range-based for loop here, instead of forcing its usage to
>> conform to what you can write with a single function call.
>> >
>> > > Anyway, TL;DR of all this is that I know it was originally designed
>> to be all things related to files, but as it grows, it starts to become
>> really heavy.  I think it makes sense to consider a separation of
>> responsibilities.  Does a function that mmaps file contents really belong
>> in the same class as a function that resolves a symlink?  I don't think we
>> need or even should have 1 class that is all things file system.  We should
>> try to find ways to break it down into smaller, more targeted, more
>> meaningful components.
>> > >
>> >
>> > I don't think I agree with this.  For instance, when clang writes the
>> dependent file records in debug information it writes them relative to the
>> current compilation directory, so for complex compiles, on Unix you can end
>> up with one source file that contains paths like:
>> >
>> > /Foo/Project/subdirA/../headers/foo.h
>> >
>> > and another:
>> >
>> > /Foo/Project/subdirB/../headers/foo.h
>> >
>> > If you want to do equivalence of these two headers (e.g. if somebody
>> sets a breakpoint by full name) then I want to know "on the target
>> platform, can these two files be the same file."  It's weird that I can't
>> ask the file spec I was handed this question, it will give me an answer,
>> but probably a wrong one.  Instead I have to pass the paths to some
>> FileSystem call or even weirder some lower level llvm call that knows
>> nothing about these files, and for sure is going to fail if they are
>> remote...  The FileSpec has a chance to be able to answer this cause it was
>> around from the time the reference was created and we can add info as
>> needed to make this possible.  But watering it down to a string and then
>> trying to recover extra information later on seems very error-prone.  And
>> having to know which actor to ask this sort of question also seems
>> error-prone.
>> >
>> > Jim
>> >
>> >
>> > > On Mon Feb 23 2015 at 1:23:54 PM <jingham at apple.com> wrote:
>> > >
>> > > > On Feb 23, 2015, at 1:17 PM, Zachary Turner <zturner at google.com>
>> wrote:
>> > > >
>> > > > In http://reviews.llvm.org/D7836#128511, @jingham wrote:
>> > > >
>> > > >> That is not how we thought of FileSpec originally.  It clearly was
>> meant to deal with real files (for instance it has Exists methods & Read,
>> etc.)  It was meant to be a wrapper that would allow us to do all the
>> useful file manipulation in a system-independent way.  It ended up with
>> system dependencies just because we didn't have the time to keep it clean
>> when there were no ports of lldb that weren't Unix...  But that's life, not
>> design.
>> > > >>
>> > > >> Anyway, it would be weird to have some object you use to denote
>> the files you are interested in, but then when you actually want to do
>> something with them, you have to turn around and use another object.  And
>> it would be unfortunate to have to write system specific code at this point
>> except for the - hopefully very few - cases where the sort of thing we want
>> to do to a file is not able to be expressed in a system-independent way.
>> > > >>
>> > > >> Jim
>> > > >
>> > > >
>> > > > Well, already we have introduced the notion of the path syntax so
>> that FileSpec can in theory refer to a path that is invalid for the host
>> you're on.  In that case it doesn't make much sense to have a Windows path,
>> for example on Mac, and then try to do a file system operation on it.  When
>> I refactored some of Host months ago, there were other methods directly in
>> Host that would do things like get file permissions, etc.  So there was
>> already some separation between the FileSpec and the methods that hit the
>> file system.  Most of that stuff ended up in FileSystem, and the stuff that
>> remains in FileSpec is only there because I couldn't change the public API.
>> > > >
>> > > > But with the notion of a FileSpec that can refer to a path that
>> can't possibly be on your local host, I think it makes sense to consider
>> whether the file system access routines should be separated from FileSpec.
>> > >
>> > > Not sure, wouldn't it be better to optionally attach a platform to
>> FileSpec, so that FileSpecs can transparently handle remote access?
>> FileSpec's are how we specify files. Seems awkward to switch gears to
>> something else when you actually want to access them.
>> > >
>> > > Jim
>> > >
>> > >
>> > > >
>> > > >
>> > > > http://reviews.llvm.org/D7836
>> > > >
>> > > > EMAIL PREFERENCES
>> > > >  http://reviews.llvm.org/settings/panel/emailpreferences/
>> > > >
>> > > >
>> > >
>> >
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20150224/6661b4b6/attachment.html>