[PATCH] D87732: [Support] Provide sys::path::guess_style

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 24 18:52:36 PDT 2020


dblaikie added a subscriber: rnk.
dblaikie added a comment.

In D87732#2278612 <https://reviews.llvm.org/D87732#2278612>, @jhenderson wrote:

> In D87732#2278196 <https://reviews.llvm.org/D87732#2278196>, @phosek wrote:
>
>> In D87732#2275943 <https://reviews.llvm.org/D87732#2275943>, @jhenderson wrote:
>>
>>> I'm not convinced this is really correct. After all, a path with mixed separators (e.g. '/my/path\to\foo') could be a Windows path rooted at the current drive. This mixed style can sometimes happen when things get concatenated together, so I don't think it's compeletely unreasonable.
>>>
>>> I'm also concerned that if we guess wrong, the behaviour will end up breaking. For example, if the path were a Windows path "/my/path\../to/foo", I'd expect a remove_dots call (without worrying about slash normalisation) to result in "/my/to/foo", but if the code is using Posix style, will it actually result in the path being left unchanged?
>>>
>>> I can think of two competing alternatives:
>>>
>>> 1. add an option to `remove_dots` to not do the separator canonicalization. I'm not sure whether this can be done unambiguously however - which separator should be removed when a directory is removed due to dots? The one before the removed parts? After them?
>>> 2. change all '\' to '/' unconditionally. However, this might break Linux paths with '\' in.
>>
>> None of these solutions is perfect and I'm not even sure if there's one. Even if we improve `remove_dots`, there's still the existing issue with `sys::path::append`: which separator should we use when combining compilation dir, include dir and the file name.
>>
>> Maybe we should instead update Clang to do the normalization when generating debug info and always use `/`, and then always use `/` unconditionally when combining paths on the consumer side (but don't do any normalization)?
>
> You're probably right about no solution being perfect. I'm not a fan of using '/' on Windows, mostly because it looks weird, but also because at least in the past for Windows cmd, '/' paths didn't auto-complete properly. It's only a minor gripe though (and not really an issue nowadays since I mostly use PowerShell). Normalisation to at least remove dots would make a lot of sense (strings become shorter), and possibly even normalising the separator direction does too, since that would make it more likely paths end up being identical and therefore poolable in .debug_str/.debug_line_str.
>
> `sys::path::append` could probably unconditionally be Posix, since such paths will also be valid Windows paths always. Maybe we need some way of communicating target platform as an alternative to `native` style? If that were a command-line option, the caller at least would know which style it needs to work with.
>
> I'd like to hear others opinions - @dblaikie?

Not sure I have great enlightenment here. @rnk might have some opinions on Windows V Posix path separators.

I think it's probably good to do path normalization up in clang where it knows about real files and paths and what's what - if possible (well, maybe with line directives it doesn't know? I think it could be given arbitrary strings there, really). Rather than trying to guess about path names later.

Are there accessible /bugs/ in any of this? (cases where we currently (or with this patch) can produce incorrect behavior, not just somewhat less than ideal dumps? The PDB use of this string seems like it goes into the PDB debug info and is used by a debugger - but I guess in that case you're dealing with Windows where both path separators are acceptable, so no bugs - and is there no POSIX equivalent situation where guessing is done and a wrong guess results in a backslash that would make the path buggy/incorrect/)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87732/new/

https://reviews.llvm.org/D87732



More information about the llvm-commits mailing list