r178105 - [modules] Re-enable the "ambiguous expansion of macro" warning.

Tue Nov 12 18:03:27 PST 2013

On Tue, Nov 12, 2013 at 12:29 PM, Argyrios Kyrtzidis <akyrtzi at gmail.com>wrote:

>
> On Oct 30, 2013, at 6:41 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Wed, Oct 30, 2013 at 5:32 PM, Argyrios Kyrtzidis <akyrtzi at gmail.com>wrote:
>
>>
>> On Oct 28, 2013, at 1:13 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Mon, Oct 28, 2013 at 12:15 PM, Richard Smith <richard at metafoo.co.uk>wrote:
>>
>>> On Sat, Oct 26, 2013 at 10:14 AM, Argyrios Kyrtzidis <akyrtzi at gmail.com>wrote:
>>>
>>>> As I said above, I think there's a misunderstanding here; to reiterate:
>>>> when you say "I want to use the interface provided by M1. If I've not
>>>> imported M2" this is case 1) I mentioned and I agree with what you said
>>>> should happen. If this does not happen currently then it should be fixed.
>>>>
>>>> Should we focus on what should happen when importing *both* M1 and M2 ?
>>>>
>>>
>>> Yes, I think refocusing will help. In particular, I think this is the
>>> key question on which we have different views:
>>>
>>>
>>>> How do I know that two modules are *related* ? One module may have
>>>> imported the other but if it doesn't re-export it it might as well be an
>>>> *unrelated* one from my perspective; why do I need to care what modules it
>>>> imported internally ?
>>>>
>>>
>>> There are (I think) two different ways that two modules can have a
>>> conflict on what a macro name means:
>>>
>>> 1) Both modules want the name to mean something (which be mean "it's a
>>> macro expanding to X" or might be "it's not a macro").
>>> 2) One module wants to *override* what another module meant the name to
>>> mean (M2 says "it expands to FOO BAR", and M1 says "actually, I override M2
>>> and say it expands to BAZ")
>>>
>>> There are also two different ways that a module can express its intent
>>> for a macro name to mean something:
>>>
>>> A) It can #define the name to something
>>> B) It can #undef the name
>>>
>>> I think we agree that:
>>>
>>> 1A: If two modules want the same macro name to be defined to different
>>> things, and that macro name is used, that is clearly ambiguous and should
>>> be diagnosed.
>>>
>>> I'm on the fence about this one:
>>>
>>> 1B: If one module wants the macro name to be #defined to something, and
>>> another module wants it to be undefined, currently Clang silently picks the
>>> definition. I am not entirely convinced that's appropriate -- if my module
>>> explicitly #undef's a macro, that's a statement of intent: in my interface,
>>> I'm using that name to mean something that's not a macro. I don't think
>>> it's obvious that:
>>>
>>>   #undef X
>>>   #define X X
>>>
>>> and
>>>
>>>   #undef X
>>>
>>> should mean fundamentally different things here. Either way, I'm saying
>>> "I own the macro name X and it produces the token sequence X."
>>>
>>
>> Hmm, I think it's actually rather important that we diagnose this: on the
>> upgrade path from #includes to modules, we'll get a silent change of
>> behavior without this.
>>
>> If we do diagnose this case (where two imported modules disagree on a
>> macro: one says it should be defined and the other says it should not),
>> then I care a lot less about case 2. (It's enough for me that we diagnose
>> this; we don't need to preserve the existing behavior.)
>>
>> That said, I think we should take as a guiding rule that modules and
>> submodules should behave the same, and we currently violate that in the
>> case of macros.
>>
>>
>>> Case 2 is where we really seem to disagree. It's also a weird case for
>>> modules, since we rarely have cases in which the interface of one module
>>> includes "modify the interface of another module". Which brings me to your
>>> next point:
>>>
>>>
>>>> I'm suggesting that If I explicitly import 2 modules, and the modules'
>>>> interfaces disagree on what an identifier means, then there is ambiguity.
>>>> One module may or may not have imported the other module internally,
>>>> but that doesn't change the fact that the modules' interfaces disagree on
>>>> the meaning of an identifier.
>>>>
>>>
>>> The key thing for me is, if one module imports another module, undefines
>>> some part of it, then re-exports it, it seems pretty clear that it intends
>>> to *modify* the interface of that first module. This happens in other
>>> guises too: extending overload sets, specializing templates, shadowing a
>>> tag name with a non-tag name, and so on. In all other cases, importing both
>>> modules gives you the modified interface *with no ambiguity*. The ambiguity
>>> has already been resolved by the dominating module saying how to resolve
>>> it. And if I import a module that says that it modifies another module's
>>> interface, I expect to get that modified interface.
>>>
>>> So: 2A) if M2 defines our macro, and M1 imports M2 and undefines and
>>> redefines that macro, and either I import M1 or I import *both* M1 and M2,
>>> I've expressed my intent to use M1's resolution to the conflict, and I
>>> think I should get M1's definition.
>>> 2B) if M2 defines our macro, and M1 imports M2 and undefines the macro,
>>> and either I import M1 or I import both M1 and M2, I think the macro should
>>> not be defined.
>>>
>>> By exporting a macro undefinition that undefines M2's macro, M1 is
>>> saying "importing me undefines M2's macro definition". There's no ambiguity
>>> in that, and if I import both modules, that's what I expect to happen. This
>>> would behave in (largely) the same way we behave across submodules in the
>>> same module, which makes the intuitive model simpler. If M1 didn't want
>>> this behavior, it should not export the macro.
>>>
>>>
>>> Let me make my proposal a bit more concrete, since I think I've been
>>> vague up until now.
>>>
>>> Instead of storing a single flat list of macro definitions and
>>> undefines, we store a DAG. The predecessors of each directive are the set
>>> of directives that were visible at the point of the directive. (For a
>>> #define, these must all be either #undefs or #defines to the same value.)
>>> When we see a macro name being used, we take the set of visible leaves of
>>> this graph, check they are consistent[1], and use that consistent value as
>>> the value of the macro. Otherwise, we diagnose.
>>>
>>> [1] I'm not certain whether consistency should allow a mixture of
>>> #undefs and #defines, but I'm inclined to think it should not.
>>>
>>
>>
>> Richard, thanks for the detailed explanation, I'm convinced that if a
>> module undefines a macro of another module and re-exports it, the macro
>> should be considered undefined even if I import both modules.
>> I have another question, what if the module doesn't re-export the
>> imported module ? Specifically, M1 imports M2, undefines a macro from M2,
>> but then it doesn't re-export module M2 ? That seems like a sufficiently
>> different case, to be considered separately.
>>
>
> Yes, I agree that that case needs separate consideration.
>
> I think it's clear that:
> - If you import only M1, the macro should not be defined
> - If you import only M2, the macro should be defined
>
> More interesting questions:
>  - If you import M1 and M3 (where M3 defines the same macro), either the
> macro should be defined or any use of it should be ambiguous. Which?
>  - If you import M1 and M2, then either (1) macro is not defined, (2)
> macro is defined, or (3) uses of macro are ambiguous might be reasonable.
> Which?
>
>
> The user-facing model I'm currently thinking about is:
>  * Preprocessor directives in modules are entities in their own right
> (every definition and undefinition of a macro is a distinct entity).
>  * These entities are *visible* if they are not private and the
> corresponding (sub)module has been imported.
>  * A #define X or #undef X directive *overrides* all definitions of X
> that are visible at the point of the directive.
>  * A #define or #undef directive is *active* if it is visible and no
> visible directive overrides it.
>  * If a macro name is used, the set of active directives for that macro
> name must be consistent (either all #undefs, or all #defines defining the
> macro name to the same value).
>
>
> Hi Richard, apologies for taking so long to get back to you.
>
> I'd like to expand a bit on the user-facing model, particularly regarding
> a module that imports another module but does not re-export it (M1 imports
> M2 but does not re-export M2).
> What are the semantics here, is it:
> - User imports M1
> - M2 is not visible at all (hidden due to non getting exported)
> - The fact that M1 imports M2 is strictly an implementation detail, M1 can
> remove the import of M2 or change to importing M3 and there will be no
> noticeable change in the user's code (regardless of what modules the user
> imported).
> ?
>

To start off with, we're only talking about the case where M1 undefines
M2's macro, and *exports* that undefine. I separately want to move us to a
module where preprocessor directives are not exported by default, so
(slightly longer-term) I want M1 to need to take explicit action for this
to happen. In this case, M1 has exported that (1) the identifier in
question is not a macro, and (2) in a conflict between this intent and M2's
intent, M1's intent should win.

So: if M1 re-exports its undefine of the macro, and user code imports both
M1 and M2, it will not see the macro. If M1 stops importing M2 before
undefining the macro, then it's no longer undefining M2's macro, and the
user code will now see an ambiguity between M1's exported undef and M2's
exported define.

If M1 didn't re-export its undefine of the macro, then it's not part of
M1's interface, and the preprocessing of the user code is entirely
unaffected by it.

But in either case, if M1 imports M2, and user code imports M1, then M2 is
part of the user's program, and the user's program can break if M1 switches
from using M2 to using M3. This problem is not specific to macros: imagine
both the user program and M3 define a global function named 'foo'. If you
transitively include any part of a module, that module is part of your
program.

Also, if M1 importing M2 is an implementation detail that doesn't "leak
> out" to the user of M1, what about if M1 exports a declaration using a type
> from M2 ?
>

There are a number of issues here, and I'm not sure which one concerns you.
The way this works at the moment is that you can implicitly use the
declarations of things from M2, but you can't name them or use their
definitions. So:

// M2
#define s_n S_FIELD_NAME
struct S { int s_n; };

// M1
import M2; // and don't re-export

S *make();
void take(S *p);
void take_val(S s);

// user code
import M1;

int s_n, foo; // ok, not a (visible) macro
#define s_n foo // ok, no conflict
int bar = s_n; // ok, assigns from foo
#undef s_n // ok, undefines our s_n, not M2's

int main() {
  take(make()); // ok
  auto *p = make(); // ok
  S *p = make(); // error, 'S' is not visible
  take_val(*make()); // error, 'S::S' is not visible
  return make()->s_n; // error, definition of 'S' is not visible
}

import M2;
int k = make()->s_n; // ok, S's definition is visible and M2's s_n macro is
active now

It doesn't matter that S's field is not actually called 's_n', because S's
definition is visible only when S is visible. The existence of the type 'S'
leaks out of M1, but its name *mostly* does not. This will fail, though:

import M1;
struct S {}; // error, multiple definitions of S.

> That gives the following answers to the above:
>
> If M2 contained a #__private_macro directive for the macro name:
>  - If you import M1 and M3, where M3 defines the same macro that M1
> undefines, the macro is defined to M3's value (that is the only visible
> directive).
>  - If you import M1 and M2, the macro is defined to M2's value (that is
> the only visible directive).
>  - (For completeness) if you import M1, M2, and M3, any use of the macro
> is ambiguous.
> (That is, a #__private_macro undefinition has no effect.)
>
> If M2 did not contain a #__private_macro directive for the macro name:
>  - If you import M1 and M3, where M3 defines the same macro that M1
> undefines, then any use of the macro is ambiguous (because M1 said "it's
> not a macro", and M3 said "it's this macro").
>  - If you import M1 and M2, then the macro is not defined (M2's definition
> is not active, because M1's undefinition is visible and overrides it).
>  - If you import M1, M2, and M3, this is the same as the first case. Any
> use of the macro is ambiguous: the active directives are M1's undefinition
> and M3's definition, and those conflict.
>
> I think I'm comfortable with those answers. For case of importing both M1
> and M2, another option would be to treat it like importing M1 and M3.
> However, that leads to a situation where (M1 re-exports M2 and I import M1)
> behaves differently from (M1 does not re-export M2 and I import both M1 and
> M2), which seems surprising.
>
>
> What do you think? Are you comfortable with the above answers, or do you
> think we need to go in a different direction for this case?
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131112/d916680d/attachment.html>