[PATCH] Implements DR 712 (odr-use, potential results of an expression)

Fri Jun 7 15:21:46 PDT 2013

On Tue, Jun 4, 2013 at 9:19 PM, Faisal Vali <faisalv at gmail.com> wrote:
> OK - this patch adds the following:
>  0) Some codegen tests, and additional non-codegen tests.
>  1) If the pointer to member expression can be entirely folded into a
> constant, do so.
>  2) If the object expression in a member access (be it through dot, or
> pointer to member) is a constant
>      but the containing expression could not be folded into a constant - do
> not emit the object as a constant
>      - instead get the variable's address using getStaticLocalDeclAddress
> (it appears that constants are
>        hoisted out as globals in llvm IR) so that getelementptr has a
> pointer to work with.
>
> Am patiently awaiting feedback - because if i am on the right track, hoping
> i can get this committed soon; if I am way off, I would like to walk away
> from this, so that I can return to working on generic lambdas ;)

The Sema part looks good.

For the CodeGen part, I think you should approach the problem somewhat
differently. CodeGen emits expressions as RValues or LValues based on
how they are used, not based on whether the expressions themselves are
rvalues or lvalues, and in particular, an operand of an
lvalue-to-rvalue conversion is typically emitted directly as an
rvalue. So if you ensure that RValue emission never actually performs
a load in the cases which are not odr-uses, then you should be OK.

One problem is that, in some cases, the rvalue emission defers to
lvalue emission; emitting pointer-to-member access is one of those
cases. I think what you should do here is to teach the
ScalarExprEmitter to try to evaluate the right-hand side of the
pointer-to-member expression, and emit a direct field access if it
can.

> On Mon, Jun 3, 2013 at 2:54 PM, Faisal Vali <faisalv at gmail.com> wrote:
>>
>> An updated patch that fixes emission of pointers to members.
>>
>> Codegen tests and clang-format is still pending feedback regarding the
>> strategy i used to fix the member access issue.
>> Thank you!
>>
>> Faisal Vali
>>
>>
>>
>> On Sun, Jun 2, 2013 at 2:36 PM, Faisal Vali <faisalv at gmail.com> wrote:
>>>
>>>
>>>
>>> On Wed, May 29, 2013 at 5:01 PM, Richard Smith <richard at metafoo.co.uk>
>>> wrote:
>>>>
>>>> On Wed, May 29, 2013 at 11:19 AM, Faisal Vali <faisalv at gmail.com> wrote:
>>>
>>>
>>>> I assume you didn't mean to include the changes to ParseAST.cpp?
>>>>
>>>
>>>
>>> No I did not. Sorry - I'll try and be better about that - but it might
>>> happen again ;)
>>>
>>>
>>>
>>>> Index: include/clang/Sema/Sema.h
>>>> ===================================================================
>>>> --- include/clang/Sema/Sema.h (revision 182855)
>>>> +++ include/clang/Sema/Sema.h (working copy)
>>>> @@ -3016,6 +3016,9 @@
>>>>    void MarkMemberReferenced(MemberExpr *E);
>>>>
>>>>    void UpdateMarkingForLValueToRValue(Expr *E);
>>>> +  // Assess the expression for any references to a variable
>>>> +  // that do not entail an odr-use of that variable.
>>>> +  void UpdateNonODRUsedVariableReferences(Expr *E);
>>>>
>>>> Isn't this function the same thing as UpdateMarkingForLValueToRValue?
>>>> Maybe merge them into a single function? (FWIW, I prefer the old name.)
>>>>
>>> Done.
>>>
>>>>
>>>>
>>>> @@ -11508,13 +11508,23 @@
>>>>    // Note that we use the C++11 definition everywhere because nothing
>>>> in
>>>>    // C++03 depends on whether we get the C++03 version correct. The
>>>> second
>>>>    // part does not apply to references, since they are not objects.
>>>> +  // Per DR: 712 constant expressions in discarded value expressions
>>>> +  // are not odr-used either.
>>>>
>>>> The mention of discarded value expressions seems out of place here.
>>>>
>>>> +  //
>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3383.html#712
>>>>
>>>> We usually don't include such links in comments.
>>>>
>>>> +  //
>>>>    const VarDecl *DefVD;
>>>> -  if (E && !isa<ParmVarDecl>(Var) &&
>>>> +  if (E && SemaRef.getLangOpts().CPlusPlus11 &&
>>>> +                  E->isCXX11ConstantExpr(SemaRef.Context)) {
>>>> +    if (!Var->getType()->isReferenceType())
>>>> +      SemaRef.MaybeODRUseExprs.insert(E);
>>>>
>>>> Hah, this is a literal interpretation of the standard text, but isn't
>>>> what was intended. It allows DeclRefExprs referring to globals, for
>>>> instance. What was *intended* was to look for objects on which the
>>>> lvalue-to-rvalue conversion could be applied in a constant expression, that
>>>> is, what the existing code does:
>>>>
>>>>
>>>> +  }
>>>> +  else if (E && !isa<ParmVarDecl>(Var) &&
>>>>        Var->isUsableInConstantExpressions(SemaRef.Context) &&
>>>>        Var->getAnyInitializer(DefVD) && DefVD->checkInitIsICE()) {
>>>>
>>> Done.  The comment might need some additional work though.
>>>
>>>
>>>>
>>>>
>>>> Index: lib/Sema/SemaExprCXX.cpp
>>>> ===================================================================
>>>> --- lib/Sema/SemaExprCXX.cpp (revision 182855)
>>>> +++ lib/Sema/SemaExprCXX.cpp (working copy)
>>>> @@ -5518,6 +5518,78 @@
>>>>    return false;
>>>>  }
>>>>
>>>> +namespace {
>>>> +
>>>> +  struct ConstantExpressionODRCleaner : RecursiveASTVisitor<
>>>> +                                 ConstantExpressionODRCleaner> {
>>>> +
>>>>
>>>> We normally don't include blank lines in these cases. Also, maybe split
>>>> this line after the : instead of after the < ?
>>>>
>>> Done.
>>>
>>>>
>>>> A RecursiveASTVisitor seems overkill here, since you're neither using
>>>> the recursive walk, nor dispatch on anything other than Stmts. Maybe this
>>>> should be a ConstStmtVisitor instead?
>>>>
>>> Done.
>>>
>>>>
>>>> Maybe name this something like PotentialResultsSetFinder, since that's
>>>> closer to what it does.
>>>>
>>> Done.
>>>>
>>>>
>>>> +    llvm::SmallPtrSet<Expr*, 2>& MaybeODRUseExprs;
>>>>
>>>> " &", not "& ", please. I have other formatting-related comments on this
>>>> patch, but clang-format can tell you what they are. :-)
>>>>
>>>
>>> Once I obtain your feedback on this revision - and prior to my next
>>> revision, i'll try and figure out how to use clang-format - thanks!
>>>
>>>>
>>>>
>>>> +    typedef RecursiveASTVisitor<ConstantExpressionODRCleaner>
>>>> inherited;
>>>>
>>>> Should be "Inherited", bu just remove this since you're not actually
>>>> using it.
>>>>
>>>>
>>>> +    // C++14 CD, DR 712: 3.2 para 2
>>>>
>>>> Please use "C++1y" here so that we can more easily find it with a grep
>>>> later.
>>>>
>>> Done.
>>>
>>>>
>>>>
>>>> +    // normally, we don't need to do any additional conversions to
>>>> handle it,
>>>> +    // but if it is a volatile lvalue with a special form, we perform
>>>> an
>>>> +    // lvalue-to-rvalue conversion.  Additionally, if it was a c++11
>>>> constant
>>>> +    // expression.
>>>>
>>>> The added comment here doesn't look right; I think you should say
>>>> something like "Even if we do not perform an lvalue-to-rvalue conversion, we
>>>> pretend that one was performed when checking for odr-uses of variables."
>>>>
>>>>
>>> Done.
>>>
>>>>
>>>> Index: lib/CodeGen/CGExpr.cpp
>>>> ===================================================================
>>>>
>>>> You have CodeGen changes but no tests for them, please add some,
>>>> covering all the ways in which a variable can be referenced without being
>>>> odr-used.
>>>>
>>>
>>> Still no codegen tests on this revision - will include them with my next
>>> revision once i get your technical feedback on this one (and figure out the
>>> LLVM IR i need to
>>> ramp up on, so that I can start writing codegen tests for this).  Any
>>> preliminary guidance will be appreciated.
>>>
>>>>
>>>>
>>>> --- lib/CodeGen/CGExpr.cpp (revision 182855)
>>>> +++ lib/CodeGen/CGExpr.cpp (working copy)
>>>> @@ -1800,6 +1800,13 @@
>>>>    return CGF.EmitLValueForField(LV, FD);
>>>>  }
>>>>
>>>> +
>>>> +inline bool isLambdaCallOperator(const Decl *D) {
>>>>
>>>> This seems generally useful; perhaps it should be a member of
>>>> CXXMethodDecl? If not, s/inline/static/, please.
>>>>
>>> This still needs to be done.
>>>
>>>>
>>>>
>>>> +  if (!ND->isUsed(false) || (E->refersToEnclosingLocal() &&
>>>> +                              isLambdaCallOperator(CurCodeDecl))) {
>>>> +    if (const VarDecl *VD = dyn_cast<VarDecl>(ND)) {
>>>>
>>>> Please flip these two 'if's around, and merge this into the preceding
>>>> code (the end result should just be removing the isReferenceType() check
>>>> from the preceding code block and adding the lambda-enclosing-local check).
>>>>
>>>>
>>> Done.
>>>
>>>>
>>>> @@ -2476,7 +2497,23 @@
>>>>
>>>>  LValue CodeGenFunction::EmitMemberExpr(const MemberExpr *E) {
>>>>    Expr *BaseExpr = E->getBase();
>>>> +  ASTContext &Context = CGM.getContext();
>>>> +  APValue ConstExprResult;
>>>>
>>>> +  // Emit a field member access that evaluates to a constexpr s.x
>>>> +  if (Context.getLangOpts().CPlusPlus11 &&
>>>> E->isCXX11ConstantExpr(Context,
>>>> +          &ConstExprResult)) {
>>>> +    if (FieldDecl *FD = dyn_cast<FieldDecl>(E->getMemberDecl())) {
>>>> +      CharUnits Alignment = Context.getDeclAlign(FD);
>>>> +      llvm::Constant *Val =
>>>> +        CGM.EmitConstantValue(ConstExprResult, FD->getType(), this);
>>>> +      assert(Val && "failed to emit C++11 constant member expression");
>>>> +      // FIXME: Eventually we will want to emit vector element
>>>> references.
>>>> +      QualType T = E->getType();
>>>> +      return MakeAddrLValue(Val, T, Alignment);
>>>> +    }
>>>> +  }
>>>>
>>>> I don't think this approach can work. Consider this:
>>>>
>>>>   struct S { int x; };
>>>>   constexpr S a = { 1 };
>>>>   S b = { 2 };
>>>>   bool k;
>>>>   return (k ? a : b).x;
>>>>
>>>> This does not odr-use 'a', but it does odr-use 'b'. The MemberExpr is
>>>> not a constant expression here. I think instead what you should do is to
>>>> teach AggExprEmitter (and friends) to directly emit a constant value when
>>>> they visit a DeclRefExpr which is not an odr-use
>>>>
>>>
>>> I think I fixed this - but I must admit I was unable to figure out how to
>>> fix it through AggExprEmitter.  My fix uses a visitor in
>>> CodeGenFunction::EmitDeclRefExpr
>>> to try and figure out whether the DeclRefExpr is in a conditional
>>> operator context of an object expression of a member expression (which is
>>> pushed onto a stack), and if
>>> that is the case, it emits a pointer to the struct vs the constant
>>> version of the struct.  It seems to pass my tests - but like i said earlier,
>>> I have not had the time yet
>>> to write the appropriate LLVM-IR codegen tests.  Please let me know about
>>> the robustness of my fix - it does feel a little brittle to me.  If you
>>> stiill feel
>>> AggExprEmitter is the way to go with this, I could certainly use a little
>>> guidance as to how we enter into AggExprEmitter while emitting code for the
>>> conditional expr.
>>>
>>>>
>>>>
>>>> @@ -3239,6 +3276,23 @@
>>>>  LValue CodeGenFunction::
>>>>  EmitPointerToDataMemberBinaryExpr(const BinaryOperator *E) {
>>>>    llvm::Value *BaseV;
>>>> +
>>>> +  Expr *ObjExpr = E->getLHS();
>>>> +  ASTContext &Context = CGM.getContext();
>>>> +  APValue ConstExprResult;
>>>> +  // Emit a field member access that evaluates to a constexpr s.*memfn
>>>> +  if (Context.getLangOpts().CPlusPlus11 &&
>>>> E->isCXX11ConstantExpr(Context,
>>>> +        &ConstExprResult)) {
>>>> +    const MemberPointerType *MPT
>>>> +                  = E->getRHS()->getType()->getAs<MemberPointerType>();
>>>> +    llvm::Constant *Val =
>>>> +      CGM.EmitConstantValue(ConstExprResult, MPT->getPointeeType(),
>>>> this);
>>>> +    assert(Val && "failed to emit C++11 constant member expression");
>>>> +    // FIXME: Eventually we will want to emit vector element
>>>> references.
>>>> +    QualType T = E->getType();
>>>> +    return MakeAddrLValue(Val, T);
>>>> +  }
>>>>
>>>> In this case, you should constant-evaluate the RHS but not the LHS.
>>>> Also, use EvaluateAsRValue here; it doesn't matter whether we need to
>>>> perform non-standard constant folding. And there's no need to check for
>>>> C++11 mode.
>>>>
>>>>
>>> Done.
>>>
>>>
>>>>
>>>> Index: test/CXX/basic/basic.def.odr/p2-DR712-C++14-CD.cpp
>>>> ===================================================================
>>>> --- test/CXX/basic/basic.def.odr/p2-DR712-C++14-CD.cpp (revision 0)
>>>> +++ test/CXX/basic/basic.def.odr/p2-DR712-C++14-CD.cpp (working copy)
>>>>
>>>> Maybe just call this .../p2-potential-results.cpp -- this isn't
>>>> C++1y-specific. This file contains both UTF-8 characters in comments and
>>>> \r\n line separators, please fix.
>>>
>>>
>>> Done.
>>>
>>>
>>> Await your feedback ...
>>>
>>> Also *ping* the patch
>>> (http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130506/079656.html)
>>> was LGTM'd by Doug, anyway we can get that one committed please - thank you!
>>>
>>> Thanks again Richard!
>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, May 18, 2013 at 1:42 PM, Richard Smith <richard at metafoo.co.uk>
>>>>> wrote:
>>>>>>
>>>>>> On Sat, May 18, 2013 at 9:22 AM, Faisal Vali <faisalv at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Before I return to working on generic lambdas, I thought it might be
>>>>>>> useful to nail down DR 712 which reworded the definition of odr-use in terms
>>>>>>> of the potential results of an expression (in order to avoid odr-using a
>>>>>>> variable that represents a constant expression whenever we can get away with
>>>>>>> it).  This is relevant to lambdas, because it affects which variables get
>>>>>>> captured and which don't.
>>>>>>>
>>>>>>>
>>>>>>> Let me review how I think clang currently handles this (pre-712
>>>>>>> resolution  - please correct me where I err). While parsing and semanalyzing
>>>>>>> expressions, all variables that are usable in constant expressions (and so
>>>>>>> might not need to be odr-used) are stored in MaybeODRUseExprs (a set
>>>>>>> container) as DeclRefExprs (DRE).  Then, if an Lvalue-to-Rvalue conversion
>>>>>>> is performed on the DRE, it is removed from the set above (In
>>>>>>> UpdateMarkingForLValueToRValue).  Anything left in that set is marked as
>>>>>>> odr-used by ActOnFinishFullExpr.
>>>>>>>
>>>>>>>
>>>>>>> OK, with the above in mind, these are my initial thoughts on how to
>>>>>>> implement it so that not only do we handle those constant expressions that
>>>>>>> are lvalue-to-rvalue converted, but also those that are discarded, and also
>>>>>>> handle the conditional expressions and member-access expressions.
>>>>>>>
>>>>>>> The relevant hooks/callbacks/functions within clang are:
>>>>>>>   - UpdateMarkingForLValueToRValue
>>>>>>>   - IgnoredValueConversions
>>>>>>>
>>>>>>> A function i intend to write is GetPotentialResultsOfExpression: when
>>>>>>> passed an Expr* E, returns a vector of its potential results as
>>>>>>> DeclRefExpr's (and MemberExpr's??); this function should follow from  3.2
>>>>>>> para 2.
>>>>>>>
>>>>>>> Now in each hook, GetPotentialResults shall be called, and those
>>>>>>> expressions shall be removed from the MaybeODRUseExpr.
>>>>>>
>>>>>>
>>>>>> This seems like the right approach, although you presumably don't
>>>>>> actually need to build the vector and could just remove the odr-use marker
>>>>>> directly.
>>>>>>
>>>>>>>
>>>>>>> What are your thoughts on the above preliminary strategy?  I have
>>>>>>> this sense that Member expressions are going to require some more work, but
>>>>>>> we'll see...
>>>>>>>
>>>>>>> Also before I proceed any further, I was hoping to get some more
>>>>>>> clarity on 3.2 para 2 and para 3 - please see my queries below:
>>>>>>>
>>>>>>>
>>>>>>> Consider the following code:
>>>>>>>
>>>>>>>
>>>>>>> struct S {
>>>>>>>   const int mi;
>>>>>>>   constexpr S(int i) : mi(i) { }
>>>>>>> };
>>>>>>>
>>>>>>> int f(const int& );
>>>>>>> void g(int);
>>>>>>> int main() {
>>>>>>>   constexpr S r{5};
>>>>>>>   constexpr S s{r};
>>>>>>>   [](char c) {
>>>>>>>       c = s.mi;  // #1 does this odr-use 's'
>>>>>>
>>>>>>
>>>>>> No. Here, e = 's.mi', ex = 's' (a member of the set of potential
>>>>>> results of e), and e is subject to an lvalue-to-rvalue conversion.
>>>>>>
>>>>>>>
>>>>>>>       g(s.mi);   //  #2 what about this?
>>>>>>
>>>>>>
>>>>>> Also no, because an lvalue-to-rvalue conversion is applied to s.mi.
>>>>>>
>>>>>>>
>>>>>>>       s.mi;        // #3 and this?
>>>>>>
>>>>>>
>>>>>> Also no, because this is a discarded-value expression.
>>>>>>
>>>>>>>
>>>>>>>       c ? s.mi : r.mi; // #4 and this?
>>>>>>
>>>>>>
>>>>>> Also no.
>>>>>>
>>>>>>>
>>>>>>>       g( c ? s.mi : r.mi ); // #5
>>>>>>
>>>>>>
>>>>>> Also no.
>>>>>>
>>>>>>>
>>>>>>>   };
>>>>>>> }
>>>>>>>
>>>>>>> My intial suspicion was that neither s nor r were odr-used in the
>>>>>>> lambda (because the compiler can figure out the value of 'mi' at its point
>>>>>>> of use, and use it or discard it) - but then based on Richard's analysis of
>>>>>>> the example below where he states that s.operator int() is an odr-use of
>>>>>>> 's', i am now unsure.
>>>>>>
>>>>>>
>>>>>> Note that member function calls aren't listed in the rules of 3.2/2.
>>>>>> 's' is not in the set of potential results of 's.operator int()'.
>>>>>>
>>>>>>>
>>>>>>> Here is my attempt at trying to figure this out (e refers to
>>>>>>> expression, ex as a potentially-evaluated expression, using the language of
>>>>>>> 3.2 para 2 and 3)
>>>>>>>
>>>>>>> In #1,
>>>>>>>   the variable 's' appears as a potentially evaluated expression and
>>>>>>> satisfies the requirements for appearing in a constant expression, and it is
>>>>>>> an object, and it is an element of the set of potential results of an
>>>>>>> expression 'e' (s.mi) and the lvalue-to-rvalue conversion is applied to that
>>>>>>> expression so it is not odr-used?  But what if 'e' is deemed to be 's' in
>>>>>>> 's.mi' - no lvalue-to-rvalue conversion is applied so it should be captured?
>>>>>>
>>>>>>
>>>>>> The rule says: "x [...] is odr-used unless [...] ex is is an element
>>>>>> of the set of potential results of an expression e, where [...]"
>>>>>>
>>>>>> That is, if there *exists* such an 'e', then 'x' is not odr-used.
>>>>>>
>>>>>> Hope that helps!
>>>>>>
>>>>>>>
>>>>>>> OK, I'm going to give up on analyzing the rest, because I feel I'm
>>>>>>> missing something and am unable to think about this clearly.
>>>>>>>
>>>>>>> Any help or guidance will be appreciated!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 15, 2013 at 3:21 PM, Richard Smith
>>>>>>> <richard at metafoo.co.uk> wrote:
>>>>>>>>
>>>>>>>> On Tue, May 14, 2013 at 9:48 PM, Faisal Vali <faisalv at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> While we're on the topic, can i ask you to clarify a few capture
>>>>>>>>> and constexpr
>>>>>>>>> questions below:
>>>>>>>>
>>>>>>>>
>>>>>>>> OK, the relevant context is [basic.def.odr]p2 and p3.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> struct S {
>>>>>>>>>   constexpr S() {}
>>>>>>>>>   constexpr operator int() const { return 0; }
>>>>>>>>> };
>>>>>>>>> void fooS(S s) { }
>>>>>>>>>
>>>>>>>>> void fooInt(int i) { }
>>>>>>>>>
>>>>>>>>> void f() {
>>>>>>>>>   constexpr S s {};
>>>>>>>>>   auto L = [] (int x) {
>>>>>>>>>       (void) s;  // 1
>>>>>>>>
>>>>>>>>
>>>>>>>> Here, the 's' expression is a discarded-value expression, and the
>>>>>>>> 's' variable is in the set of potential results of the 's' expression, and
>>>>>>>> 's' satisfies the constraints for appearing in a constant expression, so 's'
>>>>>>>> is not odr-used, so is not captured. We should not require a capture-default
>>>>>>>> here.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>       fooS(s);  // 2
>>>>>>>>
>>>>>>>>
>>>>>>>> Here, 's' is implicitly passed to the implicit S::S(const S&)
>>>>>>>> constructor. This is neither an lvalue-to-rvalue conversion nor a
>>>>>>>> discarded-value expression, so 's' is odr-used and we have an error due to
>>>>>>>> the missing capture-default.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>       fooInt(s); // 3
>>>>>>>>
>>>>>>>>
>>>>>>>> This is equivalent to 's.operator int()', which again odr-uses 's',
>>>>>>>> so requires a capture-default.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>   };
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Should the above be ok in spite of L not having a default capture?
>>>>>>>>> Currently clang errors on all of them individually - is that the
>>>>>>>>> right behavior?
>>>>>>>>>
>>>>>>>>> Also, how would you want me to start submitting patches for commit
>>>>>>>>> - do you want
>>>>>>>>> me to break up the patch into smaller patches? - and if so, do you
>>>>>>>>> have any thoughts
>>>>>>>>> on how I might want to break up the functionality per patch?
>>>>>>>>
>>>>>>>>
>>>>>>>> Smaller patches are definitely better, if you can decompose the
>>>>>>>> functionality into coherent smaller chunks. There are some hints on how to
>>>>>>>> decompose the functionality here (but this division may or may not work for
>>>>>>>> you):
>>>>>>>>
>>>>>>>>
>>>>>>>> http://clang.llvm.org/docs/InternalsManual.html#how-to-add-an-expression-or-statement
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>