[llvm-dev] RFC: Modernizing our use of auto

Sun Nov 25 06:43:15 PST 2018

I'm not advocating AAA.

However this is a proposal for more modern thinking regarding the 
permissiveness of auto in LLVM codebases.

Currently the rule on the use of auto is here:

https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

It is quite strict. It allows the use of auto for

* lambdas
* iterators because they are long to type
* casts to the type

but generally leaves the rest up to the reviewer, not the contributor.

The notes about "more readable", "easier to maintain", "obvious from the 
context" are very subjective.

I think these guidelines need to be made more clear both in that 
subjective sense, but also in the sense of use of new/recent language 
and library features, given that LLVM will allow C++17 features in a few 
months. For example:

* Can we use auto in the initializer of range-for statements?
* Can we use auto in the initializer of if() statements?
* Can we use auto with llvm:Optional instances?
* Can we use auto in c++14 lambda arguments with llvm::find_if(C, 
[](const auto& i) { ... }) for example?
* We need to use auto for structured bindings, if those are used.

I know some people are very opposed to use of auto except in exceptional 
circumstances, but

* Some features force it a bit
* Elsewhere it is becoming more common for it to be reasonable to use. 
People are not  going as far as AAA, but auto is generally used in 
blogs, slides, and other peoples code (eg 
https://skebanga.github.io/if-with-initializer/). It's becoming more 
normalized without becoming extreme.
* Our potential contributors will not see a problem with using it in the 
above cases

So, we might benefit from somewhat more-modern guidelines.

Reviewers are very subjective and that means it is hard to predict what 
the requirements will be, depending on the reviewer. This is confusing 
for contributors.

For example, I was told by a reviewer that auto should *not* be used in 
the initalizer of a range for loop.  However, even the next section of 
the coding guidelines suggest that is ok:

https://llvm.org/docs/CodingStandards.html#beware-unnecessary-copies-with-auto

So, perhaps the guidelines for reviewers needs to adapt, or the coding 
guidelines need to adapt.

Currently, reviewers reject code of the form

   void foo()
   {
     auto ParserInstance = Parser::getFromArgs(Args);
     if (ParserInstance)
       obj.someApi(ParserInstance);
   }

as unreadable while

   void foo()
   {
     obj.otherApi(Parser::getFromArgs(Args));
   }

is fine. It would be good to have an answer in the coding guidelines to 
whether

   void foo()
   {
     if (auto ParserInstance = Parser::getFromArgs(Args); ParserInstance)
       obj.someApi(ParserInstance);
   }

is ok or not.

In the case that came up in review for me, the code I submitted is

template <typename BaseT, typename DerivedT>
void registerIfNodeMatcher(...) {
   auto NodeKind = ASTNodeKind::getFromNodeKind<DerivedT>();
   if (!NodeKind.isNone())
     NodeCtors[NodeKind] = std::make_pair(MatcherName, Descriptor);
}

but it was rejected as unreadable and I was given a link to the coding 
guidelines.

'NodeKind' appears three times in one line in that code and the 
requirement is to add it again. ASTNodeKind is a very commonly used 
class in that part of Clang, so it should be familiar enough. We only 
need the local variable at all so that we can do a validity check. There 
is no reasonable need to put the type there or even to care what exactly 
the type is. All we need to know is that we check the validity of the 
instance before using it.

I'd like to update the coding guidelines so that code such as

void foo(...) {
   auto Bar = SomeAPI();
   if (/* validity condition */)
     OtherAPI(Bar);
}

is unambiguously acceptable in the coding guidelines.

That kind of thing is coming up as Optional is used more in the 
codebase, but also with objects which have a validity check such as 
isNone(), isValid(), isNull() or just `!= nullptr` etc.

I'd also like to update them so that

llvm::Optional<std::pair<std::string, MatcherCtor>>
getNodeConstructorType(ASTNodeKind targetType) {
   auto const &ctors = RegistryData->nodeConstructors();
   auto it = llvm::find_if(
       ctors, [targetType](const NodeConstructorMap::value_type &ctor) {
         return ctor.first.isSame(targetType);
       });
   if (it == ctors.end())
     return llvm::None;
   return it->second;
}

is acceptable. The `auto it` is already acceptable, but the `auto const& 
ctors` depends on an interpretation of the guideline and was rejected in 
the review I submitted.

In that code, we only create the local `ctors` variable because we need 
to compare the `end` iterator after the algorithm.  If auto is ok to use 
in range-for loops, then reasonably it should be ok in the above code too.

Naturally, the guidelines should only be changed if there is consensus 
to change them. No one is trying to force anything, but I'm trying to 
address an inconsistency in reviews and a pain point for a new 
contributor. I'm trying to see if I can make the experience easier by 
updating the guidelines. Reviewers should be able to fall back to a good 
guideline, even if it differs from their subjective taste.

This is a pain point for me as a relatively new contributor to LLVM 
because sometimes reviewers have differing opinions. We shouldn't be 
putting up barriers to new contributors, but trying to remove them 
instead. I put up

  https://reviews.llvm.org/D54877

to try to start the conversation, so I'm interested in opinions!

Thanks,

Stephen.