[llvm-dev] RFC: Modernizing our use of auto

Sun Dec 16 11:44:22 PST 2018

On 25/11/2018 14:43, Stephen Kelly via llvm-dev wrote:
> 
> However this is a proposal for more modern thinking regarding the 
> permissiveness of auto in LLVM codebases.
> 
> Currently the rule on the use of auto is here:

Hi,

Thanks for the input on this topic, which has appeared here on the 
mailing list, on the phab review, and on IRC.

Almost all respondants generally expressed the claim "The type must be 
obvious if auto is used", though as I wrote before the guide uses auto 
in context that the type is not obvious:

  for (const auto &Val : Container) { observe(Val); }

It seems that the respondants wish for 'almost never auto'. Fair enough, 
if the existing practice supports that.

There is one problem though, or rather ~56,000 problems. That's how many 
uses of auto there are in the entire codebase currently, according to 
someone on IRC.

I tried to get some rough idea of how it is used. Here are the uses in 
the lib folder of clang, which hopefully rules out 'generic' code in 
headers and use in tests etc.

  $ git grep "\bauto\b" lib/ | wc -l
  12104

about half of those have an equals and angle brackets on the other side 
(dyn_cast<T> etc):

  $ git grep -e "\bauto\b" --and --not -e "auto.* = .*<.*>" lib | wc -l
  7687

Then we can filter out the ones where it is used in for and range-for:

  $ git grep -e "\bauto\b" --and --not -e "auto.* = .*<.*>" --and --not 
-e "\bfor\b" lib | wc -l
  5007

about 2600 uses of `for (auto ...)`.

And try to eliminate uses where the result is an iterator created with 
begin/end etc:

  $ git grep -e "\bauto\b" --and --not -e "auto.* = .*<.*>" --and --not 
-e "\bfor\b" --and --not -e begin --and --not -e end --and --not -e next 
lib | wc -l
  4570

So that's 4500ish uses which remain after all of those filters.

Here is the same command run in my llvm clone:

  $ git grep -e "\bauto\b" --and --not -e "auto.* = .*<.*>" --and --not 
-e "\bfor\b" --and --not -e begin --and --not -e end --and --not -e next 
lib | wc -l
  8243

So, about 12000 uses in llvm+clang repos.

Here is a random selection of 15 after those filters in the clang repo:

git grep -e "\bauto\b" --and --not -e "auto.* = .*<.*>" --and --not -e 
"\bfor\b" --and --not -e begin --and --not -e end --and --not -e next 
lib | shuf -n 15
lib/Frontend/FrontendActions.cpp:  auto Buffer = 
FileMgr.getBufferForFile(getCurrentFile());
lib/Sema/SemaDecl.cpp:        auto IFace = MD->getClassInterface();
lib/Format/Format.cpp:  auto NewCode = applyAllReplacements(Code, Replaces);
lib/Parse/ParsePragma.cpp:  auto &Opt = Actions.getOpenCLOptions();
lib/StaticAnalyzer/Checkers/NumberObjectConversionChecker.cpp:  auto 
CppSuspiciousNumberObjectExprM =
lib/Lex/HeaderSearch.cpp:    auto *HFI = getExistingFileInfo(FE);
lib/Driver/ToolChains/Myriad.cpp:  const auto &TC =
lib/CodeGen/SwiftCallingConv.cpp:      if (auto commonTy = 
getCommonType(firstVecTy->getElementType(),
lib/AST/ASTDiagnostic.cpp:      if (auto nullability = 
AttributedType::stripOuterNullability(SugarRT)) {
lib/Sema/SemaStmtAttr.cpp:  auto *FnScope = S.getCurFunction();
lib/CodeGen/CGBlocks.cpp:          auto addr = 
LocalDeclMap.find(variable)->second;
lib/Serialization/ASTReaderDecl.cpp:  auto V = Record.readInt();
lib/Sema/SemaType.cpp:    // C++11 [dcl.spec.auto]p2: 'auto' is always 
fine if the declarator
lib/CodeGen/CGCall.cpp:        auto SrcLayout = 
CGM.getDataLayout().getStructLayout(STy);
lib/CodeGen/CGDeclCXX.cpp:    auto NL = 
ApplyDebugLocation::CreateEmpty(*this);

Do those uses conform to the guide? If they don't, then should the guide 
be updated? Are the types there 'obvious'?

Would the people who reject auto unless 'the type is obvious' accept all 
of the ~12000 uses that pass those filters?

How did all of those uses get into the codebase? Does it indicate that 
the guide is not followed, or does it indicate that the guide is too 
subjective, or that maybe the 'the type must be obvios' guide does not 
reflect the 'reality at the coalface' anymore? Should those uses of auto 
be changed?

How is a new contributor to react to any of that? What are the real 
criteria that we can use to determine where auto will cause a patch to 
be rejected? Does it only depend on who you get as a reviewer?

Here is a quote from this thread from Chris and supported by Chandler 
and Quentin at least:

 > Perhaps the rule came be rewritten more generally, as
 > in “auto should generally only be used when the inferred
 > type is obvious from context, e.g. as a result of a cast
 > like dyn_cast or as a result of constructing a value with
 > a constructor containing a type name.”?

Is it reasonable to have that as a rule if there are ~12000 uses which 
break that rule?

Here is some code from SourceManager.cpp:

  unsigned LineTableInfo::getLineTableFilenameID(StringRef Name) {
    auto IterBool = FilenameIDs.try_emplace(Name, FilenamesByID.size());
    if (IterBool.second)
      FilenamesByID.push_back(&*IterBool.first);
    return IterBool.first->second;
  }

What exactly is the type of FilenameIDs? Does knowing or not knowing 
that affect readability or maintainability? What is the exact type of 
IterBool? Does knowing or not knowing that affect readability or 
maintainability?

That code is very similar categorically to the code I submitted for 
review and which was rejected:

  llvm::Optional<std::pair<std::string, MatcherCtor>>
  getNodeConstructorType(ASTNodeKind targetType) {
    const auto &ctors = RegistryData->nodeConstructors();
    auto it = llvm::find_if(
        ctors, [targetType](const NodeConstructorMap::value_type &ctor) {
          return ctor.first.isSame(targetType);
        });
    if (it == ctors.end())
      return llvm::None;
    return it->second;
  }

Does 'ctors' being 'auto' make the code unreadable or unmaintainable? 
What if I instead write ad the review requests:

   const NodeConstructorMap &ctors = RegistryData->nodeConstructors();

Do you now understand something more than before? The type is still not 
obvious. NodeConstructorMap is just a typedef.

Both snippets are

* short
* algorithmic
* Require the local variable only for a trivial condition

and they should both be accepted or rejected, depending on whether that 
matters.

Those criteria are also the same for the other snippet that I posted before:

  template <typename BaseT, typename DerivedT>
  void registerIfNodeMatcher(...) {
    auto NodeKind = ASTNodeKind::getFromNodeKind<DerivedT>();
    if (!NodeKind.isNone())
      NodeCtors[NodeKind] = std::make_pair(MatcherName, Descriptor);
  }

[an update on this one: I changed it to

  template <typename BaseT, typename DerivedT>
  void registerIfNodeMatcher(...) {
    registerIfNodeMatcher(
        ASTNodeKind::getFromNodeKind<DerivedT>(),
        Descriptor, MatcherName);
  }

and it passed review. The return type of

  ASTNodeKind::getFromNodeKind<DerivedT>()

is no more obvious than before and now we don't have a variable name 
giving more info, so I have have further doubts that 'the type must be 
obvious' is a useful guide.  It seems instead that people reject code 
because they don't want to see 'auto'.]

Again, it is only reasonable that all three of these should be accepted 
or rejected independent of the reviewer/contributor. If that 
acceptance/rejection is dependent on the reviewer, then you have a bad 
situation.

Here is a recent commit which replaces a local typedef with auto:

  https://github.com/llvm-mirror/clang/commit/a7d1b31c

How did that get into the repo, given the interpretation of the guide in 
this thread? It appears that there are many interpretations of the guide.

If the rule is that 'the type must be obvious', as is claimed, then the 
rules are not being followed. New commits are introducing auto in 
categories where my use was rejected, and there are ~12000 uses which 
bypass some claimed 'acceptable auto' filters.

 From the perspective of a relatively new contributor, it looks like 
patches are accepted or rejected based on the use of `auto` depending on 
who is doing it, and who is reviewing.

As a new contributor, I want to know that if I have to follow a rule (or 
indeed follow the opinion of whoever is reviewing), then everyone has to 
follow the same rule. I want to know that it doesn't depend on who is 
reviewing. That doesn't currently look like the case, but maybe I'm 
missing something.

How is a new contributor to react to any of that? Is it ok that 
acceptance/rejection of a patch that uses auto depends on the reviewer? 
Especially for a contributor who sees auto finding more uses in modern 
c++ and in the ether of internet code.

Some people (At least Chris Lattner) have expressed interest in updating 
the guidelines. To what I'm not sure, but it would be good to have 
guidelines which reflect existing ground truth - even if you don't like 
that truth.

I also recommend that if someone tries to update the guidelines, they 
try to find out why the uses of `auto` which I described in this mail 
exist. It's really not 'in the interest of writability' and 'readability 
doesn't matter'. I find it a bit condescending to suggest that is the 
case. I suspect that in many cases, 'the precise type is not relevant 
here' could be part of the reason those 12000 uses of auto were written. 
Maybe some research (inside and outside of llvm) would lead to insights.

Thanks,

Stephen.