[llvm-dev] RFC: Modernizing our use of auto
Stephen Kelly via llvm-dev
llvm-dev at lists.llvm.org
Sun Dec 16 11:44:22 PST 2018
On 25/11/2018 14:43, Stephen Kelly via llvm-dev wrote:
>
> However this is a proposal for more modern thinking regarding the
> permissiveness of auto in LLVM codebases.
>
> Currently the rule on the use of auto is here:
Hi,
Thanks for the input on this topic, which has appeared here on the
mailing list, on the phab review, and on IRC.
Almost all respondants generally expressed the claim "The type must be
obvious if auto is used", though as I wrote before the guide uses auto
in context that the type is not obvious:
for (const auto &Val : Container) { observe(Val); }
It seems that the respondants wish for 'almost never auto'. Fair enough,
if the existing practice supports that.
There is one problem though, or rather ~56,000 problems. That's how many
uses of auto there are in the entire codebase currently, according to
someone on IRC.
I tried to get some rough idea of how it is used. Here are the uses in
the lib folder of clang, which hopefully rules out 'generic' code in
headers and use in tests etc.
$ git grep "\bauto\b" lib/ | wc -l
12104
about half of those have an equals and angle brackets on the other side
(dyn_cast<T> etc):
$ git grep -e "\bauto\b" --and --not -e "auto.* = .*<.*>" lib | wc -l
7687
Then we can filter out the ones where it is used in for and range-for:
$ git grep -e "\bauto\b" --and --not -e "auto.* = .*<.*>" --and --not
-e "\bfor\b" lib | wc -l
5007
about 2600 uses of `for (auto ...)`.
And try to eliminate uses where the result is an iterator created with
begin/end etc:
$ git grep -e "\bauto\b" --and --not -e "auto.* = .*<.*>" --and --not
-e "\bfor\b" --and --not -e begin --and --not -e end --and --not -e next
lib | wc -l
4570
So that's 4500ish uses which remain after all of those filters.
Here is the same command run in my llvm clone:
$ git grep -e "\bauto\b" --and --not -e "auto.* = .*<.*>" --and --not
-e "\bfor\b" --and --not -e begin --and --not -e end --and --not -e next
lib | wc -l
8243
So, about 12000 uses in llvm+clang repos.
Here is a random selection of 15 after those filters in the clang repo:
git grep -e "\bauto\b" --and --not -e "auto.* = .*<.*>" --and --not -e
"\bfor\b" --and --not -e begin --and --not -e end --and --not -e next
lib | shuf -n 15
lib/Frontend/FrontendActions.cpp: auto Buffer =
FileMgr.getBufferForFile(getCurrentFile());
lib/Sema/SemaDecl.cpp: auto IFace = MD->getClassInterface();
lib/Format/Format.cpp: auto NewCode = applyAllReplacements(Code, Replaces);
lib/Parse/ParsePragma.cpp: auto &Opt = Actions.getOpenCLOptions();
lib/StaticAnalyzer/Checkers/NumberObjectConversionChecker.cpp: auto
CppSuspiciousNumberObjectExprM =
lib/Lex/HeaderSearch.cpp: auto *HFI = getExistingFileInfo(FE);
lib/Driver/ToolChains/Myriad.cpp: const auto &TC =
lib/CodeGen/SwiftCallingConv.cpp: if (auto commonTy =
getCommonType(firstVecTy->getElementType(),
lib/AST/ASTDiagnostic.cpp: if (auto nullability =
AttributedType::stripOuterNullability(SugarRT)) {
lib/Sema/SemaStmtAttr.cpp: auto *FnScope = S.getCurFunction();
lib/CodeGen/CGBlocks.cpp: auto addr =
LocalDeclMap.find(variable)->second;
lib/Serialization/ASTReaderDecl.cpp: auto V = Record.readInt();
lib/Sema/SemaType.cpp: // C++11 [dcl.spec.auto]p2: 'auto' is always
fine if the declarator
lib/CodeGen/CGCall.cpp: auto SrcLayout =
CGM.getDataLayout().getStructLayout(STy);
lib/CodeGen/CGDeclCXX.cpp: auto NL =
ApplyDebugLocation::CreateEmpty(*this);
Do those uses conform to the guide? If they don't, then should the guide
be updated? Are the types there 'obvious'?
Would the people who reject auto unless 'the type is obvious' accept all
of the ~12000 uses that pass those filters?
How did all of those uses get into the codebase? Does it indicate that
the guide is not followed, or does it indicate that the guide is too
subjective, or that maybe the 'the type must be obvios' guide does not
reflect the 'reality at the coalface' anymore? Should those uses of auto
be changed?
How is a new contributor to react to any of that? What are the real
criteria that we can use to determine where auto will cause a patch to
be rejected? Does it only depend on who you get as a reviewer?
Here is a quote from this thread from Chris and supported by Chandler
and Quentin at least:
> Perhaps the rule came be rewritten more generally, as
> in “auto should generally only be used when the inferred
> type is obvious from context, e.g. as a result of a cast
> like dyn_cast or as a result of constructing a value with
> a constructor containing a type name.”?
Is it reasonable to have that as a rule if there are ~12000 uses which
break that rule?
Here is some code from SourceManager.cpp:
unsigned LineTableInfo::getLineTableFilenameID(StringRef Name) {
auto IterBool = FilenameIDs.try_emplace(Name, FilenamesByID.size());
if (IterBool.second)
FilenamesByID.push_back(&*IterBool.first);
return IterBool.first->second;
}
What exactly is the type of FilenameIDs? Does knowing or not knowing
that affect readability or maintainability? What is the exact type of
IterBool? Does knowing or not knowing that affect readability or
maintainability?
That code is very similar categorically to the code I submitted for
review and which was rejected:
llvm::Optional<std::pair<std::string, MatcherCtor>>
getNodeConstructorType(ASTNodeKind targetType) {
const auto &ctors = RegistryData->nodeConstructors();
auto it = llvm::find_if(
ctors, [targetType](const NodeConstructorMap::value_type &ctor) {
return ctor.first.isSame(targetType);
});
if (it == ctors.end())
return llvm::None;
return it->second;
}
Does 'ctors' being 'auto' make the code unreadable or unmaintainable?
What if I instead write ad the review requests:
const NodeConstructorMap &ctors = RegistryData->nodeConstructors();
Do you now understand something more than before? The type is still not
obvious. NodeConstructorMap is just a typedef.
Both snippets are
* short
* algorithmic
* Require the local variable only for a trivial condition
and they should both be accepted or rejected, depending on whether that
matters.
Those criteria are also the same for the other snippet that I posted before:
template <typename BaseT, typename DerivedT>
void registerIfNodeMatcher(...) {
auto NodeKind = ASTNodeKind::getFromNodeKind<DerivedT>();
if (!NodeKind.isNone())
NodeCtors[NodeKind] = std::make_pair(MatcherName, Descriptor);
}
[an update on this one: I changed it to
template <typename BaseT, typename DerivedT>
void registerIfNodeMatcher(...) {
registerIfNodeMatcher(
ASTNodeKind::getFromNodeKind<DerivedT>(),
Descriptor, MatcherName);
}
and it passed review. The return type of
ASTNodeKind::getFromNodeKind<DerivedT>()
is no more obvious than before and now we don't have a variable name
giving more info, so I have have further doubts that 'the type must be
obvious' is a useful guide. It seems instead that people reject code
because they don't want to see 'auto'.]
Again, it is only reasonable that all three of these should be accepted
or rejected independent of the reviewer/contributor. If that
acceptance/rejection is dependent on the reviewer, then you have a bad
situation.
Here is a recent commit which replaces a local typedef with auto:
https://github.com/llvm-mirror/clang/commit/a7d1b31c
How did that get into the repo, given the interpretation of the guide in
this thread? It appears that there are many interpretations of the guide.
If the rule is that 'the type must be obvious', as is claimed, then the
rules are not being followed. New commits are introducing auto in
categories where my use was rejected, and there are ~12000 uses which
bypass some claimed 'acceptable auto' filters.
From the perspective of a relatively new contributor, it looks like
patches are accepted or rejected based on the use of `auto` depending on
who is doing it, and who is reviewing.
As a new contributor, I want to know that if I have to follow a rule (or
indeed follow the opinion of whoever is reviewing), then everyone has to
follow the same rule. I want to know that it doesn't depend on who is
reviewing. That doesn't currently look like the case, but maybe I'm
missing something.
How is a new contributor to react to any of that? Is it ok that
acceptance/rejection of a patch that uses auto depends on the reviewer?
Especially for a contributor who sees auto finding more uses in modern
c++ and in the ether of internet code.
Some people (At least Chris Lattner) have expressed interest in updating
the guidelines. To what I'm not sure, but it would be good to have
guidelines which reflect existing ground truth - even if you don't like
that truth.
I also recommend that if someone tries to update the guidelines, they
try to find out why the uses of `auto` which I described in this mail
exist. It's really not 'in the interest of writability' and 'readability
doesn't matter'. I find it a bit condescending to suggest that is the
case. I suspect that in many cases, 'the precise type is not relevant
here' could be part of the reason those 12000 uses of auto were written.
Maybe some research (inside and outside of llvm) would lead to insights.
Thanks,
Stephen.
More information about the llvm-dev
mailing list