[llvm-dev] [cfe-dev] Modernizing LLVM Coding Style Guide and enforcing Clang-tidy

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Thu Dec 29 12:03:01 PST 2016


On Thu, Dec 29, 2016 at 11:46 AM Mehdi Amini <mehdi.amini at apple.com> wrote:

> On Dec 29, 2016, at 11:20 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Thu, Dec 29, 2016 at 10:04 AM Mehdi Amini via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> On Dec 29, 2016, at 5:54 AM, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>
> From yesterday discussion, Daniel Berlin proposed using emplace_back
> everywhere to make code easier to maintain. I think it is valid argument,
> but I believe it would reduce readability.
>
>
> Just to be clear; I proposed not trying to switch uses back and forth
> without real data, and to come to some agreement about what should be
> written in the first place, preferably based on real data, and then,
> switching in some organized fashion, not just randomly :)
>
> IE Either have a clear coding standard and enforce it, or leave uses alone
> one way or the other without some demonstration that it actually matters.
>
> I would *personally* prefer to just use emplace_back everywhere, and not
> try to switch between the two, without demonstration of real benefit.
>
> (i'd also accept the other way, use push_back everywhere, and not try to
> switch between the two, without demonstration of real benefit).
>
> This preference is based on a simple truth: People suck at deciding when
> to use one or the other. Yes, i know how to use it and when to use it.
> Lots of our types are cheaply movable, etc, so you probably won't see any
> performance difference even when using them "right".  People have enough
> style/etc issues without having to try to think hard about this.
>
>
> I agree that “people suck at deciding”, me in the first place in the line,
> and that’s why I like clear and good guideline and stick to it unless
> "demonstration of real benefit”.
> I also think we can have clear guideline that are different from “always
> use emplace_back” or “always use push_back”, like Piotr is suggesting.
>
>
> I don't think Piotr is suggesting "always use push_back" but "always use
> push_back when it's valid" & I'd second this.
>
>
> Define “valid”? Just that it will compile?
>

Yep


>
> To show a simpler example:
>
> std::unique_ptr<T> u(v);
>
> std::unique_ptr<T> u = v;
>
> I'd generally advocate for using copy init (the second form - it doesn't
> copy in this case, it moves) over direct init (the first form) because it's
> more legible - copy init can only execute implicit ctor operations, whereas
> direct init can invoke implicit and explicit operations. So from a
> readability perspective seeing the latter is easier than seing the former -
> I know the operation is further constrained to simpler/less interesting
> operations (eg: 'v' can't be a raw pointer in the second form, it might be
> (& then I have to worry about whether that's an ownership transfer that's
> intended), etc)
>
> & push_back V emplace_back is the same choice - push_back means only
> implicit ops are used and so it's more readable
>
>
> I don’t see what’s more readable about “only implicit ctor”.
>

It limits the set of operations that can be performed by the code. So when
I read it there's less I have to think about/consider. (& specifically, the
implicit operations tend to be simpler/safer/more obvious - copy/move or
operations that are similar - not complex/interesting things like "taking
ownership from a raw pointer" or "creating a vector of N elements")


> Emplace is very explicit that we intended to construct an object, I don’t
> understand what hurts readability here?
>

Going back to the example above, given the following two lines:

std::unique_ptr<T> u(foo());
std::unique_ptr<T> u = foo();

(& the equivalent: emplace_back(foo()) V push_back(foo()) for a vector of
unique_ptr)

the copy init/push_back are simpler to read because they aren't as powerful
- I don't have to wonder if something is taking ownership there (or if I'm
creating a vector of N ints, etc). I know it's a simple/obvious operation,
generally (because others shouldn't be implicit).


>
>
> , emplace_back requires more careful attention.
>
> I think this is a reasonably good stylistic rule - but I'm happy leaving
> an open to judgment for sure - there might be readability reasons that an
> emplace_back may work better in some cases.
>
> I'd think of this like "else after return" - we don't have any
> enforcement, sometimes we make judgment about it being better (for
> consistency - sometimes there's a pattern such that else makes the code
> more readable), we haven't gone back to do any cleanup of the codebase, but
> it's pretty consistently applied/desired in code review.
>
>
> I have a different impression: we are actively cleaning the codebase with
> tidy-checks.
>

I think that's where Danny is pushing back - I'm on the fence about that.
So I'd push back on the automated cleanup (& have done so on several
occasions as these sort of patches have been sent) & encourage efforts to
provide the tools (like clang-tidy integration for patches and editors) to
avoid mistakes going forward in an advisory (rather than mandatory) way.


> For instance look for `git log --author=Zelenko`.
> Another example is that a few months ago the entire LLDB codebase has been
> formatted with clang-format, after marking the specific places where it was
> not desirable (tables for instance) with comment to disable clang-format.
>

*nod* Only because it was so far out of LLVM's style (by design
originally). LLVM's generally "close enough" that the cleanup isn't
desired/intended.

I'd say the same for this particular instance - implicit V explicit ctor
calls. Mostly we write them using copy init, not direct init. I suspect
push_back/emplace_back is similar (if only due to history - and perhaps due
to my/other code review feedback encouraging push_back when possible in
reviews that might otherwise be going to more emplace_back) so I'm not sure
I care too much about the cleanup.

I think the issue Danny was getting at - which I agree - is unless we have
good tools in place to not make these mistakes again going forward (I think
clang-format has sort of reached that point - it's got easy integration in
editors and source control systems) then there's limited merit in doing the
cleanup. +1 to that.

If we have good integration for clang-tidy, then I'd be more OK with doing
cleanup.

Either way (good or not integration) - if we have some people who have ways
of using clang-tidy in their development process, I'd say it's worth having
a project-wide clang-tidy config. I'd vote for putting "use implicit-only
ops (like copy init and push_back) over explicit ops (like direct init and
emplace_back) where both compile" in that list.

- Dave


>
>> Mehdi
>
>
>
> All that said - sure, I'd like to have tools to help me get this right
> (else after return and "prefer copy init style over direct init") & make
> that really easy to do.
>
> - Dave
>
>
>
> That said, I’m not convinced the relative “ugliness" of emplace_back  (I’m
> not sure what’s ugly there, I only see it as a habit to take)  vs push_back
> is enough to justify a set of rules to decide between the two, I’d be fine
> with “always using emplace_back” in the name of “simple rule is better when
> possible".
>
>
> So that isn't what I would necessarily propose for LLVM.
>
> For LLVM, my answer would be "either make tool do what we want, force use
> of tool, or be consistent and obvious with what we want and then fix it if
> it leads to performance issue”
>
>
> +1
> Ultimately whatever we do is not practical if it can’t be done by a tool.
>
>
> Not sure I follow this - we have lots of things we don't currently do with
> a tool that's still part of our style & stuff we try to catch with code
> review, etc. (else after return is my best example)
>
>
>
>
>
>
>
>
> There is also other reason - emplace_back can't take arguments of some
> types like bitfields, NULL, static member. Using emplace can also lead to
> memory leak in case of smart ptrs
> "
>
> auto *ptr = new int(1);v.push_back(std::unique_ptr<int>(ptr));
>
> This is because replacing it with emplace_back could cause a leak of this
> pointer if emplace_back would throw exception before emplacement (e.g.
> not enough memory to add new element).".
>
>
> This seems, IMHO, not a useful argument for LLVM since it does not try to
> use exception based error handling to recover.
>
>
> This, and also we should not write code like that (naked pointers, calling
> new), using make_unique<> for instance would prevent any such situation.
> Passing a raw pointer to a container of unique_ptr can be flagged by a tool.
>
>
> Sure - but it generalizes further than that, it's just a nice simple
> example.
>
> std::vector<int> v(i);
>
> 'i' could be an int & this would make a vector of that many ints, or it
> could be another vector & this is making a copy, etc.
>
> std::vector<int> v = i;
>
> I know taht's not creating a vector of 'i' many ints.
>
> Yeah, not a great example either - but the general idea applies, the
> latter form is less powerful & so is easier to read because you don't have
> to worry/think about it as much.
>
> I think we mostly do this sort of thing out of habit anyway - but
> codifying it (& if possible, tools to help) could be nice.
>
>
>
>
>
>> Mehdi
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161229/0a49cd10/attachment-0001.html>


More information about the llvm-dev mailing list