[PATCH] D107378: Make enum iteration with seq safe by default

Wed Sep 29 10:21:09 PDT 2021

kuhar added a comment.

In D107378#3028380 <https://reviews.llvm.org/D107378#3028380>, @aaron.ballman wrote:

> In D107378#3028339 <https://reviews.llvm.org/D107378#3028339>, @dblaikie wrote:
>
>> In D107378#3028305 <https://reviews.llvm.org/D107378#3028305>, @aaron.ballman wrote:
>>
>>> In D107378#3026392 <https://reviews.llvm.org/D107378#3026392>, @dblaikie wrote:
>>>
>>>> @jkuhar - is there some data you could include in the patch description/commit message to help explain/motivate this functionality? Common bugs (links to commits that fixed bugs that would've been found earlier if we had this functionality would be good) introduced by not having this safety/protection over iterating over enums?
>>>>
>>>> @dexonsmith @aaron.ballman - curious what you two think of this for general design. Reckon it's worth the macros and novel traits system (using an inline function rather than/in addition to a template specialization) to allow the trait to be specified closer to the type in nested type situations?
>>>
>>> I'm also curious about the motivation behind this. I'm all for preventing misuse where someone tries to form a sequence over enumerations, but I'm not convinced that enumerators form the same notional sequence. As you point out, the enumerators don't have to take on contiguous values. Without some motivating use cases, I think it's better to avoid the extra machinery entirely and prevent this use with enumeration types at all, but I could be convinced if there are some good use cases for it.
>>
>> FWIW I was sort of leaning the other way - that most enums are contiguous and there's probably some benefit to iterating over them all from time to time. But happy for the discussion either way.
>
> I guess I look at it more as: a sequence is a bounded contiguous range of integer values and wanting to iterate over enumerators is reflection. The enumerators don't have to be contiguous (even if they frequently are), nor do they have a notional relationship to one another (even if they frequently do), so they're a bit different from a sequence even if there's a lot of notional overlap. Because of this, and because reflection provides a whole host of other kinds of sequences (lists of enumerators, lists of field members, lists of member functions, etc), I think it should be a separate interface from sequence of contiguous integers.
>
> But maybe others disagree?

In D107378#3028380 <https://reviews.llvm.org/D107378#3028380>, @aaron.ballman wrote:

> In D107378#3028339 <https://reviews.llvm.org/D107378#3028339>, @dblaikie wrote:
>
>> In D107378#3028305 <https://reviews.llvm.org/D107378#3028305>, @aaron.ballman wrote:
>>
>>> In D107378#3026392 <https://reviews.llvm.org/D107378#3026392>, @dblaikie wrote:
>>>
>>>> @jkuhar - is there some data you could include in the patch description/commit message to help explain/motivate this functionality? Common bugs (links to commits that fixed bugs that would've been found earlier if we had this functionality would be good) introduced by not having this safety/protection over iterating over enums?
>>>>
>>>> @dexonsmith @aaron.ballman - curious what you two think of this for general design. Reckon it's worth the macros and novel traits system (using an inline function rather than/in addition to a template specialization) to allow the trait to be specified closer to the type in nested type situations?
>>>
>>> I'm also curious about the motivation behind this. I'm all for preventing misuse where someone tries to form a sequence over enumerations, but I'm not convinced that enumerators form the same notional sequence. As you point out, the enumerators don't have to take on contiguous values. Without some motivating use cases, I think it's better to avoid the extra machinery entirely and prevent this use with enumeration types at all, but I could be convinced if there are some good use cases for it.
>>
>> FWIW I was sort of leaning the other way - that most enums are contiguous and there's probably some benefit to iterating over them all from time to time. But happy for the discussion either way.
>
> I guess I look at it more as: a sequence is a bounded contiguous range of integer values and wanting to iterate over enumerators is reflection. The enumerators don't have to be contiguous (even if they frequently are), nor do they have a notional relationship to one another (even if they frequently do), so they're a bit different from a sequence even if there's a lot of notional overlap. Because of this, and because reflection provides a whole host of other kinds of sequences (lists of enumerators, lists of field members, lists of member functions, etc), I think it should be a separate interface from sequence of contiguous integers.
>
> But maybe others disagree?

Let me start with some brief history:

- I started something similar for an out-of-tree llvm-based compiler, LLPC: https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/interface/lgc/EnumIterator.h. At the time, the implementation could not use `llvm::seq` because it did not support enums.
- A couple of weeks later, I learned that `seq` was enhanced to support enums. But when I looked at the implementation, I learned that it blindly accepts all enum types, even though it does not make sense to iterate over all of them.

Before this work, the status quo was that people would create hand written for loops like `for (MyEnum val = MyEnum::A; val != MyEnum::X, val = static_cast<MyEnum>(static_cast<unsigned>(val) + 1))`. This has 3 main issues:

1. The loop is verbose and ugly.
2. The underlying type may not be `unsigned`, and in our codebase nobody used `std::underlying_type` instead. But that would make it even more verbose. Some places implemented custom operators or helper increment functions instead.
3. The enum may not be continuous. I don't have links to public bugs in llvm or llpc to point to, but this showed up a few times internally, and I personally messed it up a few times. The issue is that even if the original code had continuous values, modifying the enum definition may invalidate all loops that use this type. This was difficult for me to grep for, given that for loops like this can be written slightly differently, with values hoisted out or custom increment operators, etc. In my opinion, this pattern is very bugprone.

I believe that this patch fixes all 3 issues. The most tricky one to handle is 3.: I found that the most reliable way to ensure that enumeration is safe is to disallow it by default and place this logic/declaration just after enum definition. This way, a person modifying the enum does not have to hunt for all of the uses across the codebase, and is likely to notice potential issues because of the close proximity of the declaration in the source code.

>From my perspective, I'm happy with `enumRange` in LLPC, which is even more restrictive than what is proposed here (it also forces users to specify the first and end enum value). However, I'm concerned that the unsafe default is behavior is allowed with `llvm::seq`.

If you see this patch as overcomplicated, maybe it would make sense to keep the underlying iteration mechanism of `iota_range` in place, but disallow enum iteration with `seq` and implement it separately, with extra safety checks outside of `iota_range`?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107378/new/

https://reviews.llvm.org/D107378