[cfe-commits] [Patch] -Wduplicate-enum which fixes PR6343

Richard Trieu rtrieu at google.com
Tue Aug 14 14:32:24 PDT 2012


Looks like I need to clarify a few points about my code.  First, at most
one diagnostic will be emitted for each enum constant.  If there are five
constants with the same value, one warning and four notes will be emitted.

On Mon, Aug 13, 2012 at 10:22 PM, Ted Kremenek <kremenek at apple.com> wrote:

> Hi Richard,
>
> Thanks for making great progress on this!  I agree with Richard Smith's
> suggestion about not using an std::map, but is there a reason to not just
> use a DenseMap to see if there are any enum collisions?  Fundamentally your
> algorithm is O(n*log n), not O(n).  Since you are converted everything to
> an int64_t, could you not just use a DenseMap<int64_t, EnumConstantDecl*>?
>  You could then iterate over the EnumConstantDecls in one linear pass,
> converting them to int64_t's on the fly for comparison purposes.  A few
> advantages I see are:
>
> (1) No sorting, which is O(n log n)
>
> (2) Diagnostics are based on declaration order without having to result to
> a stable_sort(), which is even more costly.  If you want to report all
> duplicates as notes, the mapped value could easily be made into a
> PointerIntPair, where the "int" corresponds to an index into another vector
> where you collect the duplicates.  Since the number of cases where
> duplicates will exist is few, you're optimizing for the common case.
>
>
I think it's best to back up a little in the loop.

 +    llvm::APSInt Value = (*I)->getInitVal();

At this point, a valid enum is found for the warning, which is stored in I.
 Now, from here to the end of this iteration of the for-loop, process every
enum which has the same Value.

With the current approach, I'm not certain, but something about the
> following looks curious:
>
> +    // Find the first duplicate enum.
> +    llvm::SmallVector<EnumConstantDecl*, 10>::iterator Iter = I + 1;
> +    for (; Iter != E && llvm::APSInt::isSameValue(Value,
> (*Iter)->getInitVal());
> +         ++Iter) {
> +      if (ValidDuplicateEnum(*Iter, Enum))
> +        break;
> +    }
> +
>
> This loop is similar to the next loop.  Before emitting the warning, check
that there exists a enum to put into a loop.  If an enum constant is found,
break, emitting the warning, then start the loop again, emitting notes.

>
> If the array is sorted by constant value, I don't believe there is any
> need to keep marching through the rest of the items if the next item isn't
> a match.  I may be mistaken, but I think this leaves your implementation
> still being quadratic in the worst case.
>
All the enums of the same value is handled at one go.  At the end, I is set
to Iter so that the next iteration will skip all same value enums and
starts at the next value.  This keeps the vector transverse at linear time.


> Further, the following:
>
> +
> +    if (Iter == E)
> +      break;
> +
>
> From the previous loop, there's three ways to exit the loop.
1) Iter == E is true, every enum constant from I to the end have the same
value, but there's not a valid enum for the diagnostic.  We can just quit
checking at this point.
2) isSameValue is false, Iter is a different value than I, so I becomes
Iter for the next iteration of the outer for loop.
3) ValidDuplicateEnum is true, emit the warning, start the next loop and
start emitting notes


> appears to break out of the entire comparison entirely.  Isn't it possible
> for Iter == E if no match was found in the preceding loop?  For example,
> from your test cases, I'd expect to see a warning here:
>
> +
> +enum {
> +  I1 = -1,
> +  I2,
> +  I3,
> +  I4 = -2,
> +  I5,
> +  I6
> +};
>
>
> Is there a reason why I1 and I5 are not reported as duplicates?  (or I3
> and I6)?
>

Anonymous enums are no longer warned on.  Some code likes to dump all the
constants into a single anonymous enum.   Looks like I forgot to add this
feature to the list of changes.

>
> At a high level, I honestly find this logic to be more complicated than I
> would have expected.  The sorting seems unnecessary, and will report
> diagnostics in an unnatural order (first based on enum constant value, then
> on declaration order).  A straight linear pass seems more naturally to me,
> and DenseMap is very efficient.
>
Is there a comparison between the different containers in LLVM and the STL
containers?  Also, will the DenseMap fix the unnatural order of the
diagnostics?


>  If you are worried about the malloc() traffic, you could always keep a
> DenseMap around in Sema just for this warning, and clear it out whenever
> entering CheckForDuplicateEnumValues().  You then pay the malloc() cost
> when you need a bigger map, which will quickly amortize the cost.
>
Seems a little strange to keep around a data structure in Sema for a
non-default diagnostic.

>
> I honestly could be missing something here, so I apologize if I just am
> not reading the code correctly.
>
I think I have addressed any misunderstandings inline.  I'll have a look at
DenseMap and see if it works better.

>
> Cheers,
> Ted
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120814/a951a35d/attachment.html>


More information about the cfe-commits mailing list