[PATCH] D16204: ValueTracking: Use fixed array for assumption exclude set in Query; NFC
Matthias Braun via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 14 17:20:53 PST 2016
MatzeB added a comment.
In http://reviews.llvm.org/D16204#327512, @reames wrote:
> In http://reviews.llvm.org/D16204#327492, @MatzeB wrote:
>
> > In http://reviews.llvm.org/D16204#327491, @reames wrote:
> >
> > > Given MaxDepth is less than the Small size specified for the SmallPtrSet, I'm not sure why this is faster? The only difference I can come up with in the new code vs the old is a) the check for isSmall and b) the fact the insert code is outlined for SmallPtrSet.
> > >
> > > I'd really doubt that (a) is the cause. If it is (b), we should just fix the SmallPtrSet impl to inline the common case of the insertion logic.
> > >
> > > Please try measuring with a tweaked version of SmallPtrSet. Unless you can show that the hand rolled code is still much faster, I'm hesitant to take this change.
> >
> >
> > In the majority of the cases the set isn't used at all, I didn't see any performance differences when tweaking the insertion/query logic.
>
>
> Wow, that was a fast turn around. How'd you managed to build something after touching an ADT header that fast much less performance test it? I want to know your secret. :)
>
> I have to admit, this really surprises me.
>
> > The gains appear to come from the simplified initialization (and maybe reduced stack usage?).
>
>
> Can you back up this claim? I'm open to being convinced this is true, but it would imply that we're creating *a lot* of Query objects. Do you believe that to be true? If so, we could explore using a stack discipline for the exclusions rather than passing around copies of the full Query.
1. I did not modify any ADT headers, I only experimented within ValueTracking.cpp. With tuning the insertion/query logic I meant working on the Query structure inside ValueTracking.cpp, and if you read the code you need to have an llvm.assume intrinsic or 1 specific comparison for the set to be used so unsurprisingly that does not happen often.
computeKnownBits() is indeed called a lot and if you compare the initialisation of a SmallPtrSet:
: SmallArray(SmallStorage), CurArray(SmallStorage), CurArraySize(SmallSize)
// ...
if (!isSmall() && NumElements*4 < CurArraySize && CurArraySize > 32)
return shrink_and_clear();
// Fill the array with empty markers.
memset(CurArray, -1, CurArraySize*sizeof(void*));
NumElements = 0;
NumTombstones = 0;
compared to:
: NumExcluded(0)
You also get additional const void **SmallArray;, const void **CurArray;, unsigned CurArraySize;, unsigned NumElements;, unsigned NumTombstones; members for the SmallPtrSet.
I don't know if this is really the instructions or the stack space saving and I am hoping it's not just cache effects because the stack alignment changed. But I did take care to measure a release clang, on a machine with most daemons killed, ASLR disabled, 3 different longer testcases (combine.c from 403.gcc, sqlite3 amalgamation, webkit javascript interpreter in a single file) run multiple times, so I am confident the measurements themself are real.
Repository:
rL LLVM
http://reviews.llvm.org/D16204
More information about the llvm-commits
mailing list