[PATCH] Use Rvalue refs in APInt

Fri Jul 22 13:10:48 PDT 2016

LGTM

Optional:

I don't think LLVM's really using braced init (like this: "auto One =
APInt{129, "1", 16};" or like this "APInt V{129, HexString, 16};" (&
honestly that ctor should probably be explicit (but no one (LLVM or
otherwise) has bothered to go & mark all the multi arg ctors as explicit
post-C++11)/shouldn't be called with {} init)) so I'd recommend switching
those to the usual ctor syntax: APInt X(y, z);)

The test cases could probably be factored into something shorter to remove
the duplication (a type expanded gtest, where the type is a functor with
op- or op+, etc) but I wouldn't spend too much time on it. Simple/explicit
tests are good too.

- Dave

On Fri, Jul 22, 2016 at 11:56 AM Pete Cooper <peter_cooper at apple.com> wrote:

> On Jul 22, 2016, at 8:18 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
> I've forgotten most of this review (but I've reread the thread to get as
> much context as I can) - but is there a reason you have op-(APInt, uint64),
> but not op-(uint64, APInt) (or am I misreading the patch and that's already
> somewhere?) (same question for op+). Seems like that'll catch someone by
> surprise when APInt-uint64 performs better than uint64+APInt?
>
> You’re right, there was no good reason to leave those out.  I think my
> reasoning was that I wanted to make use of the add_1 and sub_1 helpers
> which isn’t easy in the latter case.  As it is, I’ve added them, the sub
> case won’t necessarily be fast as it has to loop over all the value to
> negate it, but it should still save the allocation which was the main
> reason for this whole series of patches.
>
> Also, it was just odd that ‘APInt + 1’ compiled but ‘1 + APInt’ gave an
> error.  Even if the implementation was no faster, it was still worth adding
> these just for consistency.  Nice catch!
>
> So same patch as before and also added those methods and tests for them.
>
> Thanks,
> Pete
>
>
>
> - Dave
>
> On Thu, Jul 21, 2016 at 7:02 PM Pete Cooper <peter_cooper at apple.com>
> wrote:
>
>> Sorry David, I let this slip by then was on vacation.
>>
>> Your patch worked much better than mine.  I fixed one of the - operators
>> which was the wrong way around, but otherwise the logic is the same as your
>> path.
>>
>> I also added tests which should hit every method added and checks all of
>> the getRawData()’s as you suggested was possible.
>>
>> Cheers,
>> Pete
>>
>>
>> On Jun 6, 2016, at 3:59 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>> On Sun, Jun 5, 2016 at 11:55 AM, Pete Cooper <peter_cooper at apple.com>
>> wrote:
>>
>>>
>>> On Jun 4, 2016, at 9:10 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>> Looks like you just have a variable rename in the unary minus
>>> implementation - commit separately or drop that change perhaps?
>>>
>>> Oh yeah.  Good catch.  Will do that separately.
>>>
>>>
>>> I don't think you need versions that take two rvalue refs (+(&&, &&)),
>>> is there? (until +=/-= get smart and have an overload that takes an rvalue
>>> ref parameter and uses it to steal the right hand side buffer if it's
>>> bigger than the left hand side buffer or something like that?)
>>>
>>> I had a compile error somewhere in the LLVM codebase without this
>>> version.  I can’t remember where it is, but a small test (attached to the
>>> end of the email if you want to hack on it) which triggers it is:
>>>
>>> rvalue.cpp:66:22: error: use of overloaded operator '+' is ambiguous
>>> (with operand types 'APInt' and 'APInt')
>>>   APInt d2 = (a * b) + (a * b);
>>>
>>>
>>> & can you pass by value instead of by rvalue ref - does that work/choose
>>> the right overloads?
>>>
>>> Doesn’t seem to.  Using the above as an example, if I remove the && from
>>> both arguments then I get:
>>>
>>> rvalue.cpp:72:22: error: use of overloaded operator '+' is ambiguous
>>> (with operand types 'APInt' and 'APInt')
>>>   APInt d2 = (a * b) + (a * b);
>>>              ~~~~~~~ ^ ~~~~~~~
>>> rvalue.cpp:35:14: note: candidate function
>>> inline APInt operator+(APInt a, APInt b) {
>>>              ^
>>> rvalue.cpp:41:14: note: candidate function
>>> inline APInt operator+(APInt &&a, const APInt &b) {
>>>              ^
>>> rvalue.cpp:47:14: note: candidate function
>>> inline APInt operator+(const APInt &a, APInt &&b) {
>>>              ^
>>> rvalue.cpp:53:14: note: candidate function
>>> inline APInt operator+(const APInt &a, const APInt &b) {
>>>
>>> Note, removing the && from all the variants doesn’t improve the
>>> situation.
>>>
>>
>> Attached an example that I think works - but I haven't tested. There may
>> be some accidental infinite recursion in there - the version of the patch
>> you have didn't seem to pass all the tests anyway.
>>
>> (also noticed you ended up with both member and non-member version of
>> unary operator-, my patch drops the member one (you could probably move
>> operator~ out as a non-member too, but that's pretty orthogonal))
>>
>>
>>>
>>> inline APInt operator-(APInt RHS, const APInt &LHS) {
>>>   RHS += LHS;
>>>   return RHS; // shouldn't need std::move here because you're returning
>>> a local
>>> }
>>>
>>> I wondered about this too.  I turned on -Wpessimizing-move to see if
>>> what I was doing was wrong but it didn’t fire.  Interestingly, with this
>>> method:
>>>
>>> inline APInt operator+(APInt &&a, const APInt &b) {
>>>   printf("APInt::+(&&, &)\n");
>>>   a += b;
>>>   return a;
>>>
>>>
>> This one shouldn't produce a move (& you should add std::move explicitly)
>> because 'a' is not a local here, it's a reference. When it's passed by
>> value there's no need for the std::move:
>>
>> blaikie at blaikie-linux:/tmp/dbginfo$ cat -n test.cpp
>>      1  struct foo {
>>      2    foo(foo&&) = default;
>>      3  };
>>      4
>>      5  foo f(foo g) {
>>      6    return g;
>>      7  }
>>      8  foo f2(foo &&g) {
>>      9    return g;
>>     10  }
>> blaikie at blaikie-linux:/tmp/dbginfo$ clang++-tot -std=c++11 test.cpp
>> -fsyntax-only
>> test.cpp:9:10: error: call to implicitly-deleted copy constructor of 'foo'
>>   return g;
>>          ^
>> test.cpp:2:3: note: copy constructor is implicitly deleted because 'foo'
>> has a user-declared move constructor
>>   foo(foo&&) = default;
>>   ^
>> 1 error generated.
>>
>>
>>> }
>>>
>>> and with/without the std::move on the return.  The above version will
>>> call APInt::APInt(&) but the std::move version will call APInt::APInt(&&).
>>> I used printfs to verify this.  So looks like there is a difference here,
>>> even though I totally agree with you that we’re returning a local so it
>>> shouldn’t need the std::move.  I’m not sure if this is a bug, or just
>>> subtlety in rvalue semantics.  Would love to know the answer though.
>>>
>>>
>>> Then you shouldn't need the op-(const APInt&,const APInt&) version, for
>>> example.
>>>
>>> Not sure if its a result of the other &&’s ending up being required, but
>>> i’ve tested without a (const APInt&,const APInt&) version and I get
>>> ambiguous overload errors.  Seems like i’m going to need it.
>>>
>>>
>>> Tests?
>>>
>>> I was wondering about this.  I can certainly test all the variants to
>>> make sure I get the correct numerical results from APInt and I’ll add what
>>> tests are needed for that.  I wouldn’t be able to test whether we get a
>>> certain number of malloc’s, unless its ok to implement my own malloc/free
>>> the APInt unit test?
>>>
>>
>> Yeah, I'd certainly at least test that we get all the right answers
>> (potentially using a data expanded test to exercise all the operations with
>> the same values for different combinations of lvalues, rvalues, and uints).
>>
>> As for testing the avoidance of allocation... hrm... I mean it's
>> essentially a non-observable performance thing, and our tests don't really
>> test performance, so perhaps that's fine. In theory you could test that
>> moving happened by caching the result of "getRawData" and check that the
>> pointer value is the same? Not sure if that's a good test.
>>
>>
>>>
>>> Thanks for all the comments so far.  Will try get an updated patch
>>> tomorrow.
>>>
>>> Cheers,
>>> Pete
>>>
>>>
>>>
>>>
>>> On Fri, Jun 3, 2016 at 10:42 AM, Pete Cooper <peter_cooper at apple.com>
>>> wrote:
>>>
>>>> Hi David, Sanjoy
>>>>
>>>> Here’s an updated patch which provides all of the combinations of
>>>> operator[+-] I can think to add.
>>>>
>>>> All of the new ones are outside the class definition so that we can
>>>> reduce duplication and have them call each other.
>>>>
>>>> The one thing I noticed while doing this work was that the already
>>>> existing operator+= and -= methods really did exactly what I wanted.  So
>>>> i’ve implemented + and - in terms of += and -=.
>>>>
>>>> Is that ok, or is it frowned upon?  I can imagine some people would
>>>> prefer that += calls + and not the other way round.  But it is very
>>>> convenient as you can see with this patch.
>>>>
>>>> Comments very welcome.
>>>>
>>>> BTW, this reduces total allocations by about 400k from 19.1M to 18.7M.
>>>>
>>>> Cheers,
>>>> Pete
>>>>
>>>>
>>>>
>>>> On Jun 2, 2016, at 3:24 PM, Pete Cooper via llvm-commits <
>>>> llvm-commits at lists.llvm.org> wrote:
>>>>
>>>>
>>>> On Jun 2, 2016, at 2:28 PM, Sanjoy Das <sanjoy at playingwithpointers.com>
>>>> wrote:
>>>>
>>>> On Wed, Jun 1, 2016 at 9:43 AM, Pete Cooper <peter_cooper at apple.com>
>>>> wrote:
>>>>
>>>> Another interesting data point is the compile time.  On my test case,
>>>> SCEV::getRange is 8.9% of compile time which is a lot.  But of that, 6.3%
>>>> is just in ConstantRange::multiply.  This method is heavy APInt code, and
>>>> especially malloc traffic.
>>>>
>>>>
>>>> Yeah, that is definitely too high! Just to check: I assume you mean
>>>> 8.9% of opt -O2 or something similar?
>>>>
>>>> Yep, thats right.  ‘opt -O2 verify-uselistorder.bc -o opt.bc’.  The
>>>> verify-uselistorder is the pre optimized, but post linked, bitcode when
>>>> LTOing that tool.
>>>>
>>>> BTW, I just looked at the latest numbers and the commits i’ve made so
>>>> far save 3% of compile time on this use case.  So the 8.9% is more like
>>>> 5.9% now.  And still a little more to come.
>>>>
>>>>
>>>> Many of the speedup’s i’ve been finding involve doing less work
>>>> (r271020 which avoids the latter half of ConstantRange::multiply and saves
>>>> 3M allocations), and fixing cases of unnecessary APInt allocations
>>>> (r270959).  This patch is along the same lines as the latter where we have
>>>> malloc traffic we can avoid.
>>>>
>>>>
>>>> Making too many fixes on the APInt algorithms to avoid allocations
>>>> seems like we're solving the issue at the wrong layer.  I think fixing
>>>> infrastructural issues so that we _can_ be a little sloppy (within
>>>> reason) in extending integers without thinking too much about malloc
>>>> traffic is the right path.
>>>>
>>>> I completely agree.  There are certainly limits to how far to push
>>>> this.  For example, this code in ConstantRange::multiply:
>>>>
>>>>   auto L = {this_min * Other_min, this_min * Other_max,
>>>>             this_max * Other_min, this_max * Other_max};
>>>>
>>>>
>>>> Once I have the Rvalue ref version of the APInt methods (a change which
>>>> I think is reasonable), the above could be changed to:
>>>>
>>>>   auto L = {this_min * Other_min, std::move(this_min) * Other_max,
>>>>             this_max * std::move(Other_min), std::move(this_max) *
>>>> Other_max};
>>>>
>>>> This would avoid 3 allocations out of 4 because we will then use the
>>>> Rvalue APInt methods.  However, I think this might
>>>> be a little too much hacking.  So yeah, I totally agree with you, and
>>>> hopefully we can solve cases like this one in a more
>>>> reasonable way than gratuitous use of std::move() or other APInt
>>>> hackery :)
>>>>
>>>>
>>>> But you're doing the work, so you get to decide the path forward. :)
>>>>
>>>> Sounds good to me :)
>>>>
>>>>
>>>>
>>>> ConstantRange stats (bit width and count of hits in
>>>> ConstantRange::ConstantRange)
>>>>
>>>>
>>>> This is great!  Is this a bootstrap of clang or something?
>>>>
>>>> Actually same use case as before.  ‘opt -O2 verify-uselistorder’.  Its
>>>> a nice small bit code which takes about 20s to optimize.
>>>>
>>>>
>>>> Btw, there are couple of bitwidths here that I find interesting, e.g.
>>>> I'd not have expected this many i70 ConstantRange allocations.
>>>>
>>>> Yeah, some of these are a bit surprising.  2^n and (2^n)+1 both seem
>>>> likely due to the IR itself and SCEV, but anything else is a little odd.  I
>>>> may take a look at the 258 bit case just because there are so many of them.
>>>>
>>>> Pete
>>>>
>>>>
>>>> -- Sanjoy
>>>>
>>>> 1: 30850028
>>>> 2: 7238
>>>> 3: 5733
>>>> 4: 92
>>>> 5: 817
>>>> 6: 294
>>>> 7: 192
>>>> 8: 363498
>>>> 9: 896
>>>> 11: 330
>>>> 12: 378
>>>> 13: 385
>>>> 14: 125
>>>> 16: 30256
>>>> 18: 272
>>>> 20: 98
>>>> 24: 10
>>>> 25: 62
>>>> 26: 13
>>>> 27: 181
>>>> 28: 8
>>>> 31: 98
>>>> 32: 2003134
>>>> 33: 132
>>>> 34: 128
>>>> 36: 76
>>>> 38: 2130
>>>> 41: 3
>>>> 57: 262
>>>> 58: 244
>>>> 59: 342
>>>> 60: 2418
>>>> 61: 1211
>>>> 62: 190
>>>> 63: 226
>>>> 64: 5118228
>>>> 65: 128400
>>>> 66: 4236
>>>> 67: 14826
>>>> 68: 15408
>>>> 69: 13417
>>>> 70: 7959
>>>> 71: 347
>>>> 96: 88
>>>> 128: 364826
>>>> 129: 379580
>>>> 130: 19092
>>>> 256: 4734
>>>> 257: 19132
>>>> 258: 71826
>>>> 514: 4650
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sanjoy Das
>>>> http://playingwithpointers.com
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>> <apint.diff>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160722/5c590a4f/attachment.html>