[PATCH] Use Rvalue refs in APInt

Fri Jul 22 08:18:08 PDT 2016

I've forgotten most of this review (but I've reread the thread to get as
much context as I can) - but is there a reason you have op-(APInt, uint64),
but not op-(uint64, APInt) (or am I misreading the patch and that's already
somewhere?) (same question for op+). Seems like that'll catch someone by
surprise when APInt-uint64 performs better than uint64+APInt?

- Dave

On Thu, Jul 21, 2016 at 7:02 PM Pete Cooper <peter_cooper at apple.com> wrote:

> Sorry David, I let this slip by then was on vacation.
>
> Your patch worked much better than mine.  I fixed one of the - operators
> which was the wrong way around, but otherwise the logic is the same as your
> path.
>
> I also added tests which should hit every method added and checks all of
> the getRawData()’s as you suggested was possible.
>
> Cheers,
> Pete
>
>
> On Jun 6, 2016, at 3:59 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
> On Sun, Jun 5, 2016 at 11:55 AM, Pete Cooper <peter_cooper at apple.com>
> wrote:
>
>>
>> On Jun 4, 2016, at 9:10 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>> Looks like you just have a variable rename in the unary minus
>> implementation - commit separately or drop that change perhaps?
>>
>> Oh yeah.  Good catch.  Will do that separately.
>>
>>
>> I don't think you need versions that take two rvalue refs (+(&&, &&)), is
>> there? (until +=/-= get smart and have an overload that takes an rvalue ref
>> parameter and uses it to steal the right hand side buffer if it's bigger
>> than the left hand side buffer or something like that?)
>>
>> I had a compile error somewhere in the LLVM codebase without this
>> version.  I can’t remember where it is, but a small test (attached to the
>> end of the email if you want to hack on it) which triggers it is:
>>
>> rvalue.cpp:66:22: error: use of overloaded operator '+' is ambiguous
>> (with operand types 'APInt' and 'APInt')
>>   APInt d2 = (a * b) + (a * b);
>>
>>
>> & can you pass by value instead of by rvalue ref - does that work/choose
>> the right overloads?
>>
>> Doesn’t seem to.  Using the above as an example, if I remove the && from
>> both arguments then I get:
>>
>> rvalue.cpp:72:22: error: use of overloaded operator '+' is ambiguous
>> (with operand types 'APInt' and 'APInt')
>>   APInt d2 = (a * b) + (a * b);
>>              ~~~~~~~ ^ ~~~~~~~
>> rvalue.cpp:35:14: note: candidate function
>> inline APInt operator+(APInt a, APInt b) {
>>              ^
>> rvalue.cpp:41:14: note: candidate function
>> inline APInt operator+(APInt &&a, const APInt &b) {
>>              ^
>> rvalue.cpp:47:14: note: candidate function
>> inline APInt operator+(const APInt &a, APInt &&b) {
>>              ^
>> rvalue.cpp:53:14: note: candidate function
>> inline APInt operator+(const APInt &a, const APInt &b) {
>>
>> Note, removing the && from all the variants doesn’t improve the situation.
>>
>
> Attached an example that I think works - but I haven't tested. There may
> be some accidental infinite recursion in there - the version of the patch
> you have didn't seem to pass all the tests anyway.
>
> (also noticed you ended up with both member and non-member version of
> unary operator-, my patch drops the member one (you could probably move
> operator~ out as a non-member too, but that's pretty orthogonal))
>
>
>>
>> inline APInt operator-(APInt RHS, const APInt &LHS) {
>>   RHS += LHS;
>>   return RHS; // shouldn't need std::move here because you're returning
>> a local
>> }
>>
>> I wondered about this too.  I turned on -Wpessimizing-move to see if what
>> I was doing was wrong but it didn’t fire.  Interestingly, with this method:
>>
>> inline APInt operator+(APInt &&a, const APInt &b) {
>>   printf("APInt::+(&&, &)\n");
>>   a += b;
>>   return a;
>>
>>
> This one shouldn't produce a move (& you should add std::move explicitly)
> because 'a' is not a local here, it's a reference. When it's passed by
> value there's no need for the std::move:
>
> blaikie at blaikie-linux:/tmp/dbginfo$ cat -n test.cpp
>      1  struct foo {
>      2    foo(foo&&) = default;
>      3  };
>      4
>      5  foo f(foo g) {
>      6    return g;
>      7  }
>      8  foo f2(foo &&g) {
>      9    return g;
>     10  }
> blaikie at blaikie-linux:/tmp/dbginfo$ clang++-tot -std=c++11 test.cpp
> -fsyntax-only
> test.cpp:9:10: error: call to implicitly-deleted copy constructor of 'foo'
>   return g;
>          ^
> test.cpp:2:3: note: copy constructor is implicitly deleted because 'foo'
> has a user-declared move constructor
>   foo(foo&&) = default;
>   ^
> 1 error generated.
>
>
>> }
>>
>> and with/without the std::move on the return.  The above version will
>> call APInt::APInt(&) but the std::move version will call APInt::APInt(&&).
>> I used printfs to verify this.  So looks like there is a difference here,
>> even though I totally agree with you that we’re returning a local so it
>> shouldn’t need the std::move.  I’m not sure if this is a bug, or just
>> subtlety in rvalue semantics.  Would love to know the answer though.
>>
>>
>> Then you shouldn't need the op-(const APInt&,const APInt&) version, for
>> example.
>>
>> Not sure if its a result of the other &&’s ending up being required, but
>> i’ve tested without a (const APInt&,const APInt&) version and I get
>> ambiguous overload errors.  Seems like i’m going to need it.
>>
>>
>> Tests?
>>
>> I was wondering about this.  I can certainly test all the variants to
>> make sure I get the correct numerical results from APInt and I’ll add what
>> tests are needed for that.  I wouldn’t be able to test whether we get a
>> certain number of malloc’s, unless its ok to implement my own malloc/free
>> the APInt unit test?
>>
>
> Yeah, I'd certainly at least test that we get all the right answers
> (potentially using a data expanded test to exercise all the operations with
> the same values for different combinations of lvalues, rvalues, and uints).
>
> As for testing the avoidance of allocation... hrm... I mean it's
> essentially a non-observable performance thing, and our tests don't really
> test performance, so perhaps that's fine. In theory you could test that
> moving happened by caching the result of "getRawData" and check that the
> pointer value is the same? Not sure if that's a good test.
>
>
>>
>> Thanks for all the comments so far.  Will try get an updated patch
>> tomorrow.
>>
>> Cheers,
>> Pete
>>
>>
>>
>>
>> On Fri, Jun 3, 2016 at 10:42 AM, Pete Cooper <peter_cooper at apple.com>
>> wrote:
>>
>>> Hi David, Sanjoy
>>>
>>> Here’s an updated patch which provides all of the combinations of
>>> operator[+-] I can think to add.
>>>
>>> All of the new ones are outside the class definition so that we can
>>> reduce duplication and have them call each other.
>>>
>>> The one thing I noticed while doing this work was that the already
>>> existing operator+= and -= methods really did exactly what I wanted.  So
>>> i’ve implemented + and - in terms of += and -=.
>>>
>>> Is that ok, or is it frowned upon?  I can imagine some people would
>>> prefer that += calls + and not the other way round.  But it is very
>>> convenient as you can see with this patch.
>>>
>>> Comments very welcome.
>>>
>>> BTW, this reduces total allocations by about 400k from 19.1M to 18.7M.
>>>
>>> Cheers,
>>> Pete
>>>
>>>
>>>
>>> On Jun 2, 2016, at 3:24 PM, Pete Cooper via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>>
>>> On Jun 2, 2016, at 2:28 PM, Sanjoy Das <sanjoy at playingwithpointers.com>
>>> wrote:
>>>
>>> On Wed, Jun 1, 2016 at 9:43 AM, Pete Cooper <peter_cooper at apple.com>
>>> wrote:
>>>
>>> Another interesting data point is the compile time.  On my test case,
>>> SCEV::getRange is 8.9% of compile time which is a lot.  But of that, 6.3%
>>> is just in ConstantRange::multiply.  This method is heavy APInt code, and
>>> especially malloc traffic.
>>>
>>>
>>> Yeah, that is definitely too high! Just to check: I assume you mean
>>> 8.9% of opt -O2 or something similar?
>>>
>>> Yep, thats right.  ‘opt -O2 verify-uselistorder.bc -o opt.bc’.  The
>>> verify-uselistorder is the pre optimized, but post linked, bitcode when
>>> LTOing that tool.
>>>
>>> BTW, I just looked at the latest numbers and the commits i’ve made so
>>> far save 3% of compile time on this use case.  So the 8.9% is more like
>>> 5.9% now.  And still a little more to come.
>>>
>>>
>>> Many of the speedup’s i’ve been finding involve doing less work (r271020
>>> which avoids the latter half of ConstantRange::multiply and saves 3M
>>> allocations), and fixing cases of unnecessary APInt allocations (r270959).
>>> This patch is along the same lines as the latter where we have malloc
>>> traffic we can avoid.
>>>
>>>
>>> Making too many fixes on the APInt algorithms to avoid allocations
>>> seems like we're solving the issue at the wrong layer.  I think fixing
>>> infrastructural issues so that we _can_ be a little sloppy (within
>>> reason) in extending integers without thinking too much about malloc
>>> traffic is the right path.
>>>
>>> I completely agree.  There are certainly limits to how far to push
>>> this.  For example, this code in ConstantRange::multiply:
>>>
>>>   auto L = {this_min * Other_min, this_min * Other_max,
>>>             this_max * Other_min, this_max * Other_max};
>>>
>>>
>>> Once I have the Rvalue ref version of the APInt methods (a change which
>>> I think is reasonable), the above could be changed to:
>>>
>>>   auto L = {this_min * Other_min, std::move(this_min) * Other_max,
>>>             this_max * std::move(Other_min), std::move(this_max) *
>>> Other_max};
>>>
>>> This would avoid 3 allocations out of 4 because we will then use the
>>> Rvalue APInt methods.  However, I think this might
>>> be a little too much hacking.  So yeah, I totally agree with you, and
>>> hopefully we can solve cases like this one in a more
>>> reasonable way than gratuitous use of std::move() or other APInt hackery
>>> :)
>>>
>>>
>>> But you're doing the work, so you get to decide the path forward. :)
>>>
>>> Sounds good to me :)
>>>
>>>
>>>
>>> ConstantRange stats (bit width and count of hits in
>>> ConstantRange::ConstantRange)
>>>
>>>
>>> This is great!  Is this a bootstrap of clang or something?
>>>
>>> Actually same use case as before.  ‘opt -O2 verify-uselistorder’.  Its a
>>> nice small bit code which takes about 20s to optimize.
>>>
>>>
>>> Btw, there are couple of bitwidths here that I find interesting, e.g.
>>> I'd not have expected this many i70 ConstantRange allocations.
>>>
>>> Yeah, some of these are a bit surprising.  2^n and (2^n)+1 both seem
>>> likely due to the IR itself and SCEV, but anything else is a little odd.  I
>>> may take a look at the 258 bit case just because there are so many of them.
>>>
>>> Pete
>>>
>>>
>>> -- Sanjoy
>>>
>>> 1: 30850028
>>> 2: 7238
>>> 3: 5733
>>> 4: 92
>>> 5: 817
>>> 6: 294
>>> 7: 192
>>> 8: 363498
>>> 9: 896
>>> 11: 330
>>> 12: 378
>>> 13: 385
>>> 14: 125
>>> 16: 30256
>>> 18: 272
>>> 20: 98
>>> 24: 10
>>> 25: 62
>>> 26: 13
>>> 27: 181
>>> 28: 8
>>> 31: 98
>>> 32: 2003134
>>> 33: 132
>>> 34: 128
>>> 36: 76
>>> 38: 2130
>>> 41: 3
>>> 57: 262
>>> 58: 244
>>> 59: 342
>>> 60: 2418
>>> 61: 1211
>>> 62: 190
>>> 63: 226
>>> 64: 5118228
>>> 65: 128400
>>> 66: 4236
>>> 67: 14826
>>> 68: 15408
>>> 69: 13417
>>> 70: 7959
>>> 71: 347
>>> 96: 88
>>> 128: 364826
>>> 129: 379580
>>> 130: 19092
>>> 256: 4734
>>> 257: 19132
>>> 258: 71826
>>> 514: 4650
>>>
>>>
>>>
>>>
>>> --
>>> Sanjoy Das
>>> http://playingwithpointers.com
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>>
>>
>>
>>
> <apint.diff>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160722/3c510af5/attachment-0001.html>