[PATCH] Use Rvalue refs in APInt

Fri Jul 22 14:04:57 PDT 2016

> On Jul 22, 2016, at 1:10 PM, David Blaikie <dblaikie at gmail.com> wrote:
> 
> LGTM
Great.  Thanks!  r276470.
> 
> Optional:
> 
> I don't think LLVM's really using braced init (like this: "auto One = APInt{129, "1", 16};" or like this "APInt V{129, HexString, 16};" (& honestly that ctor should probably be explicit (but no one (LLVM or otherwise) has bothered to go & mark all the multi arg ctors as explicit post-C++11)/shouldn't be called with {} init)) so I'd recommend switching those to the usual ctor syntax: APInt X(y, z);)
Good point.  I changed all of the APInt’s to use regular syntax.
> 
> The test cases could probably be factored into something shorter to remove the duplication (a type expanded gtest, where the type is a functor with op- or op+, etc) but I wouldn't spend too much time on it. Simple/explicit tests are good too.
I took a quick look but it wasn’t coming out too simple.  I’ll stick with the explicit tests as you said.  If anyone feels strongly the other way then I can try again.

Thanks again for all the help.  Much appreciated.

Cheers,
Pete
> 
> - Dave
> 
> On Fri, Jul 22, 2016 at 11:56 AM Pete Cooper <peter_cooper at apple.com <mailto:peter_cooper at apple.com>> wrote:
>> On Jul 22, 2016, at 8:18 AM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>> 
>> I've forgotten most of this review (but I've reread the thread to get as much context as I can) - but is there a reason you have op-(APInt, uint64), but not op-(uint64, APInt) (or am I misreading the patch and that's already somewhere?) (same question for op+). Seems like that'll catch someone by surprise when APInt-uint64 performs better than uint64+APInt?
> 
> You’re right, there was no good reason to leave those out.  I think my reasoning was that I wanted to make use of the add_1 and sub_1 helpers which isn’t easy in the latter case.  As it is, I’ve added them, the sub case won’t necessarily be fast as it has to loop over all the value to negate it, but it should still save the allocation which was the main reason for this whole series of patches.
> 
> Also, it was just odd that ‘APInt + 1’ compiled but ‘1 + APInt’ gave an error.  Even if the implementation was no faster, it was still worth adding these just for consistency.  Nice catch!
> 
> So same patch as before and also added those methods and tests for them.
> 
> Thanks,
> Pete
> 
> 
>> 
>> - Dave
>> 
>> On Thu, Jul 21, 2016 at 7:02 PM Pete Cooper <peter_cooper at apple.com <mailto:peter_cooper at apple.com>> wrote:
>> Sorry David, I let this slip by then was on vacation.
>> 
>> Your patch worked much better than mine.  I fixed one of the - operators which was the wrong way around, but otherwise the logic is the same as your path.
>> 
>> I also added tests which should hit every method added and checks all of the getRawData()’s as you suggested was possible.
>> 
>> Cheers,
>> Pete
>> 
>> 
>> 
>>> On Jun 6, 2016, at 3:59 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>> 
>> 
>>> 
>>> On Sun, Jun 5, 2016 at 11:55 AM, Pete Cooper <peter_cooper at apple.com <mailto:peter_cooper at apple.com>> wrote:
>>> 
>>>> On Jun 4, 2016, at 9:10 AM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>> 
>>>> Looks like you just have a variable rename in the unary minus implementation - commit separately or drop that change perhaps?
>>> Oh yeah.  Good catch.  Will do that separately.
>>>> 
>>>> I don't think you need versions that take two rvalue refs (+(&&, &&)), is there? (until +=/-= get smart and have an overload that takes an rvalue ref parameter and uses it to steal the right hand side buffer if it's bigger than the left hand side buffer or something like that?)
>>> I had a compile error somewhere in the LLVM codebase without this version.  I can’t remember where it is, but a small test (attached to the end of the email if you want to hack on it) which triggers it is:
>>> 
>>> rvalue.cpp:66:22: error: use of overloaded operator '+' is ambiguous (with operand types 'APInt' and 'APInt')
>>>   APInt d2 = (a * b) + (a * b);
>>> 
>>>> 
>>>> & can you pass by value instead of by rvalue ref - does that work/choose the right overloads?
>>> Doesn’t seem to.  Using the above as an example, if I remove the && from both arguments then I get:
>>> 
>>> rvalue.cpp:72:22: error: use of overloaded operator '+' is ambiguous (with operand types 'APInt' and 'APInt')
>>>   APInt d2 = (a * b) + (a * b);
>>>              ~~~~~~~ ^ ~~~~~~~
>>> rvalue.cpp:35:14: note: candidate function
>>> inline APInt operator+(APInt a, APInt b) {
>>>              ^
>>> rvalue.cpp:41:14: note: candidate function
>>> inline APInt operator+(APInt &&a, const APInt &b) {
>>>              ^
>>> rvalue.cpp:47:14: note: candidate function
>>> inline APInt operator+(const APInt &a, APInt &&b) {
>>>              ^
>>> rvalue.cpp:53:14: note: candidate function
>>> inline APInt operator+(const APInt &a, const APInt &b) {
>>> 
>>> Note, removing the && from all the variants doesn’t improve the situation.
>>> 
>>> Attached an example that I think works - but I haven't tested. There may be some accidental infinite recursion in there - the version of the patch you have didn't seem to pass all the tests anyway.
>>> 
>>> (also noticed you ended up with both member and non-member version of unary operator-, my patch drops the member one (you could probably move operator~ out as a non-member too, but that's pretty orthogonal))
>>>  
>>>> 
>>>> inline APInt operator-(APInt RHS, const APInt &LHS) {
>>>>   RHS += LHS;
>>>>   return RHS; // shouldn't need std::move here because you're returning a local
>>>> }
>>> I wondered about this too.  I turned on -Wpessimizing-move to see if what I was doing was wrong but it didn’t fire.  Interestingly, with this method:
>>> 
>>> inline APInt operator+(APInt &&a, const APInt &b) {
>>>   printf("APInt::+(&&, &)\n");
>>>   a += b;
>>>   return a;
>>> 
>>> This one shouldn't produce a move (& you should add std::move explicitly) because 'a' is not a local here, it's a reference. When it's passed by value there's no need for the std::move:
>>> 
>>> blaikie at blaikie-linux:/tmp/dbginfo$ cat -n test.cpp
>>>      1  struct foo {
>>>      2    foo(foo&&) = default;
>>>      3  };
>>>      4  
>>>      5  foo f(foo g) {
>>>      6    return g;
>>>      7  }
>>>      8  foo f2(foo &&g) {
>>>      9    return g;
>>>     10  }
>>> blaikie at blaikie-linux:/tmp/dbginfo$ clang++-tot -std=c++11 test.cpp -fsyntax-only
>>> test.cpp:9:10: error: call to implicitly-deleted copy constructor of 'foo'
>>>   return g;
>>>          ^
>>> test.cpp:2:3: note: copy constructor is implicitly deleted because 'foo' has a user-declared move constructor
>>>   foo(foo&&) = default;
>>>   ^
>>> 1 error generated.
>>>  
>>> }
>>> 
>>> and with/without the std::move on the return.  The above version will call APInt::APInt(&) but the std::move version will call APInt::APInt(&&).  I used printfs to verify this.  So looks like there is a difference here, even though I totally agree with you that we’re returning a local so it shouldn’t need the std::move.  I’m not sure if this is a bug, or just subtlety in rvalue semantics.  Would love to know the answer though.
>>> 
>>>> 
>>>> Then you shouldn't need the op-(const APInt&,const APInt&) version, for example.
>>> Not sure if its a result of the other &&’s ending up being required, but i’ve tested without a (const APInt&,const APInt&) version and I get ambiguous overload errors.  Seems like i’m going to need it.
>>>> 
>>>> Tests?
>>> I was wondering about this.  I can certainly test all the variants to make sure I get the correct numerical results from APInt and I’ll add what tests are needed for that.  I wouldn’t be able to test whether we get a certain number of malloc’s, unless its ok to implement my own malloc/free the APInt unit test?
>>> 
>>> Yeah, I'd certainly at least test that we get all the right answers (potentially using a data expanded test to exercise all the operations with the same values for different combinations of lvalues, rvalues, and uints).
>>> 
>>> As for testing the avoidance of allocation... hrm... I mean it's essentially a non-observable performance thing, and our tests don't really test performance, so perhaps that's fine. In theory you could test that moving happened by caching the result of "getRawData" and check that the pointer value is the same? Not sure if that's a good test.
>>>  
>>> 
>>> Thanks for all the comments so far.  Will try get an updated patch tomorrow.
>>> 
>>> Cheers,
>>> Pete
>>> 
>>> 
>>> 
>>>> 
>>>> On Fri, Jun 3, 2016 at 10:42 AM, Pete Cooper <peter_cooper at apple.com <mailto:peter_cooper at apple.com>> wrote:
>>>> Hi David, Sanjoy
>>>> 
>>>> Here’s an updated patch which provides all of the combinations of operator[+-] I can think to add.
>>>> 
>>>> All of the new ones are outside the class definition so that we can reduce duplication and have them call each other.
>>>> 
>>>> The one thing I noticed while doing this work was that the already existing operator+= and -= methods really did exactly what I wanted.  So i’ve implemented + and - in terms of += and -=.
>>>> 
>>>> Is that ok, or is it frowned upon?  I can imagine some people would prefer that += calls + and not the other way round.  But it is very convenient as you can see with this patch.
>>>> 
>>>> Comments very welcome.
>>>> 
>>>> BTW, this reduces total allocations by about 400k from 19.1M to 18.7M.
>>>> 
>>>> Cheers,
>>>> Pete
>>>> 
>>>> 
>>>> 
>>>>> On Jun 2, 2016, at 3:24 PM, Pete Cooper via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>>> 
>>>>>> 
>>>>>> On Jun 2, 2016, at 2:28 PM, Sanjoy Das <sanjoy at playingwithpointers.com <mailto:sanjoy at playingwithpointers.com>> wrote:
>>>>>> 
>>>>>> On Wed, Jun 1, 2016 at 9:43 AM, Pete Cooper <peter_cooper at apple.com <mailto:peter_cooper at apple.com>> wrote:
>>>>>> 
>>>>>>> Another interesting data point is the compile time.  On my test case, SCEV::getRange is 8.9% of compile time which is a lot.  But of that, 6.3% is just in ConstantRange::multiply.  This method is heavy APInt code, and especially malloc traffic.
>>>>>> 
>>>>>> Yeah, that is definitely too high! Just to check: I assume you mean
>>>>>> 8.9% of opt -O2 or something similar?
>>>>> Yep, thats right.  ‘opt -O2 verify-uselistorder.bc -o opt.bc’.  The verify-uselistorder is the pre optimized, but post linked, bitcode when LTOing that tool.
>>>>> 
>>>>> BTW, I just looked at the latest numbers and the commits i’ve made so far save 3% of compile time on this use case.  So the 8.9% is more like 5.9% now.  And still a little more to come.
>>>>>> 
>>>>>>> Many of the speedup’s i’ve been finding involve doing less work (r271020 which avoids the latter half of ConstantRange::multiply and saves 3M allocations), and fixing cases of unnecessary APInt allocations (r270959).  This patch is along the same lines as the latter where we have malloc traffic we can avoid.
>>>>>> 
>>>>>> Making too many fixes on the APInt algorithms to avoid allocations
>>>>>> seems like we're solving the issue at the wrong layer.  I think fixing
>>>>>> infrastructural issues so that we _can_ be a little sloppy (within
>>>>>> reason) in extending integers without thinking too much about malloc
>>>>>> traffic is the right path.
>>>>> I completely agree.  There are certainly limits to how far to push this.  For example, this code in ConstantRange::multiply:
>>>>> 
>>>>>   auto L = {this_min * Other_min, this_min * Other_max,
>>>>>             this_max * Other_min, this_max * Other_max};
>>>>> 
>>>>> Once I have the Rvalue ref version of the APInt methods (a change which I think is reasonable), the above could be changed to:
>>>>> 
>>>>>   auto L = {this_min * Other_min, std::move(this_min) * Other_max,
>>>>>             this_max * std::move(Other_min), std::move(this_max) * Other_max};
>>>>> 
>>>>> This would avoid 3 allocations out of 4 because we will then use the Rvalue APInt methods.  However, I think this might
>>>>> be a little too much hacking.  So yeah, I totally agree with you, and hopefully we can solve cases like this one in a more
>>>>> reasonable way than gratuitous use of std::move() or other APInt hackery :)
>>>>>> 
>>>>>> But you're doing the work, so you get to decide the path forward. :)
>>>>> Sounds good to me :)
>>>>>> 
>>>>>> 
>>>>>>> ConstantRange stats (bit width and count of hits in ConstantRange::ConstantRange)
>>>>>> 
>>>>>> This is great!  Is this a bootstrap of clang or something?
>>>>> Actually same use case as before.  ‘opt -O2 verify-uselistorder’.  Its a nice small bit code which takes about 20s to optimize.
>>>>>> 
>>>>>> Btw, there are couple of bitwidths here that I find interesting, e.g.
>>>>>> I'd not have expected this many i70 ConstantRange allocations.
>>>>> Yeah, some of these are a bit surprising.  2^n and (2^n)+1 both seem likely due to the IR itself and SCEV, but anything else is a little odd.  I may take a look at the 258 bit case just because there are so many of them.
>>>>> 
>>>>> Pete
>>>>>> 
>>>>>> -- Sanjoy
>>>>>> 
>>>>>>> 1: 30850028
>>>>>>> 2: 7238
>>>>>>> 3: 5733
>>>>>>> 4: 92
>>>>>>> 5: 817
>>>>>>> 6: 294
>>>>>>> 7: 192
>>>>>>> 8: 363498
>>>>>>> 9: 896
>>>>>>> 11: 330
>>>>>>> 12: 378
>>>>>>> 13: 385
>>>>>>> 14: 125
>>>>>>> 16: 30256
>>>>>>> 18: 272
>>>>>>> 20: 98
>>>>>>> 24: 10
>>>>>>> 25: 62
>>>>>>> 26: 13
>>>>>>> 27: 181
>>>>>>> 28: 8
>>>>>>> 31: 98
>>>>>>> 32: 2003134
>>>>>>> 33: 132
>>>>>>> 34: 128
>>>>>>> 36: 76
>>>>>>> 38: 2130
>>>>>>> 41: 3
>>>>>>> 57: 262
>>>>>>> 58: 244
>>>>>>> 59: 342
>>>>>>> 60: 2418
>>>>>>> 61: 1211
>>>>>>> 62: 190
>>>>>>> 63: 226
>>>>>>> 64: 5118228
>>>>>>> 65: 128400
>>>>>>> 66: 4236
>>>>>>> 67: 14826
>>>>>>> 68: 15408
>>>>>>> 69: 13417
>>>>>>> 70: 7959
>>>>>>> 71: 347
>>>>>>> 96: 88
>>>>>>> 128: 364826
>>>>>>> 129: 379580
>>>>>>> 130: 19092
>>>>>>> 256: 4734
>>>>>>> 257: 19132
>>>>>>> 258: 71826
>>>>>>> 514: 4650
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Sanjoy Das
>>>>>> http://playingwithpointers.com <http://playingwithpointers.com/>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
>>> <apint.diff>
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160722/3988483b/attachment.html>