[llvm-dev] RFC: A change in InstCombine canonical form

Tue Mar 22 13:37:48 PDT 2016

But not what David was stating, unless I misread?  I was specifically 
responding to David's wording:
"If we're talking about making an optimization able to ignore the 
bitcast instructions - yes, that work is unnecessary & perhaps 
questionable given the typeless pointer work. Not outright off limits, 
but the same time might be better invested in moving typeless pointers 
forward if the contributor is so inclined/able to shift in that direction."

Both "perhaps questionable" and "not outright off limits" seem to 
strongly imply such work should be discouraged.  I disagree with that 
view which is why I wrote my response.

Philip

On 03/22/2016 01:34 PM, Mehdi Amini wrote:
> This is roughly what I wrote...
>
>> On Mar 22, 2016, at 1:31 PM, Philip Reames <listmail at philipreames.com 
>> <mailto:listmail at philipreames.com>> wrote:
>>
>> I feel very strongly that blocking work on making optimization 
>> bitcast-ignorant on the typeless pointer work would be a major 
>> mistake.  Unless we expected the typeless pointer work to be 
>> concluded within the near term (say 3-6 months maximum), we should 
>> not block any development which would be accepted in the typeless 
>> pointer work wasn't planned.
>>
>> In my view, this is one of the largest mistakes we've made with the 
>> pass manager work, it has seriously cost us, and I don't want to see 
>> it happening again.
>>
>> Philip
>>
>> On 03/22/2016 01:09 PM, David Blaikie wrote:
>>> Ultimately everything is going to be made to not rely on the types 
>>> of pointers - that's nearly equivalent to bitcast-ignorant (the 
>>> difference being that the presence of an extra instruction (the 
>>> bitcast) might trip up some optimizations - but the presence of the 
>>> /type/ information implied by the bitcast should not trip up or be 
>>> necessary for optimizations (two sides of the same coin))
>>>
>>> If we're talking about making an optimization able to ignore the 
>>> bitcast instructions - yes, that work is unnecessary & perhaps 
>>> questionable given the typeless pointer work. Not outright off 
>>> limits, but the same time might be better invested in moving 
>>> typeless pointers forward if the contributor is so inclined/able to 
>>> shift in that direction.
>>>
>>> But if we're talking about the work to make the optimization not use 
>>> the type of pointers as a crutch - that work is a necessary 
>>> precursor to the typeless pointer work and would be most welcome.
>>>
>>> - David
>>>
>>> On Tue, Mar 22, 2016 at 12:39 PM, Mehdi Amini <mehdi.amini at apple.com 
>>> <mailto:mehdi.amini at apple.com>> wrote:
>>>
>>>     I don't really mind, but the intermediate stage will not be very
>>>     nice: that a lot of code / tests that needs to be written with
>>>     bitcast, and all of that while they are deemed to disappear. The
>>>     added value isn't clear to me considering the added work. I'm
>>>     not sure it wouldn't add more work for all the cleanup required
>>>     by the "typeless pointer", but I'm not sure what's involved here
>>>     and if David thinks the intermediate steps of handling bit casts
>>>     everywhere is not making it harder I'm fine with it.
>>>
>>>     -- 
>>>     Mehdi
>>>
>>>>     On Mar 22, 2016, at 12:36 PM, Philip Reames
>>>>     <listmail at philipreames.com <mailto:listmail at philipreames.com>>
>>>>     wrote:
>>>>
>>>>     I'd phrase this differently: being pointer-bitcast agnostic is
>>>>     a step towards support typeless pointers.  :)  We can either
>>>>     become bitcast agnostic all in one big change or
>>>>     incrementally.  Personally, I'd prefer the later since it
>>>>     reduces the risk associated with enabling typeless pointers in
>>>>     the end.
>>>>
>>>>     Philip
>>>>
>>>>     On 03/22/2016 12:16 PM, Mehdi Amini via llvm-dev wrote:
>>>>>     I don't know enough about the tradeoff for 1, but 2 seems like
>>>>>     a bandaid for something that is not a correctness issue
>>>>>     neither a regression. I'm not sure it justifies "bandaid
>>>>>     patches" while there is a clear path forward, i.e. typeless
>>>>>     pointers, unless there is an acknowledgement that typeless
>>>>>     pointers won't be there before a couple of years.
>>>>>
>>>>>     -- 
>>>>>     Mehdi
>>>>>
>>>>>>     On Mar 22, 2016, at 11:32 AM, Ehsan Amiri
>>>>>>     <ehsanamiri at gmail.com> wrote:
>>>>>>
>>>>>>     Back to the discussion on the RFC, I still see some advantage
>>>>>>     in following the proposed solution. I see two paths forward:
>>>>>>
>>>>>>     1- Change canonical form, possibly lower memcpy to
>>>>>>     non-integer load and store in InstCombine. Then teach the
>>>>>>     backends to convert that to integer load and store if that is
>>>>>>     more profitable. Notice that we are talking about loads that
>>>>>>     have no use other than store. So it is a fairly simple change
>>>>>>     in the backends.
>>>>>>
>>>>>>     2- Do not change the canonical form. Then for this bug, we
>>>>>>     need to teach select speculation to see through bitcasts. We
>>>>>>     will probably need to teach other optimizations to see though
>>>>>>     bitcasts in the future as problems are uncovered. That is
>>>>>>     until typeless pointer work is complete. Once the typeless
>>>>>>     pointer work is complete, we have some extra code in each
>>>>>>     optimization for seeing through bitcasts which is possibly no
>>>>>>     longer needed.
>>>>>>
>>>>>>     Based on this I think (1) is the right thing to do. But there
>>>>>>     might be other reasons for the current canonical form that I
>>>>>>     am not aware of. Please let me know what you think.
>>>>>>
>>>>>>     Thanks
>>>>>>     Ehsan
>>>>>>
>>>>>>     On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie
>>>>>>     <dblaikie at gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>         On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri
>>>>>>         <ehsanamiri at gmail.com> wrote:
>>>>>>
>>>>>>             David,
>>>>>>
>>>>>>             Could you give us an update on the status of typeless
>>>>>>             pointer work? How much work is left and when you
>>>>>>             think it might be ready?
>>>>>>
>>>>>>
>>>>>>         It's a bit of an onion peel, really - since it will
>>>>>>         eventually involve generalizing/fixing every optimization
>>>>>>         that's currently leaning on typed pointers to keep the
>>>>>>         performance while removing the crutch they're currently
>>>>>>         leaning on. (in cases where bitcasts are literally just
>>>>>>         getting in the way, those won't require cleaning up &
>>>>>>         should just become "free performance wins" once we remove
>>>>>>         them, though)
>>>>>>
>>>>>>         At the moment we can roundtrip every LLVM IR test case
>>>>>>         through bitcode and textual IR (reading/writing both
>>>>>>         formats) while using only a narrow whitelist of places
>>>>>>         that request the type of a pointer (things like the
>>>>>>         verifier, the parser/printer where it actually needs the
>>>>>>         typed pointer to verify it matches the explicit type, etc).
>>>>>>
>>>>>>         The next thing on the list is probably figuring out the
>>>>>>         byval/inalloca representation (add an explicit pointee
>>>>>>         type? just make the number of bytes explicit with no type
>>>>>>         information?).
>>>>>>
>>>>>>         Then start migrating optimizations over - doing the same
>>>>>>         sort of testing I did for the IR/bitcode roundtripping -
>>>>>>         assert that the pointee type is not accessed, whitelist
>>>>>>         places that need it until the bitcasts go away, fix
>>>>>>         anything else... it'll still be a fair bit of work & I
>>>>>>         don't really know how much. It should parallelize pretty
>>>>>>         well (doing any of this work is really helpful, each
>>>>>>         optimization is indepednent, etc) if anyone wants to/is
>>>>>>         able to help.
>>>>>>
>>>>>>         - Dave
>>>>>>
>>>>>>
>>>>>>             Thanks
>>>>>>             Ehsan
>>>>>>
>>>>>>
>>>>>>             On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie
>>>>>>             <dblaikie at gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>                 On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via
>>>>>>                 llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>                     Hi,
>>>>>>
>>>>>>                     How do it interact with the "typeless
>>>>>>                     pointers" work?
>>>>>>
>>>>>>
>>>>>>                 Right - the goal of the typeless pointer work is
>>>>>>                 to fix all these bugs related to "didn't look
>>>>>>                 through bitcasts" in optimizations. Sometimes
>>>>>>                 that's going to mean more work (because the code
>>>>>>                 is leaning on the absence of bitcasts & the
>>>>>>                 presence of convenient (but not guaranteed) type
>>>>>>                 information to inform optimization decisions) but
>>>>>>                 if we remove typed pointers while keeping
>>>>>>                 optimization quality in the cases we have today,
>>>>>>                 then we should've also fixed the cases that were
>>>>>>                 broken because the type information didn't end up
>>>>>>                 aligning to produce the optimal output.
>>>>>>
>>>>>>                 & I know I've been off the typeless pointer stuff
>>>>>>                 for a few months working on llvm-dwp - but happy
>>>>>>                 for any help (the next immediate piece is
>>>>>>                 probably figuring out teh right representation
>>>>>>                 for byval and inalloca - there were some code
>>>>>>                 reviews sent out for that that I'll need to come
>>>>>>                 back around to - but also any optimizations
>>>>>>                 people want to help rework/improve would be great
>>>>>>                 too & I can provide some techniques/tools to help
>>>>>>                 people approach those)
>>>>>>
>>>>>>                 - Dave
>>>>>>
>>>>>>
>>>>>>                     Thanks,
>>>>>>
>>>>>>                     -- 
>>>>>>                     Mehdi
>>>>>>
>>>>>>>                     On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via
>>>>>>>                     llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>                     === PROBLEM === (See this bug
>>>>>>>                     https://llvm.org/bugs/show_bug.cgi?id=26445)
>>>>>>>
>>>>>>>                     IR contains code for loading a float from
>>>>>>>                     float * and storing it to a float * address.
>>>>>>>                     After canonicalization of load in
>>>>>>>                     InstCombine [1], new bitcasts are added to
>>>>>>>                     the IR (see bottom of the email for code
>>>>>>>                     samples). This prevents select speculation
>>>>>>>                     in SROA to work. Also after SROA we have
>>>>>>>                     bitcasts from int32 to float. (Whereas
>>>>>>>                     originally after instCombine, bitcasts are
>>>>>>>                     only done on pointer types).
>>>>>>>
>>>>>>>                     === PROPOSED SOLUTION===
>>>>>>>
>>>>>>>                     [1] implies that we need load
>>>>>>>                     canonicalization when we load a value only
>>>>>>>                     to store it again. The reason is to avoid
>>>>>>>                     generating slightly different code (due to
>>>>>>>                     different ways of adding bitcasts), in
>>>>>>>                     different situations. In all examples
>>>>>>>                     presented in [1] there is a non-zero number
>>>>>>>                     of bitcasts. I think when we load a value of
>>>>>>>                     type T from a T* address and store it as a
>>>>>>>                     type T value to one or more T* address (and
>>>>>>>                     there is no other use or store), we can
>>>>>>>                     redefine canonical form to mean there should
>>>>>>>                     not be any bitcasts. So we still have a
>>>>>>>                     canonical form, but its definition is
>>>>>>>                     slightly different.
>>>>>>>
>>>>>>>                     === REASONS FOR / AGAINST===
>>>>>>>
>>>>>>>                     Hal Finkel warns that while this may be
>>>>>>>                     useful for power pc, this may hurt more than
>>>>>>>                     one other platform and become a very large
>>>>>>>                     project. Despite this he is fine with
>>>>>>>                     bringing up the issue to the mailing list to
>>>>>>>                     get feedback, mostly because this seems
>>>>>>>                     inline with our future direction of having a
>>>>>>>                     unique type for all pointers. (Hal please
>>>>>>>                     correct me if I misunderstood your comment)
>>>>>>>
>>>>>>>                     This is a much simpler fix compared to
>>>>>>>                     alternatives. (ignoring potential regressions)
>>>>>>>
>>>>>>>                     === ALTERNATIVE SOLUTION ===
>>>>>>>
>>>>>>>                     Fix select speculation in SROA to see
>>>>>>>                     through bitcasts. Handle remaining bitcasts
>>>>>>>                     during code gen. Other alternative solutions
>>>>>>>                     are welcome.
>>>>>>>
>>>>>>>                     Should I implement the proposed solution or
>>>>>>>                     is it too risky? I understand that we may
>>>>>>>                     need to undo it if it breaks too many
>>>>>>>                     things. Comments are welcome.
>>>>>>>
>>>>>>>
>>>>>>>                     [1]
>>>>>>>                     http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
>>>>>>>                     r226781  git commit id: b778cbc0c8
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                     Code Samples (only relevant part is copied):
>>>>>>>
>>>>>>>                     -------------------- Before Canonicalization
>>>>>>>                     (contains call to std::max):
>>>>>>>                     --------------------
>>>>>>>                     entry:
>>>>>>>                       %max_value = alloca float, align 4
>>>>>>>                       %1 = load float, float* %input, align 4,
>>>>>>>                     !tbaa !1
>>>>>>>                       store float %1, float* %max_value, align
>>>>>>>                     4, !tbaa !1
>>>>>>>
>>>>>>>                     for.body:
>>>>>>>                       %call = call dereferenceable(4) float*
>>>>>>>                     @_ZSt3maxIfERKT_S2_S2_(float*
>>>>>>>                     dereferenceable(4) %max_value, float*
>>>>>>>                     dereferenceable(4) %arrayidx1)
>>>>>>>                       %3 = load float, float* %call, align 4,
>>>>>>>                     !tbaa !1
>>>>>>>                       store float %3, float* %max_value, align
>>>>>>>                     4, !tbaa !1
>>>>>>>
>>>>>>>                     -------------------- After Canonicalization
>>>>>>>                     (contains call to
>>>>>>>                     std::max):--------------------
>>>>>>>
>>>>>>>                     entry:
>>>>>>>                       %max_value = alloca float, align 4
>>>>>>>                       %1 = bitcast float* %input to i32*
>>>>>>>                       %2 = load i32, i32* %1, align 4, !tbaa !1
>>>>>>>                       %3 = bitcast float* %max_value to i32*
>>>>>>>                       store i32 %2, i32* %3, align 4, !tbaa !1
>>>>>>>
>>>>>>>                     for.body:
>>>>>>>                       %call = call dereferenceable(4) float*
>>>>>>>                     @_ZSt3maxIfERKT_S2_S2_(float* nonnull
>>>>>>>                     dereferenceable(4) %max_value, float*
>>>>>>>                     dereferenceable(4) %arrayidx1)
>>>>>>>                       %5 = bitcast float* %call to i32*
>>>>>>>                       %6 = load i32, i32* %5, align 4, !tbaa !1
>>>>>>>                       %7 = bitcast float* %max_value to i32*
>>>>>>>                       store i32 %6, i32* %7, align 4, !tbaa !1
>>>>>>>
>>>>>>>                     -------------------- After SROA (the call to
>>>>>>>                     std::max is inlined now):--------------------
>>>>>>>                     entry:
>>>>>>>                     %max_value.sroa.0 = alloca i32
>>>>>>>                       %0 = bitcast float* %input to i32*
>>>>>>>                       %1 = load i32, i32* %0, align 4, !tbaa !1
>>>>>>>                       store i32 %1, i32* %max_value.sroa.0
>>>>>>>
>>>>>>>                     for.body:
>>>>>>>                     %max_value.sroa.0.0.max_value.sroa.0.0.6 =
>>>>>>>                     load i32, i32* %max_value.sroa.0
>>>>>>>                       %3 = bitcast i32
>>>>>>>                     %max_value.sroa.0.0.max_value.sroa.0.0.6 to
>>>>>>>                     float
>>>>>>>                     %max_value.sroa.0.0.max_value.sroa_cast8 =
>>>>>>>                     bitcast i32* %max_value.sroa.0 to float*
>>>>>>>                       %__b.__a.i = select i1 %cmp.i, float*
>>>>>>>                     %arrayidx1, float*
>>>>>>>                     %max_value.sroa.0.0.max_value.sroa_cast8
>>>>>>>                       %5 = bitcast float* %__b.__a.i to i32*
>>>>>>>                       %6 = load i32, i32* %5, align 4, !tbaa !1
>>>>>>>                       store i32 %6, i32* %max_value.sroa.0
>>>>>>>
>>>>>>>                     -------------------- After SROA when
>>>>>>>                     Canonicalization is turned
>>>>>>>                     off--------------------
>>>>>>>                     entry:
>>>>>>>                       %0 = load float, float* %input, align 4,
>>>>>>>                     !tbaa !1
>>>>>>>
>>>>>>>                     for.cond: ; preds = %for.body, %entry
>>>>>>>                       %max_value.0 = phi float [ %0, %entry ], [
>>>>>>>                     %.sroa.speculated, %for.body ]
>>>>>>>
>>>>>>>                     for.body:
>>>>>>>                       %1 = load float, float* %arrayidx1, align
>>>>>>>                     4, !tbaa !1
>>>>>>>                       %cmp.i = fcmp olt float %max_value.0, %1
>>>>>>>                     %.sroa.speculate.load.true = load float,
>>>>>>>                     float* %arrayidx1, align 4, !tbaa !1
>>>>>>>                     %.sroa.speculated = select i1 %cmp.i, float
>>>>>>>                     %.sroa.speculate.load.true, float %max_value.0
>>>>>>>                     _______________________________________________
>>>>>>>                     LLVM Developers mailing list
>>>>>>>                     llvm-dev at lists.llvm.org
>>>>>>>                     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>>                     _______________________________________________
>>>>>>                     LLVM Developers mailing list
>>>>>>                     llvm-dev at lists.llvm.org
>>>>>>                     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     _______________________________________________
>>>>>     LLVM Developers mailing list
>>>>>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/0f9ac7d6/attachment-0001.html>