[llvm-dev] RFC: A change in InstCombine canonical form
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Tue Mar 22 15:42:15 PDT 2016
On 03/22/2016 02:44 PM, Ehsan Amiri wrote:
> Thanks.
>
> *Phillip, *As Hal said I do not think (1) is a very large item. Please
> let me know if I am mistaken.
I have no specific reason to believe it will be a large amount of work,
but prior experience tells me changes to canonical form have a tendency
of exposing unexpected issues. To be clear, I am supportive of you
implementing solution 1.
>
> *David *I think (1) is more inline with typeless pointer work than
> (2). Contributing to typeless pointer work will be great, but given
> its unknown time frame we cannot stop fixing existing problems. Of
> course, we should follow an approach consistent with the long-term
> solution.
>
> On Tue, Mar 22, 2016 at 5:30 PM, David Blaikie <dblaikie at gmail.com
> <mailto:dblaikie at gmail.com>> wrote:
>
>
>
> On Tue, Mar 22, 2016 at 1:41 PM, Mehdi Amini
> <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
>
> Sorry I should have been more clear (writing to many email in
> parallel)
>
> You're right. I was adding a +1 to you here.
>
> Especially I wrote "unless there is an acknowledgement that
> typeless pointers won't be there before a couple of years"
> with the PassManager in mind, and I was expecting from David
> some good indication of a timeframe for the typeless pointers.
> If the typeless pointer work is stalled or if it is not
> planned for LLVM 3.9,
>
>
> It's neither stalled nor planned, as such.
>
> I agree with Philip to not block anything.
>
>
> All I'm suggesting is that if there are people who want to fix
> these bugs, I'd really appreciate them helping out on the typeless
> pointer work - it's totally parallelizable/shardable/shareable
> work & the project as a whole seemed to agree it was the right
> direction. Why not just get it done?
>
> The Pass Manager work is a bit different & harder to share at
> certain points (I don't tihnk there's ever been a point at which
> someone could ask me "what can I do to help with the typeless
> pointer work" I couldn't've given some pretty large areas I wasn't
> anywhere near touching that they could've gotten their teeth into)
> - I think it's reaching the point where multiple passes can be
> ported independently & seen some work like that in Polly, etc. So
> at this point it seems like if people want to address the issues
> the new pass manager is aimed at addressing, they could pitch in
> on that effort.
>
> That said, I'm not in a position (nor would I do so, even if I
> were) to block other patches, just to encourage people to help out
> on the bigger efforts in whatever way they can (either directly,
> or indirectly through ensuring stopgaps/new work is done in a way
> that makes that future work easier (where it's reasonable to judge
> what might be easier or harder, etc), etc)
>
> - Dave
>
>
> --
> Mehdi
>
>
>> On Mar 22, 2016, at 1:37 PM, Philip Reames
>> <listmail at philipreames.com
>> <mailto:listmail at philipreames.com>> wrote:
>>
>> But not what David was stating, unless I misread? I was
>> specifically responding to David's wording:
>> "If we're talking about making an optimization able to ignore
>> the bitcast instructions - yes, that work is unnecessary &
>> perhaps questionable given the typeless pointer work. Not
>> outright off limits, but the same time might be better
>> invested in moving typeless pointers forward if the
>> contributor is so inclined/able to shift in that direction."
>>
>> Both "perhaps questionable" and "not outright off limits"
>> seem to strongly imply such work should be discouraged. I
>> disagree with that view which is why I wrote my response.
>>
>> Philip
>>
>> On 03/22/2016 01:34 PM, Mehdi Amini wrote:
>>> This is roughly what I wrote...
>>>
>>>> On Mar 22, 2016, at 1:31 PM, Philip Reames
>>>> <listmail at philipreames.com
>>>> <mailto:listmail at philipreames.com>> wrote:
>>>>
>>>> I feel very strongly that blocking work on making
>>>> optimization bitcast-ignorant on the typeless pointer work
>>>> would be a major mistake. Unless we expected the typeless
>>>> pointer work to be concluded within the near term (say 3-6
>>>> months maximum), we should not block any development which
>>>> would be accepted in the typeless pointer work wasn't planned.
>>>>
>>>> In my view, this is one of the largest mistakes we've made
>>>> with the pass manager work, it has seriously cost us, and I
>>>> don't want to see it happening again.
>>>>
>>>> Philip
>>>>
>>>> On 03/22/2016 01:09 PM, David Blaikie wrote:
>>>>> Ultimately everything is going to be made to not rely on
>>>>> the types of pointers - that's nearly equivalent to
>>>>> bitcast-ignorant (the difference being that the presence
>>>>> of an extra instruction (the bitcast) might trip up some
>>>>> optimizations - but the presence of the /type/ information
>>>>> implied by the bitcast should not trip up or be necessary
>>>>> for optimizations (two sides of the same coin))
>>>>>
>>>>> If we're talking about making an optimization able to
>>>>> ignore the bitcast instructions - yes, that work is
>>>>> unnecessary & perhaps questionable given the typeless
>>>>> pointer work. Not outright off limits, but the same time
>>>>> might be better invested in moving typeless pointers
>>>>> forward if the contributor is so inclined/able to shift in
>>>>> that direction.
>>>>>
>>>>> But if we're talking about the work to make the
>>>>> optimization not use the type of pointers as a crutch -
>>>>> that work is a necessary precursor to the typeless pointer
>>>>> work and would be most welcome.
>>>>>
>>>>> - David
>>>>>
>>>>> On Tue, Mar 22, 2016 at 12:39 PM, Mehdi Amini
>>>>> <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
>>>>>
>>>>> I don't really mind, but the intermediate stage will
>>>>> not be very nice: that a lot of code / tests that
>>>>> needs to be written with bitcast, and all of that
>>>>> while they are deemed to disappear. The added value
>>>>> isn't clear to me considering the added work. I'm not
>>>>> sure it wouldn't add more work for all the cleanup
>>>>> required by the "typeless pointer", but I'm not sure
>>>>> what's involved here and if David thinks the
>>>>> intermediate steps of handling bit casts everywhere is
>>>>> not making it harder I'm fine with it.
>>>>>
>>>>> --
>>>>> Mehdi
>>>>>
>>>>>> On Mar 22, 2016, at 12:36 PM, Philip Reames
>>>>>> <listmail at philipreames.com
>>>>>> <mailto:listmail at philipreames.com>> wrote:
>>>>>>
>>>>>> I'd phrase this differently: being pointer-bitcast
>>>>>> agnostic is a step towards support typeless
>>>>>> pointers. :) We can either become bitcast agnostic
>>>>>> all in one big change or incrementally. Personally,
>>>>>> I'd prefer the later since it reduces the risk
>>>>>> associated with enabling typeless pointers in the end.
>>>>>>
>>>>>> Philip
>>>>>>
>>>>>> On 03/22/2016 12:16 PM, Mehdi Amini via llvm-dev wrote:
>>>>>>> I don't know enough about the tradeoff for 1, but 2
>>>>>>> seems like a bandaid for something that is not a
>>>>>>> correctness issue neither a regression. I'm not sure
>>>>>>> it justifies "bandaid patches" while there is a
>>>>>>> clear path forward, i.e. typeless pointers, unless
>>>>>>> there is an acknowledgement that typeless pointers
>>>>>>> won't be there before a couple of years.
>>>>>>>
>>>>>>> --
>>>>>>> Mehdi
>>>>>>>
>>>>>>>> On Mar 22, 2016, at 11:32 AM, Ehsan Amiri
>>>>>>>> <ehsanamiri at gmail.com
>>>>>>>> <mailto:ehsanamiri at gmail.com>> wrote:
>>>>>>>>
>>>>>>>> Back to the discussion on the RFC, I still see some
>>>>>>>> advantage in following the proposed solution. I see
>>>>>>>> two paths forward:
>>>>>>>>
>>>>>>>> 1- Change canonical form, possibly lower memcpy to
>>>>>>>> non-integer load and store in InstCombine. Then
>>>>>>>> teach the backends to convert that to integer load
>>>>>>>> and store if that is more profitable. Notice that
>>>>>>>> we are talking about loads that have no use other
>>>>>>>> than store. So it is a fairly simple change in the
>>>>>>>> backends.
>>>>>>>>
>>>>>>>> 2- Do not change the canonical form. Then for this
>>>>>>>> bug, we need to teach select speculation to see
>>>>>>>> through bitcasts. We will probably need to teach
>>>>>>>> other optimizations to see though bitcasts in the
>>>>>>>> future as problems are uncovered. That is until
>>>>>>>> typeless pointer work is complete. Once the
>>>>>>>> typeless pointer work is complete, we have some
>>>>>>>> extra code in each optimization for seeing through
>>>>>>>> bitcasts which is possibly no longer needed.
>>>>>>>>
>>>>>>>> Based on this I think (1) is the right thing to do.
>>>>>>>> But there might be other reasons for the current
>>>>>>>> canonical form that I am not aware of. Please let
>>>>>>>> me know what you think.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Ehsan
>>>>>>>>
>>>>>>>> On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie
>>>>>>>> <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri
>>>>>>>> <ehsanamiri at gmail.com
>>>>>>>> <mailto:ehsanamiri at gmail.com>> wrote:
>>>>>>>>
>>>>>>>> David,
>>>>>>>>
>>>>>>>> Could you give us an update on the status
>>>>>>>> of typeless pointer work? How much work is
>>>>>>>> left and when you think it might be ready?
>>>>>>>>
>>>>>>>>
>>>>>>>> It's a bit of an onion peel, really - since it
>>>>>>>> will eventually involve generalizing/fixing
>>>>>>>> every optimization that's currently leaning on
>>>>>>>> typed pointers to keep the performance while
>>>>>>>> removing the crutch they're currently leaning
>>>>>>>> on. (in cases where bitcasts are literally just
>>>>>>>> getting in the way, those won't require
>>>>>>>> cleaning up & should just become "free
>>>>>>>> performance wins" once we remove them, though)
>>>>>>>>
>>>>>>>> At the moment we can roundtrip every LLVM IR
>>>>>>>> test case through bitcode and textual IR
>>>>>>>> (reading/writing both formats) while using only
>>>>>>>> a narrow whitelist of places that request the
>>>>>>>> type of a pointer (things like the verifier,
>>>>>>>> the parser/printer where it actually needs the
>>>>>>>> typed pointer to verify it matches the explicit
>>>>>>>> type, etc).
>>>>>>>>
>>>>>>>> The next thing on the list is probably figuring
>>>>>>>> out the byval/inalloca representation (add an
>>>>>>>> explicit pointee type? just make the number of
>>>>>>>> bytes explicit with no type information?).
>>>>>>>>
>>>>>>>> Then start migrating optimizations over - doing
>>>>>>>> the same sort of testing I did for the
>>>>>>>> IR/bitcode roundtripping - assert that the
>>>>>>>> pointee type is not accessed, whitelist places
>>>>>>>> that need it until the bitcasts go away, fix
>>>>>>>> anything else... it'll still be a fair bit of
>>>>>>>> work & I don't really know how much. It should
>>>>>>>> parallelize pretty well (doing any of this work
>>>>>>>> is really helpful, each optimization is
>>>>>>>> indepednent, etc) if anyone wants to/is able to
>>>>>>>> help.
>>>>>>>>
>>>>>>>> - Dave
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Ehsan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 16, 2016 at 1:12 PM, David
>>>>>>>> Blaikie <dblaikie at gmail.com
>>>>>>>> <mailto:dblaikie at gmail.com>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 16, 2016 at 8:34 AM, Mehdi
>>>>>>>> Amini via llvm-dev
>>>>>>>> <llvm-dev at lists.llvm.org
>>>>>>>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> How do it interact with the
>>>>>>>> "typeless pointers" work?
>>>>>>>>
>>>>>>>>
>>>>>>>> Right - the goal of the typeless
>>>>>>>> pointer work is to fix all these bugs
>>>>>>>> related to "didn't look through
>>>>>>>> bitcasts" in optimizations. Sometimes
>>>>>>>> that's going to mean more work (because
>>>>>>>> the code is leaning on the absence of
>>>>>>>> bitcasts & the presence of convenient
>>>>>>>> (but not guaranteed) type information
>>>>>>>> to inform optimization decisions) but
>>>>>>>> if we remove typed pointers while
>>>>>>>> keeping optimization quality in the
>>>>>>>> cases we have today, then we should've
>>>>>>>> also fixed the cases that were broken
>>>>>>>> because the type information didn't end
>>>>>>>> up aligning to produce the optimal output.
>>>>>>>>
>>>>>>>> & I know I've been off the typeless
>>>>>>>> pointer stuff for a few months working
>>>>>>>> on llvm-dwp - but happy for any help
>>>>>>>> (the next immediate piece is probably
>>>>>>>> figuring out teh right representation
>>>>>>>> for byval and inalloca - there were
>>>>>>>> some code reviews sent out for that
>>>>>>>> that I'll need to come back around to -
>>>>>>>> but also any optimizations people want
>>>>>>>> to help rework/improve would be great
>>>>>>>> too & I can provide some
>>>>>>>> techniques/tools to help people
>>>>>>>> approach those)
>>>>>>>>
>>>>>>>> - Dave
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> --
>>>>>>>> Mehdi
>>>>>>>>
>>>>>>>>> On Mar 16, 2016, at 6:41 AM, Ehsan
>>>>>>>>> Amiri via llvm-dev
>>>>>>>>> <llvm-dev at lists.llvm.org
>>>>>>>>> <mailto:llvm-dev at lists.llvm.org>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> === PROBLEM === (See this bug
>>>>>>>>> https://llvm.org/bugs/show_bug.cgi?id=26445)
>>>>>>>>>
>>>>>>>>> IR contains code for loading a
>>>>>>>>> float from float * and storing it
>>>>>>>>> to a float * address. After
>>>>>>>>> canonicalization of load in
>>>>>>>>> InstCombine [1], new bitcasts are
>>>>>>>>> added to the IR (see bottom of the
>>>>>>>>> email for code samples). This
>>>>>>>>> prevents select speculation in
>>>>>>>>> SROA to work. Also after SROA we
>>>>>>>>> have bitcasts from int32 to float.
>>>>>>>>> (Whereas originally after
>>>>>>>>> instCombine, bitcasts are only
>>>>>>>>> done on pointer types).
>>>>>>>>>
>>>>>>>>> === PROPOSED SOLUTION===
>>>>>>>>>
>>>>>>>>> [1] implies that we need load
>>>>>>>>> canonicalization when we load a
>>>>>>>>> value only to store it again. The
>>>>>>>>> reason is to avoid generating
>>>>>>>>> slightly different code (due to
>>>>>>>>> different ways of adding
>>>>>>>>> bitcasts), in different
>>>>>>>>> situations. In all examples
>>>>>>>>> presented in [1] there is a
>>>>>>>>> non-zero number of bitcasts. I
>>>>>>>>> think when we load a value of type
>>>>>>>>> T from a T* address and store it
>>>>>>>>> as a type T value to one or more
>>>>>>>>> T* address (and there is no other
>>>>>>>>> use or store), we can redefine
>>>>>>>>> canonical form to mean there
>>>>>>>>> should not be any bitcasts. So we
>>>>>>>>> still have a canonical form, but
>>>>>>>>> its definition is slightly different.
>>>>>>>>>
>>>>>>>>> === REASONS FOR / AGAINST===
>>>>>>>>>
>>>>>>>>> Hal Finkel warns that while this
>>>>>>>>> may be useful for power pc, this
>>>>>>>>> may hurt more than one other
>>>>>>>>> platform and become a very large
>>>>>>>>> project. Despite this he is fine
>>>>>>>>> with bringing up the issue to the
>>>>>>>>> mailing list to get feedback,
>>>>>>>>> mostly because this seems inline
>>>>>>>>> with our future direction of
>>>>>>>>> having a unique type for all
>>>>>>>>> pointers. (Hal please correct me
>>>>>>>>> if I misunderstood your comment)
>>>>>>>>>
>>>>>>>>> This is a much simpler fix
>>>>>>>>> compared to alternatives.
>>>>>>>>> (ignoring potential regressions)
>>>>>>>>>
>>>>>>>>> === ALTERNATIVE SOLUTION ===
>>>>>>>>>
>>>>>>>>> Fix select speculation in SROA to
>>>>>>>>> see through bitcasts. Handle
>>>>>>>>> remaining bitcasts during code
>>>>>>>>> gen. Other alternative solutions
>>>>>>>>> are welcome.
>>>>>>>>>
>>>>>>>>> Should I implement the proposed
>>>>>>>>> solution or is it too risky? I
>>>>>>>>> understand that we may need to
>>>>>>>>> undo it if it breaks too many
>>>>>>>>> things. Comments are welcome.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
>>>>>>>>> r226781 git commit id: b778cbc0c8
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Code Samples (only relevant part
>>>>>>>>> is copied):
>>>>>>>>>
>>>>>>>>> -------------------- Before
>>>>>>>>> Canonicalization (contains call to
>>>>>>>>> std::max): --------------------
>>>>>>>>> entry:
>>>>>>>>> %max_value = alloca float, align 4
>>>>>>>>> %1 = load float, float* %input,
>>>>>>>>> align 4, !tbaa !1
>>>>>>>>> store float %1, float*
>>>>>>>>> %max_value, align 4, !tbaa !1
>>>>>>>>>
>>>>>>>>> for.body:
>>>>>>>>> %call = call dereferenceable(4)
>>>>>>>>> float*
>>>>>>>>> @_ZSt3maxIfERKT_S2_S2_(float*
>>>>>>>>> dereferenceable(4) %max_value,
>>>>>>>>> float* dereferenceable(4) %arrayidx1)
>>>>>>>>> %3 = load float, float* %call,
>>>>>>>>> align 4, !tbaa !1
>>>>>>>>> store float %3, float*
>>>>>>>>> %max_value, align 4, !tbaa !1
>>>>>>>>>
>>>>>>>>> -------------------- After
>>>>>>>>> Canonicalization (contains call to
>>>>>>>>> std::max):--------------------
>>>>>>>>>
>>>>>>>>> entry:
>>>>>>>>> %max_value = alloca float, align 4
>>>>>>>>> %1 = bitcast float* %input to i32*
>>>>>>>>> %2 = load i32, i32* %1, align 4,
>>>>>>>>> !tbaa !1
>>>>>>>>> %3 = bitcast float* %max_value
>>>>>>>>> to i32*
>>>>>>>>> store i32 %2, i32* %3, align 4,
>>>>>>>>> !tbaa !1
>>>>>>>>>
>>>>>>>>> for.body:
>>>>>>>>> %call = call dereferenceable(4)
>>>>>>>>> float*
>>>>>>>>> @_ZSt3maxIfERKT_S2_S2_(float*
>>>>>>>>> nonnull dereferenceable(4)
>>>>>>>>> %max_value, float*
>>>>>>>>> dereferenceable(4) %arrayidx1)
>>>>>>>>> %5 = bitcast float* %call to i32*
>>>>>>>>> %6 = load i32, i32* %5, align 4,
>>>>>>>>> !tbaa !1
>>>>>>>>> %7 = bitcast float* %max_value
>>>>>>>>> to i32*
>>>>>>>>> store i32 %6, i32* %7, align 4,
>>>>>>>>> !tbaa !1
>>>>>>>>>
>>>>>>>>> -------------------- After SROA
>>>>>>>>> (the call to std::max is inlined
>>>>>>>>> now):--------------------
>>>>>>>>> entry:
>>>>>>>>> %max_value.sroa.0 = alloca i32
>>>>>>>>> %0 = bitcast float* %input to i32*
>>>>>>>>> %1 = load i32, i32* %0, align 4,
>>>>>>>>> !tbaa !1
>>>>>>>>> store i32 %1, i32* %max_value.sroa.0
>>>>>>>>>
>>>>>>>>> for.body:
>>>>>>>>> %max_value.sroa.0.0.max_value.sroa.0.0.6
>>>>>>>>> = load i32, i32* %max_value.sroa.0
>>>>>>>>> %3 = bitcast i32
>>>>>>>>> %max_value.sroa.0.0.max_value.sroa.0.0.6
>>>>>>>>> to float
>>>>>>>>> %max_value.sroa.0.0.max_value.sroa_cast8
>>>>>>>>> = bitcast i32* %max_value.sroa.0
>>>>>>>>> to float*
>>>>>>>>> %__b.__a.i = select i1 %cmp.i,
>>>>>>>>> float* %arrayidx1, float*
>>>>>>>>> %max_value.sroa.0.0.max_value.sroa_cast8
>>>>>>>>> %5 = bitcast float* %__b.__a.i
>>>>>>>>> to i32*
>>>>>>>>> %6 = load i32, i32* %5, align 4,
>>>>>>>>> !tbaa !1
>>>>>>>>> store i32 %6, i32* %max_value.sroa.0
>>>>>>>>>
>>>>>>>>> -------------------- After SROA
>>>>>>>>> when Canonicalization is turned
>>>>>>>>> off--------------------
>>>>>>>>> entry:
>>>>>>>>> %0 = load float, float* %input,
>>>>>>>>> align 4, !tbaa !1
>>>>>>>>>
>>>>>>>>> for.cond: ; preds = %for.body, %entry
>>>>>>>>> %max_value.0 = phi float [ %0,
>>>>>>>>> %entry ], [ %.sroa.speculated,
>>>>>>>>> %for.body ]
>>>>>>>>>
>>>>>>>>> for.body:
>>>>>>>>> %1 = load float, float*
>>>>>>>>> %arrayidx1, align 4, !tbaa !1
>>>>>>>>> %cmp.i = fcmp olt float
>>>>>>>>> %max_value.0, %1
>>>>>>>>> %.sroa.speculate.load.true = load
>>>>>>>>> float, float* %arrayidx1, align 4,
>>>>>>>>> !tbaa !1
>>>>>>>>> %.sroa.speculated = select i1
>>>>>>>>> %cmp.i, float
>>>>>>>>> %.sroa.speculate.load.true, float
>>>>>>>>> %max_value.0
>>>>>>>>> _______________________________________________
>>>>>>>>> LLVM Developers mailing list
>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>> <mailto:llvm-dev at lists.llvm.org>
>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>> <mailto:llvm-dev at lists.llvm.org>
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/1030d65c/attachment-0001.html>
More information about the llvm-dev
mailing list