[llvm-dev] RFC: A change in InstCombine canonical form
Philip Reames via llvm-dev
llvm-dev at lists.llvm.org
Tue Mar 22 13:37:48 PDT 2016
But not what David was stating, unless I misread? I was specifically
responding to David's wording:
"If we're talking about making an optimization able to ignore the
bitcast instructions - yes, that work is unnecessary & perhaps
questionable given the typeless pointer work. Not outright off limits,
but the same time might be better invested in moving typeless pointers
forward if the contributor is so inclined/able to shift in that direction."
Both "perhaps questionable" and "not outright off limits" seem to
strongly imply such work should be discouraged. I disagree with that
view which is why I wrote my response.
Philip
On 03/22/2016 01:34 PM, Mehdi Amini wrote:
> This is roughly what I wrote...
>
>> On Mar 22, 2016, at 1:31 PM, Philip Reames <listmail at philipreames.com
>> <mailto:listmail at philipreames.com>> wrote:
>>
>> I feel very strongly that blocking work on making optimization
>> bitcast-ignorant on the typeless pointer work would be a major
>> mistake. Unless we expected the typeless pointer work to be
>> concluded within the near term (say 3-6 months maximum), we should
>> not block any development which would be accepted in the typeless
>> pointer work wasn't planned.
>>
>> In my view, this is one of the largest mistakes we've made with the
>> pass manager work, it has seriously cost us, and I don't want to see
>> it happening again.
>>
>> Philip
>>
>> On 03/22/2016 01:09 PM, David Blaikie wrote:
>>> Ultimately everything is going to be made to not rely on the types
>>> of pointers - that's nearly equivalent to bitcast-ignorant (the
>>> difference being that the presence of an extra instruction (the
>>> bitcast) might trip up some optimizations - but the presence of the
>>> /type/ information implied by the bitcast should not trip up or be
>>> necessary for optimizations (two sides of the same coin))
>>>
>>> If we're talking about making an optimization able to ignore the
>>> bitcast instructions - yes, that work is unnecessary & perhaps
>>> questionable given the typeless pointer work. Not outright off
>>> limits, but the same time might be better invested in moving
>>> typeless pointers forward if the contributor is so inclined/able to
>>> shift in that direction.
>>>
>>> But if we're talking about the work to make the optimization not use
>>> the type of pointers as a crutch - that work is a necessary
>>> precursor to the typeless pointer work and would be most welcome.
>>>
>>> - David
>>>
>>> On Tue, Mar 22, 2016 at 12:39 PM, Mehdi Amini <mehdi.amini at apple.com
>>> <mailto:mehdi.amini at apple.com>> wrote:
>>>
>>> I don't really mind, but the intermediate stage will not be very
>>> nice: that a lot of code / tests that needs to be written with
>>> bitcast, and all of that while they are deemed to disappear. The
>>> added value isn't clear to me considering the added work. I'm
>>> not sure it wouldn't add more work for all the cleanup required
>>> by the "typeless pointer", but I'm not sure what's involved here
>>> and if David thinks the intermediate steps of handling bit casts
>>> everywhere is not making it harder I'm fine with it.
>>>
>>> --
>>> Mehdi
>>>
>>>> On Mar 22, 2016, at 12:36 PM, Philip Reames
>>>> <listmail at philipreames.com <mailto:listmail at philipreames.com>>
>>>> wrote:
>>>>
>>>> I'd phrase this differently: being pointer-bitcast agnostic is
>>>> a step towards support typeless pointers. :) We can either
>>>> become bitcast agnostic all in one big change or
>>>> incrementally. Personally, I'd prefer the later since it
>>>> reduces the risk associated with enabling typeless pointers in
>>>> the end.
>>>>
>>>> Philip
>>>>
>>>> On 03/22/2016 12:16 PM, Mehdi Amini via llvm-dev wrote:
>>>>> I don't know enough about the tradeoff for 1, but 2 seems like
>>>>> a bandaid for something that is not a correctness issue
>>>>> neither a regression. I'm not sure it justifies "bandaid
>>>>> patches" while there is a clear path forward, i.e. typeless
>>>>> pointers, unless there is an acknowledgement that typeless
>>>>> pointers won't be there before a couple of years.
>>>>>
>>>>> --
>>>>> Mehdi
>>>>>
>>>>>> On Mar 22, 2016, at 11:32 AM, Ehsan Amiri
>>>>>> <ehsanamiri at gmail.com> wrote:
>>>>>>
>>>>>> Back to the discussion on the RFC, I still see some advantage
>>>>>> in following the proposed solution. I see two paths forward:
>>>>>>
>>>>>> 1- Change canonical form, possibly lower memcpy to
>>>>>> non-integer load and store in InstCombine. Then teach the
>>>>>> backends to convert that to integer load and store if that is
>>>>>> more profitable. Notice that we are talking about loads that
>>>>>> have no use other than store. So it is a fairly simple change
>>>>>> in the backends.
>>>>>>
>>>>>> 2- Do not change the canonical form. Then for this bug, we
>>>>>> need to teach select speculation to see through bitcasts. We
>>>>>> will probably need to teach other optimizations to see though
>>>>>> bitcasts in the future as problems are uncovered. That is
>>>>>> until typeless pointer work is complete. Once the typeless
>>>>>> pointer work is complete, we have some extra code in each
>>>>>> optimization for seeing through bitcasts which is possibly no
>>>>>> longer needed.
>>>>>>
>>>>>> Based on this I think (1) is the right thing to do. But there
>>>>>> might be other reasons for the current canonical form that I
>>>>>> am not aware of. Please let me know what you think.
>>>>>>
>>>>>> Thanks
>>>>>> Ehsan
>>>>>>
>>>>>> On Wed, Mar 16, 2016 at 2:13 PM, David Blaikie
>>>>>> <dblaikie at gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 16, 2016 at 11:00 AM, Ehsan Amiri
>>>>>> <ehsanamiri at gmail.com> wrote:
>>>>>>
>>>>>> David,
>>>>>>
>>>>>> Could you give us an update on the status of typeless
>>>>>> pointer work? How much work is left and when you
>>>>>> think it might be ready?
>>>>>>
>>>>>>
>>>>>> It's a bit of an onion peel, really - since it will
>>>>>> eventually involve generalizing/fixing every optimization
>>>>>> that's currently leaning on typed pointers to keep the
>>>>>> performance while removing the crutch they're currently
>>>>>> leaning on. (in cases where bitcasts are literally just
>>>>>> getting in the way, those won't require cleaning up &
>>>>>> should just become "free performance wins" once we remove
>>>>>> them, though)
>>>>>>
>>>>>> At the moment we can roundtrip every LLVM IR test case
>>>>>> through bitcode and textual IR (reading/writing both
>>>>>> formats) while using only a narrow whitelist of places
>>>>>> that request the type of a pointer (things like the
>>>>>> verifier, the parser/printer where it actually needs the
>>>>>> typed pointer to verify it matches the explicit type, etc).
>>>>>>
>>>>>> The next thing on the list is probably figuring out the
>>>>>> byval/inalloca representation (add an explicit pointee
>>>>>> type? just make the number of bytes explicit with no type
>>>>>> information?).
>>>>>>
>>>>>> Then start migrating optimizations over - doing the same
>>>>>> sort of testing I did for the IR/bitcode roundtripping -
>>>>>> assert that the pointee type is not accessed, whitelist
>>>>>> places that need it until the bitcasts go away, fix
>>>>>> anything else... it'll still be a fair bit of work & I
>>>>>> don't really know how much. It should parallelize pretty
>>>>>> well (doing any of this work is really helpful, each
>>>>>> optimization is indepednent, etc) if anyone wants to/is
>>>>>> able to help.
>>>>>>
>>>>>> - Dave
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Ehsan
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 16, 2016 at 1:12 PM, David Blaikie
>>>>>> <dblaikie at gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 16, 2016 at 8:34 AM, Mehdi Amini via
>>>>>> llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> How do it interact with the "typeless
>>>>>> pointers" work?
>>>>>>
>>>>>>
>>>>>> Right - the goal of the typeless pointer work is
>>>>>> to fix all these bugs related to "didn't look
>>>>>> through bitcasts" in optimizations. Sometimes
>>>>>> that's going to mean more work (because the code
>>>>>> is leaning on the absence of bitcasts & the
>>>>>> presence of convenient (but not guaranteed) type
>>>>>> information to inform optimization decisions) but
>>>>>> if we remove typed pointers while keeping
>>>>>> optimization quality in the cases we have today,
>>>>>> then we should've also fixed the cases that were
>>>>>> broken because the type information didn't end up
>>>>>> aligning to produce the optimal output.
>>>>>>
>>>>>> & I know I've been off the typeless pointer stuff
>>>>>> for a few months working on llvm-dwp - but happy
>>>>>> for any help (the next immediate piece is
>>>>>> probably figuring out teh right representation
>>>>>> for byval and inalloca - there were some code
>>>>>> reviews sent out for that that I'll need to come
>>>>>> back around to - but also any optimizations
>>>>>> people want to help rework/improve would be great
>>>>>> too & I can provide some techniques/tools to help
>>>>>> people approach those)
>>>>>>
>>>>>> - Dave
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --
>>>>>> Mehdi
>>>>>>
>>>>>>> On Mar 16, 2016, at 6:41 AM, Ehsan Amiri via
>>>>>>> llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>> === PROBLEM === (See this bug
>>>>>>> https://llvm.org/bugs/show_bug.cgi?id=26445)
>>>>>>>
>>>>>>> IR contains code for loading a float from
>>>>>>> float * and storing it to a float * address.
>>>>>>> After canonicalization of load in
>>>>>>> InstCombine [1], new bitcasts are added to
>>>>>>> the IR (see bottom of the email for code
>>>>>>> samples). This prevents select speculation
>>>>>>> in SROA to work. Also after SROA we have
>>>>>>> bitcasts from int32 to float. (Whereas
>>>>>>> originally after instCombine, bitcasts are
>>>>>>> only done on pointer types).
>>>>>>>
>>>>>>> === PROPOSED SOLUTION===
>>>>>>>
>>>>>>> [1] implies that we need load
>>>>>>> canonicalization when we load a value only
>>>>>>> to store it again. The reason is to avoid
>>>>>>> generating slightly different code (due to
>>>>>>> different ways of adding bitcasts), in
>>>>>>> different situations. In all examples
>>>>>>> presented in [1] there is a non-zero number
>>>>>>> of bitcasts. I think when we load a value of
>>>>>>> type T from a T* address and store it as a
>>>>>>> type T value to one or more T* address (and
>>>>>>> there is no other use or store), we can
>>>>>>> redefine canonical form to mean there should
>>>>>>> not be any bitcasts. So we still have a
>>>>>>> canonical form, but its definition is
>>>>>>> slightly different.
>>>>>>>
>>>>>>> === REASONS FOR / AGAINST===
>>>>>>>
>>>>>>> Hal Finkel warns that while this may be
>>>>>>> useful for power pc, this may hurt more than
>>>>>>> one other platform and become a very large
>>>>>>> project. Despite this he is fine with
>>>>>>> bringing up the issue to the mailing list to
>>>>>>> get feedback, mostly because this seems
>>>>>>> inline with our future direction of having a
>>>>>>> unique type for all pointers. (Hal please
>>>>>>> correct me if I misunderstood your comment)
>>>>>>>
>>>>>>> This is a much simpler fix compared to
>>>>>>> alternatives. (ignoring potential regressions)
>>>>>>>
>>>>>>> === ALTERNATIVE SOLUTION ===
>>>>>>>
>>>>>>> Fix select speculation in SROA to see
>>>>>>> through bitcasts. Handle remaining bitcasts
>>>>>>> during code gen. Other alternative solutions
>>>>>>> are welcome.
>>>>>>>
>>>>>>> Should I implement the proposed solution or
>>>>>>> is it too risky? I understand that we may
>>>>>>> need to undo it if it breaks too many
>>>>>>> things. Comments are welcome.
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>> http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html
>>>>>>> r226781 git commit id: b778cbc0c8
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Code Samples (only relevant part is copied):
>>>>>>>
>>>>>>> -------------------- Before Canonicalization
>>>>>>> (contains call to std::max):
>>>>>>> --------------------
>>>>>>> entry:
>>>>>>> %max_value = alloca float, align 4
>>>>>>> %1 = load float, float* %input, align 4,
>>>>>>> !tbaa !1
>>>>>>> store float %1, float* %max_value, align
>>>>>>> 4, !tbaa !1
>>>>>>>
>>>>>>> for.body:
>>>>>>> %call = call dereferenceable(4) float*
>>>>>>> @_ZSt3maxIfERKT_S2_S2_(float*
>>>>>>> dereferenceable(4) %max_value, float*
>>>>>>> dereferenceable(4) %arrayidx1)
>>>>>>> %3 = load float, float* %call, align 4,
>>>>>>> !tbaa !1
>>>>>>> store float %3, float* %max_value, align
>>>>>>> 4, !tbaa !1
>>>>>>>
>>>>>>> -------------------- After Canonicalization
>>>>>>> (contains call to
>>>>>>> std::max):--------------------
>>>>>>>
>>>>>>> entry:
>>>>>>> %max_value = alloca float, align 4
>>>>>>> %1 = bitcast float* %input to i32*
>>>>>>> %2 = load i32, i32* %1, align 4, !tbaa !1
>>>>>>> %3 = bitcast float* %max_value to i32*
>>>>>>> store i32 %2, i32* %3, align 4, !tbaa !1
>>>>>>>
>>>>>>> for.body:
>>>>>>> %call = call dereferenceable(4) float*
>>>>>>> @_ZSt3maxIfERKT_S2_S2_(float* nonnull
>>>>>>> dereferenceable(4) %max_value, float*
>>>>>>> dereferenceable(4) %arrayidx1)
>>>>>>> %5 = bitcast float* %call to i32*
>>>>>>> %6 = load i32, i32* %5, align 4, !tbaa !1
>>>>>>> %7 = bitcast float* %max_value to i32*
>>>>>>> store i32 %6, i32* %7, align 4, !tbaa !1
>>>>>>>
>>>>>>> -------------------- After SROA (the call to
>>>>>>> std::max is inlined now):--------------------
>>>>>>> entry:
>>>>>>> %max_value.sroa.0 = alloca i32
>>>>>>> %0 = bitcast float* %input to i32*
>>>>>>> %1 = load i32, i32* %0, align 4, !tbaa !1
>>>>>>> store i32 %1, i32* %max_value.sroa.0
>>>>>>>
>>>>>>> for.body:
>>>>>>> %max_value.sroa.0.0.max_value.sroa.0.0.6 =
>>>>>>> load i32, i32* %max_value.sroa.0
>>>>>>> %3 = bitcast i32
>>>>>>> %max_value.sroa.0.0.max_value.sroa.0.0.6 to
>>>>>>> float
>>>>>>> %max_value.sroa.0.0.max_value.sroa_cast8 =
>>>>>>> bitcast i32* %max_value.sroa.0 to float*
>>>>>>> %__b.__a.i = select i1 %cmp.i, float*
>>>>>>> %arrayidx1, float*
>>>>>>> %max_value.sroa.0.0.max_value.sroa_cast8
>>>>>>> %5 = bitcast float* %__b.__a.i to i32*
>>>>>>> %6 = load i32, i32* %5, align 4, !tbaa !1
>>>>>>> store i32 %6, i32* %max_value.sroa.0
>>>>>>>
>>>>>>> -------------------- After SROA when
>>>>>>> Canonicalization is turned
>>>>>>> off--------------------
>>>>>>> entry:
>>>>>>> %0 = load float, float* %input, align 4,
>>>>>>> !tbaa !1
>>>>>>>
>>>>>>> for.cond: ; preds = %for.body, %entry
>>>>>>> %max_value.0 = phi float [ %0, %entry ], [
>>>>>>> %.sroa.speculated, %for.body ]
>>>>>>>
>>>>>>> for.body:
>>>>>>> %1 = load float, float* %arrayidx1, align
>>>>>>> 4, !tbaa !1
>>>>>>> %cmp.i = fcmp olt float %max_value.0, %1
>>>>>>> %.sroa.speculate.load.true = load float,
>>>>>>> float* %arrayidx1, align 4, !tbaa !1
>>>>>>> %.sroa.speculated = select i1 %cmp.i, float
>>>>>>> %.sroa.speculate.load.true, float %max_value.0
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/0f9ac7d6/attachment-0001.html>
More information about the llvm-dev
mailing list