PATCH: Early if-converter store if-conversion

Thu Apr 25 10:20:49 PDT 2013

On Apr 25, 2013, at 11:45 AM, Andrew Trick <atrick at apple.com> wrote:

> 
> On Apr 25, 2013, at 7:45 AM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> 
>> 
>> On Apr 25, 2013, at 9:23 AM, Hal Finkel <hfinkel at anl.gov> wrote:
>> 
>>> ----- Original Message -----
>>>> From: "Arnold" <aschwaighofer at apple.com>
>>>> To: "Andrew Trick" <atrick at apple.com>
>>>> Cc: llvm-commits at cs.uiuc.edu
>>>> Sent: Thursday, April 25, 2013 8:16:42 AM
>>>> Subject: Re: PATCH: Early if-converter store if-conversion
>>>> 
>>>> 
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>> On Apr 25, 2013, at 3:42 AM, Andrew Trick <atrick at apple.com> wrote:
>>>> 
>>>>> 
>>>>> On Apr 23, 2013, at 1:22 PM, Arnold Schwaighofer
>>>>> <aschwaighofer at apple.com> wrote:
>>>>> 
>>>>>> This patch performs a similar optimization than my previous
>>>>>> attempt did in SimplifyCFG.
>>>>>> 
>>>>>> This time we convert conditional stores in the early-if converter
>>>>>> where we can use trace metrics to gauge the “closeness” of the
>>>>>> two stores. If they are likely to be execute close together the
>>>>>> cost of the second store should be cheap compared to
>>>>>> mispredicting a branch.
>>>>>> 
>>>>>> This patch needs more tests and probably a target lowering call
>>>>>> whether the (sub)target wants to if convert stores. But I wanted
>>>>>> to get peoples opinions/input. Also it is not polished yet (so
>>>>>> feel free to ignore formatting and similar concerns).
>>>>>> 
>>>>>> I have (correctness) tested the patch on test-suite+external and
>>>>>> "-mllvm -x86-early-ifcvt=true -mllvm -stress-early-ifcvt=true”
>>>>>> and observed no correctness regressions.
>>>>>> 
>>>>>> Performance will have to be analysed (give that the previous patch
>>>>>> in simplify cfg did not cause too many regressions I don’t expect
>>>>>> many to crop up here - but you never know). Jakob said that there
>>>>>> are still some performance regressions with the early-if
>>>>>> converter (without my patch). So, I wanted to wait with
>>>>>> performance analysis until they are resolved.
>>>>>> 
>>>>>> I have however confirmed that we still get the 20% in hmmer ;)
>>>>>> 
>>>>>> 
>>>>>> Best,
>>>>>> Arnold
>>>>>> 
>>>>>> Attached are two patches: the first adds an api to targetinstrinfo
>>>>>> to query and modify “register to memory stores”. During
>>>>>> if-conversion we need to change the value register of such stores
>>>>>> and need to compare two stores’ memory locations. The second
>>>>>> implements the if-conversion in the early-if conversion pass.
>>>>>> 
>>>>>> <0001-TargetInstrInfo-Add-api-to-query-and-change-simple-r.patch><0002-Early-if-conversion-Implement-if-conversion-of-store.patch>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> The thread following my simplify cfg commit
>>>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130415/172206.html
>>>>>> has a discussion why we rather wanted to do this in simplify cfg.
>>>>> 
>>>>> It's sad that you needed new hooks specifically for this transform.
>>>>> 
>>>> 
>>>> I believe that is the cost of doing transforms in codegen were you
>>>> have target specific instructions and need to know their semantics.
>>>> At least that has been my experience in the past.
>>>> 
>>>> Maybe it is not necessary in this case. I believe it is.
>>>> 
>>>>> I thought you could use mayStore along with an option to
>>>>> isSafeToMove to allow stores. Then MachineMemOperand to determine
>>>>> aliasing. Then maybe check if the aliasing stores have a single
>>>>> vreg that differs and cmove on that?
>>>> 
>>>> 
>>>> You are assuming that this vreg has the semantics of the value to be
>>>> stored but it could mean something else. What about a theoretical
>>>> "inc mem reg" increase a value in memory by register
>>>> instruction.(without returning the value)
>> 
>> Would such an instruction have two machine memory operands (one for reading, one for writing)?
>> 
>> Anyway, I think we are in dangerous waters making assumptions about the semantics of an instructions by just looking at its operands, its may store property and having one vreg operand that is different from a preceding similar operation.
>> 
>> Also the memory operand does not qualify the set of machine instr operands that designate a memory location (on x86 it is 3 or so) I think. So we wouldn't know whether a differing vreg operand designates (part of the) memory address or the value being stored.
> 
> You're right. We can't assume a store is repeatable. I thought we could weed out weird cases by checking MachineMemOperand properties, but on second thought probably not.
> 
> We should have an MI abstraction that provides simple load/store semantics. It would be nice if that abstraction were minimal and reused across phases that care: if-convert, scheduler, load/store optimizer. It strikes me as odd that we need MachineMemOperands, getLdStBaseRegImmOfs, and haveRegToMemStoresSameLocation.

Are you thinking of a generic mi instruction like a PHI node that is lowered later. Or a more generic TTI api. I agree with you that we should avoid a proliferation of to specify apis.

Back to the patch in question:

The reason I did not rely on MachineMemOperands for comparing memory locations is that I was not sure whether they would be there reliably (they can be omitted safely) in the cases we care about and we would loose opportunity because of there non-presence. Maybe, this is a non-concern for the cases we care about …

I think, I can get rid of haveRegToMemStoresSameLocation api call and just use the MachineMemOperand if we say we consider it a performance bug if the MachineMemOperand got lost along the way hampering our desire to if-convert a store in question.

We can also get rid of changeRegToMemStore ("change the register of a register to memory instruction"). After making sure that the MachineMemOperand is designating the same location, we only need to make sure that there is only one vreg different in the instruction (we already checked that it is a register to memory move using “isRegToMemStore”).

So we would be left with just “isRegToMemoryStore”.

> This shouldn't hold up the patch! I just want to eventually evolve toward a simpler API.

I also think as we move more optimizations to the machine level we want to make sure to keep apis as generic a possible. I have seen what happens otherwise … and it is not nice. Thanks for pushing me on this.