[LLVMdev] Separating loop nests based on profile information?
Philip Reames
listmail at philipreames.com
Thu Jan 15 17:07:22 PST 2015
On 01/15/2015 09:59 AM, Adam Nemet wrote:
>
>> On Jan 8, 2015, at 12:37 PM, Philip Reames <listmail at philipreames.com
>> <mailto:listmail at philipreames.com>> wrote:
>>
>>
>> On 01/07/2015 06:42 PM, Adam Nemet wrote:
>>>
>>>>
>>>> How does this compare with classical approaches of loop peeling,
>>>> partitioning, fission, or whatever you might call it? Is there any
>>>> literature behind this approach or some literature it should be
>>>> compared with? (I genuinely don't know this area that well, so I'm
>>>> of little help here…)
>>>
>>> I think it’s pretty different from loop distribution/fission. In
>>> loop distribution, in order to split the loop into multiple
>>> consecutive loops, you need to reason about the memory references so
>>> having a function call makes that difficult/impossible. Phillip’s
>>> idea works around this exact issue.
>>>
>>> I explored this space quite a bit recently because we’re working on
>>> a proposal to add loop distribution to LLVM. I hope to send out the
>>> proposal this week.
>> Just to note, I'm going to be very interested in seeing this. I have
>> concerns about profitability, but having a mechanism is clearly a
>> good thing. :)
>> Are you also looking at loop fusion? I've seen that come up in
>> practice for a few cases where it would *really* help. Generally,
>> filter/reduce patterns implemented by hand. Nowhere near the top of
>> my priority list, but if you happened to be working on it, I'm happy
>> to help review and brainstorm.
>
> Yes, we’re planning to look at fusion next. Thanks for all your
> feedback so far!
>
> As to your example, do you mean chain of operations
> functional-programming style? E.g. with OO syntax:
>
> list.map(func1).map(func2).filter(func3).reduce(func4), where these
> could be squashed into one loop.
That's exactly what I was thinking of.
>
> Thanks,
> Adam
>
>>> So I see no issue with trying to handle loops with low-probablity
>>> function calls with this and fully self-contained loops with
>>> classical loop distribution.
>>>
>>> Thanks,
>>> Adam
>>>
>>>> Some of your points I have quick feedback on:
>>>>
>>>> Points for discussion:
>>>>
>>>> * Is using profile information for this purpose even a
>>>> reasonable thing to do?
>>>>
>>>> Yes!
>>>>
>>>> * I chose to implement this without relying on the existing
>>>> block frequency analysis. My reasoning was that a) this is
>>>> a rarely taken case and adding an expensive analysis
>>>> dependency probably wasn't worthwhile and b) that block
>>>> frequency analysis was more expensive/precise than I really
>>>> needed. Is this reasonable?
>>>>
>>>> I think we should always use the analyses. Either BlockFrequency or
>>>> BranchProbability. I think probably both in the common joint usage
>>>> (frequency of the loop header combined with probability of the cold
>>>> region).
>>>>
>>>> * If so, is the notion of 'rareness' of a loop block
>>>> something that's worth extracting out on it's own and
>>>> reusing? Are there other similar uses anyone can think of?
>>>> * Currently, I'm only supporting a fairly small set of
>>>> controlling conditions. Are there important cases I'm not
>>>> considering?
>>>>
>>>> To both of these, I think the general combination to use is to
>>>> identify the set of blocks dominated by a block which is in the
>>>> loop body of a hot loop, and is cold relative to the other
>>>> successors of its predecessor(s). These form cold "regions" as I
>>>> think of them without requiring the complexity of the region analysis.
>>>>
>>>> * Since the rarest latch is often deep in a loop - with other
>>>> "if (X) continue;" (i.e. latches) before it - this tends to
>>>> create loops with multiple exiting blocks. Some of the
>>>> existing passes might not deal with this well, is that a
>>>> major concern? Suggestions for how to analysis and validate?
>>>>
>>>> I'm somewhat concerned about this, but want to think more about the
>>>> fundamental transformation.
>>>>
>>>> * Currently, I've structured this as pulling off the rarest
>>>> latch as an outer iteration. I could also pull off the most
>>>> popular latch as an inner iteration. This might give
>>>> different tradeoffs; thoughts?
>>>>
>>>> Generally, any thoughts anyone have on the problem or approach
>>>> are welcome. I'm not particular attached to the particular
>>>> approach laid out here and if there's a more advantageous
>>>> approach, all the better.
>>>>
>>>> Thanks for pushing on this! ;] Now I need to go and ponder a lot so
>>>> i can reply more deeply on the actual transform.
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu
>>>> <mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>>> <http://llvm.cs.uiuc.edu/>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150115/a72a6082/attachment.html>
More information about the llvm-dev
mailing list