[LLVMdev] Improving loop vectorizer support for loops with a volatile iteration variable

Wed Jul 15 18:09:12 PDT 2015

On Wed, Jul 15, 2015 at 5:34 PM, Chandler Carruth <chandlerc at google.com>
wrote:

> On Wed, Jul 15, 2015 at 12:55 PM Hyojin Sung <hsung at us.ibm.com> wrote:
>
>> Hi all,
>>
>> I would like to propose an improvement of the “almost dead” block
>> elimination in Transforms/Local.cpp so that it will preserve the canonical
>> loop form for loops with a volatile iteration variable.
>>
>> *** Problem statement
>> Nested loops in LCALS Subset B (*https://codesign.llnl.gov/LCALS.php*
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__codesign.llnl.gov_LCALS.php&d=AwMGaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=aWKfvN4c8lvUSvVn8J0Z2ajTctlBJf0198Au28epBr0&s=4d9dt5ODcDWHHatSrwu5ZYT9ebgVzNEtpOlIR87izCM&e=>)
>> are not vectorized with LLVM -O3 because the LLVM loop vectorizer fails the
>> test whether the loop latch and exiting block of a loop is the same. The
>> loops are vectorizable, and get vectorized with LLVM -O2
>>
> I would be interested to know why -O2 succeeds here.
>
>
>> and also with other commercial compilers (icc, xlc).
>>
>> *** Details
>> These loops ended up with different loop latch and exiting block after a
>> series of optimizations including loop unswitching, jump threading,
>> simplify-the-CFG, and loop simplify. The fundamental problem here is that
>> the above optimizations cannot recognize a loop with a volatile iteration
>> variable and do not preserve its canonical loop structure.
>>
> Ok, meta-level question first:
>
> Why do we care about performance of loops with a volatile iteration
> variable? That seems both counter-intuitive and unlikely to be a useful
> goal. We simply don't optimize volatile operations well in *any* part of
> the optimizer, and I'm not sure why we need to start trying to fix that.
> This seems like an irreparably broken benchmark, but perhaps there is a
> motivation I don't yet see.
>

A quick look at the tarball on the linked site suggests that the volatile
iteration variable is done on purpose so that the outer "run thing N times
and take the average" loop can't be optimized.

-- Sean Silva

>
>
> Assuming that sufficient motivation arises to try to fix this, see my
> comments below:
>
>
>>
>>
>> (1) Loop unswitching generates several empty placeholder BBs only with
>> PHI nodes after separating out a shorter path with no inner loop execution
>> from a standard path.
>>
>> (2) Jump threading and simplify-the-CFG passes independently calls
>> TryToSimplifyUnconditionalBranchFromEmptyBlock() in
>> Transforms/Utils/Local.cpp to get rid of almost empty BBs.
>>
>> (3) TryToSimplifyUnconditionalBranchFromEmtpyBlock() eliminates the
>> placeholder BBs after loop unswitching and merges them into subsequent
>> blocks including the header of the inner loop. Before eliminating the
>> blocks, the function checks if the block is a loop header by looking at its
>> PHI nodes so that it can be saved, but the test fails with the loops with a
>> volatile iteration variable.
>>
> Why does this fail for a volatile iteration variable but not for a
> non-volatile one? I think understanding that will be key to understanding
> how it should be fixed.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150715/1dc26e05/attachment.html>