[LLVMdev] LICM/store-aliasing of global loads

Tue Jul 22 18:49:33 PDT 2008

On Jul 22, 2008, at 2:51 PM, Stefanus Du Toit wrote:

> On 22-Jul-08, at 1:22 PM, Dan Gohman wrote:
>>> If I apply the patch found further down, LICM moves the loads out  
>>> (as
>>> expected), but of course this is a fairly specific fix.
>>
>> Slightly better than checking for GlobalVariabls directly
>> is to call the AliasAnalysis' pointsToConstantMemory method.
>> BasicAliasAnalysis' implementation of that does exactly the same
>> thing,
>> checking for constant GlobalVariables, but it would allow alias
>> analyses to do more sophisticated things. Could you submit a patch
>> for this?
>
> Ah, I was going to ask if there was such a method.
>
> I'll submit a patch once I'm done testing it.

Thanks!

>
>
>>> What's the right way to handle this? Should Basic AA handle this
>>> case?
>>> Will the RA be aware that it can remat these loads or do I need to  
>>> do
>>> something else to allow it to know this? Will the scheduler be aware
>>> that it can reorder them?
>>
>> It would be nice to have an AA that's smart enough to do things
>> like this. However for now, having code use
>> AliasAnalysis::pointsToConstantMemory should cover many of the
>> obvious cases.
>
> OK, are there any other obvious places this should go? I expect
> optimizations that use getModRefInfo on stores rather than alias sets
> won't need any changes.

At a glance, it looks like DeadStoreElimination and
MemoryDependenceAnalysis could also use this.

>
>
>>> Obviously I can also move the loads to the entry block of the
>>> function, but that does not address the RA/scheduling issues and is
>>> difficult to do in general due to some additional semantics in our
>>> frontend.
>>
>> In the scheduling department, LLVM is not yet using any alias
>> information. You can experiment with the -combiner-alias-analysis and
>> -combiner-global-alias-analysis options, which use AliasAnalysis
>> queries and do a pretty good job, but aren't very efficient and not
>> very widely tested. Ideally we'd like to do something better here.
>
> Could you expand on this a bit (or point me to past discussions/...)?

The codegen dependency graphs and the scheduler are capable of
working with precise memory dependencies, but right now they're
being given conservative dependencies. Take a look at the
-view-sunit-dags graphs on code with lots of simple loads and
stores, and you'll often see tall chains of dashed blue lines.
That's what we'd like to avoid :-).

This can be fixed by either teaching SelectionDAGISel.cpp how
to use AliasAnalysis information to compute better dependencies
when building the graph, or by teaching DAGCombiner.cpp how to
use AliasAnalysis information to optimize the dependencies.
The options I mentioned above take the latter approach, though
they probably have room for improvement.

One other factor here is that some of the transformations that
LoopStrengthReduction and CodeGenPrepare do lower getelementptrs to
ptrtoint+arithmetic+inttoptr, and this interferes with
BasicAliasAnalysis. There's been some discussion of this on
llvmdev, but I don't have a link handy.

> This can be pretty important on architectures like Cell SPU.

I agree that it's an important capability. We're gradually working
on adding the requisite pieces, and we welcome help :-).

>
>
>> For register allocation, LLVM currently has some simple hooks which
>> individual targets use to specify which loads are rematerializable.
>> See isReallyTriviallyReMaterializable. Currently this code is all
>> target-specific and doesn't use AliasAnalysis information, but I
>> think it could be reasonably generalized to use the new
>> MachineMemOperand information to be less target-dependent and to
>> make at least AliasAnalysis::pointsToConstantMemory queries.
>
> OK, I will have a look. I assume the reference to M_REMATERIALIZABLE
> in the comment for it should really be TID::Rematerializable? I also
> noticed that the documentation for
> TargetInstrDesc::isRematerializable() says "This flag is deprecated,
> please don't use it anymore" -- could you explain what replaces it?

I think the idea is to move towards having more complete instruction
descriptions so that the rematerializability of an instruction can be
inferred instead of being an explicit flag, but I don't know what
stage this transition is in.

Dan