[llvm-commits] [PATCH] Make dependency DAG construction to use AA

Tue May 8 13:16:04 PDT 2012

Sergei,

I've not yet tried the patch, but a few comments from looking though
the code:

> +/// This returns true if the two MIs may alias each other.
> +/// If these are not even memory operations, we still may need
> +/// chain deps between them. The question really is - could
> +/// these two MIs be reordered during scheduling from memory
> dependency +/// point of view.
> +static bool MIsNeedChainEdge(AliasAnalysis *AA, const
> MachineFrameInfo *MFI,
> +                             MachineInstr *MIa,
> +                             MachineInstr *MIb) {
> +
> +  if (!MIa || !MIb || (MIa == MIb))
> +    return false;

Based on the comment above, I'd think that MIa == MIb would return true
(they "alias" each other). I can obviously understand why you might want
to invert the degenerate case, but that should be specifically noted.

> +  // If we are dealing with two "normal" loads, we do not need an
> edge
> +  // between them - they could be reordered.
> +  if (MIa->mayLoad() && MIb->mayLoad())
> +    return false;

As far as I know, an instruction can both load and store (PPC actually
has several of these). Should this be?

  if (!MIa->mayStore() && !MIb->mayStore())
    return false;

> +  int64_t MinOffset = std::min(MMOa->getOffset(), MMOb->getOffset());
> +  int64_t Overlapa = MMOa->getSize() + MMOa->getOffset() - MinOffset;
> +  int64_t Overlapb = MMOb->getSize() + MMOb->getOffset() - MinOffset;
> +
> +  AliasAnalysis::AliasResult AAResult = AA->alias(
> +  AliasAnalysis::Location(MMOa->getValue(), Overlapa,
> +                          MMOa->getTBAAInfo()),
> +  AliasAnalysis::Location(MMOb->getValue(), Overlapb,
> +                          MMOb->getTBAAInfo()));
> +
> +  return (AAResult != AliasAnalysis::NoAlias);

Can you please comment on this. Does it work if one of the offsets is
negative? Are you assuming that the value pointers into any given
memory region will be unique?

FWIW, this is similar to a check done in the PPC 970 hazard recognizer
in lib/Target/PowerPC/PPCHazardRecognizers.cpp:107. There, however, a
false negative is not fatal ;)

> +/// This function assumes that "downward" from SUb there exist
> +/// tail/leaf of already constructed DAG. It iterates downward and
> +/// checks whether SUa can be aliasing any node dominated
> +/// by SUb.
> +static void adjustChainDeps(AliasAnalysis *AA, const
> MachineFrameInfo *MFI,
> +            SUnit *SUa, SUnit *ExitSU, std::set<SUnit *> &CheckList)
> {

This function does not take an SUb, please update the comment.

This looks promising. Thanks again for working on this!

 -Hal

On Mon, 7 May 2012 13:31:33 -0500
"Sergei Larin" <slarin at codeaurora.org> wrote:

> Traditional ping :) 
> Thanks.
> 
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
> 
> 
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu
> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Sergei Larin
> Sent: Monday, April 30, 2012 1:35 PM
> To: 'Andrew Trick'; 'Hal Finkel'; 'Evan Cheng'
> Cc: llvm-commits at cs.uiuc.edu
> Subject: [llvm-commits] [PATCH] Make dependency DAG construction to
> use AA
> 
> Hello everyone, 
> 
>    Presented is a non-heroic optimization intended to introduce alias
> analysis usage into dependency graph construction used for MI
> scheduling (pre- post- RA, bundling etc.).
> Currently any memory objects are conservatively considered dependent
> on each other (aliasing) and as such are serialized via chain
> dependencies. Attached patch is an attempt to introduce alias
> analysis into the process to eliminate chain dependencies known to be
> false. 
> 
>   Some design assumptions:
> -	AA is treated as a "black box" to provide clean and
> minimalistic interface.
> -	Only likely cases are checked. Certain objects like calls or
> volatiles are treated conservatively in the original algorithm, and
> could not be reasoned about easily. Proposed algorithm extension
> avoids checking against such objects. This saves compile time without
> any (detectable) loss of performance.
> -	Theoretically, the whole task of chain edge detection could
> become quadratic to the number of memory objects in a basic block
> with this patch, but in practice it never does. It is further kept in
> check by max depth search budget that prevents even theoretical
> possibility of any corner cases. During preliminary review it has
> been suggested that worklist could perform better, but I was trying
> to do all processing in place and with minimal reiteration, so I will
> leave this point up to collective reviewer opinion.
> 
>   The feature is currently under a general flag, off by default. It
> is up to individual back-end maintainers whether it is desirable
> feature for them. One side effect of this patch is a massive release
> of ILP in some cases, which on occasion could clogs current
> schedulers, and can increase reg pressure on some targets. My quick
> test of test-suite on x86 showed fractional _decrease_ of average
> compile time (0.5%). There is also a spectrum of performance
> differences. Different targets might see different results.
> 
>   Test cases for instruction scheduling are notoriously tricky, so I
> took a somewhat direct shortcut. I am not sure how it would be
> accepted by the community, but as I see it, this is the most reliable
> way to verify that check actually takes place. The test is written
> for the Hexagon back end, and since Hexagon is sort of in flux right
> now, I disable it for the time being. Obviously regular check-all
> passes (with and without optimization enabled) on X86 and Hexagon.
> 
>   As always, comments are very welcome, and thank you for your time
> and effort.
> 
> Sergei Larin
> 
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory