[llvm] r217371 - [AArch64] Improve AA to remove unneeded edges in the AA MI scheduling graph.

Thu Oct 9 20:52:44 PDT 2014

----- Original Message -----
> From: "Sanjin Sijaric" <ssijaric at codeaurora.org>
> To: "Andrew Trick" <atrick at apple.com>, "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvm commits" <llvm-commits at cs.uiuc.edu>, "Matt Arsenault" <arsenm2 at gmail.com>, mcrosier at codeaurora.org
> Sent: Wednesday, October 1, 2014 7:46:42 PM
> Subject: RE: [llvm] r217371 - [AArch64] Improve AA to remove unneeded edges in the	AA MI scheduling graph.
> 
> Hi Andy,
> 
> 
> 
> Thanks for clarifying. I’ll add a check to do this only when dealing
> with virtual registers. The reason is that some chain edges are not
> being discarded by the BasicAA query inside MIsNeedChainEdge is
> because the check for isIdentifiedObject (called by
> isUnsafeMemoryObject) is done before the AA part has a chance to
> determine that there is no aliasing.

Interesting. Can BasicAA resolve the aliasing if you remove that check? It is actually not clear to me that it is necessary.

So if I run opt -basicaa -aa-eval -print-all-alias-modref-info -disable-output on test/CodeGen/AArch64/arm64-triv-disjoint-mem-access.ll, I get:
Function: func: 5 pointers, 0 call sites
  MayAlias:	i32* %0, i32** @a
  MayAlias:	i32* %0, i32* %arrayidx
  MayAlias:	i32* %arrayidx, i32** @a
  MayAlias:	i32* %0, i32* %arrayidx1
  MayAlias:	i32* %arrayidx1, i32** @a
  MayAlias:	i32* %arrayidx, i32* %arrayidx1
  MayAlias:	i32* %0, i32* @m
  NoAlias:	i32* @m, i32** @a
  MayAlias:	i32* %arrayidx, i32* @m
  MayAlias:	i32* %arrayidx1, i32* @m

I'm not sure which edge is being broken here by the new code (it would be great if you could note that in the test case), but I just don't think BasicAA will really help you here at all (unless it is %0 = load i32** @a, align 8, !tbaa !1 vs. store i32 %add, i32* @m, align 4, !tbaa !5, but that I'd find confusing).

Andy, I was wondering if there might be a way to provide the callback with enough information regarding the base-register dependencies when we have physical registers so that this could still truly be useful post-RA as well (and while still providing independently-correct answers)?

Thanks again,
Hal

> 
> 
> 
> Thanks,
> 
> Sanjin
> 
> 
> 
> 
> 
> From: Andrew Trick [mailto:atrick at apple.com]
> Sent: Tuesday, September 30, 2014 7:44 PM
> To: Hal Finkel
> Cc: llvm commits; Matt Arsenault; mcrosier at codeaurora.org; Sanjin
> Sijaric
> Subject: Re: [llvm] r217371 - [AArch64] Improve AA to remove unneeded
> edges in the AA MI scheduling graph.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Sep 30, 2014, at 3:36 PM, Hal Finkel < hfinkel at anl.gov > wrote:
> 
> 
> 
> 
> Andy,
> 
> Can you take a quick look at this? I think that what Sanjin says
> makes sense; do you agree? Is there a way to add an assert in case
> something goes wrong?
> 
> Sanjin, if nothing else, please add a comment in the code to this
> effect.
> 
> 
> 
> 
> 
> Thanks Hal, sometimes I need to be copied on commits to notice them.
> 
> 
> 
> 
> 
> Is BasicAA unable to pick up on this because it can't answer queries
> about aliasing within a single loop iteration? It seems like long
> term that may be something to revisit. There's no fundamental reason
> we need to decode machine instructions to figure this case out.
> 
> 
> 
> 
> 
> But given that's a pretty major design issue, I'm OK with checking
> disjoint access at the MI level.
> 
> 
> 
> 
> 
> As you said, areMemAccessesTriviallyDisjoint is lying if it reports
> true for physical registers. I don't see the harm in explicitly
> checking the register type. Otherwise, there does need to be a very
> bold comment in the function header.
> 
> 
> 
> 
> 
> -Andy
> 
> 
> 
> 
> 
> 
> 
> 
> Thanks again,
> Hal
> 
> ----- Original Message -----
> 
> 
> 
> From: "Sanjin Sijaric" < ssijaric at codeaurora.org >
> To: mcrosier at codeaurora.org
> Cc: "Hal Finkel" < hfinkel at anl.gov >, llvm-commits at cs.uiuc.edu ,
> "Matt Arsenault" < arsenm2 at gmail.com >
> Sent: Monday, September 29, 2014 5:22:47 PM
> Subject: RE: [llvm] r217371 - [AArch64] Improve AA to remove unneeded
> edges in the AA MI scheduling graph.
> 
> Hi Hal,
> 
> The original purpose of the function was for discarding trivial chain
> edges
> when constructing the dependence graph for scheduling. If BaseReg is
> a
> physical register that gets modified, then we may have something
> like:
> 
> LD R1 = A(BaseReg, Offset1) (1)
> 
> BaseReg = .... (2)
> 
> ST R2, B(BaseReg, Offset2) (3)
> 
> Given the inputs (1) and (3) to areMemAccessesTriviallyDisjoint, it
> may be
> determined that there is no need for an edge between them. There
> will be no
> dependence between (1) and (3), but there will be dependences between
> "(1)
> and (2)" and "(2) and (3)". I think this should be enough to prevent
> any
> bad scheduling from happening.
> 
> Thanks,
> Sanjin
> 
> -----Original Message-----
> From: Chad Rosier [ mailto:mcrosier at codeaurora.org ]
> Sent: Monday, September 29, 2014 6:47 AM
> To: ssijaric at codeaurora.org
> Cc: Hal Finkel; Chad Rosier; llvm-commits at cs.uiuc.edu ; Matt
> Arsenault
> Subject: Re: [llvm] r217371 - [AArch64] Improve AA to remove unneeded
> edges
> in the AA MI scheduling graph.
> 
> +Sanjin
> 
> Adding Sanjin since this is his work.
> 
> Chad
> 
> 
> 
> 
> ----- Original Message -----
> 
> 
> 
> From: "Chad Rosier" < mcrosier at codeaurora.org >
> To: llvm-commits at cs.uiuc.edu
> Sent: Monday, September 8, 2014 9:43:48 AM
> Subject: [llvm] r217371 - [AArch64] Improve AA to remove unneeded
> edges
> in the AA MI scheduling graph.
> 
> Author: mcrosier
> Date: Mon Sep 8 09:43:48 2014
> New Revision: 217371
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=217371&view=rev
> Log:
> [AArch64] Improve AA to remove unneeded edges in the AA MI
> scheduling
> graph.
> 
> Patch by Sanjin Sijaric < ssijaric at codeaurora.org >!
> Phabricator Review: http://reviews.llvm.org/D5103
> 
> Added:
> llvm/trunk/test/CodeGen/AArch64/arm64-triv-disjoint-mem-access.ll
> Modified:
> llvm/trunk/include/llvm/Target/TargetInstrInfo.h
> llvm/trunk/lib/CodeGen/ScheduleDAGInstrs.cpp
> llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp
> llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h
> 
> Modified: llvm/trunk/include/llvm/Target/TargetInstrInfo.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/Ta
> rgetInstrInfo.h?rev=217371&r1=217370&r2=217371&view=diff
> =====================================================================
> =========
> --- llvm/trunk/include/llvm/Target/TargetInstrInfo.h (original)
> +++ llvm/trunk/include/llvm/Target/TargetInstrInfo.h Mon Sep 8
> 09:43:48 2014
> @@ -1192,6 +1192,20 @@ public:
> return nullptr;
> }
> 
> + // areMemAccessesTriviallyDisjoint - Sometimes, it is possible
> for
> the target
> + // to tell, even without aliasing information, that two MIs
> access
> different
> + // memory addresses. This function returns true if two MIs
> access
> different
> + // memory addresses, and false otherwise.
> + virtual bool
> + areMemAccessesTriviallyDisjoint(MachineInstr *MIa, MachineInstr
> *MIb,
> + AliasAnalysis *AA = nullptr)
> const
> {
> + assert(MIa && (MIa->mayLoad() || MIa->mayStore()) &&
> + "MIa must load from or modify a memory location");
> + assert(MIb && (MIb->mayLoad() || MIb->mayStore()) &&
> + "MIb must load from or modify a memory location");
> + return false;
> + }
> +
> private:
> int CallFrameSetupOpcode, CallFrameDestroyOpcode; };
> 
> Modified: llvm/trunk/lib/CodeGen/ScheduleDAGInstrs.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/ScheduleDA
> GInstrs.cpp?rev=217371&r1=217370&r2=217371&view=diff
> =====================================================================
> =========
> --- llvm/trunk/lib/CodeGen/ScheduleDAGInstrs.cpp (original)
> +++ llvm/trunk/lib/CodeGen/ScheduleDAGInstrs.cpp Mon Sep 8
> 09:43:48
> 2014
> @@ -511,9 +511,18 @@ static inline bool isUnsafeMemoryObject(
> static
> bool MIsNeedChainEdge(AliasAnalysis *AA, const MachineFrameInfo
> *MFI,
> MachineInstr *MIa,
> MachineInstr *MIb) {
> + const MachineFunction *MF = MIa->getParent()->getParent();
> const
> + TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();
> +
> // Cover a trivial case - no edge is need to itself.
> if (MIa == MIb)
> return false;
> +
> + // Let the target decide if memory accesses cannot possibly
> overlap.
> + if ((MIa->mayLoad() || MIa->mayStore()) &&
> + (MIb->mayLoad() || MIb->mayStore()))
> + if (TII->areMemAccessesTriviallyDisjoint(MIa, MIb, AA))
> + return false;
> 
> // FIXME: Need to handle multiple memory operands to support
> all
> targets.
> if (!MIa->hasOneMemOperand() || !MIb->hasOneMemOperand())
> 
> Modified: llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AAr
> ch64InstrInfo.cpp?rev=217371&r1=217370&r2=217371&view=diff
> =====================================================================
> =========
> --- llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp (original)
> +++ llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp Mon Sep 8
> 09:43:48 2014
> @@ -607,6 +607,42 @@ bool AArch64InstrInfo::isCoalescableExtI
> }
> }
> 
> +bool
> +AArch64InstrInfo::areMemAccessesTriviallyDisjoint(MachineInstr
> *MIa,
> + MachineInstr
> *MIb,
> + AliasAnalysis
> *AA)
> const {
> + const TargetRegisterInfo *TRI = &getRegisterInfo(); unsigned
> + BaseRegA = 0, BaseRegB = 0; int OffsetA = 0, OffsetB = 0; int
> + WidthA = 0, WidthB = 0;
> +
> + assert(MIa && (MIa->mayLoad() || MIa->mayStore()) &&
> + "MIa must be a store or a load"); assert(MIb &&
> + (MIb->mayLoad() || MIb->mayStore()) &&
> + "MIb must be a store or a load");
> +
> + if (MIa->hasUnmodeledSideEffects() ||
> MIb->hasUnmodeledSideEffects() ||
> + MIa->hasOrderedMemoryRef() || MIb->hasOrderedMemoryRef())
> + return false;
> +
> + // Retrieve the base register, offset from the base register
> and
> width. Width
> + // is the size of memory that is being loaded/stored (e.g. 1,
> 2,
> 4, 8). If
> + // base registers are identical, and the offset of a lower
> memory
> access +
> + // the width doesn't overlap the offset of a higher memory
> access,
> + // then the memory accesses are different.
> + if (getLdStBaseRegImmOfsWidth(MIa, BaseRegA, OffsetA, WidthA,
> TRI)
> &&
> + getLdStBaseRegImmOfsWidth(MIb, BaseRegB, OffsetB, WidthB,
> TRI)) {
> + if (BaseRegA == BaseRegB) {
> 
> 
> I think that this makes sense only for SSA virtual registers, but
> not
> for physical ones (because the value in a physical register might
> have
> changed in between the two instructions). If I'm right, then you'll
> want to exclude physical registers here (at least any that might be
> modified during the function's execution).
> 
> -Hal
> 
> 
> 
> 
> + int LowOffset = OffsetA < OffsetB ? OffsetA : OffsetB;
> + int HighOffset = OffsetA < OffsetB ? OffsetB : OffsetA;
> + int LowWidth = (LowOffset == OffsetA) ? WidthA : WidthB;
> + if (LowOffset + LowWidth <= HighOffset)
> + return true;
> + }
> + }
> + return false;
> +}
> +
> /// analyzeCompare - For a comparison instruction, return the
> source
> registers /// in SrcReg and SrcReg2, and the value it compares
> against in CmpValue.
> /// Return true if the comparison instruction can be analyzed.
> @@ -1270,6 +1306,102 @@ AArch64InstrInfo::getLdStBaseRegImmOfs(M
> };
> }
> 
> +bool AArch64InstrInfo::getLdStBaseRegImmOfsWidth(
> + MachineInstr *LdSt, unsigned &BaseReg, int &Offset, int
> &Width,
> + const TargetRegisterInfo *TRI) const {
> + // Handle only loads/stores with base register followed by
> immediate offset.
> + if (LdSt->getNumOperands() != 3)
> + return false;
> + if (!LdSt->getOperand(1).isReg() ||
> !LdSt->getOperand(2).isImm())
> + return false;
> +
> + // Offset is calculated as the immediate operand multiplied by
> the
> scaling factor.
> + // Unscaled instructions have scaling factor set to 1.
> + int Scale = 0;
> + switch (LdSt->getOpcode()) {
> + default:
> + return false;
> + case AArch64::LDURQi:
> + case AArch64::STURQi:
> + Width = 16;
> + Scale = 1;
> + break;
> + case AArch64::LDURXi:
> + case AArch64::LDURDi:
> + case AArch64::STURXi:
> + case AArch64::STURDi:
> + Width = 8;
> + Scale = 1;
> + break;
> + case AArch64::LDURWi:
> + case AArch64::LDURSi:
> + case AArch64::LDURSWi:
> + case AArch64::STURWi:
> + case AArch64::STURSi:
> + Width = 4;
> + Scale = 1;
> + break;
> + case AArch64::LDURHi:
> + case AArch64::LDURHHi:
> + case AArch64::LDURSHXi:
> + case AArch64::LDURSHWi:
> + case AArch64::STURHi:
> + case AArch64::STURHHi:
> + Width = 2;
> + Scale = 1;
> + break;
> + case AArch64::LDURBi:
> + case AArch64::LDURBBi:
> + case AArch64::LDURSBXi:
> + case AArch64::LDURSBWi:
> + case AArch64::STURBi:
> + case AArch64::STURBBi:
> + Width = 1;
> + Scale = 1;
> + break;
> + case AArch64::LDRXui:
> + case AArch64::STRXui:
> + Scale = Width = 8;
> + break;
> + case AArch64::LDRWui:
> + case AArch64::STRWui:
> + Scale = Width = 4;
> + break;
> + case AArch64::LDRBui:
> + case AArch64::STRBui:
> + Scale = Width = 1;
> + break;
> + case AArch64::LDRHui:
> + case AArch64::STRHui:
> + Scale = Width = 2;
> + break;
> + case AArch64::LDRSui:
> + case AArch64::STRSui:
> + Scale = Width = 4;
> + break;
> + case AArch64::LDRDui:
> + case AArch64::STRDui:
> + Scale = Width = 8;
> + break;
> + case AArch64::LDRQui:
> + case AArch64::STRQui:
> + Scale = Width = 16;
> + break;
> + case AArch64::LDRBBui:
> + case AArch64::STRBBui:
> + Scale = Width = 1;
> + break;
> + case AArch64::LDRHHui:
> + case AArch64::STRHHui:
> + Scale = Width = 2;
> + break;
> + };
> +
> + BaseReg = LdSt->getOperand(1).getReg();
> + Offset = LdSt->getOperand(2).getImm() * Scale;
> + return true;
> +}
> +
> /// Detect opportunities for ldp/stp formation.
> ///
> /// Only called for LdSt for which getLdStBaseRegImmOfs returns
> true.
> 
> Modified: llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AAr
> ch64InstrInfo.h?rev=217371&r1=217370&r2=217371&view=diff
> =====================================================================
> =========
> --- llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h (original)
> +++ llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h Mon Sep 8
> 09:43:48 2014
> @@ -52,6 +52,10 @@ public:
> bool isCoalescableExtInstr(const MachineInstr &MI, unsigned
> &SrcReg,
> unsigned &DstReg, unsigned &SubIdx)
> const override;
> 
> + bool
> + areMemAccessesTriviallyDisjoint(MachineInstr *MIa, MachineInstr
> *MIb,
> + AliasAnalysis *AA = nullptr)
> const
> override;
> +
> unsigned isLoadFromStackSlot(const MachineInstr *MI,
> int &FrameIndex) const override;
> unsigned isStoreToStackSlot(const MachineInstr *MI, @@ -90,6
> +94,10 @@ public:
> unsigned &Offset,
> const TargetRegisterInfo *TRI) const
> override;
> 
> + bool getLdStBaseRegImmOfsWidth(MachineInstr *LdSt, unsigned
> &BaseReg,
> + int &Offset, int &Width,
> + const TargetRegisterInfo *TRI)
> const;
> +
> bool enableClusterLoads() const override { return true; }
> 
> bool shouldClusterLoads(MachineInstr *FirstLdSt, MachineInstr
> *SecondLdSt,
> 
> Added:
> llvm/trunk/test/CodeGen/AArch64/arm64-triv-disjoint-mem-access.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/a
> rm64-triv-disjoint-mem-access.ll?rev=217371&view=auto
> =====================================================================
> =========
> ---
> llvm/trunk/test/CodeGen/AArch64/arm64-triv-disjoint-mem-access.ll
> (added)
> +++
> llvm/trunk/test/CodeGen/AArch64/arm64-triv-disjoint-mem-access.ll
> Mon Sep 8 09:43:48 2014
> @@ -0,0 +1,31 @@
> +; RUN: llc < %s -mtriple=arm64-linux-gnu -mcpu=cortex-a53
> -enable-aa-sched-mi | FileCheck %s
> +; Check that the scheduler moves the load from a[1] past the
> store
> into a[2].
> + at a = common global i32* null, align 8 @m = common global i32 0,
> +align 4
> +
> +; Function Attrs: nounwind
> +define i32 @func(i32 %i, i32 %j, i32 %k) #0 {
> +entry:
> +; CHECK: ldr {{w[0-9]+}}, [x[[REG:[0-9]+]], #4] ; CHECK: str
> +{{w[0-9]+}}, [x[[REG]], #8]
> + %0 = load i32** @a, align 8, !tbaa !1
> + %arrayidx = getelementptr inbounds i32* %0, i64 2
> + store i32 %i, i32* %arrayidx, align 4, !tbaa !5
> + %arrayidx1 = getelementptr inbounds i32* %0, i64 1
> + %1 = load i32* %arrayidx1, align 4, !tbaa !5
> + %add = add nsw i32 %k, %i
> + store i32 %add, i32* @m, align 4, !tbaa !5
> + ret i32 %1
> +}
> +
> +attributes #0 = { nounwind "less-precise-fpmad"="false"
> "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"
> "no-infs-fp-math"="true" "no-nans-fp-math"="true"
> "stack-protector-buffer-size"="8" "unsafe-fp-math"="true"
> "use-soft-float"="false" }
> +
> +!llvm.ident = !{!0}
> +
> +!0 = metadata !{metadata !"clang version 3.6.0 "}
> +!1 = metadata !{metadata !2, metadata !2, i64 0}
> +!2 = metadata !{metadata !"any pointer", metadata !3, i64 0}
> +!3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0}
> +!4 = metadata !{metadata !"Simple C/C++ TBAA"}
> +!5 = metadata !{metadata !6, metadata !6, i64 0}
> +!6 = metadata !{metadata !"int", metadata !3, i64 0}
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> 
> 
> 
> 
> 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory