[llvm] r217371 - [AArch64] Improve AA to remove unneeded edges in the AA MI scheduling graph.

Tue Sep 30 19:44:20 PDT 2014

> On Sep 30, 2014, at 3:36 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> Andy,
> 
> Can you take a quick look at this? I think that what Sanjin says makes sense; do you agree? Is there a way to add an assert in case something goes wrong?
> 
> Sanjin, if nothing else, please add a comment in the code to this effect.

Thanks Hal, sometimes I need to be copied on commits to notice them.

Is BasicAA unable to pick up on this because it can't answer queries about aliasing within a single loop iteration? It seems like long term that may be something to revisit. There's no fundamental reason we need to decode machine instructions to figure this case out.

But given that's a pretty major design issue, I'm OK with checking disjoint access at the MI level.

As you said, areMemAccessesTriviallyDisjoint is lying if it reports true for physical registers. I don't see the harm in explicitly checking the register type. Otherwise, there does need to be a very bold comment in the function header.

-Andy

> 
> Thanks again,
> Hal
> 
> ----- Original Message -----
>> From: "Sanjin Sijaric" <ssijaric at codeaurora.org>
>> To: mcrosier at codeaurora.org
>> Cc: "Hal Finkel" <hfinkel at anl.gov>, llvm-commits at cs.uiuc.edu, "Matt Arsenault" <arsenm2 at gmail.com>
>> Sent: Monday, September 29, 2014 5:22:47 PM
>> Subject: RE: [llvm] r217371 - [AArch64] Improve AA to remove unneeded edges in the	AA MI scheduling graph.
>> 
>> Hi Hal,
>> 
>> The original purpose of the function was for discarding trivial chain
>> edges
>> when constructing the dependence graph for scheduling.  If BaseReg is
>> a
>> physical register that gets modified, then we may have something
>> like:
>> 
>> LD R1 = A(BaseReg, Offset1)               (1)
>> 
>> BaseReg = ....                                          (2)
>> 
>> ST R2, B(BaseReg, Offset2)                  (3)
>> 
>> Given the inputs (1) and (3) to areMemAccessesTriviallyDisjoint, it
>> may be
>> determined that there is no need for an edge between them.  There
>> will be no
>> dependence between (1) and (3), but there will be dependences between
>> "(1)
>> and (2)" and "(2) and (3)".  I think this should be enough to prevent
>> any
>> bad scheduling from happening.
>> 
>> Thanks,
>> Sanjin
>> 
>> -----Original Message-----
>> From: Chad Rosier [mailto:mcrosier at codeaurora.org]
>> Sent: Monday, September 29, 2014 6:47 AM
>> To: ssijaric at codeaurora.org
>> Cc: Hal Finkel; Chad Rosier; llvm-commits at cs.uiuc.edu; Matt Arsenault
>> Subject: Re: [llvm] r217371 - [AArch64] Improve AA to remove unneeded
>> edges
>> in the AA MI scheduling graph.
>> 
>> +Sanjin
>> 
>> Adding Sanjin since this is his work.
>> 
>> Chad
>> 
>>> ----- Original Message -----
>>>> From: "Chad Rosier" <mcrosier at codeaurora.org>
>>>> To: llvm-commits at cs.uiuc.edu
>>>> Sent: Monday, September 8, 2014 9:43:48 AM
>>>> Subject: [llvm] r217371 - [AArch64] Improve AA to remove unneeded
>>>> edges
>>>> in the	AA MI scheduling graph.
>>>> 
>>>> Author: mcrosier
>>>> Date: Mon Sep  8 09:43:48 2014
>>>> New Revision: 217371
>>>> 
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=217371&view=rev
>>>> Log:
>>>> [AArch64] Improve AA to remove unneeded edges in the AA MI
>>>> scheduling
>>>> graph.
>>>> 
>>>> Patch by Sanjin Sijaric <ssijaric at codeaurora.org>!
>>>> Phabricator Review: http://reviews.llvm.org/D5103
>>>> 
>>>> Added:
>>>>    llvm/trunk/test/CodeGen/AArch64/arm64-triv-disjoint-mem-access.ll
>>>> Modified:
>>>>    llvm/trunk/include/llvm/Target/TargetInstrInfo.h
>>>>    llvm/trunk/lib/CodeGen/ScheduleDAGInstrs.cpp
>>>>    llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp
>>>>    llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h
>>>> 
>>>> Modified: llvm/trunk/include/llvm/Target/TargetInstrInfo.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/Ta
>>>> rgetInstrInfo.h?rev=217371&r1=217370&r2=217371&view=diff
>>>> =====================================================================
>>>> =========
>>>> --- llvm/trunk/include/llvm/Target/TargetInstrInfo.h (original)
>>>> +++ llvm/trunk/include/llvm/Target/TargetInstrInfo.h Mon Sep  8
>>>> 09:43:48 2014
>>>> @@ -1192,6 +1192,20 @@ public:
>>>>     return nullptr;
>>>>   }
>>>> 
>>>> +  // areMemAccessesTriviallyDisjoint - Sometimes, it is possible
>>>> for
>>>> the target
>>>> +  // to tell, even without aliasing information, that two MIs
>>>> access
>>>> different
>>>> +  // memory addresses. This function returns true if two MIs
>>>> access
>>>> different
>>>> +  // memory addresses, and false otherwise.
>>>> +  virtual bool
>>>> +  areMemAccessesTriviallyDisjoint(MachineInstr *MIa, MachineInstr
>>>> *MIb,
>>>> +                                  AliasAnalysis *AA = nullptr)
>>>> const
>>>> {
>>>> +    assert(MIa && (MIa->mayLoad() || MIa->mayStore()) &&
>>>> +           "MIa must load from or modify a memory location");
>>>> +    assert(MIb && (MIb->mayLoad() || MIb->mayStore()) &&
>>>> +           "MIb must load from or modify a memory location");
>>>> +    return false;
>>>> +  }
>>>> +
>>>> private:
>>>>   int CallFrameSetupOpcode, CallFrameDestroyOpcode;  };
>>>> 
>>>> Modified: llvm/trunk/lib/CodeGen/ScheduleDAGInstrs.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/ScheduleDA
>>>> GInstrs.cpp?rev=217371&r1=217370&r2=217371&view=diff
>>>> =====================================================================
>>>> =========
>>>> --- llvm/trunk/lib/CodeGen/ScheduleDAGInstrs.cpp (original)
>>>> +++ llvm/trunk/lib/CodeGen/ScheduleDAGInstrs.cpp Mon Sep  8
>>>> 09:43:48
>>>> 2014
>>>> @@ -511,9 +511,18 @@ static inline bool isUnsafeMemoryObject(
>>>> static
>>>> bool MIsNeedChainEdge(AliasAnalysis *AA, const  MachineFrameInfo
>>>> *MFI,
>>>>                              MachineInstr *MIa,
>>>>                              MachineInstr *MIb) {
>>>> +  const MachineFunction *MF = MIa->getParent()->getParent();
>>>> const
>>>> + TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();
>>>> +
>>>>   // Cover a trivial case - no edge is need to itself.
>>>>   if (MIa == MIb)
>>>>     return false;
>>>> +
>>>> +  // Let the target decide if memory accesses cannot possibly
>>>> overlap.
>>>> +  if ((MIa->mayLoad() || MIa->mayStore()) &&
>>>> +      (MIb->mayLoad() || MIb->mayStore()))
>>>> +    if (TII->areMemAccessesTriviallyDisjoint(MIa, MIb, AA))
>>>> +      return false;
>>>> 
>>>>   // FIXME: Need to handle multiple memory operands to support
>>>>   all
>>>>   targets.
>>>>   if (!MIa->hasOneMemOperand() || !MIb->hasOneMemOperand())
>>>> 
>>>> Modified: llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AAr
>>>> ch64InstrInfo.cpp?rev=217371&r1=217370&r2=217371&view=diff
>>>> =====================================================================
>>>> =========
>>>> --- llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp (original)
>>>> +++ llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp Mon Sep  8
>>>> 09:43:48 2014
>>>> @@ -607,6 +607,42 @@ bool AArch64InstrInfo::isCoalescableExtI
>>>>   }
>>>> }
>>>> 
>>>> +bool
>>>> +AArch64InstrInfo::areMemAccessesTriviallyDisjoint(MachineInstr
>>>> *MIa,
>>>> +                                                  MachineInstr
>>>> *MIb,
>>>> +                                                  AliasAnalysis
>>>> *AA)
>>>> const {
>>>> +  const TargetRegisterInfo *TRI = &getRegisterInfo();  unsigned
>>>> + BaseRegA = 0, BaseRegB = 0;  int OffsetA = 0, OffsetB = 0;  int
>>>> + WidthA = 0, WidthB = 0;
>>>> +
>>>> +  assert(MIa && (MIa->mayLoad() || MIa->mayStore()) &&
>>>> +         "MIa must be a store or a load");  assert(MIb &&
>>>> + (MIb->mayLoad() || MIb->mayStore()) &&
>>>> +         "MIb must be a store or a load");
>>>> +
>>>> +  if (MIa->hasUnmodeledSideEffects() ||
>>>> MIb->hasUnmodeledSideEffects() ||
>>>> +      MIa->hasOrderedMemoryRef() || MIb->hasOrderedMemoryRef())
>>>> +    return false;
>>>> +
>>>> +  // Retrieve the base register, offset from the base register
>>>> and
>>>> width. Width
>>>> +  // is the size of memory that is being loaded/stored (e.g. 1,
>>>> 2,
>>>> 4, 8).  If
>>>> +  // base registers are identical, and the offset of a lower
>>>> memory
>>>> access +
>>>> +  // the width doesn't overlap the offset of a higher memory
>>>> access,
>>>> + // then the memory accesses are different.
>>>> +  if (getLdStBaseRegImmOfsWidth(MIa, BaseRegA, OffsetA, WidthA,
>>>> TRI)
>>>> &&
>>>> +      getLdStBaseRegImmOfsWidth(MIb, BaseRegB, OffsetB, WidthB,
>>>> TRI)) {
>>>> +    if (BaseRegA == BaseRegB) {
>>> 
>>> I think that this makes sense only for SSA virtual registers, but
>>> not
>>> for physical ones (because the value in a physical register might
>>> have
>>> changed in between the two instructions). If I'm right, then you'll
>>> want to exclude physical registers here (at least any that might be
>>> modified during the function's execution).
>>> 
>>> -Hal
>>> 
>>>> +      int LowOffset = OffsetA < OffsetB ? OffsetA : OffsetB;
>>>> +      int HighOffset = OffsetA < OffsetB ? OffsetB : OffsetA;
>>>> +      int LowWidth = (LowOffset == OffsetA) ? WidthA : WidthB;
>>>> +      if (LowOffset + LowWidth <= HighOffset)
>>>> +        return true;
>>>> +    }
>>>> +  }
>>>> +  return false;
>>>> +}
>>>> +
>>>> /// analyzeCompare - For a comparison instruction, return the
>>>> source
>>>> registers  /// in SrcReg and SrcReg2, and the value it compares
>>>> against in  CmpValue.
>>>> /// Return true if the comparison instruction can be analyzed.
>>>> @@ -1270,6 +1306,102 @@ AArch64InstrInfo::getLdStBaseRegImmOfs(M
>>>>   };
>>>> }
>>>> 
>>>> +bool AArch64InstrInfo::getLdStBaseRegImmOfsWidth(
>>>> +    MachineInstr *LdSt, unsigned &BaseReg, int &Offset, int
>>>> &Width,
>>>> +    const TargetRegisterInfo *TRI) const {
>>>> +  // Handle only loads/stores with base register followed by
>>>> immediate offset.
>>>> +  if (LdSt->getNumOperands() != 3)
>>>> +    return false;
>>>> +  if (!LdSt->getOperand(1).isReg() ||
>>>> !LdSt->getOperand(2).isImm())
>>>> +    return false;
>>>> +
>>>> +  // Offset is calculated as the immediate operand multiplied by
>>>> the
>>>> scaling factor.
>>>> +  // Unscaled instructions have scaling factor set to 1.
>>>> +  int Scale = 0;
>>>> +  switch (LdSt->getOpcode()) {
>>>> +  default:
>>>> +    return false;
>>>> +  case AArch64::LDURQi:
>>>> +  case AArch64::STURQi:
>>>> +    Width = 16;
>>>> +    Scale = 1;
>>>> +    break;
>>>> +  case AArch64::LDURXi:
>>>> +  case AArch64::LDURDi:
>>>> +  case AArch64::STURXi:
>>>> +  case AArch64::STURDi:
>>>> +    Width = 8;
>>>> +    Scale = 1;
>>>> +    break;
>>>> +  case AArch64::LDURWi:
>>>> +  case AArch64::LDURSi:
>>>> +  case AArch64::LDURSWi:
>>>> +  case AArch64::STURWi:
>>>> +  case AArch64::STURSi:
>>>> +    Width = 4;
>>>> +    Scale = 1;
>>>> +    break;
>>>> +  case AArch64::LDURHi:
>>>> +  case AArch64::LDURHHi:
>>>> +  case AArch64::LDURSHXi:
>>>> +  case AArch64::LDURSHWi:
>>>> +  case AArch64::STURHi:
>>>> +  case AArch64::STURHHi:
>>>> +    Width = 2;
>>>> +    Scale = 1;
>>>> +    break;
>>>> +  case AArch64::LDURBi:
>>>> +  case AArch64::LDURBBi:
>>>> +  case AArch64::LDURSBXi:
>>>> +  case AArch64::LDURSBWi:
>>>> +  case AArch64::STURBi:
>>>> +  case AArch64::STURBBi:
>>>> +    Width = 1;
>>>> +    Scale = 1;
>>>> +    break;
>>>> +  case AArch64::LDRXui:
>>>> +  case AArch64::STRXui:
>>>> +    Scale = Width = 8;
>>>> +    break;
>>>> +  case AArch64::LDRWui:
>>>> +  case AArch64::STRWui:
>>>> +    Scale = Width = 4;
>>>> +    break;
>>>> +  case AArch64::LDRBui:
>>>> +  case AArch64::STRBui:
>>>> +    Scale = Width = 1;
>>>> +    break;
>>>> +  case AArch64::LDRHui:
>>>> +  case AArch64::STRHui:
>>>> +    Scale = Width = 2;
>>>> +    break;
>>>> +  case AArch64::LDRSui:
>>>> +  case AArch64::STRSui:
>>>> +    Scale = Width = 4;
>>>> +    break;
>>>> +  case AArch64::LDRDui:
>>>> +  case AArch64::STRDui:
>>>> +    Scale = Width = 8;
>>>> +    break;
>>>> +  case AArch64::LDRQui:
>>>> +  case AArch64::STRQui:
>>>> +    Scale = Width = 16;
>>>> +    break;
>>>> +  case AArch64::LDRBBui:
>>>> +  case AArch64::STRBBui:
>>>> +    Scale = Width = 1;
>>>> +    break;
>>>> +  case AArch64::LDRHHui:
>>>> +  case AArch64::STRHHui:
>>>> +    Scale = Width = 2;
>>>> +    break;
>>>> +  };
>>>> +
>>>> +  BaseReg = LdSt->getOperand(1).getReg();
>>>> +  Offset = LdSt->getOperand(2).getImm() * Scale;
>>>> +  return true;
>>>> +}
>>>> +
>>>> /// Detect opportunities for ldp/stp formation.
>>>> ///
>>>> /// Only called for LdSt for which getLdStBaseRegImmOfs returns
>>>> true.
>>>> 
>>>> Modified: llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AAr
>>>> ch64InstrInfo.h?rev=217371&r1=217370&r2=217371&view=diff
>>>> =====================================================================
>>>> =========
>>>> --- llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h (original)
>>>> +++ llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h Mon Sep  8
>>>> 09:43:48 2014
>>>> @@ -52,6 +52,10 @@ public:
>>>>   bool isCoalescableExtInstr(const MachineInstr &MI, unsigned
>>>>   &SrcReg,
>>>>                              unsigned &DstReg, unsigned &SubIdx)
>>>>                              const override;
>>>> 
>>>> +  bool
>>>> +  areMemAccessesTriviallyDisjoint(MachineInstr *MIa, MachineInstr
>>>> *MIb,
>>>> +                                  AliasAnalysis *AA = nullptr)
>>>> const
>>>> override;
>>>> +
>>>>   unsigned isLoadFromStackSlot(const MachineInstr *MI,
>>>>                                int &FrameIndex) const override;
>>>>   unsigned isStoreToStackSlot(const MachineInstr *MI, @@ -90,6
>>>> +94,10 @@ public:
>>>>                             unsigned &Offset,
>>>>                             const TargetRegisterInfo *TRI) const
>>>>                             override;
>>>> 
>>>> +  bool getLdStBaseRegImmOfsWidth(MachineInstr *LdSt, unsigned
>>>> &BaseReg,
>>>> +                                 int &Offset, int &Width,
>>>> +                                 const TargetRegisterInfo *TRI)
>>>> const;
>>>> +
>>>>   bool enableClusterLoads() const override { return true; }
>>>> 
>>>>   bool shouldClusterLoads(MachineInstr *FirstLdSt, MachineInstr
>>>>   *SecondLdSt,
>>>> 
>>>> Added:
>>>> llvm/trunk/test/CodeGen/AArch64/arm64-triv-disjoint-mem-access.ll
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/a
>>>> rm64-triv-disjoint-mem-access.ll?rev=217371&view=auto
>>>> =====================================================================
>>>> =========
>>>> ---
>>>> llvm/trunk/test/CodeGen/AArch64/arm64-triv-disjoint-mem-access.ll
>>>> (added)
>>>> +++
>>>> llvm/trunk/test/CodeGen/AArch64/arm64-triv-disjoint-mem-access.ll
>>>> Mon Sep  8 09:43:48 2014
>>>> @@ -0,0 +1,31 @@
>>>> +; RUN: llc < %s -mtriple=arm64-linux-gnu -mcpu=cortex-a53
>>>> -enable-aa-sched-mi | FileCheck %s
>>>> +; Check that the scheduler moves the load from a[1] past the
>>>> store
>>>> into a[2].
>>>> + at a = common global i32* null, align 8 @m = common global i32 0,
>>>> +align 4
>>>> +
>>>> +; Function Attrs: nounwind
>>>> +define i32 @func(i32 %i, i32 %j, i32 %k) #0 {
>>>> +entry:
>>>> +; CHECK: ldr {{w[0-9]+}}, [x[[REG:[0-9]+]], #4] ; CHECK: str
>>>> +{{w[0-9]+}}, [x[[REG]], #8]
>>>> +  %0 = load i32** @a, align 8, !tbaa !1
>>>> +  %arrayidx = getelementptr inbounds i32* %0, i64 2
>>>> +  store i32 %i, i32* %arrayidx, align 4, !tbaa !5
>>>> +  %arrayidx1 = getelementptr inbounds i32* %0, i64 1
>>>> +  %1 = load i32* %arrayidx1, align 4, !tbaa !5
>>>> +  %add = add nsw i32 %k, %i
>>>> +  store i32 %add, i32* @m, align 4, !tbaa !5
>>>> +  ret i32 %1
>>>> +}
>>>> +
>>>> +attributes #0 = { nounwind "less-precise-fpmad"="false"
>>>> "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"
>>>> "no-infs-fp-math"="true" "no-nans-fp-math"="true"
>>>> "stack-protector-buffer-size"="8" "unsafe-fp-math"="true"
>>>> "use-soft-float"="false" }
>>>> +
>>>> +!llvm.ident = !{!0}
>>>> +
>>>> +!0 = metadata !{metadata !"clang version 3.6.0 "}
>>>> +!1 = metadata !{metadata !2, metadata !2, i64 0}
>>>> +!2 = metadata !{metadata !"any pointer", metadata !3, i64 0}
>>>> +!3 = metadata !{metadata !"omnipotent char", metadata !4, i64 0}
>>>> +!4 = metadata !{metadata !"Simple C/C++ TBAA"}
>>>> +!5 = metadata !{metadata !6, metadata !6, i64 0}
>>>> +!6 = metadata !{metadata !"int", metadata !3, i64 0}
>>>> 
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>> 
>>> 
>>> --
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>> 
>> 
>> 
>> 
>> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140930/00604202/attachment.html>