[PATCH] D42006: AArch64: Omit callframe setup/destroy when not necessary

Jun Lim via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 2 08:42:36 PST 2018


I looked at the hot function of spec2006/astar
(_ZN7way2obj12releasepointEii) as its score is pretty stable. After this
change, I can see one extra MOV in the entry and the a spill STR is moved up
in the entry (note that the entry block has impact on the score). Other
changes in the function are all related with the different spill decisions;
different spill/reloads in different blocks. Overall, when I enabled r322917
only this function, I observed 1.33% more dynamic instructions and about 3%
regression on falkor. The different spill decisions caused by this change
seems to lead more executions of spill/reload in this particular case we are
caring about unfortunately.  Please let me know your thought.

# Entry block of _ZN7way2obj12releasepointEii:
     without r322917
--------------------------
stp x22, x21, [sp,#176]
stp x20, x19, [sp,#192]
stp x29, x30, [sp,#208]
mov x19, x0
ldrsw x8, [x19,#4424]
sxtw x10, w2
sxtw x12, w1
madd x8, x8, x10, x12
ldr x9, [x19,#8]
add x9, x9, x8, lsl #2
ldrh w11, [x9]
ldrh w10, [x19,#16]
cmp w11, w10
str x2, [sp,#120]
b.eq L1

  with r322917
------------------------
stp x22, x21, [sp,#176]
stp x20, x19, [sp,#192]
stp x29, x30, [sp,#208]
mov x19, x0
ldrsw x8, [x19,#4424]
mov w26, w2
sxtw x10, w26
sxtw x11, w1
str x11, [sp]
madd x8, x8, x10, x11
ldr x9, [x19,#8]
add x9, x9, x8, lsl #2
ldrh w11, [x9]
ldrh w10, [x19,#16]
cmp w11, w10
b.eq L1


-----Original Message-----
From: Matthias Braun [mailto:matze at braunis.de] 
Sent: Tuesday, January 30, 2018 7:19 PM
To: reviews+D42006+public+b98fc578ea6d8e5b at reviews.llvm.org
Cc: t.p.northover at gmail.com; aemerson at apple.com; eli.friedman at gmail.com;
Quentin Colombet <qcolombet at apple.com>; gberry at codeaurora.org;
efriedma at codeaurora.org; junbuml at codeaurora.org; renato.golin at linaro.org;
mcrosier at codeaurora.org; javed.absar at arm.com; kristof.beyls at arm.com;
llvm-commits at lists.llvm.org; kanheim at a-bix.com; james.molloy at arm.com;
diana.picus at linaro.org; florian.hahn at arm.com
Subject: Re: [PATCH] D42006: AArch64: Omit callframe setup/destroy when not
necessary

So I took 453.povray (spec2006) and I cannot reproduce the performance
regression on our devices (if there is any change than it is below the
measurement noise, that particular benchmark seems to be noisy in general
for us). I tried ref and train input sets.

I used the following flags to hopefully get close to what you measured:
    -O3 -Xclang,-target-feature -Xclang,+use-postra-scheduler
-fno-math-errno -ffp-contract=fast -fomit-frame-pointer      [1]

Also comparing assembly of the hottest functions (from train dataset) showed
nothing that would explain a performance swing:

pov::All_CSG_Intersect_Intersections(pov::Object_Struct*, pov::Ray_Struct*,
pov::istack_struct*)
    - some small changes in register numbering, same instructions
pov::All_Plane_Intersections(pov::Object_Struct*, pov::Ray_Struct*,
pov::istack_struct*)
    - no changes
pov::All_Sphere_Intersections(pov::Object_Struct*, pov::Ray_Struct*,
pov::istack_struct*)
    - 3 instructions scheduled differently, 1 stp moves closer to a memcpy
pov::Check_And_Enqueue(pov::Priority_Queue_Struct*, pov::BBox_Tree_Struct*,
pov::Bounding_Box_Struct*, pov::Rayinfo_Struct*)
    - 1 lsl scheduled later
pov::Intersect_BBox_Tree(pov::BBox_Tree_Struct*, pov::Ray_Struct*,
pov::istk_entry*, pov::Object_Struct**, bool)
    - ldr moved inside a sequence of ldrs.
pov::DNoise(double*, double*)
    - no changes
pov::All_Quadric_Intersections(pov::Object_Struct*, pov::Ray_Struct*,
pov::istack_struct*)
    - no changes

Could the swings be explained with something in your environment? Could you
dive in an see whether you can spot differences in the assembly?

- Matthias



[1] For reference:
clang++  -DNDEBUG  -save-temps=obj -save-stats=obj -Xclang,-target-feature
-Xclang,+use-postra-scheduler -fno-math-errno -ffp-contract=fast
-fomit-frame-pointer  -B
/Applications/Xcode.app/Contents/Developer/Toolchains/iOS11.1.xctoolchain/us
r/bin  -O3 -DNDEBUG -arch arm64 -isysroot
/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Devel
oper/SDKs/iPhoneOS11.1.Internal.sdk   -w -Werror=date-time -DSPEC_CPU
-DSPEC_CPU_MACOSX -DSPEC_CPU_LITTLEENDIAN -DSPEC_CPU_LP64
-Wno-implicit-function-declaration        (followed by -MD/-MT/-MF and
filenames)

> On Jan 28, 2018, at 7:33 AM, Chad Rosier via Phabricator
<reviews at reviews.llvm.org> wrote:
> 
> mcrosier added a comment.
> 
> Would it be possible to revert r322917 while we investigate the
regressions?  We also identified a 3.61% regression in SPEC2006/bzip2, so
here's to complete list of regressions we are currently seeing due to this
change:
> 
> With -O3 -fno-math-errno -ffp-contract=fast -fomit-frame-pointer
-mcpu=falkor:
> 
>  Spec2006/astar -3.25%
>  Spec2006/bzip2 -3.61%
>  Spec2006/povray -5.28%
>  Spec2017/povray -6.08%
> 
> With -O3 -flto -fuse-ld=gold -fno-math-errno -ffp-contract=fast
-fwhole-program-vtables -fvisibility=hidden -fomit-frame-pointer
-mcpu=falkor:
> 
>  Spec2006/astar -4.20%
>  Spec2006/h264ref -2.15%
> 
> 
> All tests were run on Falkor, but hopefully these issues can be reproduced
on other targets.  Please let us know if you need any assistance
reproducing, Matthias.
> 
> Chad
> 
> 
> Repository:
>  rL LLVM
> 
> https://reviews.llvm.org/D42006
> 
> 
> 




More information about the llvm-commits mailing list