Disable arm-gepopt by default

Gerolf Hoflehner ghoflehner at apple.com
Mon Apr 20 19:03:39 PDT 2015


Hi James,

from a birds view this optimization needs more work, quite possibly more so in other passes rather than the algorithm itself.  In our environment we haven’t seen *any* gains from it. Could you and/or Hao take this on after the code is disabled/backed out?

The twoFish example is a double whammo with code size up *and* performance down:  the size of the assembly files differs by 100 lines (~10%), so there must be more to the different than just one simple combiner pattern fix. One issue I spotted quickly are the store pairs (rather than two str) when the optimization is off.

Similarly,  for test/CodeGen/AArch64/arm64-cse.ll,  the differences are:

< -aarch64-gep-opt=false 
> -aarch64-gep-opt=true 
22,23c22,23
< 	ldr	 w8, [x1]
< 	subs	w8, w8, #1              ; =1
---
> 	ldrsw	 x8, [x1]
> 	cmp	 w8, #1                 ; =1
26,27c26,29
< 	str	 w8, [x1]
< 	add	x0, x0, w8, sxtw
---
> 	sub	w9, w8, #1              ; =1
> 	add	 x8, x0, x8
> 	str	 w9, [x1]
> 	sub	x0, x8, #1              ; =1

FWIW, the other test case that uses the flag to disable the optimization looks OK. It is not clear to me why the flag was needed.

Thanks
Gerolf



> On Apr 20, 2015, at 1:54 PM, James Molloy <james at jamesmolloy.co.uk> wrote:
> 
> Hi Michael,
> 
> From a bleary-eyed look at those dumps, it looks to me as if fewer addressing modes are being matched. For example I'm seeing stuff like this:
> 
> -00000001000314b8		ubfm	x1, x2, #1, #8
> -00000001000314bc		ldr	w1, [x12, x1, lsl #2]
> +0000000100031be4		add	x15, x11, w15, uxtb #2
> +0000000100031be8		ldr	w15, [x15, #2120]
> 
> So, is this just pattern permutation - addressing modes aren't being matched as well as they can because the DAG has been permuted? (should be easy to fix if so). Or is it due to information loss somewhere along the line? (I seem to remember that the gep matching pass will produce some inttoptr/ptrtoint pairs sometimes).
> 
> My main concern with disabling it is the likelihood that it'll be left unfixed, and bitrot. Do you intend to look at it after disabling it? I'd be happier if I knew it was on someone's backlog.
> 
> Adding Hao, as he wrote this and hopefully can have more insight!
> 
> Cheers,
> 
> James
> 
> On Mon, 20 Apr 2015 at 21:36 Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote:
> Hi James,
> 
> I’m attaching two disasm files for the TwoFish hot function, the first one (slow.s) is obtained by compiler with default options, the second one (fast.s) - with aarch64-gep-opt=false.
> It’s a pretty big function, and for me it’s not obvious what piece of code is responsible for the total performance. However, slow.s is 100 lines longer, and that’s corresponds to the performance I observe (for slow.s geekbench score of the Twofish is 733, for fast.s - 951, higher is better).
> 
> The tests arm64-cse.ll and arm64-addr-mode-folding.ll also might give some ideas what’s wrong - and we need to fix them, or properly explain why it’s ok to disable them (please note that they were written exactly to detect issues that had been fixed once).
> 
> I totally support that we need to understand what’s wrong, and what I suggest is a temporal disabling until it’s done.
> 
> Thanks,
> Michael
> 
> 
> 
>> On Apr 18, 2015, at 9:23 AM, James Molloy <james at jamesmolloy.co.uk <mailto:james at jamesmolloy.co.uk>> wrote:
>> 
>> Hi Michael,
>> 
>> I'm not too keen on disabling this without understanding what it's doing wrong. It seems like a large hammer. 
>> 
>> If this is the pass I'm thinking of, it was introduced by Hao to help remove many redundant multiplies formed by complex GEPs, and did manifest in spec, gobmk was the main culprit iirc. 
>> 
>> It will also affect in order cores more than out of order ones- would it be ok to understand whether the regression is a deficiency in the pass easily fixed or if it is something more fundamental?
>> 
>> James
>> On Fri, 17 Apr 2015 at 19:48, Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote:
>> Hi,
>> 
>> Recently I’ve found that LLVM regressed on a TwoFish test (from geekbench) when comparing relatively new compiler and a year-old one. The performance regression is pretty big - about 20%, so I decided to investigate it further and understand what change exposed that. Triaging led me to r222328 and r222331, which introduced a new ARM64-specific pass for hoisting constant offsets from addresses. It claimed some gains on SPECS, but when I turned it off, I saw no regressions there (I checked SPEC2006/ref-input set, built with PGO and LTO).
>> 
>> Moreover, when the patch was committed, it effectively disabled two existing regression tests, which is not good I think.
>> 
>> Is it ok to disable this pass by default and enable the tests back? We can turn it back on when we are confident that all tests pass, and there are no new regressions. Any objections?
>> 
>> Thanks,
>> Michael
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150420/6fae6eb5/attachment.html>


More information about the llvm-commits mailing list