[LLVMdev] Testing the new CFL alias analysis

Fri Sep 19 20:27:30 PDT 2014

Thanks all for the feedback. :)

- George

> On Sep 18, 2014, at 1:10 PM, Das, Dibyendu <Dibyendu.Das at amd.com> wrote:
> 
> For CPU2006 4-copy specint rate runs, we measured some small gains ( 2%, 3% and 6% respectively ) for bzip2, gcc and sjeng, and some small losses ( -3% and -3% resp.) for h264ref and astar. This is for x86 and did not use PGO, but used LTO and -m32 (along with the new CFL alias flags). Overall, there is about  a 0.5% gain in specint rate.
> 
> -Dibyendu Das
> AMD Compiler Group
> 
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel
> Sent: Tuesday, September 16, 2014 7:06 AM
> To: Gerolf Hoflehner
> Cc: George Burgess IV; LLVM Dev
> Subject: Re: [LLVMdev] Testing the new CFL alias analysis
> 
> ----- Original Message -----
>> From: "Gerolf Hoflehner" <ghoflehner at apple.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu>, "Jiangning Liu" <liujiangning1 at gmail.com>, "George Burgess IV"
>> <george.burgess.iv at gmail.com>
>> Sent: Monday, September 15, 2014 7:58:59 PM
>> Subject: Re: [LLVMdev] Testing the new CFL alias analysis
>> 
>> I filed bugzilla pr20954.
> 
> Thanks!
> 
> -Hal
> 
>> 
>> 
>> -Gerolf
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sep 15, 2014, at 2:56 PM, Gerolf Hoflehner < ghoflehner at apple.com
>>> wrote:
>> 
>> 
>> 
>> On CINT2006 ARM64/ref input/lto+pgo I practically measure no 
>> performance difference for the 7 benchmarks that compile. This 
>> includes bzip2 (although different source base than in CINT2000), mcf, 
>> hmmer, sjeng, h364ref, astar, xalancbmk
>> 
>> 
>> 
>> On Sep 15, 2014, at 11:59 AM, Hal Finkel < hfinkel at anl.gov > wrote:
>> 
>> 
>> 
>> ----- Original Message -----
>> 
>> 
>> From: "Gerolf Hoflehner" < ghoflehner at apple.com >
>> To: "Jiangning Liu" < liujiangning1 at gmail.com >, "George Burgess IV"
>> < george.burgess.iv at gmail.com >, "Hal Finkel"
>> < hfinkel at anl.gov >
>> Cc: "LLVM Dev" < llvmdev at cs.uiuc.edu >
>> Sent: Sunday, September 14, 2014 12:15:02 AM
>> Subject: Re: [LLVMdev] Testing the new CFL alias analysis
>> 
>> In lto+pgo some (5 out of 12 with usual suspect like perlbench and gcc 
>> among them using -flto -Wl,-mllvm,-use-cfl-aa
>> -Wl,-mllvm,-use-cfl-aa-in-codegen) the CINT2006 benchmarks don’t 
>> compile.
>> 
>> On what platform? Could you bugpoint it and file a report?
>> Ok, I’ll see that I can get a small test case.
>> 
>> 
>> 
>> 
>> 
>> 
>> Has the implementation been tested with lto?
>> 
>> I've not.
>> 
>> 
>> 
>> If not, please
>> stress the implementation more.
>> Do we know reasons for gains? Where did you expect the biggest gains?
>> 
>> I don't want to make a global statement here. My expectation is that 
>> we'll see wins from increasing register pressure ;) -- hoisting more 
>> loads out of loops (there are certainly cases involving 
>> multiple-levels of dereferencing and insert/extract instructions where 
>> CFL can provide a NoAlias answer where BasicAA gives up).
>> Obviously, we'll also have problems if we increase pressure too much.
>> 
>> Maybe. But I prefer the OoO HW to handle hoisting though. It is hard 
>> to tune in the compiler.
>> I’m also curious about the impact on loop transformations.
>> 
>> 
>> 
>> 
>> 
>> Some of the losses will likely boil down to increased register 
>> pressure.
>> 
>> Agreed.
>> 
>> 
>> 
>> 
>> 
>> Looks like the current performance numbers pose a good challenge for 
>> gaining new and refreshing insights into our heuristics (and for 
>> smoothing out the implementation along the way).
>> 
>> It certainly seems that way.
>> 
>> Thanks again,
>> Hal
>> 
>> 
>> 
>> 
>> 
>> Cheers
>> Gerolf
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sep 12, 2014, at 1:27 AM, Jiangning Liu < liujiangning1 at gmail.com
>> 
>> 
>> wrote:
>> 
>> 
>> 
>> Hi Hal,
>> 
>> I run on SPEC2000 on cortex-a57(AArch64), and got the following 
>> results,
>> 
>> (It is to measure run-time reduction, and negative is better
>> performance)
>> 
>> spec.cpu2000.ref.183_equake 33.77%
>> spec.cpu2000.ref.179_art 13.44%
>> spec.cpu2000.ref.256_bzip2 7.80%
>> spec.cpu2000.ref.186_crafty 3.69%
>> spec.cpu2000.ref.175_vpr 2.96%
>> spec.cpu2000.ref.176_gcc 1.77%
>> spec.cpu2000.ref.252_eon 1.77%
>> spec.cpu2000.ref.254_gap 1.19%
>> spec.cpu2000.ref.197_parser 1.15%
>> spec.cpu2000.ref.253_perlbmk 1.11%
>> spec.cpu2000.ref.300_twolf -1.04%
>> 
>> So we can see almost all got worse performance.
>> 
>> The command line option I'm using is "-O3 -std=gnu89 -ffast-math 
>> -fslp-vectorize -fvectorize -mcpu=cortex-a57 -mllvm -use-cfl-aa -mllvm 
>> -use-cfl-aa-in-codegen"
>> 
>> I didn't try compile-time, and I think your test on POWER7 native 
>> build should already meant something for other hosts. Also I don't 
>> have a good benchmark suit for compile time testing. My past 
>> experiences showed both llvm-test-suite (single/multiple) and spec 
>> benchmark are not good benchmarks for compile time testing.
>> 
>> Thanks,
>> -Jiangning
>> 
>> 
>> 2014-09-04 1:11 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
>> 
>> 
>> Hello everyone,
>> 
>> One of Google's summer interns, George Burgess IV, created an 
>> implementation of the CFL pointer-aliasing analysis algorithm, and 
>> this has now been added to LLVM trunk. Now we should determine whether 
>> it is worthwhile adding this to the default optimization pipeline. For 
>> ease of testing, I've added the command line option -use-cfl-aa which 
>> will cause the CFL analysis to be added to the optimization pipeline. 
>> This can be used with the opt program, and also via Clang by passing: 
>> -mllvm -use-cfl-aa.
>> 
>> For the purpose of testing with those targets that make use of 
>> aliasing analysis during code generation, there is also a 
>> corresponding -use-cfl-aa-in-codegen option.
>> 
>> Running the test suite on one of our IBM POWER7 systems (comparing
>> -O3 -mcpu=native to -O3 -mcpu=native -mllvm -use-cfl-aa -mllvm 
>> -use-cfl-aa-in-codegen [testing without use in code generation were 
>> essentially the same]), I see no significant compile-time changes, and 
>> the following performance results:
>> speedup:
>> MultiSource/Benchmarks/mafft/pairlocalalign: -11.5862% +/- 5.9257%
>> 
>> slowdown:
>> MultiSource/Benchmarks/FreeBench/neural/neural: 158.679% +/- 22.3212%
>> MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset:
>> 0.627176% +/- 0.290698%
>> MultiSource/Benchmarks/Ptrdist/ks/ks: 57.5457% +/- 21.8869%
>> 
>> I ran the test suite 20 times in each configuration, using make -j48 
>> each time, so I'll only pick up large changes. I've not yet 
>> investigated the cause of the slowdowns (or the speedup), and I really 
>> need people to try this on x86, ARM, etc. I appears, however, the 
>> better aliasing analysis results might have some negative unintended 
>> consequences, and we'll need to look at those closely.
>> 
>> Please let me know how this fares on your systems!
>> 
>> Thanks again,
>> Hal
>> 
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu 
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu 
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> 
>> 
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev