[LLVMdev] IR Passes and TargetTransformInfo: Straw Man
shuxin.llvm at gmail.com
Mon Jul 29 16:39:12 PDT 2013
On 7/29/13 4:07 PM, Andrew Trick wrote:
> On Jul 27, 2013, at 5:47 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>> Hi, Sean:
>> I'm sorry I lie. I didn't mean to lie. I did try to avoid making a *BIG* change
>> to the IPO pass-ordering for now. However, when I make a minor change to
>> populateLTOPassManager() by separating module-pass and non-module-passes, I
>> saw quite a few performance difference, most of them are degradations. Attacking
>> these degradations one by one in a piecemeal manner is wasting time. We might as
>> well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases at this time,
>> and hopefully once for all.
>> In order to repair the image of being a liar, I post some preliminary result in this cozy
>> Saturday afternoon which I normally denote to daydreaming :-)
>> So far I only measure the result of MultiSource benchmarks on my iMac (late
>> 2012 model), and the command to run the benchmark is
>> "make TEST=simple report OPTFLAGS='-O3 -flto'".
>> In terms of execution-time, some degrade, but more improve, few of them
>> are quite substantial. User-time is used for comparison. I measure the
>> result twice, they are basically very stable. As far as I can tell from the result,
>> the proposed pass-ordering is basically toward good change.
>> Interesting enough, if I combine the populatePreIPOPassMgr() as the preIPO phase
>> (see the patch) with original populateLTOPassManager() for both IPO and postIPO,
>> I see significant improve to "Benchmarks/Trimaran/netbench-crc/netbench-crc"
>> (about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I have not yet got chance
>> to figure out why this combination improves this benchmark this much.
>> In teams of compile-time, the result reports my change improve the compile
>> time by about 2x, which is non-sense. I guess test-script doesn't count
>> The new pass ordering Pre-IPO, IPO, and PostIPO are defined by
>> I will discuss with Andy next Monday in order to be consistent with the
>> pass-ordering design he is envisioning, and measure more benchmarks then
>> post the patch and result to the community for discussion and approval.
> I don't have any objection to this as long as your compile times are comparable.
> The major differences that I could spot are:
> You've moved the second iteration of some scalar opts into post-IPO:
> - JumpThreading
> - CorrelatedValueProp
I don't see why we need so many iterations. So, I get rid of it
> You no longer run InstCombine after the first round of scalar opts (in preIPO) and before the second round (in PostIPO).
> You now have an extra (3rd) SROA in PostIPO.
I call the SROA for dead code elimination, seriously!
The dead-whatever-elimination (even if they are called aggressive) pass
dose not eliminate last store the
local variable. Shame! Shame! Shame!
It seems we don't have better way since we don't like mem-ssa. We have
to call SROA , a all-in-one algorithm,
to perform such stuff.
> I don't see a problem, but I'd like to understand the rationale. I think it would be valuable to capture some of the motivation behind the standard pass ordering and any changes we make to it. Sometimes part of the design becomes obsolete but no one can be sure. Shall we start a new doc under LLVM subsystems?
More information about the llvm-dev