[PATCH] Allow BB duplication threshold to be adjusted through JumpThreading's ctor

Tue Sep 30 01:28:45 PDT 2014

----- Original Message -----
> From: "Owen Anderson" <resistor at mac.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Michael Liao" <michael.liao at intel.com>, llvm-commits at cs.uiuc.edu,
> reviews+D5444+public+de6f72cb2e4729d3 at reviews.llvm.org
> Sent: Tuesday, September 30, 2014 3:01:55 AM
> Subject: Re: [PATCH] Allow BB duplication threshold to be adjusted through JumpThreading's ctor
> 
> 
> 
> 
> On Sep 30, 2014, at 12:51 AM, Hal Finkel < hfinkel at anl.gov > wrote:
> 
> 
> 
> ----- Original Message -----
> 
> 
> From: "Michael Liao" < michael.liao at intel.com >
> To: "Hal Finkel" < hfinkel at anl.gov >
> Cc: reviews+D5444+public+de6f72cb2e4729d3 at reviews.llvm.org ,
> spatel at rotateright.com , llvm-commits at cs.uiuc.edu ,
> nrotem at apple.com
> Sent: Monday, September 29, 2014 11:10:38 PM
> Subject: Re: [PATCH] Allow BB duplication threshold to be adjusted
> through JumpThreading's ctor
> 
> 
> 
> On Mon, 29 Sep, 2014 at 5:10 PM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> ----- Original Message -----
> 
> 
> From: "Michael Liao" < michael.liao at intel.com >
> To: "michael liao" < michael.liao at intel.com >, nrotem at apple.com ,
> hfinkel at anl.gov
> Cc: spatel at rotateright.com, llvm-commits at cs.uiuc.edu
> Sent: Monday, September 29, 2014 6:34:36 PM
> Subject: Re: [PATCH] Allow BB duplication threshold to be
> adjusted
> through JumpThreading's ctor
> 
> Hi Hal
> 
> Yeah, "noduplicate" could prevent duplicating of barrier calls
> but
> that
> patch wants to address the potential issue on processors with
> divergent
> control flow, commonly found in GPUs, e.g. AMD/NVIDIA ones. The
> scenario is that, if BB is duplicated to exploit more jump
> threading,
> targets with divergent CF may execute more instructions if the
> condition is a divergent one.
> 
> For updating that threshold from TTI, yeah, if we are interested
> in
> that case. I could come another patch considering both TTI and
> user-specified threshold.
> 
> I suppose that I don't understand what you mean by "if we are
> interested." Generally speaking, ctor parameters are useful only
> for
> clients who are not using the standard optimization pipeline, and
> we'd like the standard optimization pipeline to generally work well
> for a wide range of targets. Thus, a TTI interface is preferred.
> 
> OK, I will add another patch with TTI support.
> 
> 
> 
> 
> 
> From a cost modeling perspective, how can you tell whether the
> instruction duplication will be worthwhile. Can this be something
> like 2*(instruction costs) <= (branch cost)?
> 
> To be honest, I have no concrete answer as the instruction cost may
> be
> changed significantly after merging two BB, which is not fully
> considered in the current cost model. E.g., if inst-fold kicks in
> after
> duplicating that BB and folds all instructions. Probably a better
> place
> to address that is to add a similar pass in backend with detailed
> target model. So far, this patch only allows brief control of that
> threshold.
> 
> The problem of estimating what costs will be after instruction
> folding is faced by many mid-level passes, and while a
> machine-instruction-level pass could do a better job at cost
> modeling, those passes often run too late to enable other
> optimizations, interact with inlining, etc.
> 
> That having been said, currently, getJumpThreadDuplicationCost does
> not use any of the current TTI-based cost modeling infrastructure
> (it pre-dates TTI), and I agree that it will provide a poor estimate
> of the ultimate cost because it has no understand of what the target
> will be able to fold. I suspect it would be better to make the
> function work more like CodeMetrics::analyzeBasicBlock so that the
> target can inform the estimation of the cost of each instruction
> (even the base TTI implemented has some intelligence that can be
> applied). I suspect that proving getJumpThreadDuplicationCost with
> an actual target-informed method for estimating costs will
> ultimately yield better results for everyone
> 
> 
> 
> If you want to go the code model route, you’re going to want to
> introduce some concept of duplication cost for an instruction. This
> code would be marginal on most CPUs (the only cost is code size),
> but significant on GPUs (or CPUs programmed in a SPMD fashion) where
> duplicated instructions have significant cost, even if unexecuted,
> because they reduce the utilization of the machine’s vector width.

Agreed. Normally (on a CPU), however, we can choose whether or not to predicate based on the code size, so it is not as much of a problem (although I can imagine that JumpThreading could hamper our ability to predicate later). So logic like if (TTI->blockLikelyToBePredicated(...)) { cost += TTI->dupCost(...) } certainly makes sense to me.

 -Hal

> 
> 
> —Owen
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory