[LLVMdev] Is LLVM expressive enough to represent asynchronous exceptions?

Tue Jun 14 18:35:57 PDT 2011

On Jun 14, 2011, at 1:07 AM, Duncan Sands wrote:

> Hi Andrew,
> 
>> No. Duncan suggested that he could hitch a ride with us through France. The problem is, we're not driving to Spain at all and there doesn't appear to be any place to transfer.
>> 
>> The point is, you're not going to be able to leverage most of a CFG-based optimizing compiler if don't use the CFG to express control flow.
> 
> when Chris first came up with his proposal for attaching exception handling info
> to basic blocks, I argued that it would be better to keep explicit control flow
> by having lots of basic blocks.  This has costs.  First off, you get a lot of
> control flow edges, which can cause some optimizers to take a long time.  This
> could hopefully be dealt with by having such algorithms ignore/spend less time
> on exception handling cfg edges.  Secondly, you use up more memory by allocating
> a lot of basic blocks.  This could be mitigated by internally using "lightweight
> basic blocks" to represent all the little basic blocks that would have all been
> part of one big basic block if you weren't allowing trapping instructions. 

Compilation time issues in CodeGen are usually caused by large basic blocks or "deep" CFGs. Passes are quadratic in both of these dimensions. The number of edges in the CFG is irrelevant for a given CFG depth. In other words, exception edges do not fundamentally increase the complexity of the CFG. I think this will hold true for the mid-level optimizer too.

If we need to optimize the memory footprint of the IR, there is plenty of room for improvement. To be concerned about this, I would need to see data on the footprint of CFG edges relative to total memory.

If any optimizations are currently inhibited by adding exception edges, I think it's an artificial limitation. Fixing the limitation will improve optimization independent of exception handling. For example, a loop optimization that only handles single-exit loops is simply incomplete regardless of whether we support exceptions.

This isn't the first time I've heard the small block fallacy. I'll try to give a quick data point. I suppressed basic block merging and hacked SimplifyCFG to split blocks at all call sites. Then I looked at the impact on 403.gcc and lencod at -O3. This is all I had time for.

It looks like the aggressive block splitting did have a small impact on compile time. Up to 10% for some passes. Much of the overhead is in isel, which doesn't make sense to me. We should get a speedup, but we may be suffering from extra DAG bloat for the copy nodes at block boundaries.

The regalloc slowdown is also counterintuitive to me. But the way GreedyRA works today, I think the small blocks force it into a more exhaustive search for split points. This could be managed.

I didn't notice a significant score change in either case, though that's not really what I was interested in.

** lencod trunk (unmodified)

Program | GCCAS  Bytecode LLC compile LLC-BETA compile JIT codegen | GCC CBE LLC     LLC-BETA JIT | GCC/CBE GCC/LLC GCC/LLC-BETA LLC/LLC-BETA
lencod  | 6.9222 2335248  8.6951      *                *           | *   *    7.2000 *        *   | n/a     n/a     n/a          n/a

  opt Total Execution Time: 6.9222 seconds (6.9200 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   1.0654 ( 16.0%)   0.0024 (  0.9%)   1.0678 ( 15.4%)   1.0678 ( 15.4%)  Global Value Numbering
   0.5236 (  7.9%)   0.0041 (  1.6%)   0.5277 (  7.6%)   0.5271 (  7.6%)  Induction Variable Simplification
   0.5145 (  7.7%)   0.0020 (  0.8%)   0.5165 (  7.5%)   0.5166 (  7.5%)  Combine redundant instructions

  llc Total Execution Time: 8.6951 seconds (8.6967 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   2.7376 ( 33.2%)   0.2429 ( 55.4%)   2.9804 ( 34.3%)   2.9815 ( 34.3%)  X86 DAG->DAG Instruction Selection
   1.8160 ( 22.0%)   0.0093 (  2.1%)   1.8252 ( 21.0%)   1.8250 ( 21.0%)  Loop Strength Reduction
   0.8806 ( 10.7%)   0.0653 ( 14.9%)   0.9459 ( 10.9%)   0.9461 ( 10.9%)  Greedy Register Allocator

  1219 branchfolding    - Number of block tails merged
  1687 branchfolding    - Number of branches optimized
  1847 branchfolding    - Number of dead blocks removed

** lencod split blocks at calls

Program | GCCAS  Bytecode LLC compile LLC-BETA compile JIT codegen | GCC CBE LLC     LLC-BETA JIT | GCC/CBE GCC/LLC GCC/LLC-BETA LLC/LLC-BETA
lencod  | 7.3408 2392336  9.1578      *                *           | *   *    7.2500 *        *   | n/a     n/a     n/a          n/a

  Total Execution Time: 7.3408 seconds (7.3399 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   1.0714 ( 15.1%)   0.0044 (  1.7%)   1.0757 ( 14.7%)   1.0760 ( 14.7%)  Global Value Numbering
   0.5184 (  7.3%)   0.0050 (  1.9%)   0.5233 (  7.1%)   0.5229 (  7.1%)  Induction Variable Simplification
   0.5201 (  7.3%)   0.0027 (  1.0%)   0.5228 (  7.1%)   0.5226 (  7.1%)  Combine redundant instructions

  Total Execution Time: 9.1578 seconds (9.4429 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   2.9271 ( 33.8%)   0.3016 ( 60.6%)   3.2287 ( 35.3%)   3.2297 ( 34.2%)  X86 DAG->DAG Instruction Selection
   1.8756 ( 21.7%)   0.0101 (  2.0%)   1.8857 ( 20.6%)   1.8859 ( 20.0%)  Loop Strength Reduction
   0.9343 ( 10.8%)   0.0651 ( 13.1%)   0.9993 ( 10.9%)   0.9996 ( 10.6%)  Greedy Register Allocator

  1260 branchfolding    - Number of block tails merged
  1677 branchfolding    - Number of branches optimized
  5732 branchfolding    - Number of dead blocks removed

-Andy

-------------- next part --------------
A non-text attachment was scrubbed...
Name: splitblocks.diff
Type: application/octet-stream
Size: 1823 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110614/d9eb6b11/attachment.obj>