[PATCH] D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection

Tue Jan 23 05:42:10 PST 2018

alex-t added a comment.

> Specifically which nodes are a problem here?  We should query the IR DivergenceAnalysis to compute isSDNodeSourceOfDivergence for a CopyFromReg from a live-in virtual register.  (Not sure there's an existing map from registers to values, but you could easily construct one; basically the inverse of FunctionLoweringInfo::ValueMap.)

In one of my previous posts I have explained what control dependencies are. Let's try again.
Consider the following OpenCL code:

  uint tid = get_global_id(0);    // returns the ID of the individual workitem
  if (tid < 10) {
    x = 2;
  } else {
    x = 3;
  } 
  z = y + x; // all threads 0-9 have x= 2, others x= 3

Please note that the addition "z = y + x" is divergent because different threads compute different values of "z".
Please also note that this addition does not depend on "tid" or any other divergent data.  It is not possible to discover this dependency analyzing individual block. We need CFG information.
Divergence Analysis on IR covers control dependencies by means of special PHI-nodes processing. 
For regular node the node divergence is computed as literally logical OR of all operands divergence bits.
For PHI-node it adds to the list all the branch instructions that terminate basic blocks in PHI's source blocks post-dominance frontier.

All the above means that we cannot just drop the IR divergence analysis results.  DAG only reflects data dependencies.
Analyzing individual block on the DAG we can only follow data dependencies. So if we try to match the divergence bits computed on the IR  (counting control flow)
with those computed on the individual block DAG we'll get in assert on the divergence bits set on the nodes control dependent on the divergent branches.

To track all the nodes divergent by the control dependencies we'd need to sustain special data structure along the all stages of the DAG processing.
This all looks too resource consuming.

There is one possible trade-off:
We can add virtual hook in TargetTransformInfo to query if the target support divergence analysis driven selection. It returns true iif the target ensures it has no transformations that may break divergence data integrity.
For AMDGPU that is always true.

If the target does not support this we don't use the divergence bit at all.

This would allow us to use the functionality w/o any even theoretical threat to other targets.

https://reviews.llvm.org/D35267