[llvm-dev] Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection

Alexander Timofeev via llvm-dev llvm-dev at lists.llvm.org
Wed Nov 15 05:24:51 PST 2017

Hi All,

I'm writing to introduce and attract attention to the proposed change that
was published for the review several month ago.


I also tried to contact code owners explicitly but had no reply yet.

Brief story:

In SIMT architectures VGPRs are high-demand resource. Same time significant
part of the computations operate on naturally scalar data.
That computations can be performed by the SALU and save a lot of VGPRs.
This is intended to increase occupancy.
Also, splitting the data flow to scalar and vector parts provide more
flexibility to the instruction scheduler that can increase HW utilization.

On GPU targets we say that instruction is vector if it operates on VGPR
operands each lane of which contains different values.
We say the instruction is scalar if it operates on SGPR that is shared
among the all threads in the warp.

Divergence Analysis was introduced by F. Pereira & Co in 2013 and now is a
part of LLVM core analysis stuff.
Unfortunately it's results are mostly useless because there is no way to
inform instruction selection DAG about the divergence property of the
concrete instruction.
Literally, IR operation that has not divergent operands produces uniform
result and should be selected to scalar instruction.

We used to pass divergence data for memory access instructions through
metadata just because MemSDNode has memory operand that refer the IR.
This approach is restricted to memory accesses only. That's why we'd need
another pass working on the machine code that propagates divergence property
from the value load to computations and finally to the result store. Except
the fact that we'd need one more pass,
this pass would repeat on the machine instructions same algorithm that was
already done by the divergence analysis over IR.

Since SDNode flags field was recently enhanced to 16 bits and there are 5
bits unoccupied yet we have a chance to use them for passing divergence
data to instruction selection.

This change introduce possible approach to the implementation of such
It passes DA data for load instructions only. If accepted we'll go ahead
and add same code to handle other instructions as well.

I'd appreciate any advises and/or opinions.

Thanks in advance.

