[LLVMdev] Suggestion for a more flexible DFA Packetizer

Fri Mar 13 14:43:35 PDT 2015

Hi All,
   I would like to suggest (and implement) an improvement to LLVM's DFA
Packetizer.
I am currently implementing a backend for LLVM to compile towards a VLIW
ISA.
One of the specificities of this ISA is that it is not a pure VLIW : it has
5 executions slots (2 ALUs, 1 MAU/FPU, 1 BCU and 1 LSU) but they are not
totally independant. For example Some instructions use the MAU + part of
the LSU, forbidding other instructions to be packed in the same bundle.
This makes the architecture difficult to describe in the current state of
Instruction itinerary description of LLVM.

  To describe the instruction itineraries correctly I tried to used 0
cycles latency pipeline stages (multiple InstrStage<1, [FUs], *0> * for a
same itineraries but it seems the standard DFA based Packetizer only look
at the first pipeline stages when selecting the units used by an
instruction (for DFA state init or transition).

So I made the following changes:
-> exend the TableGen FuncUnit class towards
class FuncUnit<list<FuncUnit> subunits = []> {
   list<FuncUnit> SubUnits = subunits;
}
-> duplicate the DFAPacketizerEmitter (one of TableGen backends) towards a
new backend which works with unit state which are not only one bit set in
the unsigned value but multiple bits (masks),
In fact an architecture is now described by simple units (SubUnits == [])
and complex units (SubUnits != []) both are encoded by a single bit set in
an unsigned mask when generated by the SubtargetEmitter, but when computing
the Packetizer DFA we swap bit resource for mask resources to encode that a
single units may in fact use several simple/basic resources.

This require the modificaiton of
addInsnClass and canAddInsnClass Function + DFA class +
DFAPacketizerEmitter class

for example in canAddInsnClass rather than testing (~*SI & InsnClass) to
see if an InsnClass is compatible with the current state we check than none
of the multiple resource (bitmask) of the InsnClass is set in the state. It
looks like this
for (unsigned j = 0; j < sizeof(unsigned) * 8; ++j) {
  if ((1 << j) & InsnClass) { // the j-th is a possibility for InsnClass
     InsnResource = unit_resource[j];
     if (SI & InsnResource == 0) return true;
   }
unit_reousrce is an unsigned array containing the bit mask of cumulated
resource for every FuncUnits. It is computed when reading the FU list of
the Itinerary list, and contains one bit for simple units and multiple bits
for compound units.

the addInsnClass is modified in a similar way

  unsigned ResultingResourceState = thisState | (0x1 << j);

becomes

  unsigned ResultingResourceState = thisState | unit_resources[j];

The modification implies some slowdown: in canAddInsnClass we change a
bitwise test by a loop of tests, we must first compute unit_resources and
transmit it from DFAPacketizerEmitter to DFA and State objects. Moreover,
generally the architecture being described will be more complex with a more
complex DFA. In my backend, I did not notice any visible increase in LLVM
build time, but I will try to make some measures and try to improve the
implementation if the community want to go forward with it.

The modifications are working in my version, they will need some clean-ups
before an actual patch can be submitted. I want to know if those
modifications have not been suggested (and rejected) before and if they
might be of interest for someone before doing the work of generalizing the
work I did for my backend to other.

best regards,
Nicolas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150313/b83b8a14/attachment.html>