[llvm-dev] Encoding instructions in packets

Wed Jul 7 16:27:21 PDT 2021

Hello,

I am pretty inexperienced with the MC layer and am looking at encoding
instructions for a machine with an unusual encoding scheme.  Each group
of three instructions is encoded in a 64-bit packet along with a handful
of extra control bits.  These are not VLIW packets but simply the way
individual instructions are encoded.  Packets may straddle basic block
boundaries and control may exit or enter a packet at any of its
instructions.  In other words, it is purely an encoding scheme.

To throw an even bigger wrench into things, each instruction is not a
byte-multiple number of bits and the three instructions aren't encoded
in order.  The first and thirds instructions are contiguous but the
third instruction is inserted in the middle of the second instruction
such that it splits it.  So it looks something like this:

.-------------------------.
| CTL | Instr 2 | Instr 1 | Word 0
+-------------------------+
| CTL | Instr 2 | Instr 3 | Word 1
'-------------------------'

It seems that the MCCodeEmitter encodeInstruction assumes that a single
instruction is encoded.  But here I can't encode Instr 2 with that
interface since its encoding isn't contiguous.  Moreover, the CTL bits
depend on the contents of all three instructions so can't be set until
we've seen all three.

Originally I had planned to "queue up" three instructions before having
encodeInstruction actually write any bits, but I think the MC layer
assumes encodeInstruction always writes something out.  Is that true or
can I actually get away with encodeInstruction not emitting anything?
There is the pesky detail of handling the last packet and I don't see
any kind of "end function" interface to pad the final packet if needed.

Assuming that won't work, my next thought was to encode each instruction
sequentially, such that each MCFragment has a single instruction, with
separate fragments for the control bits.  By carefully controlling how
the bits are emitted I *think* (hope?) I can keep things such that
MCFixup offsets remain valid.  The fixup offset is from the start of the
section, yes?  So as long as things stay encoded that way until after
fixup/relaxation/etc. it should be fine?

Then in either MCAsmBackend::finishLayout (as overridden in a target
class) or MCElfStreamer::finishImpl (as overridden in a target class) I
could run through all the MCFragments, combining every three fragments
(plus the control fragments) into a single MCFragment encoded as a
packet, discarding the separate fragments I no longer need.  I'm
assuming here that as long as the fragments of every three instructions
is size 64 bits, the fixup addresses will remain correct even after
re-encoding into packets.  True?

I have no idea if either of these options is a viable path.

I don't want to use MCInstrBundle to represent a packet because those
can't straddle basic block boundaries and so I'd need to insert NOPs in
every basic block that is not a multiple of three instructions.

Is there another option I'm overlooking?  If not, are either of the
paths above viable?

Thanks for your help!

                          -David