[llvm-dev] Tablegen backend for emulator core?

Tue Mar 23 04:41:39 PDT 2021

Hi John,

This is an area I'm still greatly interested in since doing the work
up to that talk, and have worked on a second simulator using
MCDisassembler as the decoder, but sadly haven't had the time to do
any of the exploring using TableGen for the semantics.

It has however still been ticking over in my brain so I have some more
thoughts on what would need to be considered, and think it would be a
good addition to LLVM, so would be happy to see that be added, making
comparisons in my mind to what CGEN adds to the GNU toolchain.

The first thing is what kind of simulation do we want to have LLVM
model, if its a simple instruction set simulator then in some regards
I don't imagine this being too hard, but if we want to be able to
stretch as far as modelling full pipelines (unlikely to be
automatically), it would be good to generate the components, even if
we have to build the pipeline by hand (I think scheduling info in
TableGen is probably insufficient for this).

The other large thing would be identifying the semantics we don't have
from TableGen patterns and working out a nice way to describe these in
Instruction definitions. For architectures that have status bits
modified by instructions and then used by future branch instructions,
these are typically modelled as registers that are implicitly written,
and so definitions would need to be extended to describe these. I'm
thinking you might have a second field that describes extra semantics
that doesn't make sense for code generation, but 100% sure on that
one. If we can auto-generate that one that IMO would dramatically
reduce how much needs manually writing.

One thing to keep in mind though, is at some scale a single switch on
Insn.getOpcode() might not be the best model for the more complex and
varied architectures. If you have multiple generations of
architectures you might want to split the simulator up into different
loops, so if you have a couple of generations of cores in one backend
maybe you want to generate different loops (maybe reusing
ProcessorModels or something similar here?) That would help
identifying which instructions would still need manual semantics
written in a more mentally scalable way.

Practically speaking if something like this would be written, I think
there's a good model to follow in how GlobalISel's reuse of TableGen
patterns has gone, having a TableGen generator that generates what it
can for some instructions, and then asking someone to write the
missing parts, and over time more things can move from hand written to
auto-generated.

As for driving the simulator, I've found the "forked objdump" approach
works well, but if this were in-tree I'd expect something to be more
specialised (and likely written from scratch) to fit in. It may also
be potentially possible to repurpose parts of LLDBs lldb-server
pulling in some "MCSimulator" library to have something that talks RSP
(for people using LLVM + GDB), but I'm not familiar with all its
components to know how feasible this would be.

Overall I still think this is something that would be a great addition
to LLVM and I think the raw pieces are there. As with most things I
think these things live and die by having people who would use and
maintain such things. I'm not sure how many others also have interest
or thoughts in this area but if I would certainly welcome such an
addition.

Thanks,
Simon

On Sun, Mar 21, 2021 at 7:45 PM John Byrd <jbyrd at giganticsoftware.com> wrote:
>
>
>> > And I realized that, although I could write an emulator in the traditional manner, tablegen already has most of the information it needs to automatically generate the guts of an emulator.
>>
>> > Simon Cook (CCed) previously used LLVM MC to help write a simulator <https://llvm.org/devmtg/2016-01/slides/fosdem16-aapsim.pdf>, which might be worth taking a look at. Though I understood from your email that you're imagining relying more heavily on TableGen for generating the execution loop.
>
>
> Thank you.  I think Cook understood what I was hinting at, towards the end of his presentation.  You could build such a simulator by creating a large switch statement based on MCInst's the way that Cook has done, or you could theoretically let tablegen create that switch for you.... llvm-tblgen -gen-simulator was the way he put this idea.  At the least, the concept maintains tablegen's DRY approach to representing machine instructions.
>
> jwb