[llvm] [TableGen] fix tlbgen for EncodingByHwMode (PR #84906)

Sun Mar 17 15:18:31 PDT 2024

nvjle wrote:

> > > @nvjle @topperc Hello, could you please help me review my code.
> > 
> > 
> > @superZWT123 Please give me a day or two to review this code. From a very quick scan, it actually undoes some of the work I did recently (or at least does it differently)-- which may be fine-- but I need to examine carefully. I'm using this feature in a downstream production backend with a complicated ISA (generously speaking) and even more complicated encoding schemes (100+ decoder tables). It tends to exercise the encode/decode heavily.
> > Also, my previous patch to suppress duplicates had a command line option so that existing downstream backends could opt-in to that feature-- at least temporarily. That is, a downstream backend could rebase and their disassembler would not need any changes due to decoder table naming. Something like that would probably be good to keep initially.
> > All of that said, I am certainly glad to see others helping to production-harden this feature.
> 
> Hi [nvjle](https://github.com/nvjle). I hope all is well with you. Since you mentioned needing a day or two to carefully review my code, three days have now passed, and I’m curious if you’ve had the opportunity to complete your review. Are there any suggestions or adjustments you’d like me to make? If not, may I proceed with merging my code into the main branch? I look forward to your valuable feedback. Thanks!

Hi @superZWT123,
I've done some reviewing and done a trial incorporation into a production downstream backend.
To summarize briefly, there are some significant problems with this patch in its current form. The first problem is that, in the mentioned downstream backend with these decoder changes applied, all decoder duplicate suppression is reverted compared to the existing duplicate suppression. This definitely can not be merged as is. The second problem is that with the encoding changes, I'm seeing miscompiles-- but I've not had time to investigate precisely what is happening there (enormous 200,000+ line TD files, complicated encoding schemes, binary diffs-- TBD).  

Patches to the decode and encode can be subtle-- especially in larger backends. Here are some specific requests and observations:
1. Typically in LLVM we adhere to this policy for submitting patches: https://llvm.org/docs/Contributing.html#how-to-submit-a-patch . Specifically, independent/orthogonal changes should be separate patches. This patch incorporates two completely independent issues (even though both rely on `EncodingByHwMode`) and each is subtle and needs to be considered separately. In this patch, I would respectfully request that it be split into two orthogonal parts-- one for the `CodeEmitterGen`+`SubtargetEmitter`+`RegisterInfoEmitter` set of changes and the other for the `DecoderEmitter` changes. This will allow detailed attention to each (which they need) and independent timelines for potential merging of each (and likewise for downstream backends-- they can independently incorporate/test/revert orthogonal changes). In any case, I'll give some initial thoughts below which can go into the follow-up patches preemptively for quicker turnaround.
2. The decoder emitter changes here essentially undo/cancel much of the existing support for duplicate suppression (in our backend, 100% of duplicates are no longer detected). It also undoes the command line option to opt out of suppression. This means any existing downstream backend that already relies on `EncodingByHwMode` may need to have its decoder tables altered. It is better in the short term to allow targets to decide whether or not to apply suppression-- particularly in production backends who are necessarily conservative or just want to change on their own timeline (consider downstream backends who pull-in upstream changes daily/weekly on an automated basis). This can be further discussed in a separate decoder patch, but at the moment, this is significantly worse than that current state of affairs. The good news is, I think by merely using "" (empty name) instead of `AllModes` as the shared/common decoder table name in the existing code, you would have the effect you are trying to achieve (I used that name to make it explicit in the generated tables that the table applies to all instructions that do not have a per-HwMode override, but a blank name is perfectly acceptable too-- specifically for the case when a backend did not previously have `EncodingByHwMode`, but will add it).
4. I see some obvious inefficiencies and coding standard nits (https://llvm.org/docs/CodingStandards.html) that you could attend to while splitting the patch. For example, in the generated `getRegInfosHwModeId`, there is an entire loop nest to simply detect that a single bit is set. That can be replaced entirely with a single bitwise expression (or use `llvm::has_single_bit` + zero check). A number of comments need to be updated (all comments should be standard prose/caps/full-stop/etc).
5. A simple TableGen unit test is needed for the reginfo/subtarget-emitters changes.  Also, I think it might make sense to have two different APIs for this since `getHwMode` can now return a set. Maybe the original for callers who expect exactly one live mode at a given moment (the original intent, as far as I know), and a second `getHwModeSet` for callers expecting a bit vector of live modes. If not that, at the very least, some comments need to be added explaining what this API now returns for future readers of this and related code.

I would say 3 days of review time is not terrible in LLVM time, especially for this sort of patch. I will certainly try to turn around as quickly as possible-- with the emitter changes being likely quicker to turnaround.

FWIW, in recent changes to TableGen, I've been very careful myself about making narrow, isolated changes. This helps reviewers more easily understand and review the code (lessen cognitive load for people who largely work full time and need to context switch). It also makes it easier to pinpoint any problems later and revert/change relatively small amounts of code. This does require a bit more effort to upstream patches from downstream/private backends, but this is exactly what all of us "living downstream" are doing.

https://github.com/llvm/llvm-project/pull/84906