[PATCH] D93849: [RISCV] Define vmclr.m/vmset.m intrinsics.

Mon Dec 28 11:51:43 PST 2020

frasercrmck accepted this revision.
frasercrmck added a comment.
This revision is now accepted and ready to land.

In D93849#2472560 <https://reviews.llvm.org/D93849#2472560>, @craig.topper wrote:

> If we expand them early there's no guarantee the register allocator would pick the same register for the inputs as the output. Our downstream repo and probably BSC's repo was expanding them in a custom inserter and putting the same vreg on the inputs and the outputs with the input operands marked with undef. Which seemed to work, but I'm surprised that isn't flagged as a violation of SSA by the machine verifier. Expanding after register allocation allows us to force the source registers explicitly.
>
> I would imagine a high performance out of order processor would want to detect these instructions as a special case of xor/xnor and not read the data from the input register. Whether they would check that the inputs and the output are the same register or just the inputs are the same register I'm not sure. X86 CPUs for many generations have recognized xor eax, eax as special zero idiom that doesn't have a dependency on the previous value of eax. The X86 backend also uses pseudos and a post RA expansion. For scheduling, X86 has various WriteZero classes the pseudos are assigned to and I believe there is special code in most of the scheduler models to detect xor with same input post RA.

I similarly don't understand how the pre-RA `undef` inputs managed to "just work".

That's really interesting to learn about the X86 aliases/idioms, thanks. The processors I've worked on have been rather different so that might explain where I was coming from (see below).

In D93849#2472580 <https://reviews.llvm.org/D93849#2472580>, @khchen wrote:

> I did this way jsut because I think PseduoInst should be expanded after RA
> which can ensure source and dest use the same register(non-SSA). I didn't
> consider the pros/cons yet.
> Could you please share me is there any target also expand PsedoInst in the
> ISel custom inserter? I'm wondering that in order to get more precise schedule info,
> it means ideally any target would prefer expand pseudo as early as possible.
> Maybe the another way is adding schedule info for those pseudo insts?

My previous (downstream) project had a complex auto-generated per-operand scheduling model with all sorts of bypasses, all to satisfy an exposed pipeline, so it was important for us to have information about the underlying instruction as accurate as possible as early as possible to avoid hundreds of special cases and copy/paste code in the rest of the compiler. So our general approach was to limit the number pseudos that survived past pre-RA scheduling. I can't point to an upstream target that does this. I haven't yet familiarised myself with the RISCV scheduling info so I can't be of much help about the best approach there. If Craig's right then it sounds like it's up to the RISC-V implementation about whether aliases could be "special" from the processor's point of view or whether they're just assembler syntactic-sugar.

I think Craig's comments help resolve the matter about where to do this expansion. So LGTM!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93849/new/

https://reviews.llvm.org/D93849