[PATCH] D45994: AMDGPU/GlobalISel: Enable TableGen'd instruction selector

Wed May 9 11:03:13 PDT 2018

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:478
+AMDGPUInstructionSelector::selectVSRC0(MachineOperand &Root) const {
+  auto Res = selectImm(Root);
+  if (Res != None)
----------------
tstellar wrote:
> tstellar wrote:
> > tstellar wrote:
> > > nhaehnle wrote:
> > > > arsenm wrote:
> > > > > We don't fold immediates in the DAG selector now, so why do this here?
> > > > In fact, I remember there are some edge cases where folding immediates directly doesn't work due to moveToVALU. Basically, if you initially generate an instruction like S_ADD with an immediate and that instruction then gets moved to VALU it becomes a VOP3-encoded V_ADD which cannot have an immediate (because of the restriction that instructions can be at most 64 bits long), leading to machine instruction verifier errors.
> > > > 
> > > > The question is whether the same issue may apply here when going through GlobalISel in corner cases, even if we try to be better at selection the correct ALU directly. I think it makes sense to be a bit conservative initially.
> > > > We don't fold immediates in the DAG selector now, so why do this here?
> > > 
> > > Because we can?  Ideally doing folding in during ISel would allow us to eliminate or greatly simplify the SIFoldOperands pass, but admittedly, global-isel is so new I don't know if that would be possible.   I can go either way on folding during ISel, but it would at least be nice to have a target feature to enable this.
> > > 
> > > How do you feel about selecting directly to _e32 variants when we can instead of _e64, which also something that global-isel is doing?
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > >Because we can? Ideally doing folding in during ISel would allow us to eliminate or greatly simplify the SIFoldOperands pass, but admittedly, global-isel is so new I don't know if that would be possible. I can go either way on folding during ISel, but it would at least be nice to have a target feature to enable this.
> > 
> > Actually, thinking about this more I would prefer to have global-isel do immediate folding by default with a target option to disable it.  global-isel is still very experimental for AMDGPU, and I would like to take advantage of as many new features as possible while it's still in the experimental state.  This will also help us learn what's possible and what isn't.
> In theory global-isel won't just be better at selecting the correct ALU, it will *always* select the correct ALU, so we shouldn't have to worry about corner cases like this,  but this is all in theory until we actually have a more complete global-isel implementation.
I don't particularly like either. I don't think doing these during selection eliminates the need to do these things later. Other passes will be introducing and changing instructions, which will introduce new places these need to happen. It's true most of the places that need this now are junk created by SIFixSGPRCopies hacks.

I also think for good debug info constants are supposed to be materialized into a register before use. There are also multiple use and code size considerations when an immediate is used multiple times, and it's probably not appropriate for a single instruction pattern to be considering those, and a proper optimization pass would have to do it.

There is some stuff that has become problematic we do now in SIFoldOperands (the way clamp folding is implemented for example starts to break down with packed operations) that would be better to move when we have more semantically pure pseudoinstructions. Similarly, SDWA would probably be easier do handle during isel (although it has some of the other code size consideration issues as immediates)

Doing folding (especially for encoding shrinking) must be done later, since things like frame index elimination introduce new constants to deal with very late.

Repository:
  rL LLVM

https://reviews.llvm.org/D45994