[llvm-dev] Dealing with illegal operand mappings in RegBankSelect

Wed Feb 27 11:58:32 PST 2019

Thanks for the details Matt, now I think I can suggest an alternative or provide more reasonable explanation why what you’re doing is right!

Assuming we stick to your current approach, it makes sense to mark the operation legal for vgpr because from RBS point of view the boundaries of the expansion of the instruction are indeed vgpr.

Now, if we want to leverage the infrastructure provided by RBS for this case, a possible alternative would require to unroll the loop (or use a pseudo that would fake the loop unrolling).
Here is what it would look like:
Essentially we would replace:
```
= use <NxsM> vgpr
```
into
```
sgpr1, …, sgprN = extract_reg vgpr
= use sM sgpr1
…
= use sM sgprN
```
In terms of mapping, you would describe vgpr as being broken down in N element of sgpr. The applyMapping would insert the `extract_reg` for repairing and expand the vector use into the scalar uses (or one pseudo with N sgpr as input).

Honestly, I don’t know how much benefit you would get to expose these details to RBS today.
The advantages I see moving forward are:
- At some point I wanted to teach RBS to materialize the repairing code next to the definition (as opposed to the use like we do right now) so that we can reuse that repairing for something else (what you’ve pointed out in your previous reply)
- In the future RBS is supposed to be smart enough to decide that something needs to be scalarized because its uses are cheaper that way. I.e., in your case, the definition of vgpr could be scalarized from the start and we wouldn’t have to insert this repairing code (in other words, RBS would have chosen to scalarize the def of the the vgpr into sgpr instead of repairing the use of the vgpr into sgpr).

Therefore, if we choose not to expose these details to RBS, we shut the door to potential improvements on how the repairing points are inserted/shared and how the cost model is able to deal with choosing the best instructions based on its uses.

Obviously, I haven’t spent a lot of time thinking about for your case, hence it could be completely bogus!

Cheers,
-Quentin

> On Feb 26, 2019, at 4:58 PM, Matt Arsenault <arsenm2 at gmail.com> wrote:
> 
> 
> 
>> On Feb 26, 2019, at 7:46 PM, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote:
>> 
>>> The only  use I would have for the copy is as as a means of passing which registers were already created for the new mapping, after which point I would need to delete it.
>> 
>> Could you describe in pseudo code what the expansion of vgpr into sgpr looks like?
>> e.g., = use vgpr
>> And you only support = use sgpr 
>> 
> 
> It’s serializing the vector operation. There’s an additional optimization to reduce the number of loop iterations when multiple work items/lanes/threads have the same value in them which happens in practice, but essentially it does:
> 
> Save Execution Mask
> For (Lane : Wavefront/Warp) {
>   Enable Lane, Disable all other lanes
>   SGPR = read SGPR value for current lane from VGPR
>   VGPRResult[Lane] = use_op SGPR
> }
> Restore Execution Mask
> 
> Eventually it might be nice to have optimizations to only emit one of these loops when multiple consecutive instructions need the same register handled (which I suspect will happen frequently with image samplers), but I haven’t really thought about what that should look like yet.
> 
> -Matt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190227/574f7734/attachment-0001.html>