[llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

Thu Jan 14 16:41:57 PST 2016

Hi,

As part of the effort to bring up GlobalISel, I would like your feedbacks on the best way to map LLVM IR values into MachineInstr values (virtual registers), in particular when aggregate types get involved.

I am looking for a long term solution.
Short term is to replicate SDAG solution.

** Context **

The first step of GlobalISel is to translate the LLVM IR into MachineInstr representation. During this process we need to be able to tell where are the different LLVM IR values with respect to their machine representation, such that we can feed the right element to the related machine instructions.

Right now, in SelectionDAG, we have a mapping from one Value* to  one SDValue. But this is not the whole story!

** Problem With Aggregate Types **

* Facts *

- Aggregate types cannot be expressed with a single EVT (Extended Value Type)
- SDValues are typed with a single EVT.

Therefore, values with aggregate types need to be break apart to be represented by a SDValue.

* Implications *

- Instructions that may accept aggregate type, e.g., load, store, and select, must be split to handle the different component that compose a Value. (Look for the use of ComputeValueVTs in the SelecionDAGBuilder and the related loops this implies).
- Use of aggregate type generates a bunch of MERGE_VALUE nodes, which sole purpose is to glue all the components that make the aggregate type together.

Therefore, in practice, values with aggregate types are mapped to several SDValue hidden in MERGE_VALUE nodes.

* Summary *

Values with aggregate type map to a list of SDValue and, consequently, the handling of the instructions is uneven between the ones that need to support aggregate type and the ones that do not.

** Possible Solutions **

* A. Replicate SDAG Solution *

Not much to add from the title. One can see that in the sketch of patch for the IRTranslator (from the original GlobalISel email http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html), we have a map from a Value to a list of virtual registers.

Pros:
- We know it worked for SDAG.
- We know backends would generate reasonable code from that.

Cons:
- We need to pay (compile time, memory consumption) for the extra logic for every non-aggregate type.
- Translation is not homogeneous between, say, add and store.
- Premature splitting that need to be fixed later(?), e.g., merge consecutive stores.

* B. Map Aggregate Type to One (Big) Virtual Register *

Have one big virtual register for the whole aggregate type. Only the size of the register matters for the instructions that handle aggregate type, so we can still use EVT for virtual registers.

Pros:
- Translation is fast.
- Code is easy to understand.
- No premature splitting for instance for load instructions.

Cons:
- Useless bits (padding) are carried around and may be hard to eliminate.
- Expose weird types to legalization more often.
- Backends may not cope very well with that kind of code.

* C. Ideas *

Any other ideas?

** Feedback Needed **

I want to have your feedbacks on how we should model aggregate types during the IR translation with respect to the mapping to their machine representation.

This is not a road blocker as, at first, I will replicate SDAG solution, but I want we have a conversation on what is the best long term solution.

I will file a PR so that we remember we would need to come back to this part of the implementation of the prototype when we agreed on what the solution should be and look at productizing the selector.

** Example **

Consider the following LLVM IR input:
%struct.bar = type { i16, i16 }

define void @foo(i16 signext %a, i16 signext %b, %struct.bar* nocapture %addr) #0 {
entry:
  %tmp = insertvalue %struct.bar undef, i16 %a, 0
  %tmp2 = insertvalue %struct.bar %tmp, i16 %b, 1
  store %struct.bar %tmp2, %struct.bar* %addr, align 4
  ret void
}

Which is roughly:
struct bar {
  short a;
  short b;
};

void foo(short a, short b, struct bar *addr) {
  struct bar tmp;
  tmp.a = a;
  tmp.b = b;
  *addr = tmp;
}

* Solution A: Replicate SDAG *

Note: (#) is the size of the virtual register.

- Translation:

arg1(32) = copy R0
arg2(32) = copy R1
addr(32) = copy R2
a(16) = truncate arg1(32)
b(16) = truncate arg2(32)
tmp5(16), tmp6(16) = merge_value a(16), b(16)
store tmp5(16), addr(32), 0
store tmp6(16), addr(32), 4

- Legalization:
arg1(32) = copy R0
arg2(32) = copy R1
addr(32) = copy R2
tmp5(16) = extract_subreg a(32), 0
tmp6(16) = extract_subreg b(32), 0
store tmp5(16), addr(32), 0
store tmp6(16), addr(32), 4

* Solution B: One to one mapping Value to Vreg *

- Translation:

arg1(32) = copy R0
arg2(32) = copy R1
addr(32) = copy R2
a(16) = truncate arg1(32)
b(16) = truncate arg2(32)
tmp(32) = concat a(16), b(16)
store tmp(32), addr

- Legalization:
arg1(32) = copy R0
arg2(32) = copy R1
addr(32) = copy R2
a(32) = and arg1(32), 0xFFFF
b(32) = and arg2(32), 0xFFFF
tmp(32) = concat a(32/16), b(32/16) ; <— although the input registers are 32bits we are really only interested in the 16bit low part of both a and b.
store tmp(32), addr ; <— legal

The problem now is how do we legalize that thing with generic code?
I.e., we’ll end up with either load/store sequences (maybe storing i16 is not even legal) or SHIFT and OR and that may be painful to optimize later.

Thanks,
-Quentin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160114/7264119c/attachment.html>