[llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

Thu Jan 14 16:53:18 PST 2016

----- Original Message -----
> From: "Quentin Colombet via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Thursday, January 14, 2016 6:41:57 PM
> Subject: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to	MachineInstr translation for aggregate type
> 
> 
> Hi,
> 
> 
> As part of the effort to bring up GlobalISel, I would like your
> feedbacks on the best way to map LLVM IR values into MachineInstr
> values (virtual registers), in particular when aggregate types get
> involved.
> 
> 
> I am looking for a long term solution.
> Short term is to replicate SDAG solution.
> 
> 
> 
> 
> ** Context **
> 
> 
> The first step of GlobalISel is to translate the LLVM IR into
> MachineInstr representation. During this process we need to be able
> to tell where are the different LLVM IR values with respect to their
> machine representation, such that we can feed the right element to
> the related machine instructions.
> 
> 
> Right now, in SelectionDAG, we have a mapping from one Value* to one
> SDValue. But this is not the whole story!
> 
> 
> 
> 
> ** Problem With Aggregate Types **
> 
> 
> * Facts *
> 
> 
> - Aggregate types cannot be expressed with a single EVT (Extended
> Value Type)
> - SDValues are typed with a single EVT.
> 
> 
> Therefore, values with aggregate types need to be break apart to be
> represented by a SDValue.
> 
> 
> * Implications *
> 
> 
> - Instructions that may accept aggregate type, e.g., load, store, and
> select, must be split to handle the different component that compose
> a Value. (Look for the use of ComputeValueVTs in the
> SelecionDAGBuilder and the related loops this implies).
> - Use of aggregate type generates a bunch of MERGE_VALUE nodes, which
> sole purpose is to glue all the components that make the aggregate
> type together.
> 
> 
> Therefore, in practice, values with aggregate types are mapped to
> several SDValue hidden in MERGE_VALUE nodes.
> 
> 
> * Summary *
> 
> 
> Values with aggregate type map to a list of SDValue and,
> consequently, the handling of the instructions is uneven between the
> ones that need to support aggregate type and the ones that do not.
> 
> 
> 
> 
> ** Possible Solutions **
> 
> 
> * A. Replicate SDAG Solution *
> 
> 
> Not much to add from the title. One can see that in the sketch of
> patch for the IRTranslator (from the original GlobalISel email
> http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html
> ), we have a map from a Value to a list of virtual registers.
> 
> 
> Pros:
> - We know it worked for SDAG.
> - We know backends would generate reasonable code from that.
> 
> 
> Cons:
> - We need to pay (compile time, memory consumption) for the extra
> logic for every non-aggregate type.
> - Translation is not homogeneous between, say, add and store.
> - Premature splitting that need to be fixed later(?), e.g., merge
> consecutive stores.
> 
> 
> 
> 
> * B. Map Aggregate Type to One (Big) Virtual Register *
> 
> 
> Have one big virtual register for the whole aggregate type. Only the
> size of the register matters for the instructions that handle
> aggregate type, so we can still use EVT for virtual registers.
> 
> 
> Pros:
> - Translation is fast.
> - Code is easy to understand.
> - No premature splitting for instance for load instructions.
> 
> 
> Cons:
> - Useless bits (padding) are carried around and may be hard to
> eliminate.

We could associate some metadata with the register somehow to represent the padding bits (we already do something like this for memcpys).

> - Expose weird types to legalization more often.
> - Backends may not cope very well with that kind of code.
> 

Given that every backend is going to need major work to truly make this transition, I'd advise not to worry too much about your last item.

Seems like this second option is better than replicating the SDAG solution, so I recommend we pursue that.

 -Hal

> 
> * C. Ideas *
> 
> 
> Any other ideas?
> 
> 
> 
> 
> ** Feedback Needed **
> 
> 
> I want to have your feedbacks on how we should model aggregate types
> during the IR translation with respect to the mapping to their
> machine representation.
> 
> 
> This is not a road blocker as, at first, I will replicate SDAG
> solution, but I want we have a conversation on what is the best long
> term solution.
> 
> 
> I will file a PR so that we remember we would need to come back to
> this part of the implementation of the prototype when we agreed on
> what the solution should be and look at productizing the selector.
> 
> 
> 
> 
> ** Example **
> 
> 
> Consider the following LLVM IR input:
> 
> %struct.bar = type { i16, i16 }
> 
> 
> define void @foo(i16 signext %a, i16 signext %b, %struct.bar*
> nocapture %addr) #0 {
> entry:
> %tmp = insertvalue %struct.bar undef, i16 %a, 0
> %tmp2 = insertvalue %struct.bar %tmp, i16 %b, 1
> store %struct.bar %tmp2, %struct.bar* %addr, align 4
> ret void
> }
> 
> 
> Which is roughly:
> 
> struct bar {
> short a;
> short b;
> };
> 
> 
> void foo(short a, short b, struct bar *addr) {
> struct bar tmp;
> tmp.a = a;
> tmp.b = b;
> *addr = tmp;
> }
> 
> 
> * Solution A: Replicate SDAG *
> 
> 
> Note: (#) is the size of the virtual register.
> 
> 
> - Translation:
> 
> 
> 
> 
> arg1(32) = copy R0
> arg2(32) = copy R1
> addr(32) = copy R2
> a(16) = truncate arg1(32)
> b(16) = truncate arg2(32)
> tmp5(16), tmp6(16) = merge_value a(16), b(16)
> store tmp5(16), addr(32), 0
> store tmp6(16), addr(32), 4
> 
> 
> - Legalization:
> 
> 
> 
> arg1(32) = copy R0
> arg2(32) = copy R1
> addr(32) = copy R2
> tmp5(16) = extract_subreg a(32), 0
> 
> tmp6(16) = extract_subreg b(32), 0
> store tmp5(16), addr(32), 0
> store tmp6(16), addr(32), 4
> 
> 
> 
> 
> * Solution B: One to one mapping Value to Vreg *
> 
> 
> - Translation:
> 
> 
> arg1(32) = copy R0
> arg2(32) = copy R1
> addr(32) = copy R2
> a(16) = truncate arg1(32)
> b(16) = truncate arg2(32)
> tmp(32) = concat a(16), b(16)
> store tmp(32), addr
> 
> 
> - Legalization:
> 
> 
> 
> arg1(32) = copy R0
> arg2(32) = copy R1
> addr(32) = copy R2
> a(32) = and arg1(32), 0xFFFF
> b(32) = and arg2(32), 0xFFFF
> tmp(32) = concat a(32/16), b(32/16) ; <— although the input registers
> are 32bits we are really only interested in the 16bit low part of
> both a and b.
> store tmp(32), addr ; <— legal
> 
> 
> The problem now is how do we legalize that thing with generic code?
> I.e., we’ll end up with either load/store sequences (maybe storing
> i16 is not even legal) or SHIFT and OR and that may be painful to
> optimize later.
> 
> 
> Thanks,
> -Quentin
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory