[llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

Fri Jan 15 11:43:58 PST 2016

> On Jan 15, 2016, at 11:41 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> 
> From: "Quentin Colombet" <qcolombet at apple.com <mailto:qcolombet at apple.com>>
> To: "Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
> Sent: Friday, January 15, 2016 1:37:50 PM
> Subject: Re: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type
> 
> Hi Hal,
> 
> Thanks for your quick reply.
> 
> On Jan 14, 2016, at 4:53 PM, Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote:
> 
> 
> From: "Quentin Colombet via llvm-dev" <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
> To: "llvm-dev" <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
> Sent: Thursday, January 14, 2016 6:41:57 PM
> Subject: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to	MachineInstr translation for aggregate type
> 
> 
> Hi,
> 
> 
> As part of the effort to bring up GlobalISel, I would like your
> feedbacks on the best way to map LLVM IR values into MachineInstr
> values (virtual registers), in particular when aggregate types get
> involved.
> 
> 
> I am looking for a long term solution.
> Short term is to replicate SDAG solution.
> 
> 
> 
> 
> ** Context **
> 
> 
> The first step of GlobalISel is to translate the LLVM IR into
> MachineInstr representation. During this process we need to be able
> to tell where are the different LLVM IR values with respect to their
> machine representation, such that we can feed the right element to
> the related machine instructions.
> 
> 
> Right now, in SelectionDAG, we have a mapping from one Value* to one
> SDValue. But this is not the whole story!
> 
> 
> 
> 
> ** Problem With Aggregate Types **
> 
> 
> * Facts *
> 
> 
> - Aggregate types cannot be expressed with a single EVT (Extended
> Value Type)
> - SDValues are typed with a single EVT.
> 
> 
> Therefore, values with aggregate types need to be break apart to be
> represented by a SDValue.
> 
> 
> * Implications *
> 
> 
> - Instructions that may accept aggregate type, e.g., load, store, and
> select, must be split to handle the different component that compose
> a Value. (Look for the use of ComputeValueVTs in the
> SelecionDAGBuilder and the related loops this implies).
> - Use of aggregate type generates a bunch of MERGE_VALUE nodes, which
> sole purpose is to glue all the components that make the aggregate
> type together.
> 
> 
> Therefore, in practice, values with aggregate types are mapped to
> several SDValue hidden in MERGE_VALUE nodes.
> 
> 
> * Summary *
> 
> 
> Values with aggregate type map to a list of SDValue and,
> consequently, the handling of the instructions is uneven between the
> ones that need to support aggregate type and the ones that do not.
> 
> 
> 
> 
> ** Possible Solutions **
> 
> 
> * A. Replicate SDAG Solution *
> 
> 
> Not much to add from the title. One can see that in the sketch of
> patch for the IRTranslator (from the original GlobalISel email
> http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html <http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html>
> ), we have a map from a Value to a list of virtual registers.
> 
> 
> Pros:
> - We know it worked for SDAG.
> - We know backends would generate reasonable code from that.
> 
> 
> Cons:
> - We need to pay (compile time, memory consumption) for the extra
> logic for every non-aggregate type.
> - Translation is not homogeneous between, say, add and store.
> - Premature splitting that need to be fixed later(?), e.g., merge
> consecutive stores.
> 
> 
> 
> 
> * B. Map Aggregate Type to One (Big) Virtual Register *
> 
> 
> Have one big virtual register for the whole aggregate type. Only the
> size of the register matters for the instructions that handle
> aggregate type, so we can still use EVT for virtual registers.
> 
> 
> Pros:
> - Translation is fast.
> - Code is easy to understand.
> - No premature splitting for instance for load instructions.
> 
> 
> Cons:
> - Useless bits (padding) are carried around and may be hard to
> eliminate.
> 
> We could associate some metadata with the register somehow to represent the padding bits (we already do something like this for memcpy).
> 
> The problem I have with metadata is that they can be lost.
> 
> Sorry, I meant metadata in the more-general sense (not actual IR metadata can can be dropped). Something actually associated with the vreg itself.
> 
> That being said, having a mechanism to tell the optimizer what are in a register would be useful. Right now, to get this kind of information we use the computeKnownBits analysis and I wonder how could we generalize that so that the information is cheaper to get and pass along the way.
> 
> Agreed. Reducing the computeKnownBits cost would be a good thing. We could use caching (or an actual forward analysis), and that might help.

Agreed.

Q.

> 
>  -Hal
> 
> 
> 
> - Expose weird types to legalization more often.
> - Backends may not cope very well with that kind of code.
> 
> 
> Given that every backend is going to need major work to truly make this transition, I'd advise not to worry too much about your last item.
> 
> Seems like this second option is better than replicating the SDAG solution, so I recommend we pursue that.
> 
> My guts tell me that indeed the second option is better, but the lack of actual data makes me nervous.
> Anyway, we don’t have to support aggregates to lay the basic pipeline. And when the basic pipeline is set, we can make actual measurements!
> 
> I’ve filed https://llvm.org/bugs/show_bug.cgi?id=26161 <https://llvm.org/bugs/show_bug.cgi?id=26161> to keep track of that issue.
> 
> Thanks,
> -Quentin
> 
> 
> 
> -Hal
> 
> 
> * C. Ideas *
> 
> 
> Any other ideas?
> 
> 
> 
> 
> ** Feedback Needed **
> 
> 
> I want to have your feedbacks on how we should model aggregate types
> during the IR translation with respect to the mapping to their
> machine representation.
> 
> 
> This is not a road blocker as, at first, I will replicate SDAG
> solution, but I want we have a conversation on what is the best long
> term solution.
> 
> 
> I will file a PR so that we remember we would need to come back to
> this part of the implementation of the prototype when we agreed on
> what the solution should be and look at productizing the selector.
> 
> 
> 
> 
> ** Example **
> 
> 
> Consider the following LLVM IR input:
> 
> %struct.bar = type { i16, i16 }
> 
> 
> define void @foo(i16 signext %a, i16 signext %b, %struct.bar*
> nocapture %addr) #0 {
> entry:
> %tmp = insertvalue %struct.bar undef, i16 %a, 0
> %tmp2 = insertvalue %struct.bar %tmp, i16 %b, 1
> store %struct.bar %tmp2, %struct.bar* %addr, align 4
> ret void
> }
> 
> 
> Which is roughly:
> 
> struct bar {
> short a;
> short b;
> };
> 
> 
> void foo(short a, short b, struct bar *addr) {
> struct bar tmp;
> tmp.a = a;
> tmp.b = b;
> *addr = tmp;
> }
> 
> 
> * Solution A: Replicate SDAG *
> 
> 
> Note: (#) is the size of the virtual register.
> 
> 
> - Translation:
> 
> 
> 
> 
> arg1(32) = copy R0
> arg2(32) = copy R1
> addr(32) = copy R2
> a(16) = truncate arg1(32)
> b(16) = truncate arg2(32)
> tmp5(16), tmp6(16) = merge_value a(16), b(16)
> store tmp5(16), addr(32), 0
> store tmp6(16), addr(32), 4
> 
> 
> - Legalization:
> 
> 
> 
> arg1(32) = copy R0
> arg2(32) = copy R1
> addr(32) = copy R2
> tmp5(16) = extract_subreg a(32), 0
> 
> tmp6(16) = extract_subreg b(32), 0
> store tmp5(16), addr(32), 0
> store tmp6(16), addr(32), 4
> 
> 
> 
> 
> * Solution B: One to one mapping Value to Vreg *
> 
> 
> - Translation:
> 
> 
> arg1(32) = copy R0
> arg2(32) = copy R1
> addr(32) = copy R2
> a(16) = truncate arg1(32)
> b(16) = truncate arg2(32)
> tmp(32) = concat a(16), b(16)
> store tmp(32), addr
> 
> 
> - Legalization:
> 
> 
> 
> arg1(32) = copy R0
> arg2(32) = copy R1
> addr(32) = copy R2
> a(32) = and arg1(32), 0xFFFF
> b(32) = and arg2(32), 0xFFFF
> tmp(32) = concat a(32/16), b(32/16) ; <— although the input registers
> are 32bits we are really only interested in the 16bit low part of
> both a and b.
> store tmp(32), addr ; <— legal
> 
> 
> The problem now is how do we legalize that thing with generic code?
> I.e., we’ll end up with either load/store sequences (maybe storing
> i16 is not even legal) or SHIFT and OR and that may be painful to
> optimize later.
> 
> 
> Thanks,
> -Quentin
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> 
> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/a8bba1aa/attachment.html>