[llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

Fri Jan 15 11:41:14 PST 2016

----- Original Message -----

> From: "Quentin Colombet" <qcolombet at apple.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Friday, January 15, 2016 1:37:50 PM
> Subject: Re: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to
> MachineInstr translation for aggregate type

> Hi Hal,

> Thanks for your quick reply.

> > On Jan 14, 2016, at 4:53 PM, Hal Finkel < hfinkel at anl.gov > wrote:
> 

> > ----- Original Message -----
> 

> > > From: "Quentin Colombet via llvm-dev" < llvm-dev at lists.llvm.org >
> > 
> 
> > > To: "llvm-dev" < llvm-dev at lists.llvm.org >
> > 
> 
> > > Sent: Thursday, January 14, 2016 6:41:57 PM
> > 
> 
> > > Subject: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to
> > > MachineInstr translation for aggregate type
> > 
> 

> > > Hi,
> > 
> 

> > > As part of the effort to bring up GlobalISel, I would like your
> > 
> 
> > > feedbacks on the best way to map LLVM IR values into MachineInstr
> > 
> 
> > > values (virtual registers), in particular when aggregate types
> > > get
> > 
> 
> > > involved.
> > 
> 

> > > I am looking for a long term solution.
> > 
> 
> > > Short term is to replicate SDAG solution.
> > 
> 

> > > ** Context **
> > 
> 

> > > The first step of GlobalISel is to translate the LLVM IR into
> > 
> 
> > > MachineInstr representation. During this process we need to be
> > > able
> > 
> 
> > > to tell where are the different LLVM IR values with respect to
> > > their
> > 
> 
> > > machine representation, such that we can feed the right element
> > > to
> > 
> 
> > > the related machine instructions.
> > 
> 

> > > Right now, in SelectionDAG, we have a mapping from one Value* to
> > > one
> > 
> 
> > > SDValue. But this is not the whole story!
> > 
> 

> > > ** Problem With Aggregate Types **
> > 
> 

> > > * Facts *
> > 
> 

> > > - Aggregate types cannot be expressed with a single EVT (Extended
> > 
> 
> > > Value Type)
> > 
> 
> > > - SDValues are typed with a single EVT.
> > 
> 

> > > Therefore, values with aggregate types need to be break apart to
> > > be
> > 
> 
> > > represented by a SDValue.
> > 
> 

> > > * Implications *
> > 
> 

> > > - Instructions that may accept aggregate type, e.g., load, store,
> > > and
> > 
> 
> > > select, must be split to handle the different component that
> > > compose
> > 
> 
> > > a Value. (Look for the use of ComputeValueVTs in the
> > 
> 
> > > SelecionDAGBuilder and the related loops this implies).
> > 
> 
> > > - Use of aggregate type generates a bunch of MERGE_VALUE nodes,
> > > which
> > 
> 
> > > sole purpose is to glue all the components that make the
> > > aggregate
> > 
> 
> > > type together.
> > 
> 

> > > Therefore, in practice, values with aggregate types are mapped to
> > 
> 
> > > several SDValue hidden in MERGE_VALUE nodes.
> > 
> 

> > > * Summary *
> > 
> 

> > > Values with aggregate type map to a list of SDValue and,
> > 
> 
> > > consequently, the handling of the instructions is uneven between
> > > the
> > 
> 
> > > ones that need to support aggregate type and the ones that do
> > > not.
> > 
> 

> > > ** Possible Solutions **
> > 
> 

> > > * A. Replicate SDAG Solution *
> > 
> 

> > > Not much to add from the title. One can see that in the sketch of
> > 
> 
> > > patch for the IRTranslator (from the original GlobalISel email
> > 
> 
> > > http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html
> > 
> 
> > > ), we have a map from a Value to a list of virtual registers.
> > 
> 

> > > Pros:
> > 
> 
> > > - We know it worked for SDAG.
> > 
> 
> > > - We know backends would generate reasonable code from that.
> > 
> 

> > > Cons:
> > 
> 
> > > - We need to pay (compile time, memory consumption) for the extra
> > 
> 
> > > logic for every non-aggregate type.
> > 
> 
> > > - Translation is not homogeneous between, say, add and store.
> > 
> 
> > > - Premature splitting that need to be fixed later(?), e.g., merge
> > 
> 
> > > consecutive stores.
> > 
> 

> > > * B. Map Aggregate Type to One (Big) Virtual Register *
> > 
> 

> > > Have one big virtual register for the whole aggregate type. Only
> > > the
> > 
> 
> > > size of the register matters for the instructions that handle
> > 
> 
> > > aggregate type, so we can still use EVT for virtual registers.
> > 
> 

> > > Pros:
> > 
> 
> > > - Translation is fast.
> > 
> 
> > > - Code is easy to understand.
> > 
> 
> > > - No premature splitting for instance for load instructions.
> > 
> 

> > > Cons:
> > 
> 
> > > - Useless bits (padding) are carried around and may be hard to
> > 
> 
> > > eliminate.
> > 
> 

> > We could associate some metadata with the register somehow to
> > represent the padding bits (we already do something like this for
> > memcpy).
> 

> The problem I have with metadata is that they can be lost.
Sorry, I meant metadata in the more-general sense (not actual IR metadata can can be dropped). Something actually associated with the vreg itself. 

> That being said, having a mechanism to tell the optimizer what are in
> a register would be useful. Right now, to get this kind of
> information we use the computeKnownBits analysis and I wonder how
> could we generalize that so that the information is cheaper to get
> and pass along the way.
Agreed. Reducing the computeKnownBits cost would be a good thing. We could use caching (or an actual forward analysis), and that might help. 

-Hal 

> > > - Expose weird types to legalization more often.
> > 
> 
> > > - Backends may not cope very well with that kind of code.
> > 
> 

> > Given that every backend is going to need major work to truly make
> > this transition, I'd advise not to worry too much about your last
> > item.
> 

> > Seems like this second option is better than replicating the SDAG
> > solution, so I recommend we pursue that.
> 

> My guts tell me that indeed the second option is better, but the lack
> of actual data makes me nervous.
> Anyway, we don’t have to support aggregates to lay the basic
> pipeline. And when the basic pipeline is set, we can make actual
> measurements!

> I’ve filed https://llvm.org/bugs/show_bug.cgi?id=26161 to keep track
> of that issue.

> Thanks,
> -Quentin

> > -Hal
> 

> > > * C. Ideas *
> > 
> 

> > > Any other ideas?
> > 
> 

> > > ** Feedback Needed **
> > 
> 

> > > I want to have your feedbacks on how we should model aggregate
> > > types
> > 
> 
> > > during the IR translation with respect to the mapping to their
> > 
> 
> > > machine representation.
> > 
> 

> > > This is not a road blocker as, at first, I will replicate SDAG
> > 
> 
> > > solution, but I want we have a conversation on what is the best
> > > long
> > 
> 
> > > term solution.
> > 
> 

> > > I will file a PR so that we remember we would need to come back
> > > to
> > 
> 
> > > this part of the implementation of the prototype when we agreed
> > > on
> > 
> 
> > > what the solution should be and look at productizing the
> > > selector.
> > 
> 

> > > ** Example **
> > 
> 

> > > Consider the following LLVM IR input:
> > 
> 

> > > %struct.bar = type { i16, i16 }
> > 
> 

> > > define void @foo(i16 signext %a, i16 signext %b, %struct.bar*
> > 
> 
> > > nocapture %addr) #0 {
> > 
> 
> > > entry:
> > 
> 
> > > %tmp = insertvalue %struct.bar undef, i16 %a, 0
> > 
> 
> > > %tmp2 = insertvalue %struct.bar %tmp, i16 %b, 1
> > 
> 
> > > store %struct.bar %tmp2, %struct.bar* %addr, align 4
> > 
> 
> > > ret void
> > 
> 
> > > }
> > 
> 

> > > Which is roughly:
> > 
> 

> > > struct bar {
> > 
> 
> > > short a;
> > 
> 
> > > short b;
> > 
> 
> > > };
> > 
> 

> > > void foo(short a, short b, struct bar *addr) {
> > 
> 
> > > struct bar tmp;
> > 
> 
> > > tmp.a = a;
> > 
> 
> > > tmp.b = b;
> > 
> 
> > > *addr = tmp;
> > 
> 
> > > }
> > 
> 

> > > * Solution A: Replicate SDAG *
> > 
> 

> > > Note: (#) is the size of the virtual register.
> > 
> 

> > > - Translation:
> > 
> 

> > > arg1(32) = copy R0
> > 
> 
> > > arg2(32) = copy R1
> > 
> 
> > > addr(32) = copy R2
> > 
> 
> > > a(16) = truncate arg1(32)
> > 
> 
> > > b(16) = truncate arg2(32)
> > 
> 
> > > tmp5(16), tmp6(16) = merge_value a(16), b(16)
> > 
> 
> > > store tmp5(16), addr(32), 0
> > 
> 
> > > store tmp6(16), addr(32), 4
> > 
> 

> > > - Legalization:
> > 
> 

> > > arg1(32) = copy R0
> > 
> 
> > > arg2(32) = copy R1
> > 
> 
> > > addr(32) = copy R2
> > 
> 
> > > tmp5(16) = extract_subreg a(32), 0
> > 
> 

> > > tmp6(16) = extract_subreg b(32), 0
> > 
> 
> > > store tmp5(16), addr(32), 0
> > 
> 
> > > store tmp6(16), addr(32), 4
> > 
> 

> > > * Solution B: One to one mapping Value to Vreg *
> > 
> 

> > > - Translation:
> > 
> 

> > > arg1(32) = copy R0
> > 
> 
> > > arg2(32) = copy R1
> > 
> 
> > > addr(32) = copy R2
> > 
> 
> > > a(16) = truncate arg1(32)
> > 
> 
> > > b(16) = truncate arg2(32)
> > 
> 
> > > tmp(32) = concat a(16), b(16)
> > 
> 
> > > store tmp(32), addr
> > 
> 

> > > - Legalization:
> > 
> 

> > > arg1(32) = copy R0
> > 
> 
> > > arg2(32) = copy R1
> > 
> 
> > > addr(32) = copy R2
> > 
> 
> > > a(32) = and arg1(32), 0xFFFF
> > 
> 
> > > b(32) = and arg2(32), 0xFFFF
> > 
> 
> > > tmp(32) = concat a(32/16), b(32/16) ; <— although the input
> > > registers
> > 
> 
> > > are 32bits we are really only interested in the 16bit low part of
> > 
> 
> > > both a and b.
> > 
> 
> > > store tmp(32), addr ; <— legal
> > 
> 

> > > The problem now is how do we legalize that thing with generic
> > > code?
> > 
> 
> > > I.e., we’ll end up with either load/store sequences (maybe
> > > storing
> > 
> 
> > > i16 is not even legal) or SHIFT and OR and that may be painful to
> > 
> 
> > > optimize later.
> > 
> 

> > > Thanks,
> > 
> 
> > > -Quentin
> > 
> 
> > > _______________________________________________
> > 
> 
> > > LLVM Developers mailing list
> > 
> 
> > > llvm-dev at lists.llvm.org
> > 
> 
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> 

> > --
> 
> > Hal Finkel
> 
> > Assistant Computational Scientist
> 
> > Leadership Computing Facility
> 
> > Argonne National Laboratory
> 
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160115/147b4080/attachment-0001.html>