<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi,<div class=""><br class=""></div><div class="">As part of the effort to bring up GlobalISel, I would like your feedbacks on the best way to map LLVM IR values into MachineInstr values (virtual registers), in particular when aggregate types get involved.</div><div class=""><br class=""></div><div class="">I am looking for a long term solution.</div><div class="">Short term is to replicate SDAG solution.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">** Context **</div><div class=""><br class=""></div><div class="">The first step of GlobalISel is to translate the LLVM IR into MachineInstr representation. During this process we need to be able to tell where are the different LLVM IR values with respect to their machine representation, such that we can feed the right element to the related machine instructions.</div><div class=""><br class=""></div><div class="">Right now, in SelectionDAG, we have a mapping from one Value* to one SDValue. But this is not the whole story!</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">** Problem With Aggregate Types **</div><div class=""><br class=""></div><div class="">* Facts *</div><div class=""><br class=""></div><div class="">- Aggregate types cannot be expressed with a single EVT (Extended Value Type)</div><div class="">- SDValues are typed with a single EVT.</div><div class=""><br class=""></div><div class="">Therefore, values with aggregate types need to be break apart to be represented by a SDValue.</div><div class=""><br class=""></div><div class="">* Implications *</div><div class=""><br class=""></div><div class="">- Instructions that may accept aggregate type, e.g., load, store, and select, must be split to handle the different component that compose a Value. (Look for the use of ComputeValueVTs in the SelecionDAGBuilder and the related loops this implies).</div><div class="">- Use of aggregate type generates a bunch of MERGE_VALUE nodes, which sole purpose is to glue all the components that make the aggregate type together.</div><div class=""><br class=""></div><div class="">Therefore, in practice, values with aggregate types are mapped to several SDValue hidden in MERGE_VALUE nodes.</div><div class=""><br class=""></div><div class="">* Summary *</div><div class=""><br class=""></div><div class="">Values with aggregate type map to a list of SDValue and, consequently, the handling of the instructions is uneven between the ones that need to support aggregate type and the ones that do not.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">** Possible Solutions **</div><div class=""><br class=""></div><div class="">* A. Replicate SDAG Solution *</div><div class=""><br class=""></div><div class="">Not much to add from the title. One can see that in the sketch of patch for the IRTranslator (from the original GlobalISel email <a href="http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html" class="">http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html</a>), we have a map from a Value to a list of virtual registers.</div><div class=""><br class=""></div><div class="">Pros:</div><div class="">- We know it worked for SDAG.</div><div class="">- We know backends would generate reasonable code from that.</div><div class=""><br class=""></div><div class="">Cons:</div><div class="">- We need to pay (compile time, memory consumption) for the extra logic for every non-aggregate type.</div><div class="">- Translation is not homogeneous between, say, add and store.</div><div class="">- Premature splitting that need to be fixed later(?), e.g., merge consecutive stores.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">* B. Map Aggregate Type to One (Big) Virtual Register *</div><div class=""><br class=""></div><div class="">Have one big virtual register for the whole aggregate type. Only the size of the register matters for the instructions that handle aggregate type, so we can still use EVT for virtual registers.</div><div class=""><br class=""></div><div class="">Pros:</div><div class="">- Translation is fast.</div><div class="">- Code is easy to understand.</div><div class="">- No premature splitting for instance for load instructions.</div><div class=""><br class=""></div><div class="">Cons:</div><div class="">- Useless bits (padding) are carried around and may be hard to eliminate.</div><div class="">- Expose weird types to legalization more often.</div><div class="">- Backends may not cope very well with that kind of code.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">* C. Ideas *</div><div class=""><br class=""></div><div class="">Any other ideas?</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">** Feedback Needed **</div><div class=""><br class=""></div><div class="">I want to have your feedbacks on how we should model aggregate types during the IR translation with respect to the mapping to their machine representation.</div><div class=""><br class=""></div><div class="">This is not a road blocker as, at first, I will replicate SDAG solution, but I want we have a conversation on what is the best long term solution.</div><div class=""><br class=""></div><div class="">I will file a PR so that we remember we would need to come back to this part of the implementation of the prototype when we agreed on what the solution should be and look at productizing the selector.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">** Example **</div><div class=""><br class=""></div><div class="">Consider the following LLVM IR input:</div><div class=""><div class=""><font face="Menlo" style="font-size: 11px;" class="">%struct.bar = type { i16, i16 }</font></div><div class=""><font face="Menlo" style="font-size: 11px;" class=""><br class=""></font></div><div class=""><font face="Menlo" style="font-size: 11px;" class="">define void @foo(i16 signext %a, i16 signext %b, %struct.bar* nocapture %addr) #0 {</font></div><div class=""><font face="Menlo" style="font-size: 11px;" class="">entry:</font></div><div class=""><font face="Menlo" style="font-size: 11px;" class=""> %tmp = insertvalue %struct.bar undef, i16 %a, 0</font></div><div class=""><font face="Menlo" style="font-size: 11px;" class=""> %tmp2 = insertvalue %struct.bar %tmp, i16 %b, 1</font></div><div class=""><font face="Menlo" style="font-size: 11px;" class=""> store %struct.bar %tmp2, %struct.bar* %addr, align 4</font></div><div class=""><font face="Menlo" style="font-size: 11px;" class=""> ret void</font></div><div class=""><font face="Menlo" style="font-size: 11px;" class="">}</font></div></div><div class=""><br class=""></div><div class="">Which is roughly:</div><div class=""><div class="">struct bar {</div><div class=""> short a;</div><div class=""> short b;</div><div class="">};</div><div class=""><br class=""></div><div class="">void foo(short a, short b, struct bar *addr) {</div><div class=""> struct bar tmp;</div><div class=""> tmp.a = a;</div><div class=""> tmp.b = b;</div><div class=""> *addr = tmp;</div><div class="">}</div></div><div class=""><br class=""></div><div class="">* Solution A: Replicate SDAG *</div><div class=""><br class=""></div><div class="">Note: (#) is the size of the virtual register.</div><div class=""><br class=""></div><div class="">- Translation:</div><div class=""><br class=""></div><div class=""><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><div style="font-family: Helvetica; font-size: 12px;" class="">arg1(32) = copy R0</div><div style="font-family: Helvetica; font-size: 12px;" class="">arg2(32) = copy R1</div><div style="font-family: Helvetica; font-size: 12px;" class="">addr(32) = copy R2</div><div style="font-family: Helvetica; font-size: 12px;" class="">a(16) = truncate arg1(32)</div><div style="font-family: Helvetica; font-size: 12px;" class="">b(16) = truncate arg2(32)</div><div style="font-family: Helvetica; font-size: 12px;" class="">tmp5(16), tmp6(16) = <b class="">merge_value</b> a(16), b(16)</div><div style="font-family: Helvetica; font-size: 12px;" class=""><b class="">store</b> tmp5(16), addr(32), 0</div><div style="font-family: Helvetica; font-size: 12px;" class=""><b class="">store</b> tmp6(16), addr(32), 4</div><div style="font-family: Helvetica; font-size: 12px;" class=""><br class=""></div><div style="font-family: Helvetica; font-size: 12px;" class="">- Legalization:</div><div style="font-family: Helvetica; font-size: 12px;" class=""><div class=""><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><div style="font-family: Helvetica; font-size: 12px;" class="">arg1(32) = copy R0</div></div></div><div class="">arg2(32) = copy R1</div><div class="">addr(32) = copy R2</div><div class="">tmp5(16) = extract_subreg a(32), 0</div><div class=""><div class="">tmp6(16) = extract_subreg b(32), 0</div><div class="">store tmp5(16), addr(32), 0</div><div class="">store tmp6(16), addr(32), 4</div></div></div><div style="font-family: Helvetica; font-size: 12px;" class=""><br class=""></div><div style="font-family: Helvetica; font-size: 12px;" class=""><br class=""></div><div style="font-family: Helvetica; font-size: 12px;" class="">* Solution B: One to one mapping Value to Vreg *</div><div style="font-family: Helvetica; font-size: 12px;" class=""><br class=""></div><div style="font-family: Helvetica; font-size: 12px;" class="">- Translation:</div><div style="font-family: Helvetica; font-size: 12px;" class=""><br class=""></div><div style="font-family: Helvetica; font-size: 12px;" class="">arg1(32) = copy R0</div></div></div><div class="">arg2(32) = copy R1</div><div class="">addr(32) = copy R2</div><div class="">a(16) = truncate arg1(32)</div><div class="">b(16) = truncate arg2(32)</div><div class="">tmp(32) = <b class="">concat</b> a(16), b(16)</div><div class=""><b class="">store</b> tmp(32), addr</div><div class=""><br class=""></div><div class="">- Legalization:</div><div class=""><div class=""><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><div style="font-family: Helvetica; font-size: 12px;" class="">arg1(32) = copy R0</div></div></div><div class="">arg2(32) = copy R1</div><div class="">addr(32) = copy R2</div><div class="">a(32) = and arg1(32), 0xFFFF</div><div class="">b(32) = and arg2(32), 0xFFFF</div><div class="">tmp(32) = concat a(32/16), b(32/16) ; <— although the input registers are 32bits we are really only interested in the 16bit low part of both a and b.</div><div class="">store tmp(32), addr ; <— legal</div></div><div class=""><br class=""></div><div class="">The problem now is how do we legalize that thing with generic code?</div><div class="">I.e., we’ll end up with either load/store sequences (maybe storing i16 is not even legal) or SHIFT and OR and that may be painful to optimize later.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">-Quentin</div></body></html>