<div dir="ltr"><div class="uyb8Gf"><div class="F3hlO"><div text="#000000" bgcolor="#FFFFFF">Hi Philip,</div><div text="#000000" bgcolor="#FFFFFF"><br></div><div text="#000000" bgcolor="#FFFFFF">> I think after reading your link I'm actually more confused. This might just be a wording problem, but let me ask a couple of clarifying questions.<br><br>Sorry about that :( Every time I explain this I get slightly more embarassed because it is indeed weird and ugly (but was certainly the least ugly solution).<br><br>> 1) After compiling the code sequence below (from that page), does the in memory bit pattern differ? The page seemed to contradict itself. <br><pre>> %0 = load <4 x i32> %x
> %1 = bitcast <4 x i32> %0 to <2 x i64>
> store <2 x i64> %1, <2 x i64>* %y
</pre>Yes. The memory pattern differs. This is the first diagram on the right at: <a href="http://llvm.org/docs/BigEndianNEON.html#bitconverts">http://llvm.org/docs/BigEndianNEON.html#bitconverts</a>)</div><div text="#000000" bgcolor="#FFFFFF"><br></div><div text="#000000" bgcolor="#FFFFFF">> If so, does this mean that performing dead-store-elimination is illegal for ARM?<br><br>Yes, for vector types whose corresponding load differs from the store type. </div><div text="#000000" bgcolor="#FFFFFF"><br></div><div text="#000000" bgcolor="#FFFFFF">%0 = load <4 x i32> %x</div><div text="#000000" bgcolor="#FFFFFF">store <4 x i32> %0, <4 x i32>* %x</div><div text="#000000" bgcolor="#FFFFFF"><br></div><div text="#000000" bgcolor="#FFFFFF">is still fine. I should go and check that DSE doesn't do bad things for big-endian NEON actually...</div><div text="#000000" bgcolor="#FFFFFF"><br>> 3) Are loads and stores ever allowed to fault based on the in memory representation? <br><br>No (thank goodness!)</div><div text="#000000" bgcolor="#FFFFFF"><br>> 4) What happens if we have a load of <2xi64> following the store above and we do DSE the store before forwarding it's value?</div></div></div><div class="uyb8Gf"><br></div>The store can't be DSE'd as above. But value forwarding is fine. It's fine because the IR is strongly typed - there's no way to remove that bitcast and still have the IR correctly formed. However folding bitcasts into memory operands is explicitly illegal:<br><br><br>%1 = bitcast <4 x i32> %x to <2 x i64><br>store <2 x i64> %x to <2 x i64>* %y<br> =><br>store <4 x i32> %x to (bitcast <2 x i64>* %x to < 4 x i32>*) ; ILLEGAL!<div><br></div><div>There's a hook somewhere in CGP that disables an optimization that tries to do this.</div><div><br></div><div>So in IR, because it's strongly typed, there's not really many special cases or things to worry about. But in SDAG things get more difficult. SDAG is weakly typed and all bitconverts will just get blasted into oblivion, so while SDAG can merge bitconverts (bitconvert (bitconvert %x)) -> (bitconvert %x), it mustn't remove them completely.</div><div><br></div><div>I hope I've explained that OK. CCing Tim who can hopefully pick more holes in the explanation.</div><div><br></div><div>Also, could you please point me to where the documentation seems contradictory? then I'll fix it. I wrote it for exactly this scenario!</div><div><br></div><div>Cheers,</div><div><br></div><div>James</div><br><div class="gmail_quote"><div dir="ltr">On Wed, 13 Jan 2016 at 00:42 Quentin Colombet <<a href="mailto:qcolombet@apple.com">qcolombet@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>Hi James,</div><div><br></div>I am also confused!<div><br></div><div><div></div></div></div><div style="word-wrap:break-word"><div><div><blockquote type="cite"><div>On Jan 12, 2016, at 4:11 PM, Philip Reames <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>> wrote:</div><br><div>
<div text="#000000" bgcolor="#FFFFFF">
I think after reading your link I'm actually more confused. This
might just be a wording problem, but let me ask a couple of
clarifying questions.<br>
<br>
1) After compiling the code sequence below (from that page), does
the in memory bit pattern differ? The page seemed to contradict
itself. <br></div></div></blockquote><div><br></div></div></div></div><div style="word-wrap:break-word"><div><div>+1</div><div><br></div><div>Thanks,</div><div>Q.</div></div></div><div style="word-wrap:break-word"><div><div><br><blockquote type="cite"><div><div text="#000000" bgcolor="#FFFFFF">
<pre>%0 = load <4 x i32> %x
%1 = bitcast <4 x i32> %0 to <2 x i64>
store <2 x i64> %1, <2 x i64>* %y
</pre>
2) If so, does this mean that performing dead-store-elimination is
illegal for ARM?<br>
<br>
3) Are loads and stores ever allowed to fault based on the in memory
representation? <br>
<br>
4) What happens if we have a load of <2xi64> following the
store above and we do DSE the store before forwarding it's value?<br>
<br>
Philip<br>
<br>
<br>
<div>On 01/12/2016 05:55 AM, James Molloy
via llvm-dev wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi,
<div><br>
</div>
<div>
<div>
<div>
<div link="blue" vlink="purple" lang="EN-GB"><p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif">>
I found this thinking quite difficult to explain.
Does it make sense?</span></p>
<div><span style="font-size:11pt;font-family:Calibri,sans-serif">It
might help to link to the documentation on why
bitcasts are weird on big-endian NEON: </span><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px"><a href="http://llvm.org/docs/BigEndianNEON.html#bitconverts" target="_blank"></a><a href="http://llvm.org/docs/BigEndianNEON.html#bitconverts" target="_blank">http://llvm.org/docs/BigEndianNEON.html#bitconverts</a></span></font></div>
<div><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px"><br>
</span></font></div>
<div><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px">Cheers,</span></font></div>
<div><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px"><br>
</span></font></div>
<div><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px">James</span></font></div>
</div>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr">On Tue, 12 Jan 2016 at 13:23 Daniel Sanders via
llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div link="blue" vlink="purple" lang="EN-GB">
<div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Hi,</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I
haven't found much time to look into the LLVM-IR-level
optimizations yet so I'm not sure how they handle
bitcasts. With that disclaimer in mind, I expect it's
fine for the LLVM-IR level optimizations to handle
them using either definition since they are equivalent
at the LLVM-IR level. My thinking is that LLVM-IR is
consistent about how virtual bits are assigned to
types and that non-zero instruction nops arise when
there is inconsistency.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">At
the LLVM-IR level, bits 0-127 of <4 x i32> map
directly onto bits 0-127 of <2 x i64> using the
identity map. It's therefore ok to interpret such
bitcasts as zero-instruction no-ops. As far as I can
tell, LLVM-IR has been defined such that the identity
map can be used for bitcasts between all same-sized
types, and also such that bitcasting between
different-sized types is invalid.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Similarly,
most targets have a single mapping of virtual bit
numbers to physical bit numbers for each size that is
applied consistently when mapping a type to memory.
For example 32-bits map like so:</span></p><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Little
Endian Targets: virtual register bits
{0..7,8..15,16..23,24..31} map to physical memory bits
{0..7,8..15,16..23,24..31}</span></p><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Big
Endian Targets: virtual register bits
{0..7,8..15,16..23,24..31} map to physical memory bits
{24..31,16..23,8..15,0..7}</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">regardless
of whether it's a float, or an i32. We therefore need
zero instructions to re-map physical memory bits for
one type onto another type.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">The
same idea holds for physical register classes. There's
a single consistent mapping from physical memory bits
to physical register bits that applies for all types
that can be stored in that class. As long as this is
the case the load/store and zero-instruction
interpretation of bitcasts are equivalent.</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">In
the case of big-endian MSA and NEON, there isn't a
single consistent mapping from physical memory bits to
physical register bits so the equivalence in the two
definitions breaks down:</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">
i128: virtual register bits {0..31, 32..63, 64..95,
96...127} map to physical memory bits {96..127,
64..95, 32..63, 0..31}</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">
<4 x i32>: virtual register bits {0..31, 32..63,
64..95, 96...127} map to physical memory bits {0..31,
32..63, 64..95, 96..127}</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">
<2 x i64>: virtual register bits {0..31, 32..63,
64..95, 96...127} map to physical memory bits {32..63,
0..31, 96..127, 64..95}</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">with
these inconsistent mappings we require instructions to
bitcast between the types.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I
found this thinking quite difficult to explain. Does
it make sense?</span></p>
</div>
</div>
<div link="blue" vlink="purple" lang="EN-GB">
<div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">>
</span>I am fine with treating bit casts as equivalent
store/load pairs in GISel, I just want to be sure we do
not have a semantic gap between the LLVM-IR and the
backend if we do.</p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
</div>
</div>
<div link="blue" vlink="purple" lang="EN-GB">
<div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I
think a gap would arise from not having a GISel
equivalent to ISD::BITCAST (gBITCAST?) available when
it's necessary for correctness. However, I agree that
GISel should delete bitcasts for the common case where
the store/load and zero-instruction definitions are
equivalent.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm"><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"> Quentin Colombet [mailto:<a href="mailto:qcolombet@apple.com" target="_blank"></a><a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>]
<br>
<b>Sent:</b> 11 January 2016 17:23<br>
<b>To:</b> Daniel Sanders<br>
<b>Cc:</b> Tim Northover (<a href="mailto:t.p.northover@gmail.com" target="_blank"></a><a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a>);
llvm-dev</span></p>
</div>
</div>
</div>
</div>
</div>
<div link="blue" vlink="purple" lang="EN-GB">
<div>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"><br>
<b>Subject:</b> Re: [llvm-dev] [GlobalISel] A
Proposal for global instruction selection</span></p>
</div>
</div>
</div>
</div>
</div>
<div link="blue" vlink="purple" lang="EN-GB">
<div>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt"><div> <br></div><p class="MsoNormal">Hi Daniel,</p>
<div><div> <br></div>
</div>
<div><p class="MsoNormal">Thanks for the pointers, I wasn’t
aware of the second thread you’ve mentioned.</p>
</div>
<div><div> <br></div>
</div>
<div><p class="MsoNormal">I may be wrong but I think
LLVM-IR optimizations really treat bistcasts as
no-op casts, in the sense of no instructions are
required.</p>
</div>
<div><div> <br></div>
</div>
<div><p class="MsoNormal">Is there anyone that could chime
in on that?</p>
</div>
<div><div> <br></div>
</div>
<div><p class="MsoNormal">However, it seems SelectionDAG
sticks to the load/store semantic:</p>
</div>
<div><p class="MsoNormal"><span>"BITCAST
- This operator converts between integer, vector
and FP values, as if the value was
<b>stored to memory with one type and loaded from
the same address with the other type</b> (or
equivalently for vector format conversions, etc)."</span></p>
</div>
<div><div> <br></div>
</div>
<div><p class="MsoNormal">I am fine with treating bit casts
as equivalent store/load pairs in GISel, I just want
to be sure we do not have a semantic gap between the
LLVM-IR and the backend if we do.</p>
</div>
<div><div> <br></div>
</div>
<div><p class="MsoNormal">Thanks,</p>
</div>
<div><p class="MsoNormal">-Quentin</p>
</div>
<div><div> <br></div>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div><p class="MsoNormal">On Jan 11, 2016, at 7:43
AM, Daniel Sanders <<a href="mailto:Daniel.Sanders@imgtec.com" target="_blank"></a><a href="mailto:Daniel.Sanders@imgtec.com" target="_blank">Daniel.Sanders@imgtec.com</a>>
wrote:</p>
</div><div> <br></div>
<div>
<div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Hi,</span></p>
</div>
<div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
</div>
<div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">It
was a comment by Tim that first made me
aware of it (see<span> </span><a href="http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html" target="_blank"><span style="color:purple"></span></a><a href="http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html" target="_blank">http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html</a></span><span> </span>but
I think he commented on one of my patches
before that).</p>
</div>
<div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
</div>
<div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I
asked about it on llvm-dev a couple weeks
later (<a href="http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html" target="_blank"><span style="color:purple">http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html</span></a>)
highlighting the contradiction and was
told that 'no-op cast' referred to the
lack of math rather than a requirement
that zero instructions are used. It's
therefore my understanding that shuffling
the bits to preserve the load/store based
definition isn't considered to be changing
the bits.</span></p>
</div>
<div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
</div>
<div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I
think the main thing the current
definition is unclear on is whether it
refers to the bits in a physical machine
register or the bits in the LLVM-IR
virtual register. Most of the time these
two views are the same but this doesn't
quite work for big-endian MSA/NEON. For
example:</span></p>
</div>
<div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">%0
= bitcast <4 x i32> <i32 1, i32
2, i32 3, i32 4> to <2 x i64></span></p>
</div>
<div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">%0
= <2 x i64> <i64 (1 << 32)
| 2, i64 (3 << 32) | 4></span></p>
</div>
<div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">are
equivalent to each other in LLVM-IR terms
but the constants are physically laid out
in MSA registers as:</span></p>
</div>
<div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">0x00000004000000030000000200000001
# <4 x i32> <i32 1, i32 2, i32 3,
i32 4></span></p>
</div>
<div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">0x00000003000000040000000100000002
# <2 x i64> <i64 (1 << 32)
| 2, i64 (3 << 32) | 4></span></p>
</div>
<div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">and
we must therefore shuffle the bits to
preserve LLVM-IR's point of view.</span></p>
</div>
<div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
</div>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<div><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">From:</span></b><span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"> </span></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">Quentin Colombet [<a href="mailto:qcolombet@apple.com" target="_blank"></a><a href="mailto:qcolombet@apple.com" target="_blank">mailto:qcolombet@apple.com</a>]<span> </span><br>
<b>Sent:</b><span> </span>07 January
2016 19:58<br>
<b>To:</b><span> </span>Daniel
Sanders<br>
<b>Cc:</b><span> </span>llvm-dev<br>
<b>Subject:</b><span> </span>Re:
[llvm-dev] [GlobalISel] A Proposal
for global instruction selection</span></p>
</div>
</div>
</div>
<div><div> <br></div>
</div>
<div><p class="MsoNormal">Hi Daniel,</p>
</div>
<div>
<div><div> <br></div>
</div>
</div>
<div>
<div><p class="MsoNormal">I had a quick look at
the language reference for bitcast and I
have a different reading than what you
were pointing out.</p>
</div>
</div>
<div>
<div><p class="MsoNormal">Indeed, my take away
is:</p>
</div>
</div>
<div>
<div><p class="MsoNormal"><span>"It
is<span> </span><b>always a </b></span><em><b><span>no-op
cast</span></b></em><span> because
no bits change with this conversion."</span></p>
</div>
</div>
<div>
<div><div> <br></div>
</div>
</div>
<div>
<div><p class="MsoNormal">In other words,
deleting all bitcast instructions should
be fine.</p>
</div>
</div>
<div>
<div><div> <br></div>
</div>
</div>
<div>
<div><p class="MsoNormal">My understanding of
the quote you’ve highlighted is that it
tells C programmers that this is like a
memcpy, not a cast :).</p>
</div>
</div>
<div>
<div><div> <br></div>
</div>
</div>
<div>
<div><p class="MsoNormal">Cheers,</p>
</div>
</div>
<div>
<div><p class="MsoNormal">-Quentin</p>
</div>
<div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div><p class="MsoNormal">On Nov 20,
2015, at 6:53 AM, Daniel Sanders
<<a href="mailto:Daniel.Sanders@imgtec.com" target="_blank"><span style="color:purple">Daniel.Sanders@imgtec.com</span></a>>
wrote:</p>
</div>
</div>
<div><div> <br></div>
</div>
<div>
<div>
<div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Hi,</span></p>
</div>
</div>
<div>
<div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
</div>
</div>
<div>
<div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I
haven't had chance to read all
of this yet, but one minor
thing occurred to me during
your presentation that I want
to mention. At one point you
mentioned deleting all the
bitcast instructions since
they're equivalent to nops but
this isn't always true.</span></p>
</div>
</div>
<div>
<div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
</div>
</div>
<div>
<div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">The<span> </span><a href="http://llvm.org/docs/LangRef.html" target="_blank"><span style="color:purple"></span></a><a href="http://llvm.org/docs/LangRef.html" target="_blank">http://llvm.org/docs/LangRef.html</a></span><span> </span>definition
of the bitcast instruction
includes this sentence:</p>
</div>
</div>
<div>
<div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">The
conversion is done as if the
value had been stored to
memory and read back as type
ty2.</span></p>
</div>
</div>
<div>
<div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">For
big-endian MSA, this is
equivalent to a shuffling of
the bits in the register
because endianness only
changes the byte order within
each element. The order of the
elements is unaffected by
endianness. IIRC, big-endian
NEON is the same way.</span></p>
</div>
</div>
<div>
<div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
</div>
</div>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<div>
<div><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">From:</span></b><span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"> </span></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">llvm-dev [<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank"><span style="color:purple"></span></a><a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">mailto:llvm-dev-bounces@lists.llvm.org</a></span>]<span> </span><b>On
Behalf Of<span> </span></b>Quentin
Colombet via llvm-dev<br>
<b>Sent:</b><span> </span>18
November 2015 19:27<br>
<b>To:</b><span> </span>llvm-dev<br>
<b>Subject:</b><span> </span>[llvm-dev]
[GlobalISel] A Proposal
for global instruction
selection</p>
</div>
</div>
</div>
</div>
<div>
<div><div> <br></div>
</div>
</div>
<div>
<div>
<div>
<div><p class="MsoNormal">Hi,<br>
<span style="color:#12c00e"><br>
</span>With this email, I
would like to kick-off the
development for the next
instruction selector that
I described during the
last LLVM Dev’ Meeting.<br>
For the motivations, see
Jakob’s proposal (<a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html" target="_blank"><span style="color:purple"></span></a><a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html</a>)
and for the proposal, see
the slides (Keynote: <a href="http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co" target="_blank"><span style="color:purple"></span></a><a href="http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co" target="_blank">http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co</a> or
PDF: <a href="http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co" target="_blank"><span style="color:purple"></span></a><a href="http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co" target="_blank">http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co</a>)
or the talk (<a href="https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2" target="_blank"><span style="color:purple"></span></a><a href="https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2" target="_blank">https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2</a>).</p>
</div>
</div>
</div>
<div>
<div>
<div><p class="MsoNormal"><br>
TL;DR This is happening
now, feedbacks invited!<br>
<br>
*** Context ***<br>
<span style="color:#12c00e"><br>
</span>During the last
LLVM Dev’ Meeting, I have
presented a proposal for
the next instruction
selector, GlobalISel. The
proposal is basically
summarized in "High Level
Prototype Design” and
“Roadmap”. (If you want
further details, feel free
to reach me.)<br>
<span style="color:#00afcd"><br>
</span>The first step of
the development plan is to
prototype the new
framework on open source.
The idea is to <b>start
prototyping now(!)</b> and
have the discussion
ongoing in parallel. The
reason of such approach is
to have code that can be
used to inform those
discussions, e.g., by
collecting data and trying
different designs
approaches. Regarding the
discussion, I have listed
a few points where your
feedbacks would be
particularly appreciated
(see Feedback Invite).</p>
</div>
</div>
</div>
<div>
<div>
<div><p class="MsoNormal"><span style="color:#00afcd"><br>
</span>Also, as I have
mentioned in my talk, some
issues are controversial
but I expect them to be
resolved during prototype
development. Specifically
theses concern aspects of
legalization (should parts
of it be done at the LLVM
IR level or all at the MI
level?) and code re-use
for instruction combiner.
Please feel free to bring
up your specific concern
as I move along with the
development plan.<br>
<span style="color:#00afcd"><br>
</span>I expect the design
to evolve with our
experimental findings and
your feedbacks
and contributions.<br>
Nonetheless, we expect to
nail down some design
decisions once and for all
as the prototype
progresses. I have
highlighted them with
the following pattern <b>[final]</b>.<br>
<span style="color:#12c00e"><br>
<br>
<br>
</span>*** Feedback Invite
***<br>
<span style="color:#00afcd"><br>
</span>If you follow and
support this work you need
to be aware of three
things and I am eager to
hear your feedback and
thoughts about them: the
overall goals of Global
ISel, the goals of the
prototype, and the impact
of the prototype work on
backend design. <br>
<span style="color:#00afcd"><br>
</span>In the section
“Goals", I defined
(repeated for people that
saw the talk) the goals
for the Global ISel
design.<br>
- Do you see anything
missing?<br>
- Do you see something
that should not be there? <br>
<span style="color:#00afcd"><br>
</span>The prototype will
answer critical design
questions (see “Design
Questions the Prototype
Addresses at the End of
M1" for examples) before
the actual design of Gobal
ISel is finalized, but it
cannot cover everything.<br>
Specifically we will <b>*not*</b> look
into improving TableGen or
reuse InstCombine (see “
Proposed Approach” for the
rational). Please let me
know if you see any issue
with that.<br>
<span style="color:#00afcd"><br>
</span>There is also basic
ground work needed to
prepare for Global ISel
and I need to extend the
core MachineInstr-level
APIs as explained during
the talk. For this, I
prepared sketches of
patches to illustrate them
and describe the details
in the “Implications”
section below. Please have
a look at the patches to
have a better idea of the
expected impact.<br>
<span style="color:#00afcd"><br>
</span>If there is
anything else you want to
discuss related to Global
ISel feel free to reach
me. In particular, several
people expressed their
interests during the LLVM
Dev Meeting in
contributing to the
project. Let me know what
is your area of interest,
so that we can coordinate
our efforts.<br>
Anyhow, please add
[GlobalISel] in the
subject line to help
categorizing the emails.<br>
<span style="color:#00afcd"><br>
<br>
<br>
</span>*** Goals ***<br>
<span style="color:#12c00e"><br>
</span>The high level
goals of the new
instruction selector are:<br>
- Global instruction
selector.<br>
- Fast instruction
selector.<br>
- Shared code path for
fast and good instruction
selection.<br>
- IR that represents ISA
concepts better.<br>
- More flexible
instruction selector.<br>
- Easier to
maintain/understand
framework, in particular
legalization.<br>
- Self contained machine
representation, no back
links to LLVM IR.<br>
- No change to LLVM IR.<br>
<span style="color:#5856d6"><br>
</span>Note: The goals
are common to all targets.
In particular, we do not
intend to work on target
specific feature for the
prototype.<br>
The bottom line is please
make sure those goals are
compatible with what you
want to achieve for your
target, even if your
requirement does not get
listed here.<br>
<br>
<span style="color:#12c00e"><br>
<br>
</span>*** Proposed
Approach ***<br>
<span style="color:#12c00e"><br>
</span>In this section, I
describe the approach I
plan to pursue in the
prototype and the roadmap
to get there. The final
design will flow out of
it.<br>
<span style="color:#12c00e"><br>
</span>For this prototype,
we purposely exclude any
work to improve or use
TableGen or InstCombine <b>[final].</b> We
will keep in mind however,
that some of the C++ code
we write will be
table-generated at some
point.<br>
The rational is that we do
not want to lay down a new
TableGen/InstCombine
infrastructure before
being able to work on the
ISel framework itself.<br>
<span style="color:#12c00e"><br>
</span>The prototype
vehicle will be <b>AArch64</b>.
None of the changes for
GlobalISel will negatively
impact the existing ISel.<br>
<span style="color:#12c00e"><br>
<br>
</span>** High Level
Prototype Design **<br>
<span style="color:#12c00e"><br>
</span>As shown in the
talk, the expected
pipeline for the prototype
is:<br>
<b>LLVM IR </b>->
IRTranslator -> <b>Generic (G)
MachineInstr</b> ->
Legalizer ->
RegBankSelect -> Select
-> <b>MachineInstr</b><br>
<span style="color:#12c00e"><br>
</span>Where:<br>
- Terms in <b>bold</b> are
intermediate
representations.<br>
- Generic MachineInstrs
are machine instructions
with a generic opcode,
e.g., ADD, COPY.</p>
</div>
</div>
</div>
<div>
<div>
<div><p class="MsoNormal">-
IRTranslator: Translate
LLVM IR to (G)
MachineInstr.<br>
- Legalizer: Legalize
illegal (G) MachineInstr
to legal (G) MachineInstr.<br>
- RegBankSelect: Assign
virtual register with size
to virtual register with
Register Bank.<br>
- Select: Translate the
remaining (G) MachineInstr
to MachineIntr.<br>
<br>
<span style="color:#00afcd"><br>
<br>
</span>** Implications **<br>
<span style="color:#00afcd"><br>
</span>As part of the
bring-up of the prototype,
we need to extend some of
the core
MachineInstr-level APIs:<br>
- Need to remember
FastMath flags for each
MachineInstr.<br>
- Need to know the type
of each MachineInstr. We
don’t want ADD8, ADD16,
etc.<br>
- Extend the
MachineRegisterInfo to
support size as well as
register classes for
virtual registers.<br>
<span style="color:#00afcd"><br>
</span>I have sketched the
changes in the attached
patches to help picturing
how the changes would
impact the existing APIs.</p>
</div>
</div>
</div>
<div>
<div>
<div><div> <br></div>
</div>
</div>
</div>
<div>
<div>
<div><p class="MsoNormal">Note: I
do not intend to commit
those changes as they are.
They will go the usual
review process in due
time.</p>
</div>
</div>
</div>
<div>
<div>
<div><p class="MsoNormal"><br>
The patches contain “//
***”-like comment that
give a rough explanation
on why those changes are
needed w.r.t. the goals.<br>
The order of the patches
could be modified since
the dependencies between
those are not sequential.
Anyhow, here are the
patches:<br>
1. Introduce (some of) the
generic opcode.<br>
2. Make MachineFunction
more independent of LLVM
IR to eventually be able
to delete the LLVM IR
instance from the memory.<br>
3. Extend MachineInstr to
represent additional
information attached to
generic opcode.<br>
4. Teach
MachineRegisterInfo about
size for virtual
registers.<br>
5. Introduce a helper
class to build
MachineInstr related
objects.<br>
6. Add new target hooks to
lower the ABI directly to
MachineInstr.<br>
7. Introduce the
IRTranslator pass.<br>
<br>
<span style="color:#12c00e"><br>
</span>** Roadmap for the
Prototype **<br>
<span style="color:#00afcd"><br>
</span>We plan to split
the prototype in three
main milestones:<br>
1. Translation: LLVM IR to
(G) MachineInstr
translation.<br>
2. Basic selector: Legal
LLVM IR to target specific
MachineInstr.<br>
3. Simple legalization:
Support scalar type
legalization and some
vector instructions.<br>
<span style="color:#00afcd"><br>
</span>Notes:<br>
- For #1, we will not
support any fancy
instructions like landing
pad or switch.<br>
- Each milestone should
take about 3-4 months.</p>
</div>
</div>
</div>
<div>
<div>
<div><p class="MsoNormal">- At
the end of #2, we would
have a FastISel like
selector.<br>
<span style="color:#00afcd"><br>
</span>Each milestone will
be detailed right before
starting it. The rational
is that we want to
accommodate what we
discovered with the
prototype for the next
milestone. In other words,
in this email, <b>I only
describe the first
milestone</b> in detail
and I will give more
details on the next
milestone shortly before
we start it and so on. For
your information, here is
the remaining of the
intended roadmap for the <b>full</b> project:<br>
4. Productization: Clean
up implementation,
stabilize the APIs.<br>
5. Complex legalization:
Extend legalization
support to everything
missing.<br>
6. Completeness: Fill the
blanks, e.g., landing pad.<br>
7. Clean-up and
performance: Add the
necessary bits to be at
parity or beat
SelectionDAG generated
code.<br>
8. Transition: Document
how to switch, provide
tools to help.<br>
<span style="color:#00afcd"><br>
<br>
</span>** Milestone 1 **<br>
<span style="color:#12c00e"><br>
</span>The first phase is
focused on the
IRTranslator pass.<br>
<span style="color:#12c00e"><br>
</span>The IRTranslator is
responsible for
translating the LLVM IR
into Generic MachineInstr.
The IRTranslator pass uses
some target hooks
to perform the ABI
lowering. We can either
define a new API for them,
e.g., ABILoweringInfo, or
extend the existing
TargetLowering.<br>
Moreover, the prototype
will focus on simple
instruction, i.e., we will
not support switch or
landing pad for this
iteration.<br>
<span style="color:#12c00e"><br>
</span>At the end of M1,
the prototype will not be
able to produce code,
since we would only have
the beginning of the
Global ISel pipeline.
Instead, we will test the
IRTranslator on the
generic output that is
produced from the tested
IR.<br>
<span style="color:#12c00e"><br>
</span>* Design Decisions
*<br>
<span style="color:#12c00e"><br>
</span>- The IRTranslator
is a final class. Its
purpose is to move away
from LLVM IR to
MachineInstr world <b>[final]</b>.<br>
- Lower the ABI as part of
the translation process <b>[final]</b>.<br>
<span style="color:#12c00e"><br>
</span>* Design Questions
the Prototype Addresses at
the End of M1 *<br>
<span style="color:#12c00e"><br>
</span>- Handling of
aggregate types during the
translation.<br>
- Lowering of switches.<br>
- What about Module pass
for Machine pass?<br>
- Introduce new APIs to
have a clearer separation
between:<br>
- Legalization
(setOperationAction, etc.)<br>
- Cost/Combine related
(isXXXFree, etc.)<br>
- Lowering related
(LowerFormal, etc.)<br>
- What is the contract
with the backends? Is it
still “should be able to
select any valid LLVM IR”?<br>
<span style="color:#00afcd"><br>
</span>Thanks,</p>
</div>
</div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div><p class="MsoNormal">-Quentin</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div><div> <br></div>
</div>
</div>
</div>
</div>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
LLVM Developers mailing list
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
<br>
</div>
</div></blockquote></div></div></div></blockquote></div></div>