<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 9/25/2017 9:14 AM, Björn Pettersson
A wrote:<br>
</div>
<blockquote type="cite"
cite="mid:HE1PR0701MB25723C2FC2B41565BBA52F69B07A0@HE1PR0701MB2572.eurprd07.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle18
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle19
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">(Not sure if this exactly maps to “truncating
store”, but I think it at least touches some of the subjects
discussed in this thread)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">Our out-of-tree-target need several patches to
get things working correctly for us.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">We have introduced i24 and i40 types in
ValueTypes/MachineValueTypes (in addition to the normal
pow-of-2 types). And we have vectors of those (v2i40,
v4i40).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">And the byte size in our target is 16 bits.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">When storing an i40 we need to store it as
three 16-bit bytes, i.e. 48 bits.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">When storing a v4i40 vector it will be stored
as 4x48 bits.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">One thing that we have had to patch is the
getStoreSize() method in ValueTypes/MachineValueTypes where
we assume that vectors are bitpacked when the element size
is smaller than the byte size (“BitsPerByte”):<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"> /// Return the number of bytes overwritten
by a store of the specified value<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"> /// type.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"> unsigned getStoreSize() const {<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">- return (getSizeInBits() + 7) / 8;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">+ // We assume that vectors with elements
smaller than the byte size are<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">+ // bitpacked. And that elements larger
than the byte size should be padded<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">+ // (e.g. i40 type for Phoenix is stored
using 3 bytes (48 bits)).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">+ bool PadElementsToByteSize =<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">+ isVector() &&
getScalarSizeInBits() >= BitsPerByte;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">+ if (PadElementsToByteSize)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">+ return getVectorNumElements() *
getScalarType().getStoreSize();<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">+ return (getSizeInBits() +
(BitsPerByte-1)) / BitsPerByte;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"> }<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">The patch seems to work for in-tree-target
tests as well as our out-of-tree target.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">If it is a correct assumption for all targets
is beyond my knowledge. Maybe only i1 vectors should be
bitpacked?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">Anyway, I think the bitpacked cases is very
special (we do not use it for our target…).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">AFAIK bitcast is defined as writing to memory
followed by a load using a different type. And I think that
doing several scalar operations should give the same result
as when using vectors. So bitcast of bitpacked vectors
should probably be avoided?</span></p>
</div>
</blockquote>
<br>
Yes, store+load is the right definition of bitcast. And in fact,
the backend will lower a bitcast to a store+load to a stack
temporary in cases where there isn't some other lowering specified.<br>
<br>
The end result is probably going to be pretty inefficient unless
your target has a special instruction to handle it (x86 has pmovmskb
for i1 vector bitcasts, but otherwise you probably end up with some
terrible lowering involving a lot of shifts).<br>
<br>
<blockquote type="cite"
cite="mid:HE1PR0701MB25723C2FC2B41565BBA52F69B07A0@HE1PR0701MB2572.eurprd07.prod.outlook.com">
<div class="WordSection1">
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">This also reminded me of the following test
case that is in trunk: test/CodeGen/X86/pr20011.ll<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">%destTy = type { i2, i2 }<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">define void @crash(i64 %x0, i64 %y0, %destTy*
nocapture %dest) nounwind {<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">; X64-LABEL: crash:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">; X64: # BB#0:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">; X64-NEXT: andl $3, %esi<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">; X64-NEXT: movb %sil, (%rdx)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">; X64-NEXT: andl $3, %edi<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">; X64-NEXT: movb %dil, (%rdx)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">; X64-NEXT: retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"> %x1 = trunc i64 %x0 to i2<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"> %y1 = trunc i64 %y0 to i2<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"> %1 = bitcast %destTy* %dest to <2 x
i2>*<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"> </span>
<span style="mso-fareast-language:EN-US">%2 = insertelement
<2 x i2> undef, i2 %x1, i32 0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US">
</span><span style="mso-fareast-language:EN-US" lang="EN-US">%3
= insertelement <2 x i2> %2, i2 %y1, i32 1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"> store <2 x i2> %3, <2 x i2>* %1,
align 1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"> ret void<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">}<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">As you can see by the “X64” checks the behavior
is quite weird.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">Both movb instructions writes to the same
address. So the result of the store <2 x i2> will be
the same as when only storing one of the elements.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">Is this really expected?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="mso-fareast-language:EN-US"
lang="EN-US">We have emailed Simon Pilgrim who added the
test case to show that we no longer crash on this test case
(see
<a href="https://bugs.llvm.org/show_bug.cgi?id=20011"
moz-do-not-send="true">https://bugs.llvm.org/show_bug.cgi?id=20011</a>).
But even if the compiler doesn’t crash, the behavior seems
wrong to me. <br>
</span></p>
</div>
</blockquote>
<p>Yes, the behavior here is wrong.
DAGTypeLegalizer::SplitVecOp_STORE/DAGTypeLegalizer::SplitVecRes_LOAD/etc.
assume the element size is a multiple of 8. I'm sure this has
been discussed before, but I guess nobody ever wrote a patch to
fix it...?<br>
</p>
<p>-Eli<br>
</p>
<pre class="moz-signature" cols="72">--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>
</body>
</html>