<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Arial","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:714157082;
mso-list-type:hybrid;
mso-list-template-ids:-437589798 2102831726 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:12.0pt;
font-family:Wingdings;
mso-fareast-font-family:Calibri;
mso-bidi-font-family:"Times New Roman";
color:windowtext;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-family:Wingdings"><span style="mso-list:Ignore">Ø<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]>It's interesting that it’s such a performance issue though<span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">I don’t think it really is much of a performance issue, except perhaps on Quark. All recent processors for IA32 make unaligned accesses effectively the same<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">performance as aligned accesses unless they cross a cache-line boundary. And since alignment within structs is made 8 bytes, provided the class when created
is dynamic, then often the memory allocators will return memory that is “well aligned” as well.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">So, that leaves potential penalities for things allocated on the stack. Again, if the compiler thinks it is worthwhile it can use extra instructions in the prolog
and<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">epilog of a routine to ensure a higher than minimum stack alignment if it thinks there is a performance reason for doing so.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">But this discussion was about the ABIs, and what was guaranteed. And the ABI for IA32 windows only has a 4 byte guarantee for the stack<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">upon entry to a function. And the ABI (that gcc is assuming) for IA32 linux has a guarantee of 16 byte alignment, but that can be controlled by the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">option shown below. And, for example, the linux kernel is built with gcc using the 4 byte alignment guarantee version of that option.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">Kevin<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> John Sully [mailto:john@csquare.ca]
<br>
<b>Sent:</b> Thursday, January 15, 2015 11:52 AM<br>
<b>To:</b> mats petersson<br>
<b>Cc:</b> Smith, Kevin B; cfe-dev@cs.uiuc.edu<br>
<b>Subject:</b> Re: [cfe-dev] Default stack alignment for x86 changed<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">Clang is really no different than MSVC here (I just double checked). For SSE you always had to specify the alignment required because it was never guaranteed by the compiler (especially when you get into mandatory 16-byte alignment).
It's interesting that its such a performance issue though, unless your really memory constrained it seems the size/speed trade-off is clearly in favour of 8 byte alignment even though its not technically necessary.<o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">On Thu, Jan 15, 2015 at 11:16 AM, mats petersson <<a href="mailto:mats@planetcatfish.com" target="_blank">mats@planetcatfish.com</a>> wrote:<o:p></o:p></p>
<p class="MsoNormal">To be clear: double does not REQUIRE 8 byte alignment, but on<br>
(reasonably modern, like "Pentium onwards", so ca 1994-5 ish) x86<br>
processors would "prefer" 8-byte alignment for "double" values, since<br>
they can then be read as ONE cycle on a 64-bit bus.<br>
<br>
And of course, SSE instructions that aren't specifically designed for<br>
unaligned loads will require a 16-byte alignment. Or does SSE code<br>
automatically modify the alignment criteria for the function?<br>
<br>
Further, shouldn't the stack be aligned to "LargestAlignment" or<br>
whatever it is called? Otherwise, any structure alignment will surely<br>
be "lost"?<br>
<br>
--<br>
Mats<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal"><br>
On 15 January 2015 at 18:38, Smith, Kevin B <<a href="mailto:kevin.b.smith@intel.com">kevin.b.smith@intel.com</a>> wrote:<br>
> Although alignof(double) on windows returns 8, the actual minimum stack alignment is still 4. Here is a source example illustrating<br>
> this.<br>
><br>
> #include <stdlib.h><br>
><br>
> int a = __alignof(double);<br>
><br>
> extern void crud1(int i, double *p);<br>
><br>
> void crud(void) {<br>
> double dummy;<br>
> crud1(0, &dummy);<br>
> }<br>
><br>
> Assembly code produced from VS 2012, compiling with cl -Fa -c -O2 crud.c<br>
> _DATA SEGMENT<br>
> _a DD 08H<br>
> _DATA ENDS<br>
> PUBLIC _crud<br>
> EXTRN _crud1:PROC<br>
> ; Function compile flags: /Ogtpy<br>
> ; COMDAT _crud<br>
> _TEXT SEGMENT<br>
> _dummy$ = -8<br>
> _crud PROC<br>
> ; File d:\users\kbsmith1\tc_tmp1\crud.c<br>
> ; Line 7<br>
> sub esp, 8<br>
> ; Line 9<br>
> lea eax, DWORD PTR _dummy$[esp+8]<br>
> push eax<br>
> push 0<br>
> call _crud1<br>
> ; Line 10<br>
> add esp, 16<br>
> ret 0<br>
><br>
> You can see that __alignof(double) produced 8 by the initialization value of a. You can also see that there is no code at the beginning of function crud to<br>
> align the stack. So, if it comes in on a 4 byte boundary, it will remain on a 4 byte boundary, and since it subs 8 from esp, if it comes in on an 8 byte boundary<br>
> it will stay on an 8 byte boundary. Now consider the call to crud1. This pushes two parameters, and then the call pushes the return address. So, if the stack<br>
> comes in 8 byte aligned, at the entry to crud1, the stack is now only 4 byte aligned.<br>
><br>
> For this reason, in windows, although __alignof(double) is 8, it doesn't follow that the value of every double * must be such that the pointer value is 8 byte aligned.<br>
><br>
> Also, for IA32 on linux, 4 byte minimum stack alignment used to be specified by the Sys V ABI, which is pretty much the only one you can find references to on the web. However, for quite a number of years, gcc's default on linux is to assure 16 byte stack
alignment at function entry, so that every function that used SSE/SSE2 instructions (and might possibly need to spill) didn't have to perform dynamic stack alignment. In gcc this is controlled by -mpreferred-stack-boundary=num option.
<a href="https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/i386-and-x86-64-Options.html" target="_blank">
https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/i386-and-x86-64-Options.html</a>, says the default for this option is 4, implying 16 byte stack alignment.<br>
><br>
> Kevin Smith<br>
><br>
> -----Original Message-----<br>
> From: <a href="mailto:cfe-dev-bounces@cs.uiuc.edu">cfe-dev-bounces@cs.uiuc.edu</a> [mailto:<a href="mailto:cfe-dev-bounces@cs.uiuc.edu">cfe-dev-bounces@cs.uiuc.edu</a>] On Behalf Of palparni<br>
> Sent: Thursday, January 15, 2015 9:34 AM<br>
> To: <a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>
> Subject: Re: [cfe-dev] Default stack alignment for x86 changed<br>
><br>
> I understand, so the change was made for Unix-based systems in mind.<br>
> Unfortunately the win32 x86 ABI seems to require doubles to be 64-bit<br>
> aligned. Could we perhaps keep the 8-byte alignment only for win32 targets?<br>
><br>
> Thanks,<br>
> Alpar<br>
><br>
><br>
><br>
> --<br>
> View this message in context: <a href="http://clang-developers.42468.n3.nabble.com/Default-stack-alignment-for-x86-changed-tp4043481p4043483.html" target="_blank">
http://clang-developers.42468.n3.nabble.com/Default-stack-alignment-for-x86-changed-tp4043481p4043483.html</a><br>
> Sent from the Clang Developers mailing list archive at Nabble.com.<br>
> _______________________________________________<br>
> cfe-dev mailing list<br>
> <a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>
> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>
><br>
> _______________________________________________<br>
> cfe-dev mailing list<br>
> <a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>
> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>
<br>
_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</body>
</html>