<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 12/21/14 4:27 AM, Kuperstein,
Michael M wrote:<br>
</div>
<blockquote
cite="mid:251BD6D4E6A77E4586B482B33960D228442297A0@HASMSX106.ger.corp.intel.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<meta name="Generator" content="Microsoft Word 14 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
line-height:115%;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
color:black;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
color:black;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman","serif";
color:black;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0cm;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";
color:black;}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:"Calibri","sans-serif";}
span.EmailStyle19
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;
color:black;}
span.EmailStyle23
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#1F497D">Which
performance guidelines are you referring to?</span></p>
</div>
</blockquote>
Table C-21 in "Intel® 64 and IA-32 Architectures Optimization
Reference Manual", September 2014.<br>
<br>
It hasn't changed. It still lists push and pop instructions as 2-3
times more expensive as mov. And that's not taking into account any
optimizations that might get undone by the stack pointer changing.
I'm just speculating, but I suspect that move being faster has to do
with not having to modify a register every time.<br>
<br>
With that as a basis, the fastest entry/exit sequences just use
subl/addl on the stack pointer and don't use push at all. For most
C functions, you don't even have to materialize a frame pointer (if
the unwind mechanisms are set up to handle that). Not that I am
recommending changing the x86_32 code generation to do that.<br>
<blockquote
cite="mid:251BD6D4E6A77E4586B482B33960D228442297A0@HASMSX106.ger.corp.intel.com"
type="cite">
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">I’m not that
familiar with decade-old CPUs, but to the best of my
knowledge, this is not true on current hardware.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">There is one
specific circumstance where PUSHes should be avoided – for
Atom/Silvermont processors, the memory form of PUSH is
inefficient, so the register-freeing optimization below may
not be profitable (see 14.3.3.6 and 15.3.1.2 in the Intel
optimization reference manual). <o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Having said
that, one distinct possibility is to have the heuristic make
different decisions depending on the optimization flags,
that is be much more aggressive for optsize functions.<o:p></o:p></span></p>
<p class="MsoNormal"><a moz-do-not-send="true"
name="_MailEndCompose"><span style="color:#1F497D"><o:p> </o:p></span></a></p>
<div>
<div style="border:none;border-top:solid #B5C4DF
1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="line-height:normal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext">
Herbie Robinson [<a class="moz-txt-link-freetext" href="mailto:HerbieRobinson@verizon.net">mailto:HerbieRobinson@verizon.net</a>]
<br>
<b>Sent:</b> Sunday, December 21, 2014 10:58<br>
<b>To:</b> Kuperstein, Michael M; <a class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a><br>
<b>Subject:</b> Re: [LLVMdev] [RFC] [X86] Mov to push
transformation in x86-32 call sequences<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">According to the Intel performance
guidelines, pushes are significantly slower than moves to
the extent they should be avoided as much as possible. It's
been a decade since I was dealing with this; so, I don't
remember the numbers, but I'm pretty sure the changes you
are proposing will slow the code down.<br>
<br>
People who care about speed more than code size are probably
not going to like this very much...<br>
<br>
<br>
On 12/21/14 3:17 AM, Kuperstein, Michael M wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoPlainText">Hello all,<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">In r223757 I’ve committed a patch that
performs, for the 32-bit x86 calling convention, the
transformation of MOV instructions that push function
arguments onto the stack into actual PUSH instructions.<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">For example, it will transform this:<o:p></o:p></p>
<p class="MsoPlainText">subl $16, %esp<o:p></o:p></p>
<p class="MsoPlainText">movl $4, 12(%esp)<o:p></o:p></p>
<p class="MsoPlainText">movl $3, 8(%esp)<o:p></o:p></p>
<p class="MsoPlainText">movl $2, 4(%esp)<o:p></o:p></p>
<p class="MsoPlainText">movl $1, (%esp)<o:p></o:p></p>
<p class="MsoPlainText">calll _func<o:p></o:p></p>
<p class="MsoPlainText">addl $16, %esp<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">Into this:<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">pushl $4<o:p></o:p></p>
<p class="MsoPlainText">pushl $3<o:p></o:p></p>
<p class="MsoPlainText">pushl $2<o:p></o:p></p>
<p class="MsoPlainText">pushl $1<o:p></o:p></p>
<p class="MsoPlainText">calll _func<o:p></o:p></p>
<p class="MsoPlainText">addl $16, %esp<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">The main motivation for this is code
size (a “pushl $4” is 2 bytes, a “movl $4, 12(%esp)” is 7
bytes), but there are some other advantages, as shown below.<o:p></o:p></p>
<p class="MsoPlainText">The way this works in r223757 is by
intercepting call frame simplification in the Prolog/Epilog
Inserter, and replacing the mov sequence with pushes. Right
now it only handles functions which do not have a reserved
call frame (a small minority of cases), and I'd like to
extend it to cover other cases where it is profitable.<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">The currently implemented approach has
a couple of drawbacks:<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">1) Push vs. having a reserved call
frame: <br>
This transformation is always profitable when we do not have
a reserved call frame. When a reserved frame can be used,
however, there is a trade-off. For example, in a function
that contains only one call site, and no other stack
allocations, pushes are a clear win, since having a reserved
call frame wouldn't save any instructions. On the other
hand, if a function contains 10 call sites, and only one of
them can use pushes, then it is most probably a loss – not
reserving a call frame will cost us 10 add instructions,
and the pushes gain very little. I’d like to be able to make
the decision on whether we want to have a reserved frame or
pushes by considering the entire function. I don't think
this can be done in the context of PEI.
<o:p></o:p></p>
<p class="MsoPlainText">Note that in theory we could have both
a reserved call frame and have some specific call sites in
the function use pushes, but this is fairly tricky because
it requires much more precise tracking of the stack pointer
state. That is something I’m not planning to implement at
this point.<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">2) Register allocation inefficiency:<br>
Ideally, pushes can be used to make direct memory-to-memory
movs, freeing up registers, and saving quite a lot of code.<o:p></o:p></p>
<p class="MsoPlainText">For example, for this (this is
obviously a constructed example, but code of this kind does
exist in the wild):<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">void foo(int a, int b, int c, int d,
int e, int f, int g, int h);<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">struct st { int arr[8]; };<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">void bar(struct st* p)<o:p></o:p></p>
<p class="MsoPlainText">{<o:p></o:p></p>
<p class="MsoPlainText"> foo(p->arr[0], p->arr[1],
p->arr[2], p->arr[3], p->arr[4], p->arr[5],
p->arr[6], p->arr[7]); }<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">We currently generate (with -m32 -O2)
this:<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText"> pushl %ebp<o:p></o:p></p>
<p class="MsoPlainText"> movl %esp, %ebp<o:p></o:p></p>
<p class="MsoPlainText"> pushl %ebx<o:p></o:p></p>
<p class="MsoPlainText"> pushl %edi<o:p></o:p></p>
<p class="MsoPlainText"> pushl %esi<o:p></o:p></p>
<p class="MsoPlainText"> subl $44, %esp<o:p></o:p></p>
<p class="MsoPlainText"> movl 8(%ebp), %eax<o:p></o:p></p>
<p class="MsoPlainText"> movl 28(%eax), %ecx<o:p></o:p></p>
<p class="MsoPlainText"> movl %ecx,
-20(%ebp) # 4-byte Spill<o:p></o:p></p>
<p class="MsoPlainText"> movl 24(%eax), %edx<o:p></o:p></p>
<p class="MsoPlainText"> movl 20(%eax), %esi<o:p></o:p></p>
<p class="MsoPlainText"> movl 16(%eax), %edi<o:p></o:p></p>
<p class="MsoPlainText"> movl 12(%eax), %ebx<o:p></o:p></p>
<p class="MsoPlainText"> movl 8(%eax), %ecx<o:p></o:p></p>
<p class="MsoPlainText"> movl %ecx,
-24(%ebp) # 4-byte Spill<o:p></o:p></p>
<p class="MsoPlainText"> movl (%eax), %ecx<o:p></o:p></p>
<p class="MsoPlainText"> movl %ecx,
-16(%ebp) # 4-byte Spill<o:p></o:p></p>
<p class="MsoPlainText"> movl 4(%eax), %eax<o:p></o:p></p>
<p class="MsoPlainText"> movl -20(%ebp),
%ecx # 4-byte Reload<o:p></o:p></p>
<p class="MsoPlainText"> movl %ecx, 28(%esp)<o:p></o:p></p>
<p class="MsoPlainText"> movl %edx, 24(%esp)<o:p></o:p></p>
<p class="MsoPlainText"> movl %esi, 20(%esp)<o:p></o:p></p>
<p class="MsoPlainText"> movl %edi, 16(%esp)<o:p></o:p></p>
<p class="MsoPlainText"> movl %ebx, 12(%esp)<o:p></o:p></p>
<p class="MsoPlainText"> movl -24(%ebp),
%ecx # 4-byte Reload<o:p></o:p></p>
<p class="MsoPlainText"> movl %ecx, 8(%esp)<o:p></o:p></p>
<p class="MsoPlainText"> movl %eax, 4(%esp)<o:p></o:p></p>
<p class="MsoPlainText"> movl -16(%ebp),
%eax # 4-byte Reload<o:p></o:p></p>
<p class="MsoPlainText"> movl %eax, (%esp)<o:p></o:p></p>
<p class="MsoPlainText"> calll _foo<o:p></o:p></p>
<p class="MsoPlainText"> addl $44, %esp<o:p></o:p></p>
<p class="MsoPlainText"> popl %esi<o:p></o:p></p>
<p class="MsoPlainText"> popl %edi<o:p></o:p></p>
<p class="MsoPlainText"> popl %ebx<o:p></o:p></p>
<p class="MsoPlainText"> popl %ebp<o:p></o:p></p>
<p class="MsoPlainText"> retl<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">Which is fairly horrible. <o:p></o:p></p>
<p class="MsoPlainText">Some parameters get mov-ed up to four
times - a mov from the struct into a register, a register
spill, a reload, and finally a mov onto the stack.<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">What we’d like to generate is
something like:<o:p></o:p></p>
<p class="MsoPlainText"> pushl %ebp<o:p></o:p></p>
<p class="MsoPlainText"> movl %esp, %ebp<o:p></o:p></p>
<p class="MsoPlainText"> movl 8(%ebp), %eax<o:p></o:p></p>
<p class="MsoPlainText"> pushl 28(%eax)<o:p></o:p></p>
<p class="MsoPlainText"> pushl 24(%eax)<o:p></o:p></p>
<p class="MsoPlainText"> pushl 20(%eax)<o:p></o:p></p>
<p class="MsoPlainText"> pushl 16(%eax)<o:p></o:p></p>
<p class="MsoPlainText"> pushl 12(%eax)<o:p></o:p></p>
<p class="MsoPlainText"> pushl 8(%eax)<o:p></o:p></p>
<p class="MsoPlainText"> pushl 4(%eax)<o:p></o:p></p>
<p class="MsoPlainText"> pushl (%eax)<o:p></o:p></p>
<p class="MsoPlainText"> calll _foo<o:p></o:p></p>
<p class="MsoPlainText"> addl $32, %esp<o:p></o:p></p>
<p class="MsoPlainText"> popl %ebp<o:p></o:p></p>
<p class="MsoPlainText"> retl<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">To produce the code above, the
transformation has to run pre-reg-alloc. Otherwise, even if
we fold loads into the push, it's too late to recover from
spills.
<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">The direction I'd like to take with
this is:<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">1) Add an X86-specific
MachineFunctionPass that does the mov -> push
transformation and runs pre-reg-alloc.
<o:p></o:p></p>
<p class="MsoPlainText">It will:<o:p></o:p></p>
<p class="MsoPlainText">* Make a decision on whether promoting
some (or all) of the call sites to use pushes is worth
giving up on the reserved call frame.<o:p></o:p></p>
<p class="MsoPlainText">* If it is, perform the mov ->push
transformation for the selected call sites.<o:p></o:p></p>
<p class="MsoPlainText">* Fold loads into the pushes while
doing the transformation.
<o:p></o:p></p>
<p class="MsoPlainText">As an alternative, I could try to
teach the peephole optimizer to do it (right now it won't
even try to do this folding because PUSHes store to memory),
but getting it right in the general case seems complex.
<o:p></o:p></p>
<p class="MsoPlainText">I think I'd rather do folding of the
simple (but common) cases on the fly.<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">2) Alter the semantics of
ADJCALLSTACKDOWN/ADJCALLSTACKUP slightly.
<o:p></o:p></p>
<p class="MsoPlainText">Doing the mov->push transformation
before PEI means I'd have to leave the ADJCALLSTACKDOWN/UP
pair unbalanced.<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">E.g. something like:<o:p></o:p></p>
<p class="MsoPlainText">ADJCALLSTACKDOWN32 0,
%ESP<imp-def>, %EFLAGS<imp-def,dead>,
%ESP<imp-use>
<o:p></o:p></p>
<p class="MsoPlainText">%vreg9<def,dead> = COPY %ESP;
GR32:%vreg9 <o:p></o:p></p>
<p class="MsoPlainText">PUSH32rmm %vreg0, 1, %noreg, 28,
%noreg, %ESP<imp-def>, %ESP<imp-use>;
GR32:%vreg0
<o:p></o:p></p>
<p class="MsoPlainText">PUSH32rmm %vreg0, 1, %noreg, 24,
%noreg, %ESP<imp-def>, %ESP<imp-use>;
GR32:%vreg0
<o:p></o:p></p>
<p class="MsoPlainText">PUSH32rmm %vreg0, 1, %noreg, 20,
%noreg, %ESP<imp-def>, %ESP<imp-use>;
GR32:%vreg0
<o:p></o:p></p>
<p class="MsoPlainText">PUSH32rmm %vreg0, 1, %noreg, 16,
%noreg, %ESP<imp-def>, %ESP<imp-use>;
GR32:%vreg0
<o:p></o:p></p>
<p class="MsoPlainText">PUSH32rmm %vreg0, 1, %noreg, 12,
%noreg, %ESP<imp-def>, %ESP<imp-use>;
GR32:%vreg0
<o:p></o:p></p>
<p class="MsoPlainText">PUSH32rmm %vreg0, 1, %noreg, 8,
%noreg, %ESP<imp-def>, %ESP<imp-use>;
GR32:%vreg0
<o:p></o:p></p>
<p class="MsoPlainText">PUSH32rmm %vreg0, 1, %noreg, 4,
%noreg, %ESP<imp-def>, %ESP<imp-use>;
GR32:%vreg0
<o:p></o:p></p>
<p class="MsoPlainText">PUSH32rmm %vreg0<kill>, 1,
%noreg, 0, %noreg, %ESP<imp-def>, %ESP<imp-use>;
GR32:%vreg0<o:p></o:p></p>
<p class="MsoPlainText">CALLpcrel32 <ga:@foo>,
<regmask>, %ESP<imp-use>, %ESP<imp-def><o:p></o:p></p>
<p class="MsoPlainText">ADJCALLSTACKUP32 32, 0,
%ESP<imp-def>, %EFLAGS<imp-def,dead>,
%ESP<imp-use><o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">This, rightly, gets flagged by the
verifier.<o:p></o:p></p>
<p class="MsoPlainText">My proposal is to add an additional
parameter to ADJCALLSTACKDOWN to express the amount of
adjustment the call sequence itself does. This is somewhat
similar to the second parameter of ADKCALLSTACKUP which
allows adjustment for callee stack-clean-up. <o:p></o:p></p>
<p class="MsoPlainText">So, in this case, we will get a
"ADJCALLSTACKDOWN32 32, 32" instead of the
“ADJCALLSTACKDOWN32 0”. The verifier will be happy, and PEI
will know it doesn't need to do any stack pointer
adjustment.<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">Does this sound like the right
approach?<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">Any suggestions, as well as warnings
of potential pitfalls, are welcome. :-)<o:p></o:p></p>
<p class="MsoPlainText"> <o:p></o:p></p>
<p class="MsoPlainText">Thanks,<o:p></o:p></p>
<p class="MsoPlainText"> Michael<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p>---------------------------------------------------------------------<br>
Intel Israel (74) Limited<o:p></o:p></p>
<p>This e-mail and any attachments may contain confidential
material for<br>
the sole use of the intended recipient(s). Any review or
distribution<br>
by others is strictly prohibited. If you are not the
intended<br>
recipient, please contact the sender and delete all copies.<o:p></o:p></p>
<p class="MsoNormal" style="line-height:normal"><span
style="font-size:12.0pt;font-family:"Times New
Roman","serif""><br>
<br>
<br>
<o:p></o:p></span></p>
<pre>_______________________________________________<o:p></o:p></pre>
<pre>LLVM Developers mailing list<o:p></o:p></pre>
<pre><a moz-do-not-send="true" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a moz-do-not-send="true" href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a><o:p></o:p></pre>
<pre><a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><o:p></o:p></pre>
</blockquote>
<p class="MsoNormal" style="line-height:normal"><span
style="font-size:12.0pt;font-family:"Times New
Roman","serif""><o:p> </o:p></span></p>
</div>
<p>---------------------------------------------------------------------<br>
Intel Israel (74) Limited</p>
<p>This e-mail and any attachments may contain confidential
material for<br>
the sole use of the intended recipient(s). Any review or
distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.</p>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a class="moz-txt-link-freetext" href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a>
<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>
</pre>
</blockquote>
<br>
</body>
</html>