<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hi James,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I don't think it's going to be possible to avoid all false positives, while<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">still detecting a reasonable amount of invalid code. I've tried to pick the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">rules to balance the two, rather than sticking to one extreme, and that's the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">reason I've made these warnings rather than errors. That said, this is a<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">prototype, and there's obviously room for improvement.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I think it would be useful to allow the user to suppress these warnings, both<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">on a per-rule and per-statement basis. That's a bit hard to do with the checks<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">happening in the backend, but maybe if Reid's suggestion of moving this into<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">clang works out, that would make this easier.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> This will warn if you write to a hardcoded register without having marked it<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> clobbered. But that's perfectly valid if you've first copied the old value<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> somewhere (which, I'll note, also requires reading the register which is not<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> declared as an input!), and then restore it before leaving the inline-asm.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> You might be able to salvage this warning by only warning on a write to an<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> arbitrary register if there hasn't previously been a read of that register<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> (on the assumption that the read might have saved it).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Do you know of any cases where this is a useful pattern in real code? I'd<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">expect that it would be better to simply mark any used registers as clobbered,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">and let the compiler save them if they happen to be live at that point.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> This triggers on store instructions without having either declared an output<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> memory operand or marked memory as clobbered. But it's valid to store to<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> memory, so long as you aren't writing to memory which is accessible to the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> compiler. A special case of that is writing to the stack (as you've noted in<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> your test-case), but it applies just as well to other inaccessible memory as<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> well.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Maybe we could add a heuristic to allow memory accesses if the stack pointer is<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">written to, but I don't think there's any way to avoid the warning in every<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">valid case without completely disabling it, thus failing to diagnose the common<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">mistakes.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> Some instructions may not have their effects fully described, when they're<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> not generated by the compiler. In fact, some may not be _able_ to be<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> described (e.g. the syscall/trap/software-interrupt instruction might read or<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> write any registers). You'll probably need to just give up any<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> dataflow-sensitive analysis if you see one of these.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Good point, we should probably assume that any register listed as clobbered or<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">a physical register operand is set by these instructions, so we don't emit the<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">"reading an undefined register" warnings after them. I think we can also add<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">call instructions to that list, as a regular function call could also write<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">registers, and might not even be using a standard calling convention.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> As you noted, as soon as there's any control flow, you're giving incorrect<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> warnings. Again just giving up on any dataflow-sensitive warnings might make<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> sense here.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Yes, I think we need to suppress some of the warnings if there is control flow<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">inside the assembly. I think the way to do this would be to buffer these<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">warnings until the end of the block, and only emit them if<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">MCInstrDesc::mayAffectControlFlow was false for every instruction.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> Also, what happens to the substitution-tracking and analysis when there's a<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> macro call (or even definition)? Is this getting data pre- or post-<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> assembly-macro expansion?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">This happens after assembly macro expansion, so it sees every instruction in<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">the order in which it gets sent to the MCStreamer.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Oliver<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> James Y Knight [mailto:jyknight@google.com]
<br>
<b>Sent:</b> 26 November 2018 21:42<br>
<b>To:</b> Oliver Stannard<br>
<b>Cc:</b> llvm-dev; nd<br>
<b>Subject:</b> Re: [llvm-dev] [RFC] Checking inline assembly for validity<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">This sort of verification definitely seems interesting. However, many of the checks you've added look like they'll have false positives in perfectly valid inline-asm. IMO, a warning for incorrect inline-asm is only going to be useful if
can avoid triggering on correct code -- it should trigger only when it's can be _certain_ that there's an error.<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Some examples of where this can go wrong in your current patch:<o:p></o:p></p>
</div>
<div>
<div>
<p class="MsoNormal">- This will warn if you write to a hardcoded register without having marked it clobbered. But that's perfectly valid if you've first copied the old value somewhere (which, I'll note, also requires reading the register which is not declared
as an input!), and then restore it before leaving the inline-asm. You might be able to salvage this warning by only warning on a write to an arbitrary register if there hasn't previously been a read of that register (on the assumption that the read might have
saved it).<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">- This triggers on store instructions without having either declared an output memory operand or marked memory as clobbered. But it's valid to store to memory, so long as you aren't writing to memory which is accessible to the compiler.
A special case of that is writing to the stack (as you've noted in your test-case), but it applies just as well to other inaccessible memory as well.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">- Some instructions may not have their effects fully described, when they're not generated by the compiler. In fact, some may not be _able_ to be described (e.g. the syscall/trap/software-interrupt instruction might read or write any registers).
You'll probably need to just give up any dataflow-sensitive analysis if you see one of these.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">- As you noted, as soon as there's any control flow, you're giving incorrect warnings. Again just giving up on any dataflow-sensitive warnings might make sense here.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Also, what happens to the substitution-tracking and analysis when there's a macro call (or even definition)? Is this getting data pre- or post- assembly-macro expansion?<o:p></o:p></p>
</div>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Mon, Nov 26, 2018 at 5:55 AM Oliver Stannard via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm">
<p class="MsoNormal">GCC-style inline assembly is notoriously hard to write correctly, because it is<br>
the user's responsibility to tell the compiler about the requirements of the<br>
assembly (inputs, output, modified registers, memory access), and getting this<br>
wrong results in silently generating incorrect code. This is also dependent on<br>
register allocation and scheduling decisions made by the compiler, so an inline<br>
assembly statement may appear to work correctly, then silently break when<br>
another change to the code or compiler upgrade causes those decisions to<br>
change.<br>
<br>
I've posted a prototype patch at <a href="https://reviews.llvm.org/D54891" target="_blank">
https://reviews.llvm.org/D54891</a> which tries to<br>
improve this situation by emitting diagnostics when the instructions inside the<br>
inline assembly string do not match the operands to the inline assembly<br>
statement. We can do this because we parse the assembly in the same process as<br>
the compiler, and the MC layer has some knowledge of which registers are read<br>
and written by an assembly instruction.<br>
<br>
For example, this C code, which tries to add 3 integers together, looks OK at<br>
first glance:<br>
<br>
int add3(int a, int b, int c) {<br>
int x;<br>
asm volatile(<br>
"add %0, %1, %2; add %0, %0, %3"<br>
: "=r" (x)<br>
: "r" (a), "r" (b), "r" (c));<br>
return x;<br>
}<br>
<br>
However, the compiler is allowed to allocate x (the output operand) and c (an<br>
input operand) to the same register, as it assumes that c will be read before x<br>
is written. This code might happen to work correctly when first written, but a<br>
later change (either in the source code or compiler) could cause register<br>
allocation to change, and this will start silently generating the "wrong" code.<br>
<br>
With this patch, the compiler emits this warning for the above code:<br>
<br>
test.cpp:4:7: warning: read from an inline assembly operand after first<br>
output operand written, suggest adding early-clobber<br>
modifier to output operand [-Winline-asm]<br>
"add %0, %1, %2; add %0, %0, %3"<br>
^<br>
<inline asm>:1:30: note: instantiated into assembly here<br>
add r0, r0, r1; add r0, r0, r2<br>
^<br>
test.cpp:4:7: note: output operand written here<br>
"add %0, %1, %2; add %0, %0, %3"<br>
^<br>
<inline asm>:1:6: note: instantiated into assembly here<br>
add r0, r0, r1; add r0, r0, r2<br>
^<br>
<br>
The warning is a bit noisy because it prints both the original source code and<br>
final assembly code, and the carets for the original source just point to the<br>
start of the line, but that's a separate issue.<br>
<br>
I've designed this to be independent of register allocation decisions, so the<br>
above diagnostic is always reported, regardless of whether x and c were<br>
allocated in the same register or not. This allows it to catch code which<br>
currently works but might break in future.<br>
<br>
The way this works is:<br>
- When the AsmParser reaches an INLINEASM MachineInstr, it creates an<br>
InlineAsmDataflowChecker object, which tracks all of the information we need<br>
to know about one inline assembly statement.<br>
- The AsmPrinter examines the operands to the MachineInstruction, and calls<br>
functions on the tracking object to record information about each operand to<br>
the inline assembly block, as provided by the user and relied on by the<br>
compiler's optimisations and code-generation.<br>
- While the AsmParser is generating the final assembly string (which involves<br>
expanding operand templates like "$0" into physical register names), it<br>
records the offset from the start of the (output) string at which each<br>
operand expansion appeared.<br>
- The table-generated assembly matcher is modified to record the index of the<br>
MCParsedAsmOperand which resulted in in the creation of each MCOperand. An<br>
MCParsedAsmOperand can create multiple MCOperands (for example, a memory<br>
operand with base and offset), but not the other way round, so this<br>
information is stored in the MCOperand.<br>
- When the AsmParser is running and a tracking object is present (it is only<br>
present if we are parsing inline assembly), it records in each ParsedOperand<br>
the indexes of the inline assembly statement operands which overlap with it.<br>
This is done using the source location of the just-parsed MCParsedAsmOperand<br>
and the string offsets recored in the tracking object by the AsmPrinter.<br>
- Finally, after an instruction has been completely parsed, the AsmParser calls<br>
into the tracking object with the final MCInst and list of<br>
MCParsedAsmOperands. With all of this information, we can match up inline<br>
assembly operands to the MCOperands that were created for them, and check<br>
that they match.<br>
<br>
The reason that I'm posting this as an RFC is that it adds a lot of coupling<br>
between different parts of the backend. Do people think this is an acceptable<br>
cost for making a quite user-hostile feature a bit safer? If people agree that<br>
this is worthwhile, there are some things I still need to do before the patch<br>
is ready for a proper review:<br>
<br>
- Check the memory overhead (during regular compilation) of storing the parsed<br>
operand number in the MCOperand. If this is too high, it could be moved to a<br>
separate data structure only created when parsing inline assmebly.<br>
- There are three different concepts of "operand" here (inline assembly<br>
operand, MCParsedAsmOperand and MCOperand), tidy up the naming so that these<br>
are a bit clearer.<br>
- For checks which depend on the order of instructions (for example, checking<br>
that a clobbered register is only read from if it has previously been written<br>
to), these could give false positives if there is control-flow in the<br>
assembly. We could check if an MCInst affects control flow, buffer these<br>
diagnostics until the end of the assembly block, and only emit them if there<br>
are no control-flow instructions.<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><o:p></o:p></p>
</blockquote>
</div>
</div>
</div>
</body>
</html>