<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Courier;
panose-1:0 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">I work on a compiler that uses LLVM for its back end. I'm interested in setting just the low byte of a register, leaving the other bits alone, for some GC tag bit shenanigans, e.g.:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier">long replace_low_byte_with_37(long* a) {<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> return (*a & ~0xFFL) | 37;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier">}<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">x86_64 has a movb instruction that does exactly this, but I can't get clang (or any other compiler), to use movb for this purpose, even at -Os.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Here is the -Os -march=sandybridge compiler output for gcc-10.2, icc-21.1.9, and clang-11.0.1 (all different!), as well as how a simple movb assembles:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier">0000000000000000 <gcc>:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 0: 48 8b 07 mov (%rdi),%rax<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 3: 30 c0 xor %al,%al<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 5: 48 83 c8 25 or $0x25,%rax<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 9: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier">000000000000000a <icc>:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> a: 48 8b 07 mov (%rdi),%rax<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> d: 48 25 00 ff ff ff and $0xffffffffffffff00,%rax<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 13: 48 83 c0 25 add $0x25,%rax<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 17: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier">0000000000000018 <clang>:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 18: 48 c7 c0 00 ff ff ff mov $0xffffffffffffff00,%rax<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 1f: 48 23 07 and (%rdi),%rax<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 22: 48 83 c8 25 or $0x25,%rax<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 26: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier">0000000000000027 <simple_movb_by_hand>:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 27: 48 8b 07 mov (%rdi),%rax<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 2a: b0 25 mov $0x25,%al<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:Courier"> 2c: c3 retq<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">As you can see, movb would be smallest (and llvm's is the biggest). Size is important for my use case.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">So why don't these compilers generate movb? Perhaps the concern is partial register stalls and how %rax and %al interact with the register renamer. As I understand the
<a href="https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to">
background</a> from Peter Cordes referenced by <a href="https://bugs.llvm.org/show_bug.cgi?id=34707">
#34707</a>, the punchline is that since Sandy Bridge, and especially Skylake, the partial register stall is no big deal for an actual RMW operation like this.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">But even on CPUs where there is a stall that's worse than the added instructions from not using movb, -Os should still prefer movb.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I'm not advocating using this for %ah (etc.), which is
<a href="http://gallium.inria.fr/blog/intel-skylake-bug/">famously incorrect</a> in some Skylake and Kaby Lake CPUs without a microcode patch.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Is there a way to get LLVM to generate movb to set just the low byte?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
</body>
</html>