<div dir="ltr">li a0,-1<div>srli a0,a0,0x20</div><div><br></div><div>... works for me. Both 16 bit instructions. And similar for any other sequence of 0s in hi bits followed by 1s in lo.</div><div><br></div><div>And indeed, yes, the divu() as a whole would be better as:</div><div><br></div><div>slli a0,a0,0x20</div><div>slli a1,a1,0x20</div><div>srli a0,a0,0x20</div><div>srli a1,a1,0x20</div><div>divu a0,a0,a1</div><div>sext.w a0,a0</div><div>ret</div><div><br></div><div>(scheduled for a dual-issue machine. Would be different for a machine with macro-op fusion)</div><div><br></div><div>Really looking forward to 64 bit in upstream!</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Oct 4, 2018 at 12:21 AM, Alex Bradbury <span dir="ltr"><<a href="mailto:asb@lowrisc.org" target="_blank">asb@lowrisc.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Thu, 4 Oct 2018 at 08:03, Bruce Hoult via llvm-dev<br>

<<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br>

><br>

> Now rebased to ToT, as of now.<br>

><br>

> All that mess in divu is the same as is generated from:<br>

><br>

> long foo(){<br>

>     return 0x00000000ffffffffl;<br>

> }<br>

><br>

> 0000000000000000 <foo>:<br>

>    0: 00000537          lui a0,0x0<br>

>    4: 0005059b          sext.w a1,a0<br>

>    8: 1582                slli a1,a1,0x20<br>

>    a: 357d                addiw a0,a0,-1<br>

>    c: 1502                slli a0,a0,0x20<br>

>    e: 9101                srli a0,a0,0x20<br>

>   10: 8d4d                or a0,a0,a1<br>

>   12: 8082                ret<br>

><br>

> For sure that's not the best way to generate that constant!<br>

<br>

</span>Definitely not. That pattern was a placeholder just to produce<br>

something correct. The list of changes described in the RFC describes<br>

the work implemented to end up with mostly reasonable-looking codegen.<br>

I'm hoping to start posting these to phabricator later today.<br>

<br>

That constant takes 3 instructions with smarter 64-bit immediate<br>

materialisation. For zext i32 -> i64 you'd prefer to perform two<br>

shifts, unless you can CSE the mask.<br>

<br>

Best,<br>

<br>

Alex<br>

</blockquote></div><br></div>