<div dir="ltr">Well, yes. Thumb1 was not clear cut, but with Thumb2 there are I think only two possible things that can make Thumb very slightly slower than ARM:<div><br></div><div>1) needing an extra IT instruction to cast predication over following instructions</div><div>2) on some microarchitectures there might be a penalty for branching to an address that isn't 4-byte aligned. (probably not on recent ones)</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 15, 2018 at 4:39 AM, Tim Northover <span dir="ltr"><<a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Thu, 15 Nov 2018 at 12:25, Bruce Hoult <<a href="mailto:brucehoult@sifive.com">brucehoult@sifive.com</a>> wrote:<br>

> OK, I just checked, and -mcpu=cortex-{m3,m4,m7,a7,a9,<wbr>a15,a53} gives Thumb at -O1, -O1, -Os on the following gcc:<br>

<br>

</span>If anything I'd be inclined to just default to Thumb always. I haven't<br>

checked myself, but rumour has it the icache benefits make it faster<br>

than ARM code as well as smaller in most cases. My one worry there is<br>

with reset vectors, which I believe must be implemented in ARM in some<br>

cases; but since GCC itself appears to be inconsistent here, hopefully<br>

those people are already explicit about their needs.<br>

<br>

Cheers.<br>

<span class="HOEnZb"><font color="#888888"><br>

Tim.<br>

</font></span></blockquote></div><br></div>