Hi Steve,<br><br>That bitmnp01 is affected strongly by this commit is no accident- it was the main target of the patch. By if converting, many more opportunities for good codegen are exposed because the benchmark is essentially just moving and copying bit masks around. <br><br>The biggest uplift comes from identifying a bit reversal idiom which I have patches for and will submit next week. The second biggest uplift comes from identifying bit trickery and emitting good codegen for it - the ARM backend uses the BFI instruction for this.<br><br>Your snippet appears to show a lot more spills and moves but not worse code excluding those - in fact a store, test, jump sequence has become an and + cmov. So it looks like the x86 backend is doing a poor job here. <br><br>James<br><div class="gmail_quote"><div dir="ltr">On Sat, 21 Nov 2015 at 01:23, Steve King <<a href="mailto:steve@metrokings.com">steve@metrokings.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">On Fri, Nov 20, 2015 at 5:15 PM, Steve King <<a href="mailto:steve@metrokings.com" target="_blank">steve@metrokings.com</a>> wrote:<br>><br>> On Fri, Nov 20, 2015 at 5:06 PM, James Molloy <<a href="mailto:james@jamesmolloy.co.uk" target="_blank">james@jamesmolloy.co.uk</a>> wrote:<br>> ><br>> > Hi,<br>> ><br>> > We'd need to look precisely at what's causing the code size bloat. The midend commit pointed out by Steve shouldn't cause bloat in and of itself - it should reduce code size. It removes a load of stores and branches.<br>> ><br>> > I know a backend change I made to ARM isn't behaving as well as it could, and I have patches to fix that. Speculatively reverting midend patches isn't the best way to approach this, in my opinion! :)<br>> ><br>><br>> For i586, the effect of r252152 seems to cause cmoves instead of branches.<br>> Code size increase is +35% for i586. <br>> Unfortunately the object files are wildly different in a way that does not seem to occur in other workloads. I tried to clip a concise before and after case.<br>><br>> Before<br>> :<br>> As a reference point, I found OR $0x408 and OR $0x810 in close proximity.<br>><br>><br>> 278: 81 ca 10 08 00 00 or $0x810,%edx<br>> 27e: 89 10 mov %edx,(%eax)<br>> 280: f6 c1 40<br>> <br>> test $0x40,%cl<br>> 283: 74 08 je 28d <t_run_test+0x28d><br>> 285: 81 ca 08 04 00 00 or $0x408,%edx<br>> 28b: 89 10 mov %edx,(%eax)<br>> 28d: 84 c9 test %cl,%cl<br>> 28f: 0f 89 34 01 00 00 jns 3c9 <t_run_test+0x3c9><br>><br>><br>> After<br>> r252152:<br>><br>> Note that the OR $0x408 and OR $0x810 come<br>> now<br>> in reverse order.<br>><br>><br>> 35d: 81 c9 08 04 00 00 or $0x408,%ecx<br>> 363: 89 4c 24 28 mov %ecx,0x28(%esp)<br>> 367: 89 df mov %ebx,%edi<br>> 369: 83 e7 10<br>> <br>> and $0x10,%edi<br>> 36c: 89 7c 24 20 mov %edi,0x20(%esp)<br>> 370: 0f 45 d1<br>> <br>> cmovne %ecx,%edx<br>> 373: 89 d7 mov %edx,%edi<br>> 375: 81 cf 10 08 00 00 or $0x810,%edi<br>> 37b: 89 7c 24 14 mov %edi,0x14(%esp)<br>> 37f: 89 d9 mov %ebx,%ecx<br>> 381: 83 e1 20<br>> <br>> and $0x20,%ecx<br>> 384: 89 4c 24 1c mov %ecx,0x1c(%esp)<br>> 388: 0f 45 d7<br>> <br>> cmovne %edi,%edx<br>> 38b: 89 d7 mov %edx,%edi<br>> <br>><br>> HTH,<br>> -steve<br>><br><br></div><div dir="ltr">And the ll source for this snippet:<br><br><font face="monospace, monospace"> %or105 = or i32 %.or83.or94, 1032<br> %.or83.or94.or105 = select i1 %tobool98, i32 %.or83.or94, i32 %or105<br> %and108 = and i32 %12, 32<br> %tobool109 = icmp eq i32 %and108, 0<br> %or116 = or i32 %.or83.or94.or105, 2064<br> %.or83.or94.or105.or116 = select i1 %tobool109, i32 %.or83.or94.or105, i32 %or116<br> %and119 = and i32 %12, 64<br> %tobool120 = icmp eq i32 %and119, 0</font><br><br></div>
</blockquote></div>