[llvm-dev] Recent -Os code size regressions

Fri Nov 20 17:32:08 PST 2015

Hi Steve,

That bitmnp01 is affected strongly by this commit is no accident- it was
the main target of the patch. By if converting, many more opportunities for
good codegen are exposed because the benchmark is essentially just moving
and copying bit masks around.

The biggest uplift comes from identifying a bit reversal idiom which I have
patches for and will submit next week. The second biggest uplift comes from
identifying bit trickery and emitting good codegen for it - the ARM backend
uses the BFI instruction for this.

Your snippet appears to show a lot more spills and moves but not worse code
excluding those - in fact a store, test, jump sequence has become an and +
cmov. So it looks like the x86 backend is doing a poor job here.

James
On Sat, 21 Nov 2015 at 01:23, Steve King <steve at metrokings.com> wrote:

> On Fri, Nov 20, 2015 at 5:15 PM, Steve King <steve at metrokings.com> wrote:
> >
> > On Fri, Nov 20, 2015 at 5:06 PM, James Molloy <james at jamesmolloy.co.uk>
> wrote:
> > >
> > > Hi,
> > >
> > > We'd need to look precisely at what's causing the code size bloat. The
> midend commit pointed out by Steve shouldn't cause bloat in and of itself -
> it should reduce code size. It removes a load of stores and branches.
> > >
> > > I know a backend change I made to ARM isn't behaving as well as it
> could, and I have patches to fix that. Speculatively reverting midend
> patches isn't the best way to approach this, in my opinion! :)
> > >
> >
> > For i586, the effect of r252152 seems to cause cmoves instead of
> branches.
> >  Code size increase is +35% for i586.
> > Unfortunately the object files are wildly different in a way that does
> not seem to occur in other workloads.  I tried to clip a concise before and
> after case.
> >
> > Before
> > :
> > As a reference point, I found OR $0x408 and OR $0x810 in close proximity.
> >
> >
> >  278: 81 ca 10 08 00 00     or     $0x810,%edx
> >  27e: 89 10                 mov    %edx,(%eax)
> >  280: f6 c1 40
> >
> >     test   $0x40,%cl
> >  283: 74 08                 je     28d <t_run_test+0x28d>
> >  285: 81 ca 08 04 00 00     or     $0x408,%edx
> >  28b: 89 10                 mov    %edx,(%eax)
> >  28d: 84 c9                 test   %cl,%cl
> >  28f: 0f 89 34 01 00 00     jns    3c9 <t_run_test+0x3c9>
> >
> >
> > After
> > r252152:
> >
> > Note that the OR $0x408 and OR $0x810 come
> > now
> > in reverse order.
> >
> >
> > 35d: 81 c9 08 04 00 00     or     $0x408,%ecx
> > 363: 89 4c 24 28           mov    %ecx,0x28(%esp)
> > 367: 89 df                 mov    %ebx,%edi
> > 369: 83 e7 10
> >
> >       and    $0x10,%edi
> > 36c: 89 7c 24 20           mov    %edi,0x20(%esp)
> > 370: 0f 45 d1
> >
> >    cmovne %ecx,%edx
> > 373: 89 d7                 mov    %edx,%edi
> > 375: 81 cf 10 08 00 00     or     $0x810,%edi
> > 37b: 89 7c 24 14           mov    %edi,0x14(%esp)
> > 37f: 89 d9                 mov    %ebx,%ecx
> > 381: 83 e1 20
> >
> >        and    $0x20,%ecx
> > 384: 89 4c 24 1c           mov    %ecx,0x1c(%esp)
> > 388: 0f 45 d7
> >
> >       cmovne %edi,%edx
> > 38b: 89 d7                 mov    %edx,%edi
> >
> >
> > HTH,
> > -steve
> >
>
> And the ll source for this snippet:
>
>   %or105 = or i32 %.or83.or94, 1032
>   %.or83.or94.or105 = select i1 %tobool98, i32 %.or83.or94, i32 %or105
>   %and108 = and i32 %12, 32
>   %tobool109 = icmp eq i32 %and108, 0
>   %or116 = or i32 %.or83.or94.or105, 2064
>   %.or83.or94.or105.or116 = select i1 %tobool109, i32 %.or83.or94.or105,
> i32 %or116
>   %and119 = and i32 %12, 64
>   %tobool120 = icmp eq i32 %and119, 0
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151121/0c6c7897/attachment.html>