[PATCH] D25344: Add a fast path to alignTo.

Thu Oct 6 15:47:05 PDT 2016

Is alignTo actually showing up in profiles of real world code?

On Thu, Oct 6, 2016 at 3:34 PM, Rafael Espíndola via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> It is marginally faster on a "Intel(R) Xeon(R) CPU E5-2697 v2"
>
> firefox
>   master 6.521730037
>   patch  6.53065974 1.00136922304x slower
> chromium
>   master 4.381491021
>   patch  4.372600839 1.00203315654x faster
> chromium fast
>   master 1.847313003
>   patch  1.840066086 1.0039384004x faster
> the gold plugin
>   master 0.326036955
>   patch  0.323885574 1.00664241069x faster
> clang
>   master 0.550480887
>   patch  0.547021193 1.00632460688x faster
> llvm-as
>   master 0.03225211
>   patch  0.031885871 1.01148593369x faster
> the gold plugin fsds
>   master 0.355666359
>   patch  0.353588254 1.00587718901x faster
> clang fsds
>   master 0.633038735
>   patch  0.629498967 1.0056231514x faster
> llvm-as fsds
>   master 0.030099552
>   patch  0.0297617 1.0113519053x faster
> scylla
>   master 2.908191778
>   patch  2.900006405 1.00282253618x faster
>
> Cheers,
> Rafael
>
>
> On 6 October 2016 at 17:58, Rafael Espíndola <rafael.espindola at gmail.com>
> wrote:
> > The attached test passes all tests. I will benchmark to see if it
> > makes any difference.
> >
> > I also noticed a missing optimization. It would be nice if we could
> > keep a single function but have the optimizer take care of it, so I
> > tried
> >
> > uint64_t foo(uint64_t Value, uint64_t Align) {
> >   return alignToNonP2(Value, 1 << Align);
> > }
> >
> > but it still produces
> >
> > define i64 @_Z3foomm(i64 %Value, i64 %Align) local_unnamed_addr #0 {
> > entry:
> >   %sh_prom = trunc i64 %Align to i32
> >   %shl = shl i32 1, %sh_prom
> >   %conv = sext i32 %shl to i64
> >   %add.i = add i64 %Value, -1
> >   %sub.i = add i64 %add.i, %conv
> >   %div.i = urem i64 %sub.i, %conv
> >   %add2.i = sub i64 %sub.i, %div.i
> >   ret i64 %add2.i
> > }
> >
> > Changing 1 to 1ULL does cause us to optimize it
> >
> > define i64 @_Z3foomm(i64 %Value, i64 %Align) local_unnamed_addr #0 {
> > entry:
> >   %shl = shl i64 1, %Align
> >   %add.i = add i64 %Value, -1
> >   %sub.i = add i64 %add.i, %shl
> >   %.not = sub i64 0, %shl
> >   %add2.i = and i64 %sub.i, %.not
> >   ret i64 %add2.i
> > }
> >
> >
> > Cheers,
> > Rafael
> >
> > On 6 October 2016 at 17:00, Rafael Espíndola <rafael.espindola at gmail.com>
> wrote:
> >> On 6 October 2016 at 16:39, Davide Italiano <dccitaliano at gmail.com>
> wrote:
> >>> On Thu, Oct 6, 2016 at 1:37 PM, Rui Ueyama <ruiu at google.com> wrote:
> >>>> Or to make alignTo accept only power of twos and fix code that passes
> >>>> non-power-of-twos.
> >>>>
> >>>
> >>> Do you know how many of these cases are in LLVM and if there are
> legitimate?
> >>
> >> Interesting idea. I added an assert and I am running the tests.
> >>
> >> Cheers,
> >> Rafael
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161006/414398cf/attachment.html>