[PATCH] D25344: Add a fast path to alignTo.

Thu Oct 6 15:34:48 PDT 2016

It is marginally faster on a "Intel(R) Xeon(R) CPU E5-2697 v2"

firefox
  master 6.521730037
  patch  6.53065974 1.00136922304x slower
chromium
  master 4.381491021
  patch  4.372600839 1.00203315654x faster
chromium fast
  master 1.847313003
  patch  1.840066086 1.0039384004x faster
the gold plugin
  master 0.326036955
  patch  0.323885574 1.00664241069x faster
clang
  master 0.550480887
  patch  0.547021193 1.00632460688x faster
llvm-as
  master 0.03225211
  patch  0.031885871 1.01148593369x faster
the gold plugin fsds
  master 0.355666359
  patch  0.353588254 1.00587718901x faster
clang fsds
  master 0.633038735
  patch  0.629498967 1.0056231514x faster
llvm-as fsds
  master 0.030099552
  patch  0.0297617 1.0113519053x faster
scylla
  master 2.908191778
  patch  2.900006405 1.00282253618x faster

Cheers,
Rafael


On 6 October 2016 at 17:58, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
> The attached test passes all tests. I will benchmark to see if it
> makes any difference.
>
> I also noticed a missing optimization. It would be nice if we could
> keep a single function but have the optimizer take care of it, so I
> tried
>
> uint64_t foo(uint64_t Value, uint64_t Align) {
>   return alignToNonP2(Value, 1 << Align);
> }
>
> but it still produces
>
> define i64 @_Z3foomm(i64 %Value, i64 %Align) local_unnamed_addr #0 {
> entry:
>   %sh_prom = trunc i64 %Align to i32
>   %shl = shl i32 1, %sh_prom
>   %conv = sext i32 %shl to i64
>   %add.i = add i64 %Value, -1
>   %sub.i = add i64 %add.i, %conv
>   %div.i = urem i64 %sub.i, %conv
>   %add2.i = sub i64 %sub.i, %div.i
>   ret i64 %add2.i
> }
>
> Changing 1 to 1ULL does cause us to optimize it
>
> define i64 @_Z3foomm(i64 %Value, i64 %Align) local_unnamed_addr #0 {
> entry:
>   %shl = shl i64 1, %Align
>   %add.i = add i64 %Value, -1
>   %sub.i = add i64 %add.i, %shl
>   %.not = sub i64 0, %shl
>   %add2.i = and i64 %sub.i, %.not
>   ret i64 %add2.i
> }
>
>
> Cheers,
> Rafael
>
> On 6 October 2016 at 17:00, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
>> On 6 October 2016 at 16:39, Davide Italiano <dccitaliano at gmail.com> wrote:
>>> On Thu, Oct 6, 2016 at 1:37 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>> Or to make alignTo accept only power of twos and fix code that passes
>>>> non-power-of-twos.
>>>>
>>>
>>> Do you know how many of these cases are in LLVM and if there are legitimate?
>>
>> Interesting idea. I added an assert and I am running the tests.
>>
>> Cheers,
>> Rafael