[PATCH] [x86] @llvm.ctpop.v8i32 custom lowering
Bruno Cardoso Lopes
bruno.cardoso at gmail.com
Wed May 20 13:47:28 PDT 2015
Hi Chandler,
Started investigating the http://wm.ite.pl/articles/sse-popcount.html
approach (called sselookup in the numbers below). As expected, it
peforms better on most part of the cases.
One unexpected case is v8i32-avx2. Although sselookup and
parallelbitmath vary in which runs faster, I've seen the latter
yielding slightly better results in multiple runs. I would expect
sselookup to always be faster because it has fewer instructions but
looks like there's some latency/resource conflict issue going on.
Still need to clean up the patch and add testcases but if you're
curious in the meantime about the implementation or have any
suggestions, I've made the assembly files available at
https://github.com/bcardosolopes/llvm-vpopcount
== v4i32-avx:
sselookup (v4i32): 1.10211
scalar + ctpop (v4i32): 0.907016 <-- best == ToT
parallelbitmath (v4i32): 1.14124
== v8i32-avx:
sselookup (v8i32): 1.97514 <-- best == patch
scalar + ctpop (v8i32): 2.37118
== v8i32-avx2:
sselookup (v8i32): 1.17823
parallelbitmath (v8i32): 1.15288 <-- best == ToT
== v2i64-avx:
scalar + ctpop (v2i64): 0.589292 <-- best == ToT
sselookup (v2i64): 0.865797
parallelbitmath (v2i64): 1.31027
== v4i64-avx:
scalar + ctpop (v4i64): 0.903523 <-- best == ToT
sselookup (v4i64): 1.11988
== v4i64-avx2:
scalar + ctpop (v4i64): 0.895486
sselookup (v4i64): 0.677801 <-- best == patch
parallelbitmath (v4i64): 1.02711
== v16i8-avx:
scalar + ctpop (v16i8): 4.1569
sselookup (v16i8): 0.508693 <-- best == patch
== v32i8-avx:
scalar + ctpop (v32i8): 8.32336
sselookup (v32i8): 0.961657 <-- best == patch
== v32i8-avx2:
scalar + ctpop (v32i8): 8.79509
sselookup (v32i8): 0.487716 <-- best == patch
== v8i16-avx:
scalar + ctpop (v8i16): 1.86908
sselookup (v8i16): 0.755885 <-- best == patch
== v16i16-avx:
scalar + ctpop (v16i16): 4.08575
sselookup (v16i16): 1.32838 <-- best == patch
== v16i16-avx2:
scalar + ctpop (v16i16): 4.19101
sselookup (v16i16): 1.18095 <-- best == patch
Sorry for taking that long.
Cheers,
On Thu, Apr 30, 2015 at 11:54 AM, Bruno Cardoso Lopes
<bruno.cardoso at gmail.com> wrote:
> I guess this makes it four months later now =T
>
> Really sorry for the late reply. You're totally right, my bad I
> haven't tackled this from my priority list despite promises.
> However, I intend to resume this work in one week or two, but fell
> free to revert it if that's sounds like another vague promise :-)
>
> Cheers,
>
> On Sun, Mar 29, 2015 at 6:15 PM, Chandler Carruth <chandlerc at gmail.com> wrote:
>> And three months later, you still haven't implemented the requested changes during code review.
>>
>> Please do so, and quickly. I'm really unhappy about the behavior of promising to make changes requested during code review in order to get past code review, and then failing to follow through on them. I'm very tempted to just revert the patch until you actually have time to address this fully.
>>
>>
>> http://reviews.llvm.org/D6531
>>
>> EMAIL PREFERENCES
>> http://reviews.llvm.org/settings/panel/emailpreferences/
>>
>>
>
>
>
> --
> Bruno Cardoso Lopes
> http://www.brunocardoso.cc
--
Bruno Cardoso Lopes
http://www.brunocardoso.cc
More information about the llvm-commits
mailing list