[cfe-dev] Difference in generated code between variadic parameter pack and manual version
Arthur O'Dwyer via cfe-dev
cfe-dev at lists.llvm.org
Mon Sep 21 04:42:08 PDT 2020
On Mon, Sep 21, 2020 at 7:02 AM Bart Samwel via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> Hi there folks,
>
> I wonder if anybody can shed some light on this. I'm looking at a function
> with a parameter pack argument and one without, that should do the exact
> same thing.
>
> https://godbolt.org/z/Keqzcj
>
> However, the version with the parameter pack expands (at -O3
> -march=broadwell, on clang 10.0.1, on godbolt) into a loop per 128 bytes,
> plus a loop per 64 bytes, plus nonvectorized instructions to process the
> remaining <=63 bytes. The manual version expands to just a loop per 128
> bytes (256-bit vectors, unrolled 4x), and nonvectorized instructions to
> process the remaining <=127 bytes.
>
It's about the fold expression.
https://godbolt.org/z/EPETj9
With C++17 fold-expressions, (args | ...) doesn't mean (arg1 | arg2 |
arg3); it means (arg1 | (arg2 | arg3)). So with the right-fold you wrote,
you're telling the compiler to OR the values together "right-to-left",
whereas the non-template version does it "left-to-right": ((arg1 | arg2) |
arg3). And apparently this makes some huge difference to the codegen (which
is still mysterious to me, but out of my depth).
Switch the right-fold to a left-fold and the codegen becomes identical, at
least to my eyes. (In the above Godbolt, put -DVARIADIC in one compiler
frame and nothing in the other.)
–Arthur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200921/a6b080f8/attachment.html>
More information about the cfe-dev
mailing list