[libcxx-commits] [PATCH] D63063: Bug 42208: speeding up std::merge

Denis Yaroshevskiy via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Sun Jul 28 14:58:12 PDT 2019


dyaroshev marked 2 inline comments as done.
dyaroshev added inline comments.


================
Comment at: include/algorithm:1725
     if (__n > 0)
         _VSTD::memmove(__result, __first, __n * sizeof(_Up));
     return __result + __n;
----------------
EricWF wrote:
> `__builtin_memmove` is constexpr, so I think using that is a better approach that branching on `is_constant_evaluated`.
Will do, thanks.


================
Comment at: include/algorithm:4394
+        if (__comp(*__first2, *__first1)) goto __takeSecond;
+        *__result = *__first1;
+        ++__first1, (void)++__result;
----------------
EricWF wrote:
> Everytime I've seen a duff's device optimization, it's a win is some cases and a loss in others. That makes me skeptical that it's the libraries job to perform the loop unrolling.
> Do you know why LLVM is failing to generate comparable code here?
1) Though you have a valid concern here - I have benchmarked this code front and center - I have not seen a pessimization. Do you have something you want me to try?
This is at most a 40% increase in binary size (still significantly less then with libstdc++) where I tried - so I would not expect sudden instruction cache spills or things like that.

I would point out that sometimes it's a 1.7 times win.

2) I do know - see https://bugs.llvm.org/show_bug.cgi?id=42313
This optimization requires jumping through the loop header - which optimizer cannot comfortably work at the moment.

>Eli Friedman 2019-06-19 14:27:29 PDT
>> Is this a fundamental llvm problem or it is solvable?
>
>
>The reason we generally avoid jump-threading across a loop header is that irreducible CFGs (like the "goto" version of your >function) generally don't optimize very well; we have a bunch of optimizations that only recognize proper loops.  But certain >loops really benefit from being transformed into irreducible CFGs; I think we've had similar reports before about state >machines before.  So the challenge is figuring out a good heuristic for when the transform is actually profitable.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63063/new/

https://reviews.llvm.org/D63063





More information about the libcxx-commits mailing list