[libc-commits] [PATCH] D148717: [libc] Improve memcmp latency and codegen
Guillaume Chatelet via Phabricator via libc-commits
libc-commits at lists.llvm.org
Fri Jun 30 05:04:07 PDT 2023
gchatelet marked an inline comment as done.
gchatelet added inline comments.
================
Comment at: libc/src/string/memory_utils/utils.h:198-201
+ // cmp rdi, rsi <- serializing
+ // mov ecx, -5 <- can be done in parallel
+ // mov eax, 5 <- can be done in parallel
+ // cmovb eax, ecx <- serializing
----------------
nafi3000 wrote:
> gchatelet wrote:
> > lntue wrote:
> > > I wonder what's the tradeoffs between this and what is generated for 1 and -1? If this is better, then the compiler should just use this for 1 and -1 also, right?
> > > I wonder what's the tradeoffs between this and what is generated for 1 and -1? If this is better, then the compiler should just use this for 1 and -1 also, right?
> >
> > x86 does not have conditional negate and codegen for returning 1 and -1 has higher latency.
> > ```
> > xor eax, eax
> > cmp rdi, rsi <- serializing
> > sbb eax, eax <- dep on previous instruction
> > or eax, 1 <- dep on previous instruction
> > ```
> >
> > I think the tradeoff is around register pressure, in the `-1` / `1` case we just need `eax` at the expense of a longer dependency chain.
> > In the `-5` / `5` case we need `ecx` on top of `eax` but the dependency chain is shorter and then latency is reduced. Since latency matters for `memcmp` it makes more sense to use this construct.
> >
> > Now TBH I haven't measured that the overall generated code is better but I'll run a few tests before landing.
> >
> > https://godbolt.org/z/Gqahv7r7e
> The compiler could have also used `edi` or `esi` instead of `ecx`. Would that cause slightly lower register pressure? E.g. why is it not doing something like:
> ```
> cmp rdi, rsi
> mov edi, -5
> mov eax, 5
> cmovb eax, edi
> ```
> The compiler could have also used `edi` or `esi` instead of `ecx`. Would that cause slightly lower register pressure? E.g. why is it not doing something like:
> ```
> cmp rdi, rsi
> mov edi, -5
> mov eax, 5
> cmovb eax, edi
> ```
Not exactly sure why, it may first use available registers (greedy algorithm) and then tries extra hard to reuse but only is it necessary?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D148717/new/
https://reviews.llvm.org/D148717
More information about the libc-commits
mailing list