[libc-commits] [PATCH] D74397: [libc] Adding memcpy implementation for x86_64

Thu Feb 13 01:44:58 PST 2020

gchatelet added a comment.

In D74397#1872961 <https://reviews.llvm.org/D74397#1872961>, @sivachandra wrote:

> > The `rep movsb` instruction performance is highly tied to the targeted microarchitecture.
> >  For one there is the ERMS cpuid flag https://en.wikipedia.org/wiki/CPUID#EAX=7,_ECX=0:_Extended_Features that helps to know if it should be used at all.
> >  Then depending on the microarchitecture the crosspoint between aligned copy and `rep movsb` varies between 512 to a few kilobytes.
> >  Ideally we need to adapt this threshold somehow to provide the best implementation.
> > 
> > Eventually, when `rep movsb` becomes excellent we can replace the function entirely with this single instruction.
> > 
> > understand that `llvm-libc` is to be a pick and choose what you need libc which implies some sort of customization anyways. Am I right?
>
> Would configure time sniffing to detect what is available work? It will not work in case of cross-compilation. But, requiring explicit setting of build params for cross-compilation seems OK to me.

Yes that would work, we can map from micro architecture to parameters and be conservative when we don't know.
The possibility to force the configuration manually would be required for cross compilation as you noticed.

>>> A related question: considering we are using `__builtin_memcpy` and inline assembly, does the code work as is with MSVC?
>> 
>> No it doesn't, I can add a fallback case if you want but the generated code is not good right now.
>>  This means we'd need to specialize the `CopyRepMovsb` function so it uses the correct syntax for msvc `__asm` instead of the provided `LIBC_INLINE_ASM`
> 
> We don't need to provide the specialization in this patch, but pointing out which parts needs to be specialized would help.

I'll document the code then.

> 
> 
>> Quick question @sivachandra , how do we currently build `llvm-libc`?
>>  `utils/build_scripts` https://github.com/llvm/llvm-project/blob/master/libc/docs/source_layout.rst#the-utilsbuild_scripts-directory seems to be missing so  I could only build the entrypoints via transitive dependency through tests.
> 
> Sorry, that file is out of date. Will fix it soon.
>  Add the target of your entrypoint to the list of `DEPENDS` for the `llvmlibc` target here: https://github.com/llvm/llvm-project/blob/master/libc/lib/CMakeLists.txt
>  Running `ninja llvmlibc` after that will produce `libllvmlibc.a` which includes your entrypoint.

Perfect! Thx!

In D74397#1873189 <https://reviews.llvm.org/D74397#1873189>, @abrachet wrote:

> > See above, the generated code is improved (smaller and faster) when targeting specific architecture.
> >  glibc is using IFUNC to that matter and lets the runtime pick the best implementation.
> >  Although feasible in llvm-libc we noticed that the required extra level of indirection is hurting branch prediction and latency for small sizes (which are the most frequent)
>
> The resolver function is only called the first time that ld.so binds the symbol into the plt

Yes

> so I don't think it would hurt branch prediction any more than any other dynamic symbol.

Not more than dynamic symbol but still noticeable for small sizes.
That also mean one branch prediction entry is kept for each runtime dispatched function (at least `memcpy`, `memset`, `memmove`, `memcmp`). These functions are called every now and prevent other code from using them.

> Notably a big goal of llvm libc stated from Siva early on was static linking, so I am sure we will end up having a build time way of determining which specialization to call, as well as runtime with ifunc.

Yes, I agree that static linking is the primary goal, we can always provide an ifunc selected function on top of individual implementations.

>> How do we specify custom compilation flags? (We'd need -fno-builtin-memcpy to be passed in),
> 
> Were you asking how to express this in CMake?
> 
>   set_property(
>     SOURCE memcpy.cpp
>     PROPERTY COMPILE_OPTIONS
>     -fno-builtin-memcpy
>   )
> 
> 
> This worked in my (quick) testing. `ninja -v` will show the commands it's running so you can see if it worked here too.

Thx !

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74397/new/

https://reviews.llvm.org/D74397