[libc-commits] [PATCH] D74397: [libc] Adding memcpy implementation for x86_64

Wed Mar 18 09:47:31 PDT 2020

gchatelet added inline comments.

================
Comment at: libc/src/string/CMakeLists.txt:83
+add_subdirectory(${LIBC_MEMCPY_IMPL_FOLDER})
+add_memcpy(memcpy MARCH native)
----------------
sivachandra wrote:
> gchatelet wrote:
> > sivachandra wrote:
> > > It seems like this will not ensure the best flags. Is that the intention? If so, why?
> > It will, `-march=native` enables all the features available on the host.
> > Why do you think `-march=native` won't get the best flags?
> > 
> > That said if you're on a `skylake-avx512` machine it will not use avx512 instructions. This is because `-mprefer-vector-width` is defaulted to 256 bit width operations (see this [phoronix article](https://www.phoronix.com/scan.php?page=news_item&px=LLVM-Clang-10-AVX512-Change))
> > 
> > Currently, if on a `skylake-avx512` machine the implementation will be the same as the `avx` one. We would have to measure to be sure it's worth forcing `-mprefer-vector-width=512` as well.
> I was of the opinion that the compilers do not have the complete set of capabilities available to them and that is why we have facilities like  `HWCAP`, `cpuid` etc. But, you are the expert, and if you say what you have is enough, I will take it :)
With `-march=native` the compiler will introspect the CPU with [cpuid](https://github.com/llvm/llvm-project/blob/master/clang/lib/Headers/cpuid.h) and detect the available capabilities. We're deferring to the compiler here.

Now for shared libraries and runtime dispatch we'll have to provide such code but we're not there yet.

================
Comment at: libc/test/src/string/CMakeLists.txt:32
+  if(can_run)
+    compute_flags(flags MARCH native)
+    add_libc_unittest(
----------------
sivachandra wrote:
> gchatelet wrote:
> > sivachandra wrote:
> > > sivachandra wrote:
> > > > gchatelet wrote:
> > > > > abrachet wrote:
> > > > > > Would you mind explaining this? It seems like ${flags} will just be -march=native, and the work above to find flags gets ignored.
> > > > > Actually there's no need for flags here, the implementation has already been compiled with the correct flags. The test itself doesn't need them.
> > > > Same question for me as well. I left a related comment at a different place above.
> > > Ah, so remove this line then?
> > This is a two step process:
> >  - the implementations get compiled with specific flags
> >  - when testing we retrieve these flags and check whether the current host supports them
> > 
> > If they are compatible, the already compiled `.o` file can run on the host, the test file itself doesn't need to receive the flags (only the implementation)
> > 
> > This will be the same for benchmarking, the benchmarking code does not need to be compiled with `avx` support, only the code under test needs to be.
> > 
> > Do you think it deserves a comment?
> AFAICT, `compute_flags` does not have any side effects. Also, it doesn't look like `${flags}` is used anywhere here. So, why call it at all?
Yes thank you it was a leftover.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74397/new/

https://reviews.llvm.org/D74397