[libc-commits] [PATCH] D74397: [libc] Adding memcpy implementation for x86_64

Mon Mar 16 12:01:32 PDT 2020

sivachandra accepted this revision.
sivachandra marked an inline comment as done.
sivachandra added a comment.

I think what is outstanding wrt comments is very minor. So, feel free to land and we can iterate if required after landing.

================
Comment at: libc/src/string/CMakeLists.txt:83
+add_subdirectory(${LIBC_MEMCPY_IMPL_FOLDER})
+add_memcpy(memcpy MARCH native)
----------------
gchatelet wrote:
> sivachandra wrote:
> > It seems like this will not ensure the best flags. Is that the intention? If so, why?
> It will, `-march=native` enables all the features available on the host.
> Why do you think `-march=native` won't get the best flags?
> 
> That said if you're on a `skylake-avx512` machine it will not use avx512 instructions. This is because `-mprefer-vector-width` is defaulted to 256 bit width operations (see this [phoronix article](https://www.phoronix.com/scan.php?page=news_item&px=LLVM-Clang-10-AVX512-Change))
> 
> Currently, if on a `skylake-avx512` machine the implementation will be the same as the `avx` one. We would have to measure to be sure it's worth forcing `-mprefer-vector-width=512` as well.
I was of the opinion that the compilers do not have the complete set of capabilities available to them and that is why we have facilities like  `HWCAP`, `cpuid` etc. But, you are the expert, and if you say what you have is enough, I will take it :)

================
Comment at: libc/src/string/x86_64/CMakeLists.txt:1
+add_memcpy("memcpy_${LIBC_TARGET_MACHINE}_opt_none" REJECT "${ALL_CPU_FEATURES}")
+add_memcpy("memcpy_${LIBC_TARGET_MACHINE}_opt_sse" REQUIRE "SSE" REJECT "SSE2")
----------------
gchatelet wrote:
> sivachandra wrote:
> > These listings are already under `x86_64`. So, do we need to use `${LIBC_TARGET_MACHINE}`?
> This is because we redirect both x86 and x86_64 versions to this folder and they ought to have different names. We should probably rename this folder if it's unclear. Maybe `x86_multi` ?
Normal convention is to name it just `x86`. So, name it just `x86` then?

================
Comment at: libc/test/src/string/CMakeLists.txt:32
+  if(can_run)
+    compute_flags(flags MARCH native)
+    add_libc_unittest(
----------------
gchatelet wrote:
> sivachandra wrote:
> > sivachandra wrote:
> > > gchatelet wrote:
> > > > abrachet wrote:
> > > > > Would you mind explaining this? It seems like ${flags} will just be -march=native, and the work above to find flags gets ignored.
> > > > Actually there's no need for flags here, the implementation has already been compiled with the correct flags. The test itself doesn't need them.
> > > Same question for me as well. I left a related comment at a different place above.
> > Ah, so remove this line then?
> This is a two step process:
>  - the implementations get compiled with specific flags
>  - when testing we retrieve these flags and check whether the current host supports them
> 
> If they are compatible, the already compiled `.o` file can run on the host, the test file itself doesn't need to receive the flags (only the implementation)
> 
> This will be the same for benchmarking, the benchmarking code does not need to be compiled with `avx` support, only the code under test needs to be.
> 
> Do you think it deserves a comment?
AFAICT, `compute_flags` does not have any side effects. Also, it doesn't look like `${flags}` is used anywhere here. So, why call it at all?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74397/new/

https://reviews.llvm.org/D74397