[libc-commits] [PATCH] D74397: [libc] Adding memcpy implementation for x86_64

Siva Chandra via Phabricator via libc-commits libc-commits at lists.llvm.org
Thu Feb 20 22:33:08 PST 2020


sivachandra added inline comments.


================
Comment at: libc/src/string/CMakeLists.txt:78
+  compile_memcpy_with_flags(avx512 FLAGS -mavx512f)
+endif()
+
----------------
gchatelet wrote:
> sivachandra wrote:
> > gchatelet wrote:
> > > gchatelet wrote:
> > > > sivachandra wrote:
> > > > > abrachet wrote:
> > > > > > sivachandra wrote:
> > > > > > > gchatelet wrote:
> > > > > > > > This now creates the following memcpy implementations
> > > > > > > >  - `__llvm_libc::memcpy_x86_64_avx512`
> > > > > > > >  - `__llvm_libc::memcpy_x86_64_avx`
> > > > > > > >  - `__llvm_libc::memcpy_x86_64_sse2`
> > > > > > > >  - `__llvm_libc::memcpy_x86_64_sse`
> > > > > > > >  - `__llvm_libc::memcpy_x86_64_unopt`
> > > > > > > > 
> > > > > > > > For shared libc, we need an ifunc like trampoline to select the correct version.
> > > > > > > > For static libc, we need to select an implementation
> > > > > > > > 
> > > > > > > > @sivachandra how do you see this kind if code generation integrate into the more general cmake functions from `libc/cmake/modules/LLVMLibCRules.cmake`?
> > > > > > > > I expect other memory functions to follow the same scheme.
> > > > > > > Instead of building all the possible implementations, could we use the CMake [[ https://cmake.org/cmake/help/v3.14/command/try_compile.html | `try_compile` ]] and/or [[ https://cmake.org/cmake/help/v3.14/command/try_run.html | `try_run` ]] command to sniff out the best flags to use? I think `try_run` is more appropriate as I expect that we need to run the `cpuid` instruction? Also, compilers have a convenience macro `__cpuid` to run this instruction on x86/x86_64?
> > > > > > > 
> > > > > > > BTW, one can have ifuncs in static libraries as well. But, I do understand we want to avoid the overhead of the added indirection, so sniffing out at configure time is the best. If we can setup something for configure time sniffing, I believe we should be able use it (may be with straightforward extension/modification iff required) to use as the ifunc selector as well.
> > > > > > I think this is going to get quickly outside of the scope of this patch. It would probably be better to work on this in a separate patch.
> > > > > If there are any infrastructure pieces required to build a full solution, then we can do them separately as a prerequisite. However, if correct compile options are critical to memcpy implementation, then any code added to deduce them should belong to this patch. I agree that the patch will start to become too big. But, I think the additions to memcpy_utils and their tests can be split out to another prerequisite patch.
> > > > @sivachandra `try_compile` and `try_run` will only work when the host is also the target of the build. For instance it won't work when cross compiling.
> > > > 
> > > > There are a few dimensions here:
> > > > 1. static library
> > > >     - autodetect which implementation to use with `try_compile` / `try_run`
> > > >     - manually choose the implementation for cross compilation
> > > > 2. dynamic library
> > > >     - select a set of implementations to include in the library and the associated dispatch code
> > > > 3. test
> > > >     - test all the implementations that can run on the host
> > > > 
> > > > I agree with both of you that this is getting too big for this patch.
> > > > Let me do my homework and come back with the prerequisite patches.
> > > Let me think about it, I'll get back with the required building blocks.
> > We don't need to worry about #2 for now.
> > 
> > For cross-compilation in case of #1, if we set up the build vars appropriately, I believe it should not be a problem.
> > 
> > #3 is currently a problem because of the way the `add_entrypoint_object` rule works: it expects the entrypoint name and the target name to be the same.  We can add an optional argument to the rule which explicitly provides the entrypoint name. This will allow us to add multiple targets for the same entrypoint. We can setup our targets as follows:
> > 
> > ```
> > src/string/
> >     - x86_64 # x86_64 specific directory.
> >           - CMakeLists.txt # Lists the targets for the various
> >                            # x86_64 flavors which all use the
> >                            # single memcpy.cpp source file
> >     - CMakeLists.txt # Lists the target for the release version
> >                      # of memcpy built using the best compile
> >                      # for the target machine.
> >     - memcpy.cpp # The actual platform independent memcpy
> >                  # implementation.
> > ```
> > 
> > The same structure can be followed for tests as well. WDYT?
> @sivachandra SGTM
> Can you try to amend `add_entrypoint_object` in a separate patch?
> 
> Separately, I came up with a CMake function to test cpu flags on linux/mac this will help for `3` and `2`.
> 
> `x86` and `x86_64`will have the same implementation any suggestion on how to name the directory? `x86_32_64` ?
Done now: https://reviews.llvm.org/D74948

Conventionally, pieces common to x86_64 and other x86 flavors are all put together in a directory named `x86`. I do not have an easy to way to test on non-x86_64 machines. So, may be just start with `x86_64` for now and leave a comment that it works for other x86 flavors also? If you have a better way, feel free to propose.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74397/new/

https://reviews.llvm.org/D74397





More information about the libc-commits mailing list