[libc-commits] [PATCH] D150202: [libc] Add optimized memcpy for RISCV 64

Guillaume Chatelet via Phabricator via libc-commits libc-commits at lists.llvm.org
Tue May 9 07:34:41 PDT 2023


gchatelet created this revision.
gchatelet added a reviewer: sivachandra.
Herald added subscribers: libc-commits, VincentWu, vkmr, ecnelises, evandro, luismarques, sameer.abuasal, tschuett, s.egerton, Jim, benna, psnobl, PkmX, rogfer01, shiva0217, kito-cheng, simoncook, asb, kristof.beyls, arichardson.
Herald added projects: libc-project, All.
gchatelet requested review of this revision.
Herald added subscribers: pcwang-thead, eopXD.

This patch adds two versions of memcpy optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 only but it could be used for RISCV 32 or ARM 32 architectures as well.

Here is the before / after output of `libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Memcpy` on a quad core Linux starfive board running at 1.5GHz.

Before:

  Run on (4 X 1500 MHz CPU s)
  CPU Caches:
    L1 Instruction 32 KiB (x4)
    L1 Data 32 KiB (x4)
    L2 Unified 2048 KiB (x1)
  ------------------------------------------------------------------------
  Benchmark              Time             CPU   Iterations UserCounters...
  ------------------------------------------------------------------------
  BM_Memcpy/0/0        474 ns          474 ns      1483776 bytes_per_cycle=0.243492/s bytes_per_second=348.318M/s items_per_second=2.11097M/s __llvm_libc::memcpy,memcpy Google A
  BM_Memcpy/1/0        210 ns          209 ns      3649536 bytes_per_cycle=0.233819/s bytes_per_second=334.481M/s items_per_second=4.77519M/s __llvm_libc::memcpy,memcpy Google B
  BM_Memcpy/2/0       1814 ns         1814 ns       396288 bytes_per_cycle=0.247899/s bytes_per_second=354.622M/s items_per_second=551.402k/s __llvm_libc::memcpy,memcpy Google D
  BM_Memcpy/3/0       89.3 ns         89.2 ns      7459840 bytes_per_cycle=0.217415/s bytes_per_second=311.014M/s items_per_second=11.2071M/s __llvm_libc::memcpy,memcpy Google L
  BM_Memcpy/4/0        134 ns          134 ns      3815424 bytes_per_cycle=0.226584/s bytes_per_second=324.131M/s items_per_second=7.44567M/s __llvm_libc::memcpy,memcpy Google M
  BM_Memcpy/5/0       52.8 ns         52.6 ns     11001856 bytes_per_cycle=0.194893/s bytes_per_second=278.797M/s items_per_second=19.0284M/s __llvm_libc::memcpy,memcpy Google Q
  BM_Memcpy/6/0        180 ns          180 ns      4101120 bytes_per_cycle=0.231884/s bytes_per_second=331.713M/s items_per_second=5.55957M/s __llvm_libc::memcpy,memcpy Google S
  BM_Memcpy/7/0        195 ns          195 ns      3906560 bytes_per_cycle=0.232951/s bytes_per_second=333.239M/s items_per_second=5.1217M/s __llvm_libc::memcpy,memcpy Google U
  BM_Memcpy/8/0        152 ns          152 ns      4789248 bytes_per_cycle=0.227507/s bytes_per_second=325.452M/s items_per_second=6.58187M/s __llvm_libc::memcpy,memcpy Google W
  BM_Memcpy/9/0       6036 ns         6033 ns       118784 bytes_per_cycle=0.249158/s bytes_per_second=356.423M/s items_per_second=165.75k/s __llvm_libc::memcpy,uniform 384 to 4096

After:

  BM_Memcpy/0/0        126 ns          126 ns      5770240 bytes_per_cycle=1.04707/s bytes_per_second=1.46273G/s items_per_second=7.9385M/s __llvm_libc::memcpy,memcpy Google A
  BM_Memcpy/1/0       75.1 ns         75.0 ns     10204160 bytes_per_cycle=0.691143/s bytes_per_second=988.687M/s items_per_second=13.3289M/s __llvm_libc::memcpy,memcpy Google B
  BM_Memcpy/2/0        333 ns          333 ns      2174976 bytes_per_cycle=1.39297/s bytes_per_second=1.94596G/s items_per_second=3.00002M/s __llvm_libc::memcpy,memcpy Google D
  BM_Memcpy/3/0       49.6 ns         49.5 ns     16092160 bytes_per_cycle=0.710161/s bytes_per_second=1015.89M/s items_per_second=20.1844M/s __llvm_libc::memcpy,memcpy Google L
  BM_Memcpy/4/0       57.7 ns         57.7 ns     11213824 bytes_per_cycle=0.561557/s bytes_per_second=803.314M/s items_per_second=17.3228M/s __llvm_libc::memcpy,memcpy Google M
  BM_Memcpy/5/0       48.0 ns         47.9 ns     16437248 bytes_per_cycle=0.346708/s bytes_per_second=495.97M/s items_per_second=20.8571M/s __llvm_libc::memcpy,memcpy Google Q
  BM_Memcpy/6/0       67.5 ns         67.5 ns     10616832 bytes_per_cycle=0.614173/s bytes_per_second=878.582M/s items_per_second=14.8142M/s __llvm_libc::memcpy,memcpy Google S
  BM_Memcpy/7/0       84.7 ns         84.6 ns     10480640 bytes_per_cycle=0.819077/s bytes_per_second=1.14424G/s items_per_second=11.8174M/s __llvm_libc::memcpy,memcpy Google U
  BM_Memcpy/8/0       61.7 ns         61.6 ns     11191296 bytes_per_cycle=0.550078/s bytes_per_second=786.893M/s items_per_second=16.2279M/s __llvm_libc::memcpy,memcpy Google W
  BM_Memcpy/9/0        981 ns          981 ns       703488 bytes_per_cycle=1.52333/s bytes_per_second=2.12807G/s items_per_second=1019.81k/s __llvm_libc::memcpy,uniform 384 to 4096

It is not as good as glibc for now so there's room for improvement. I suspect a path pumping 16 bytes at once given the doubled numbers for large copies.

  BM_Memcpy/0/1        146 ns         82.5 ns      8576000 bytes_per_cycle=1.35236/s bytes_per_second=1.88922G/s items_per_second=12.1169M/s glibc memcpy,memcpy Google A
  BM_Memcpy/1/1        112 ns         63.7 ns     10634240 bytes_per_cycle=0.628018/s bytes_per_second=898.387M/s items_per_second=15.702M/s glibc memcpy,memcpy Google B
  BM_Memcpy/2/1        315 ns          180 ns      4079616 bytes_per_cycle=2.65229/s bytes_per_second=3.7052G/s items_per_second=5.54764M/s glibc memcpy,memcpy Google D
  BM_Memcpy/3/1       85.3 ns         43.1 ns     15854592 bytes_per_cycle=0.774164/s bytes_per_second=1107.45M/s items_per_second=23.2249M/s glibc memcpy,memcpy Google L
  BM_Memcpy/4/1        105 ns         54.3 ns     13427712 bytes_per_cycle=0.7793/s bytes_per_second=1114.8M/s items_per_second=18.4109M/s glibc memcpy,memcpy Google M
  BM_Memcpy/5/1       77.1 ns         43.2 ns     16476160 bytes_per_cycle=0.279808/s bytes_per_second=400.269M/s items_per_second=23.1428M/s glibc memcpy,memcpy Google Q
  BM_Memcpy/6/1        112 ns         62.7 ns     11236352 bytes_per_cycle=0.676078/s bytes_per_second=967.137M/s items_per_second=15.9387M/s glibc memcpy,memcpy Google S
  BM_Memcpy/7/1        131 ns         65.5 ns     11751424 bytes_per_cycle=0.965616/s bytes_per_second=1.34895G/s items_per_second=15.2762M/s glibc memcpy,memcpy Google U
  BM_Memcpy/8/1        104 ns         55.0 ns     12314624 bytes_per_cycle=0.583336/s bytes_per_second=834.468M/s items_per_second=18.1937M/s glibc memcpy,memcpy Google W
  BM_Memcpy/9/1        932 ns          466 ns      1480704 bytes_per_cycle=3.17342/s bytes_per_second=4.43321G/s items_per_second=2.14679M/s glibc memcpy,uniform 384 to 4096


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D150202

Files:
  libc/src/string/memory_utils/memcpy_implementations.h
  libc/src/string/memory_utils/utils.h
  libc/test/src/string/memory_utils/utils_test.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D150202.520697.patch
Type: text/x-patch
Size: 9191 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/libc-commits/attachments/20230509/334edd1d/attachment-0001.bin>


More information about the libc-commits mailing list