[all-commits] [llvm/llvm-project] c5dede: [libc] Add optimized memset for RISCV

Guillaume Chatelet via All-commits all-commits at lists.llvm.org
Mon May 15 00:31:16 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: c5dede880d175f7229c9b2923f4753e12702305d
      https://github.com/llvm/llvm-project/commit/c5dede880d175f7229c9b2923f4753e12702305d
  Author: Guillaume Chatelet <gchatelet at google.com>
  Date:   2023-05-15 (Mon, 15 May 2023)

  Changed paths:
    M libc/src/string/memory_utils/memset_implementations.h

  Log Message:
  -----------
  [libc]  Add optimized memset for RISCV

This patch adds two versions of `memset` optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.

Here is the before / after output of libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Memset on a quad core Linux starfive RISCV 64 board running at 1.5GHz.

Before
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
  L1 Instruction 32 KiB (x4)
  L1 Data 32 KiB (x4)
  L2 Unified 2048 KiB (x1)
------------------------------------------------------------------------
Benchmark              Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------
BM_Memset/0/0        506 ns          252 ns      2883584 bytes_per_cycle=0.238312/s bytes_per_second=340.908M/s items_per_second=3.96043M/s __llvm_libc::memset,memset Google A
BM_Memset/1/0        296 ns          189 ns      2900992 bytes_per_cycle=0.234589/s bytes_per_second=335.583M/s items_per_second=5.29382M/s __llvm_libc::memset,memset Google B
BM_Memset/2/0       2110 ns         1049 ns       678912 bytes_per_cycle=0.24687/s bytes_per_second=353.151M/s items_per_second=953.527k/s __llvm_libc::memset,memset Google D
BM_Memset/3/0        397 ns          254 ns      3055616 bytes_per_cycle=0.238479/s bytes_per_second=341.147M/s items_per_second=3.93224M/s __llvm_libc::memset,memset Google L
BM_Memset/4/0       1119 ns          621 ns      1079296 bytes_per_cycle=0.244925/s bytes_per_second=350.368M/s items_per_second=1.61047M/s __llvm_libc::memset,memset Google M
BM_Memset/5/0        605 ns          349 ns      1644544 bytes_per_cycle=0.241364/s bytes_per_second=345.274M/s items_per_second=2.8614M/s __llvm_libc::memset,memset Google Q
BM_Memset/6/0        472 ns          271 ns      2310144 bytes_per_cycle=0.238615/s bytes_per_second=341.341M/s items_per_second=3.68799M/s __llvm_libc::memset,memset Google S
BM_Memset/7/0        262 ns          143 ns      3956736 bytes_per_cycle=0.225812/s bytes_per_second=323.026M/s items_per_second=7.0087M/s __llvm_libc::memset,memset Google U
BM_Memset/8/0        454 ns          261 ns      2940928 bytes_per_cycle=0.238883/s bytes_per_second=341.725M/s items_per_second=3.82716M/s __llvm_libc::memset,memset Google W
BM_Memset/9/0       8768 ns         5998 ns       115712 bytes_per_cycle=0.249196/s bytes_per_second=356.478M/s items_per_second=166.724k/s __llvm_libc::memset,uniform 384 to 4096
```

After
```
BM_Memset/0/0        117 ns         69.5 ns      9761792 bytes_per_cycle=0.935152/s bytes_per_second=1.30639G/s items_per_second=14.3834M/s __llvm_libc::memset,memset Google A
BM_Memset/1/0       97.8 ns         58.5 ns     13002752 bytes_per_cycle=0.892814/s bytes_per_second=1.24725G/s items_per_second=17.0848M/s __llvm_libc::memset,memset Google B
BM_Memset/2/0        326 ns          163 ns      5156864 bytes_per_cycle=1.54408/s bytes_per_second=2.15706G/s items_per_second=6.1192M/s __llvm_libc::memset,memset Google D
BM_Memset/3/0        132 ns         65.4 ns     11455488 bytes_per_cycle=0.876411/s bytes_per_second=1.22433G/s items_per_second=15.2803M/s __llvm_libc::memset,memset Google L
BM_Memset/4/0        222 ns          120 ns      6405120 bytes_per_cycle=1.44398/s bytes_per_second=2.01722G/s items_per_second=8.30758M/s __llvm_libc::memset,memset Google M
BM_Memset/5/0        119 ns         79.2 ns      8930304 bytes_per_cycle=1.13327/s bytes_per_second=1.58317G/s items_per_second=12.6189M/s __llvm_libc::memset,memset Google Q
BM_Memset/6/0        123 ns         64.0 ns     11609088 bytes_per_cycle=1.008/s bytes_per_second=1.40817G/s items_per_second=15.6365M/s __llvm_libc::memset,memset Google S
BM_Memset/7/0       85.9 ns         52.1 ns     12423168 bytes_per_cycle=0.641164/s bytes_per_second=917.192M/s items_per_second=19.1937M/s __llvm_libc::memset,memset Google U
BM_Memset/8/0        114 ns         67.1 ns     10347520 bytes_per_cycle=0.911968/s bytes_per_second=1.274G/s items_per_second=14.9015M/s __llvm_libc::memset,memset Google W
BM_Memset/9/0       1326 ns          785 ns       907264 bytes_per_cycle=1.89716/s bytes_per_second=2.6503G/s items_per_second=1.27348M/s __llvm_libc::memset,uniform 384 to 4096
```

Again not as good as current glibc but it's a first step in the right direction.
```
BM_Memset/0/0        108 ns         53.6 ns     12894208 bytes_per_cycle=1.02858/s bytes_per_second=1.4369G/s items_per_second=18.668M/s glibc::memset,memset Google A
BM_Memset/1/0       84.6 ns         47.6 ns     14284800 bytes_per_cycle=1.00197/s bytes_per_second=1.39974G/s items_per_second=21.0256M/s glibc::memset,memset Google B
BM_Memset/2/0        160 ns         85.8 ns      8927232 bytes_per_cycle=3.30805/s bytes_per_second=4.62129G/s items_per_second=11.6596M/s glibc::memset,memset Google D
BM_Memset/3/0       78.9 ns         53.6 ns     13326336 bytes_per_cycle=1.14058/s bytes_per_second=1.59338G/s items_per_second=18.674M/s glibc::memset,memset Google L
BM_Memset/4/0       99.2 ns         60.8 ns     11460608 bytes_per_cycle=2.54751/s bytes_per_second=3.55884G/s items_per_second=16.4587M/s glibc::memset,memset Google M
BM_Memset/5/0       93.0 ns         56.1 ns     12219392 bytes_per_cycle=1.73379/s bytes_per_second=2.42207G/s items_per_second=17.8157M/s glibc::memset,memset Google Q
BM_Memset/6/0       89.4 ns         47.2 ns     14692352 bytes_per_cycle=1.34846/s bytes_per_second=1.88377G/s items_per_second=21.1795M/s glibc::memset,memset Google S
BM_Memset/7/0       84.0 ns         50.0 ns     14468096 bytes_per_cycle=0.911198/s bytes_per_second=1.27293G/s items_per_second=19.994M/s glibc::memset,memset Google U
BM_Memset/8/0       93.4 ns         52.8 ns     13063168 bytes_per_cycle=1.06642/s bytes_per_second=1.48977G/s items_per_second=18.9524M/s glibc::memset,memset Google W
BM_Memset/9/0        438 ns          241 ns      2853888 bytes_per_cycle=6.1185/s bytes_per_second=8.54744G/s items_per_second=4.15064M/s glibc::memset,uniform 384 to 4096
```

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D150433




More information about the All-commits mailing list