[llvm] [Support] Import SipHash c reference implementation. (PR #94393)

Ahmed Bougacha via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 4 13:03:47 PDT 2024


https://github.com/ahmedbougacha created https://github.com/llvm/llvm-project/pull/94393

This brings the unmodified SipHash reference implementation:
  https://github.com/veorq/SipHash
which has been very graciously licensed under our llvm license (Apache-2.0 WITH LLVM-exception) by Jean-Philippe Aumasson.

SipHash is a lightweight hash function optimized for speed on short messages.  We use it as part of the AArch64 ptrauth ABI (in arm64e and ELF PAuth) to generate discriminators based on language identifiers and mangled names.

This commit brings the reference implementation (siphash.c at f26d35e) unmodified, as SipHash.cpp.

Next, we will integrate it properly into libSupport, with a wrapping API suited for the ptrauth use-case.

>From 4a89f3578935f76880fccb439a4d7e3bec3f3c37 Mon Sep 17 00:00:00 2001
From: Ahmed Bougacha <ahmed at bougacha.org>
Date: Tue, 4 Jun 2024 09:51:22 -0700
Subject: [PATCH] [Support] Import SipHash c reference implementation.

This brings the unmodified SipHash reference implementation:
  https://github.com/veorq/SipHash
which has been very graciously licensed under our llvm license
(Apache-2.0 WITH LLVM-exception) by Jean-Philippe Aumasson.

SipHash is a lightweight hash function optimized for speed on short
messages.  We use it as part of the AArch64 ptrauth ABI (in arm64e and
ELF PAuth) to generate discriminators based on language identifiers and
mangled names.

This commit brings the reference implementation (siphash.c at f26d35e)
unmodified, as SipHash.cpp.

Next, we will integrate it properly into libSupport, with a wrapping API
suited for the ptrauth use-case.
---
 llvm/lib/Support/CMakeLists.txt    |   3 +
 llvm/lib/Support/README.md.SipHash | 126 ++++++++++++++++++++
 llvm/lib/Support/SipHash.cpp       | 185 +++++++++++++++++++++++++++++
 3 files changed, 314 insertions(+)
 create mode 100644 llvm/lib/Support/README.md.SipHash
 create mode 100644 llvm/lib/Support/SipHash.cpp

diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index 5df36f811efe9..7cc01a5399911 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -127,6 +127,9 @@ endif()
 
 add_subdirectory(BLAKE3)
 
+# Temporarily ignore SipHash.cpp before we fully integrate it into LLVMSupport.
+set(LLVM_OPTIONAL_SOURCES SipHash.cpp)
+
 add_llvm_component_library(LLVMSupport
   ABIBreak.cpp
   AMDGPUMetadata.cpp
diff --git a/llvm/lib/Support/README.md.SipHash b/llvm/lib/Support/README.md.SipHash
new file mode 100644
index 0000000000000..4de3cd1854681
--- /dev/null
+++ b/llvm/lib/Support/README.md.SipHash
@@ -0,0 +1,126 @@
+# SipHash
+
+[![License:
+CC0-1.0](https://licensebuttons.net/l/zero/1.0/80x15.png)](http://creativecommons.org/publicdomain/zero/1.0/)
+
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+
+
+SipHash is a family of pseudorandom functions (PRFs) optimized for speed on short messages.
+This is the reference C code of SipHash: portable, simple, optimized for clarity and debugging.
+
+SipHash was designed in 2012 by [Jean-Philippe Aumasson](https://aumasson.jp)
+and [Daniel J. Bernstein](https://cr.yp.to) as a defense against [hash-flooding
+DoS attacks](https://aumasson.jp/siphash/siphashdos_29c3_slides.pdf).
+
+SipHash is:
+
+* *Simpler and faster* on short messages than previous cryptographic
+algorithms, such as MACs based on universal hashing.
+
+* *Competitive in performance* with insecure non-cryptographic algorithms, such as [fhhash](https://github.com/cbreeden/fxhash).
+
+* *Cryptographically secure*, with no sign of weakness despite multiple [cryptanalysis](https://eprint.iacr.org/2019/865) [projects](https://eprint.iacr.org/2019/865) by leading cryptographers.
+
+* *Battle-tested*, with successful integration in OSs (Linux kernel, OpenBSD,
+FreeBSD, FreeRTOS), languages (Perl, Python, Ruby, etc.), libraries (OpenSSL libcrypto,
+Sodium, etc.) and applications (Wireguard, Redis, etc.).
+
+As a secure pseudorandom function (a.k.a. keyed hash function), SipHash can also be used as a secure message authentication code (MAC).
+But SipHash is *not a hash* in the sense of general-purpose key-less hash function such as BLAKE3 or SHA-3.
+SipHash should therefore always be used with a secret key in order to be secure.
+
+
+## Variants
+
+The default SipHash is *SipHash-2-4*: it takes a 128-bit key, does 2 compression
+rounds, 4 finalization rounds, and returns a 64-bit tag.
+
+Variants can use a different number of rounds. For example, we proposed *SipHash-4-8* as a conservative version.
+
+The following versions are not described in the paper but were designed and analyzed to fulfill applications' needs:
+
+* *SipHash-128* returns a 128-bit tag instead of 64-bit. Versions with specified number of rounds are SipHash-2-4-128, SipHash4-8-128, and so on.
+
+* *HalfSipHash* works with 32-bit words instead of 64-bit, takes a 64-bit key,
+and returns 32-bit or 64-bit tags. For example, HalfSipHash-2-4-32 has 2
+compression rounds, 4 finalization rounds, and returns a 32-bit tag.
+
+
+## Security
+
+(Half)SipHash-*c*-*d* with *c* ≥ 2 and *d* ≥ 4 is expected to provide the maximum PRF
+security for any function with the same key and output size.
+
+The standard PRF security goal allow the attacker access to the output of SipHash on messages chosen adaptively by the attacker.
+
+Security is limited by the key size (128 bits for SipHash), such that
+attackers searching 2<sup>*s*</sup> keys have chance 2<sup>*s*−128</sup> of finding
+the SipHash key. 
+Security is also limited by the output size. In particular, when
+SipHash is used as a MAC, an attacker who blindly tries 2<sup>*s*</sup> tags will
+succeed with probability 2<sup>*s*-*t*</sup>, if *t* is that tag's bit size.
+
+
+## Research
+
+* [Research paper](https://www.aumasson.jp/siphash/siphash.pdf) "SipHash: a fast short-input PRF" (accepted at INDOCRYPT 2012)
+* [Slides](https://cr.yp.to/talks/2012.12.12/slides.pdf) of the presentation of SipHash at INDOCRYPT 2012 (Bernstein)
+* [Slides](https://www.aumasson.jp/siphash/siphash_slides.pdf) of the presentation of SipHash at the DIAC workshop (Aumasson)
+
+
+## Usage
+
+Running
+
+```sh
+  make
+```
+
+will build tests for 
+
+* SipHash-2-4-64
+* SipHash-2-4-128
+* HalfSipHash-2-4-32
+* HalfSipHash-2-4-64
+
+
+```C
+  ./test
+```
+
+verifies 64 test vectors, and
+
+```C
+  ./debug
+```
+
+does the same and prints intermediate values.
+
+The code can be adapted to implement SipHash-*c*-*d*, the version of SipHash
+with *c* compression rounds and *d* finalization rounds, by defining `cROUNDS`
+or `dROUNDS` when compiling.  This can be done with `-D` command line arguments
+to many compilers such as below.
+
+```sh
+gcc -Wall --std=c99 -DcROUNDS=2 -DdROUNDS=4 siphash.c halfsiphash.c test.c -o test
+```
+
+The `makefile` also takes *c* and *d* rounds values as parameters.
+
+```sh
+make cROUNDS=2 dROUNDS=4
+``` 
+
+Obviously, if the number of rounds is modified then the test vectors
+won't verify.
+
+## Intellectual property
+
+This code is copyright (c) 2014-2023 Jean-Philippe Aumasson, Daniel J.
+Bernstein. It is multi-licensed under
+
+* [CC0](./LICENCE_CC0)
+* [MIT](./LICENSE_MIT).
+* [Apache 2.0 with LLVM exceptions](./LICENSE_A2LLVM).
+
diff --git a/llvm/lib/Support/SipHash.cpp b/llvm/lib/Support/SipHash.cpp
new file mode 100644
index 0000000000000..c6d16e205521d
--- /dev/null
+++ b/llvm/lib/Support/SipHash.cpp
@@ -0,0 +1,185 @@
+/*
+   SipHash reference C implementation
+
+   Copyright (c) 2012-2022 Jean-Philippe Aumasson
+   <jeanphilippe.aumasson at gmail.com>
+   Copyright (c) 2012-2014 Daniel J. Bernstein <djb at cr.yp.to>
+
+   To the extent possible under law, the author(s) have dedicated all copyright
+   and related and neighboring rights to this software to the public domain
+   worldwide. This software is distributed without any warranty.
+
+   You should have received a copy of the CC0 Public Domain Dedication along
+   with
+   this software. If not, see
+   <http://creativecommons.org/publicdomain/zero/1.0/>.
+ */
+
+#include "siphash.h"
+#include <assert.h>
+#include <stddef.h>
+#include <stdint.h>
+
+/* default: SipHash-2-4 */
+#ifndef cROUNDS
+#define cROUNDS 2
+#endif
+#ifndef dROUNDS
+#define dROUNDS 4
+#endif
+
+#define ROTL(x, b) (uint64_t)(((x) << (b)) | ((x) >> (64 - (b))))
+
+#define U32TO8_LE(p, v)                                                        \
+    (p)[0] = (uint8_t)((v));                                                   \
+    (p)[1] = (uint8_t)((v) >> 8);                                              \
+    (p)[2] = (uint8_t)((v) >> 16);                                             \
+    (p)[3] = (uint8_t)((v) >> 24);
+
+#define U64TO8_LE(p, v)                                                        \
+    U32TO8_LE((p), (uint32_t)((v)));                                           \
+    U32TO8_LE((p) + 4, (uint32_t)((v) >> 32));
+
+#define U8TO64_LE(p)                                                           \
+    (((uint64_t)((p)[0])) | ((uint64_t)((p)[1]) << 8) |                        \
+     ((uint64_t)((p)[2]) << 16) | ((uint64_t)((p)[3]) << 24) |                 \
+     ((uint64_t)((p)[4]) << 32) | ((uint64_t)((p)[5]) << 40) |                 \
+     ((uint64_t)((p)[6]) << 48) | ((uint64_t)((p)[7]) << 56))
+
+#define SIPROUND                                                               \
+    do {                                                                       \
+        v0 += v1;                                                              \
+        v1 = ROTL(v1, 13);                                                     \
+        v1 ^= v0;                                                              \
+        v0 = ROTL(v0, 32);                                                     \
+        v2 += v3;                                                              \
+        v3 = ROTL(v3, 16);                                                     \
+        v3 ^= v2;                                                              \
+        v0 += v3;                                                              \
+        v3 = ROTL(v3, 21);                                                     \
+        v3 ^= v0;                                                              \
+        v2 += v1;                                                              \
+        v1 = ROTL(v1, 17);                                                     \
+        v1 ^= v2;                                                              \
+        v2 = ROTL(v2, 32);                                                     \
+    } while (0)
+
+#ifdef DEBUG_SIPHASH
+#include <stdio.h>
+
+#define TRACE                                                                  \
+    do {                                                                       \
+        printf("(%3zu) v0 %016" PRIx64 "\n", inlen, v0);                       \
+        printf("(%3zu) v1 %016" PRIx64 "\n", inlen, v1);                       \
+        printf("(%3zu) v2 %016" PRIx64 "\n", inlen, v2);                       \
+        printf("(%3zu) v3 %016" PRIx64 "\n", inlen, v3);                       \
+    } while (0)
+#else
+#define TRACE
+#endif
+
+/*
+    Computes a SipHash value
+    *in: pointer to input data (read-only)
+    inlen: input data length in bytes (any size_t value)
+    *k: pointer to the key data (read-only), must be 16 bytes 
+    *out: pointer to output data (write-only), outlen bytes must be allocated
+    outlen: length of the output in bytes, must be 8 or 16
+*/
+int siphash(const void *in, const size_t inlen, const void *k, uint8_t *out,
+            const size_t outlen) {
+
+    const unsigned char *ni = (const unsigned char *)in;
+    const unsigned char *kk = (const unsigned char *)k;
+
+    assert((outlen == 8) || (outlen == 16));
+    uint64_t v0 = UINT64_C(0x736f6d6570736575);
+    uint64_t v1 = UINT64_C(0x646f72616e646f6d);
+    uint64_t v2 = UINT64_C(0x6c7967656e657261);
+    uint64_t v3 = UINT64_C(0x7465646279746573);
+    uint64_t k0 = U8TO64_LE(kk);
+    uint64_t k1 = U8TO64_LE(kk + 8);
+    uint64_t m;
+    int i;
+    const unsigned char *end = ni + inlen - (inlen % sizeof(uint64_t));
+    const int left = inlen & 7;
+    uint64_t b = ((uint64_t)inlen) << 56;
+    v3 ^= k1;
+    v2 ^= k0;
+    v1 ^= k1;
+    v0 ^= k0;
+
+    if (outlen == 16)
+        v1 ^= 0xee;
+
+    for (; ni != end; ni += 8) {
+        m = U8TO64_LE(ni);
+        v3 ^= m;
+
+        TRACE;
+        for (i = 0; i < cROUNDS; ++i)
+            SIPROUND;
+
+        v0 ^= m;
+    }
+
+    switch (left) {
+    case 7:
+        b |= ((uint64_t)ni[6]) << 48;
+        /* FALLTHRU */
+    case 6:
+        b |= ((uint64_t)ni[5]) << 40;
+        /* FALLTHRU */
+    case 5:
+        b |= ((uint64_t)ni[4]) << 32;
+        /* FALLTHRU */
+    case 4:
+        b |= ((uint64_t)ni[3]) << 24;
+        /* FALLTHRU */
+    case 3:
+        b |= ((uint64_t)ni[2]) << 16;
+        /* FALLTHRU */
+    case 2:
+        b |= ((uint64_t)ni[1]) << 8;
+        /* FALLTHRU */
+    case 1:
+        b |= ((uint64_t)ni[0]);
+        break;
+    case 0:
+        break;
+    }
+
+    v3 ^= b;
+
+    TRACE;
+    for (i = 0; i < cROUNDS; ++i)
+        SIPROUND;
+
+    v0 ^= b;
+
+    if (outlen == 16)
+        v2 ^= 0xee;
+    else
+        v2 ^= 0xff;
+
+    TRACE;
+    for (i = 0; i < dROUNDS; ++i)
+        SIPROUND;
+
+    b = v0 ^ v1 ^ v2 ^ v3;
+    U64TO8_LE(out, b);
+
+    if (outlen == 8)
+        return 0;
+
+    v1 ^= 0xdd;
+
+    TRACE;
+    for (i = 0; i < dROUNDS; ++i)
+        SIPROUND;
+
+    b = v0 ^ v1 ^ v2 ^ v3;
+    U64TO8_LE(out + 8, b);
+
+    return 0;
+}



More information about the llvm-commits mailing list