[libc-commits] [libc] [libc][NFC] Use 16-byte indices for _mmXXX_shuffle_epi8 (PR #77781)

Guillaume Chatelet via libc-commits libc-commits at lists.llvm.org
Thu Jan 11 07:14:01 PST 2024


https://github.com/gchatelet created https://github.com/llvm/llvm-project/pull/77781

This is less confusing since the implementation only cares about the 4 lower bits.


>From 1c5950bba7c7477a079aae8abbd267b64fd576b0 Mon Sep 17 00:00:00 2001
From: Guillaume Chatelet <gchatelet at google.com>
Date: Thu, 11 Jan 2024 15:13:39 +0000
Subject: [PATCH] [libc][NFC] Use 16-byte indices for _mmXXX_shuffle_epi8

This is less confusing since the implementation only cares about the 4 lower bits.
---
 libc/src/string/memory_utils/op_x86.h | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/libc/src/string/memory_utils/op_x86.h b/libc/src/string/memory_utils/op_x86.h
index a6529a6d424a30..6ae9583627bd6d 100644
--- a/libc/src/string/memory_utils/op_x86.h
+++ b/libc/src/string/memory_utils/op_x86.h
@@ -263,13 +263,13 @@ LIBC_INLINE uint64_t big_endian_cmp_mask(__m512i max, __m512i value) {
   // 16-byte lane.
   // zmm = | 16 bytes  | 16 bytes  | 16 bytes  | 16 bytes  |
   // zmm = | <8> | <8> | <8> | <8> | <8> | <8> | <8> | <8> |
-  const __m512i indices = _mm512_set_epi8(56, 57, 58, 59, 60, 61, 62, 63, //
-                                          48, 49, 50, 51, 52, 53, 54, 55, //
-                                          40, 41, 42, 43, 44, 45, 46, 47, //
-                                          32, 33, 34, 35, 36, 37, 38, 39, //
-                                          24, 25, 26, 27, 28, 29, 30, 31, //
-                                          16, 17, 18, 19, 20, 21, 22, 23, //
-                                          8, 9, 10, 11, 12, 13, 14, 15,   //
+  const __m512i indices = _mm512_set_epi8(8, 9, 10, 11, 12, 13, 14, 15, //
+                                          0, 1, 2, 3, 4, 5, 6, 7,       //
+                                          8, 9, 10, 11, 12, 13, 14, 15, //
+                                          0, 1, 2, 3, 4, 5, 6, 7,       //
+                                          8, 9, 10, 11, 12, 13, 14, 15, //
+                                          0, 1, 2, 3, 4, 5, 6, 7,       //
+                                          8, 9, 10, 11, 12, 13, 14, 15, //
                                           0, 1, 2, 3, 4, 5, 6, 7);
   // Then we compute the mask for equal bytes. In this mask the bits of each
   // byte are already reversed but the byte themselves should be reversed, this



More information about the libc-commits mailing list