[llvm] [llvm] Improve implementation of StringRef::find_last_of and cie (PR #71865)
Alexandre Ganea via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 16 16:06:56 PST 2023
================
@@ -268,17 +268,47 @@ StringRef::size_type StringRef::find_first_not_of(StringRef Chars,
return npos;
}
+// See https://graphics.stanford.edu/~seander/bithacks.html#ValueInWord
+static inline uint64_t haszero(uint64_t v) {
+ return ((v)-0x0101010101010101UL) & ~(v) & 0x8080808080808080UL;
+}
+static inline uint64_t hasvalue(uint64_t x, char n) {
+ return haszero((x) ^ (~0UL / 255 * (n)));
+}
+
+/// This is a hot spot for some clangd operations, enough to be eligible to
+/// a vectorized implementation.
+static StringRef::size_type
+vectorized_find_last_of_specialized(const char *Data, size_t Sz, char C0,
+ char C1) {
+ do {
+ Sz = Sz < 8 ? 0 : Sz - 8;
+ uint64_t Buffer = 0;
+ std::memcpy((void *)&Buffer, (void *)(Data + Sz), sizeof(Buffer));
----------------
aganea wrote:
This suffers from the same issue as the previous SSE2 intrinsics code: if `strlen(Data) < 7`, this will access invalid, out-of-bounds characters. For example if `Data` is `"HI!\0"`, this will load `"HI!\0XXXX"`, thus accessing the potentially invalid "X" memory slots. ASAN will certainly report this.
You can certainly do below at L304: `if (Chars.size() == 2 && Length > 7)` (also we can't assume the string is null terminated)
https://github.com/llvm/llvm-project/pull/71865
More information about the llvm-commits
mailing list