[PATCH] D46163: [DO NOT SUBMIT] Do strlen() while computing xxhash
George Rimar via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Apr 29 18:20:59 PDT 2018
grimar added inline comments.
================
Comment at: llvm/lib/Support/xxhash.cpp:148
+ const uint64_t *R = (const uint64_t *)Q;
+ if (uint64_t X = (R[0] - 0x0101010101010101) & ~R[0] & 0x8080808080808080) {
+ Remaining = (__builtin_ffsl(X) >> 3) - 1;
----------------
ruiu wrote:
> grimar wrote:
> > btw, if you pass P=="ab\0" to this method,
> >
> > then `R[0]` will read the rest 5 bytes of memory from out of bounds, no?
> That's intentional.
It would be wrong to keep it in `lib/Support` with that probably I think.
We can probably assume that it is valid to read out of buffer for the particular LLD use case,
but it does not seem fine to do in general.
================
Comment at: llvm/lib/Support/xxhash.cpp:149
+ if (uint64_t X = (R[0] - 0x0101010101010101) & ~R[0] & 0x8080808080808080) {
+ Remaining = (__builtin_ffsl(X) >> 3) - 1;
+ break;
----------------
ruiu wrote:
> grimar wrote:
> > I tried to build this patch with MSVS, but it complains about the absence of this function.
> >
> > I had to use the following instead:
> >
> > ```
> > int psnip_builtin_ffsl(long v) {
> > unsigned long r;
> > if (_BitScanForward(&r, (unsigned long)v)) {
> > return (int)(r + 1);
> > }
> > return 0;
> > }
> > ```
> > (took it from https://github.com/nemequ/portable-snippets/tree/master/builtin).
> >
> > But it does not work correctly it seems. I am testing on `lld-speed-test\lld-speed-test\scylla`,
> > here at some point `Remaining` becomes (-1). And then lines below (185, 186)
> >
> > ```
> > const char *End = Q + Remaining;
> > size_t Len = End - P;
> > ```
> >
> > handle it wrong and code ends up with
> > an infinite loop in `splitStrings1` because `strlen_xxHash64` returns -1 always.
> >
> >
> I'm not very familiar with these intrinsics of MSVC, but I believe you forgot to use the 64 bit version. Use this.
>
> static inline int my_ffsl(uint64_t v) {
> uint64r_t R;
> _BitScanForward64(&R, V);
> return R;
> }
>
> Note that this should be compiled to a single BSF instruction on x86-64.
Yes, thanks. The final code I used was:
```
static inline int psnip_builtin_ffsl(uint64_t v) {
unsigned long R;
_BitScanForward64(&R, v);
return R + 1;
}
```
And I was able to profile the patch finally. It was faster for me than the original code but
results I observe with D45571 are a bit better. I'll post the numbers to the D45571 thread for
completeness.
>> Note that this should be compiled to a single BSF instruction on x86-64.
Yes, it was:
```
...
Remaining = (psnip_builtin_ffsl(X) >> 3) - 1;
00007FF6F219C604 bsf rcx,r8
00007FF6F219C608 inc ecx
00007FF6F219C60A sar ecx,3
00007FF6F219C60D dec ecx
}
```
https://reviews.llvm.org/D46163
More information about the llvm-commits
mailing list