[PATCH] D46163: [DO NOT SUBMIT] Do strlen() while computing xxhash

Sun Apr 29 18:20:59 PDT 2018

grimar added inline comments.

================
Comment at: llvm/lib/Support/xxhash.cpp:148
+    const uint64_t *R = (const uint64_t *)Q;
+    if (uint64_t X = (R[0] - 0x0101010101010101) & ~R[0] & 0x8080808080808080) {
+      Remaining = (__builtin_ffsl(X) >> 3) - 1;
----------------
ruiu wrote:
> grimar wrote:
> > btw, if you pass P=="ab\0" to this method,
> > 
> > then `R[0]` will read the rest 5 bytes of memory from out of bounds, no?
> That's intentional.
It would be wrong to keep it in `lib/Support` with that probably I think. 
We can probably assume that it is valid to read out of buffer for the particular LLD use case,
but it does not seem fine to do in general.

================
Comment at: llvm/lib/Support/xxhash.cpp:149
+    if (uint64_t X = (R[0] - 0x0101010101010101) & ~R[0] & 0x8080808080808080) {
+      Remaining = (__builtin_ffsl(X) >> 3) - 1;
+      break;
----------------
ruiu wrote:
> grimar wrote:
> > I tried to build this patch with MSVS, but it complains about the absence of this function.
> > 
> > I had to use the following instead:
> > 
> > ```
> > int psnip_builtin_ffsl(long v) {
> >   unsigned long r;
> >   if (_BitScanForward(&r, (unsigned long)v)) {
> >     return (int)(r + 1);
> >   }
> >   return 0;
> > }
> > ```
> > (took it from https://github.com/nemequ/portable-snippets/tree/master/builtin).
> > 
> > But it does not work correctly it seems. I am testing on `lld-speed-test\lld-speed-test\scylla`,
> > here at some point `Remaining` becomes (-1). And then lines below (185, 186) 
> > 
> > ```
> >   const char *End = Q + Remaining;
> >   size_t Len = End - P;
> > ```
> > 
> > handle it wrong and code ends up with
> > an infinite loop in `splitStrings1` because `strlen_xxHash64` returns -1 always.
> > 
> > 
> I'm not very familiar with these intrinsics of MSVC, but I believe you forgot to use the 64 bit version. Use this.
> 
>   static inline int my_ffsl(uint64_t v) {
>     uint64r_t R;
>     _BitScanForward64(&R, V);
>     return R;
>   }
> 
> Note that this should be compiled to a single BSF instruction on x86-64.
Yes, thanks. The final code I used was:

```
static inline int psnip_builtin_ffsl(uint64_t v) {
  unsigned long R;
  _BitScanForward64(&R, v);
  return R + 1;
}
```

And I was able to profile the patch finally. It was faster for me than the original code but
results I observe with D45571 are a bit better. I'll post the numbers to the D45571 thread for
completeness.

>> Note that this should be compiled to a single BSF instruction on x86-64.
Yes, it was:
```
...
      Remaining = (psnip_builtin_ffsl(X) >> 3) - 1;
00007FF6F219C604  bsf         rcx,r8  
00007FF6F219C608  inc         ecx  
00007FF6F219C60A  sar         ecx,3  
00007FF6F219C60D  dec         ecx  
  }
```

https://reviews.llvm.org/D46163