[libc-commits] [PATCH] D77949: [libc] Add SIMD strlen implementation .

Mon Apr 13 15:20:43 PDT 2020

sivachandra added a comment.

In D77949#1978924 <https://reviews.llvm.org/D77949#1978924>, @abrachet wrote:

> In D77949#1978750 <https://reviews.llvm.org/D77949#1978750>, @hctim wrote:
>
> > In D77949#1976185 <https://reviews.llvm.org/D77949#1976185>, @abrachet wrote:
> >
> > > As a high level comment, this scheme is technically safe because no mmu that I know of can deal make fault pages on non word boundaries (of course they all do at page boundaries) so readying some extra bytes past the end of the string will never cause SIGBUS or SIGSEGV. However it is still undefined behavior and sanitizers will complain. If you build with `LLVM_USE_SANITIZER=Address` I'm pretty sure that `FourBytes` and `TwoBytes` will fail. And certainly `__llvm_libc::strlen(__llvm_libc::strcpy(::malloc(6), "12345"));` will get the sanitizers to complain.
> > >
> > > I think it is worth discussing if we think it's valuable to turn off sanitizers for this function because we can conclude that it is safe to read past the string in the name of speed. I'd like to hear what the others have to say.
> >
> >
> > Sorry - I fail to see how this is safe. There are no guarantees about the alignment of the string, the OOB read could trivially cross a page boundary (and even then, I don't think we should be relying on UB).
>
>
> This implementation doesn't cross a page boundary (assuming the string doesn't of course) because of this:
>
>   // align char_ptr to multiple of sizeof(uintptr_t)
>   for (charPtr = src; ((uintptr_t)charPtr & (sizeof(uintptr_t) - 1)) != 0;
>          ++charPtr) {
>       if (*charPtr == '\0')
>         return charPtr - src;
>     }
>
>
> This doesn't die:
>
>   char *twoPages = (char *)mmap(nullptr, 0x2000, ...);
>   mprotect(twoPages + 0x1000, 0x1000, PROT_NONE);
>   std::strcpy(twoPages + 0x0FFA, "12345");
>   __llvm_libc::strlen(twoPages + 0x0FFA);
>   

I agree that this implementation does not cross the page boundary unless the string itself crosses the page boundary. But, if I am reading @hctim right, he is probably saying that reading OOB is by itself UB. Even if I understood @hctim right, another aspect to note is that this implementation would read strictly less than word-size worth of bytes OOB. I do not know if sanitizers are sensitive to this. If they are, instead of a blanket disable of sanitizers, are there any finer grain knobs which say reading (wordsize - 1) bytes OOB is OK?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77949/new/

https://reviews.llvm.org/D77949