[libc-commits] [PATCH] D82134: [libc] Add strcmp implementation.

Mon Dec 14 08:19:01 PST 2020

gchatelet added inline comments.

================
Comment at: libc/src/string/strcmp.cpp:15
+
+int LLVM_LIBC_ENTRYPOINT(strcmp)(const char *l, const char *r) {
+  while (*l) {
----------------
sivachandra wrote:
> gchatelet wrote:
> > cgyurgyik wrote:
> > > PaulkaToast wrote:
> > > > sivachandra wrote:
> > > > > Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so we can handle it separately. But if it can lead to speed up, just a leave a TODO about it and we will get to it in a later round.
> > > > I would suggest to opt for more descriptive variable names over single letter one. i.e ("left" instead of "l").
> > > What is the standard for leaving TODOs? I'm not finding anything in the llvm Coding Standards.
> > > Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so we can handle it separately. But if it can lead to speed up, just a leave a TODO about it and we will get to it in a later round.
> > 
> > Technically we have to read the buffers byte per byte as we're not supposed to read past the `\0`.
> > I'm not optimistic on the ability to accelerate this routine because -even if a page fault can only occur at page boundaries- some processors are offering [[ https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Providing%20protection%20for%20complex%20software.pdf | hardware pointer authentication ]] which can break reading buffers cache line per cache line.
> > > Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so we can handle it separately. But if it can lead to speed up, just a leave a TODO about it and we will get to it in a later round.
> > 
> > Technically we have to read the buffers byte per byte as we're not supposed to read past the `\0`.
> > I'm not optimistic on the ability to accelerate this routine because -even if a page fault can only occur at page boundaries- some processors are offering [[ https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Providing%20protection%20for%20complex%20software.pdf | hardware pointer authentication ]] which can break reading buffers cache line per cache line.
> 
> So, going by that, we cannot employ similar techniques (of comparing word lengths at a time) even for functions like `strlen`. Just to note, few other libcs take such an approach with `strlen`.
> So, going by that, we cannot employ similar techniques (of comparing word lengths at a time) even for functions like `strlen`. Just to note, few other libcs take such an approach with `strlen`.

My understanding is that reading memory past the null character is undefined behavior.

7.21.1.1 p325 of the [[ http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf | C standard ]]:
//Various methods are used for determining the lengths of the arrays, but in all cases a char * or void * argument points to the initial (lowest addressed) character of the array. If an array is accessed beyond the end of an object, the behavior is undefined.//

These optimizations work for now on a particular implementation but the standard does not make any guarantees whatsoever.

As a matter of fact [[ https://godbolt.org/z/eKEjbj | neither GCC nor Clang ]] try to vectorize this code with optimization turned on.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82134/new/

https://reviews.llvm.org/D82134