[libc-commits] [PATCH] D112176: [libc] fix strtol returning the wrong length

Michael Jones via Phabricator via libc-commits libc-commits at lists.llvm.org
Thu Oct 21 16:24:00 PDT 2021

michaelrj marked an inline comment as done.
michaelrj added inline comments.

Comment at: libc/src/__support/str_conv_utils.h:83
+    seen_digit = true;
+  }
sivachandra wrote:
> michaelrj wrote:
> > sivachandra wrote:
> > > Do we need this `if` block at all?
> > it's to handle the case where the number is just "0" and the base is set to automatic. Without this, then that would be parsed as an octal number with no digits, as opposed to a decimal number with one digit. I've added a test to check this.
> > The other part of the condition was unnecessary though.
> May be `infer_base` is incorrect then? Consider this:
> ```
> const char *n = "08";
> char *next;
> long i = ::strtol(n, &next, 0);
> ```
> Since "08" is an invalid number, should we want the above `strtol`call  to succeed or a fail? If it is a failure, should `next` point to 8 or 0? Should it be the same if the base was explicitly specified to be 8?
> I also think we can ask the same questions when base is 16.
> My reading of the standard says that "08" is an invalid number so no conversion should be performed. Which means, `str_end` should point to `original_src`. Likewise, "0xZ" with base 16 is an invalid number so no conversion should be performed and `str_end` should point to `original_src`.
`infer_base` was incorrect, and I have now fixed it so that it doesn't skip over the leading octal 0, and removed this condition.

>From what I can tell, in the case of both "08" and "0xZ", when parsed with an automatic base, what we have is a solo octal 0 (and this matches what I see from other libc implementations). 

In the C standard ยง6.4.4.1 it specifies that "A decimal constant begins with a nonzero digit", meaning that those numbers cannot be assumed to be base 10, but that "An octal constant consists of the prefix 0 optionally followed by a sequence of the digits 0 through 7 only". This means that the longest valid interpretation of "08" is that it is an octal 0 followed by some value that is not a valid digit (equivalent to "0Z").

For "0xZ" it is similar, a hexadecimal number has to have a valid hexadecimal digit after the "0x" so this is just 0.

  rG LLVM Github Monorepo



More information about the libc-commits mailing list