[PATCH] D129224: [InstCombine] Fold strtoul and strtoull and avoid PR #56293

Martin Sebor via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 12 11:53:23 PDT 2022


msebor added inline comments.


================
Comment at: llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp:117
+      if (Str[0] == '0') {
+        if (toUpper((unsigned char)Str[1]) == 'X') {
+          Base = 16;
----------------
efriedma wrote:
> msebor wrote:
> > efriedma wrote:
> > > The standard requires that a "0x" prefix be followed by a hexadecimal digit.  Similarly, the standard requires that a "0" prefix be followed by an octal digit.  If there isn't an appropriate digit after, the subject sequence (the part that's actually parsed as a number) is just a decimal "0".  And the rest is treated as "unrecognized characters".
> > > 
> > > The standard allows a "0x" prefix if the base is explicitly specified to be 16.
> > Whoops, yes, we need to consume the hex prefix even in base 16.  Thanks for pointing out this mistake!
> > 
> > I'm aware of what the C and POSIX standards require but I'm not sure I understand the first part of your comment, but just for clarity: 1) a lone zero is valid in any base (i.e., it doesn't need to be followed by other digits), and 2) the function fails if the whole subject sequence isn't consumed (i.e., has "unrecognized characters").
> I think strictly interpreting the standard, if you have something like "0xg", or "09", they should also parse to "0".  But if we're rejecting them anyway, I guess that's fine... might want to add a comment somewhere to clarify why we're rejecting strings with unrecognized trailing characters.
That's right.  The rationale for rejecting valid sequences with trailing characters is already in one of the new tests but I can repeat it in the function.

(In practice it would probably be safe to accept sequences terminated by most unrecognizable characters except a comma or a space but I'm not sure it's worth it since it's possible to create a wacky locale with a thousands separator set to pretty much any nonnumeric character.)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129224/new/

https://reviews.llvm.org/D129224



More information about the llvm-commits mailing list