[PATCH] D48271: [llvm-readobj] Fix printing format

Adrian McCarthy via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 11 08:41:20 PDT 2018


amccarth added inline comments.


================
Comment at: tools/llvm-readobj/ELFDumper.cpp:3280
+  for (char C : S)
+    W << (isprint(C) ? C : '.');
+}
----------------
paulsemel wrote:
> amccarth wrote:
> > The behavior of isprint is undefined if the value is not representable as an unsigned char.  Since char on Windows is signed (by default), this means that some bytes could be negative, which is not representable and an unsigned char.
> > 
> > I think the safest thing to do is to cast `C` to `unsigned char`.
> Hmm I'm perplex.. Doesn't `isprint` take an `int` as parameter ? I mean, if it takes an int in parameter, I don't understand why it could lead to undefined behavior if we give it a negative number...
> Can you elaborate on this one, I'm really curious to learn where is the problem :)
Yes, `isprint` takes an `int` so that it can distinguish between a character and EOF, but EOF is the only valid negative value to pass to it.

"The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF."  https://en.cppreference.com/w/cpp/string/byte/isprint

Let's say you have the character value 193 (0xC1).  On Windows, `char` is signed and thus a `char` variable representing that character value is effectively -63.  When passed to `isprint`, it's promoted to an `int`, you'll end up with -63 instead of 193 which puts you into the realm of undefined behavior.

If you first cast to an `unsigned char`, the 193 is preserved when cast to an `int`, and everything should be hunky dory.

Note that EOF is a negative value, usually (always?) -1, so it would alias with an actual character value if we were allowed to use negative character values.

Now, in practice, this may not manifest as a detectable bug because (1) the `isprint` implementation might do something reasonable and/or (2) if `<locale>` has been included, then you get a convenience function templated on the actual character type that does the cast for you (ยง 22.3.3.1).  Then again, this call is `isprint` rather than `std::isprint`, so whether it resolves to the convenience function depends on whether `<locale>` dumps all its symbols into the global namespace as well as into `std`.


Repository:
  rL LLVM

https://reviews.llvm.org/D48271





More information about the llvm-commits mailing list