[compiler-rt] [libc] [lld] [lldb] [clang-tools-extra] [llvm] [flang] [libcxx] [clang] Fix clang to recognize new C23 modifiers %w and %wf when printing (PR #71771)

Aaron Ballman via cfe-commits cfe-commits at lists.llvm.org
Thu Nov 30 05:57:23 PST 2023


================
@@ -286,7 +286,33 @@ clang::analyze_format_string::ParseLengthModifier(FormatSpecifier &FS,
       lmKind = LengthModifier::AsInt3264;
       break;
     case 'w':
-      lmKind = LengthModifier::AsWide; ++I; break;
+      ++I;
+      if (I == E) return false;
+      if (*I == 'f') {
+        lmKind = LengthModifier::AsWideFast;
+        ++I;
+      } else {
+        lmKind = LengthModifier::AsWide;
+      }
+
+      if (I == E) return false;
+      int s = 0;
+      while (unsigned(*I - '0') <= 9) {
+        s = 10 * s + unsigned(*I - '0');
+        ++I;
+      }
+
+      // s == 0 is MSVCRT case, like l but only for c, C, s, S, or Z on windows
+      // s != 0 for b, d, i, o, u, x, or X when a size followed(like 8, 16, 32 or 64)
+      if (s != 0) {
+        std::set<int> supported_list {8, 16, 32, 64};
----------------
AaronBallman wrote:

Thanks for the feedback! I did some research of my own to see what the landscape looks like and here's what I found.

1) Support for these types came in https://github.com/llvm/llvm-project/commit/55c9877b664c1bc6614ad588f376ef41fe6ab4ca and there was no discussion on the patch (https://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20091109/023777.html) before it landed with post-commit review saying it looked good (https://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20091116/024219.html) and no justification beyond wanting a generalized algorithm.

2) There is hardware which supports 48-bit ints (https://en.wikipedia.org/wiki/48-bit_computing), 36-bit ints (https://en.wikipedia.org/wiki/36-bit_computing), 24-bit ints (https://en.wikipedia.org/wiki/24-bit_computing), etc. A lot of it is older hardware, but as examples, x86-64 uses a 48-bit addressing scheme, OpenCL supports intrinsics for 24-bit multiplication, UniSys has a 36-bit system they were selling as of 2015...

3) There's not evidence people are using stdint.h to serve those needs. For example: https://sourcegraph.com/search?q=context:global+int48_t+-file:.*clang.*+-file:.*stdint.*+-file:.*int48_t.*&patternType=standard&sm=1&groupBy=repo shows that most uses are either a custom C++ data type, use of an underlying non-basic int type, or bit-fields. I don't think I spotted a single use of `int48_t` that came from stdint.h. https://sourcegraph.com/search?q=context:global+int24_t+-file:.*clang.*+-file:.*stdint.*+-file:.*int24_t.*&patternType=standard&sm=1&groupBy=repo shows similarly for `int24_t`.

4) The code which defines the underlying macros used by stdint.h is here: https://github.com/llvm/llvm-project/blob/29a0f3ec2b47630ce229953fe7250e741b6c10b6/clang/lib/Frontend/InitPreprocessor.cpp#L220 and it uses `TargetInfo::getTypeWidth()` to determine the width generically. We have no targets (https://github.com/llvm/llvm-project/tree/main/clang/lib/Basic/Targets) which define bit widths other than 8, 16, 32, 64, or 128. So Clang itself doesn't support those odd types yet.

Based on this and your details above, I think we should ignore these odd types for format strings on the assumption they're just not used. As for removing them from `stdint.h`, I think that is reasonable to explore given that we have no targets defining those widths (and so we should never be defining those macros to begin with); the algorithm remains general, we just drop the extra cruft from stdint.h. I don't think this should cause any issues.

CC @jrtc27 @jyknight for some extra opinions.

https://github.com/llvm/llvm-project/pull/71771


More information about the cfe-commits mailing list