[libc-commits] [PATCH] D158774: [libc] Add GPU support for `printf` and `fprintf`

Michael Jones via Phabricator via libc-commits libc-commits at lists.llvm.org
Wed Sep 6 16:49:31 PDT 2023


michaelrj added inline comments.


================
Comment at: libc/src/stdio/gpu/parser.h:136-140
+    if (format[cur_pos] == '*') {
+      specifier.raw_value = static_cast<uintptr_t>(args.next_var<uint32_t>());
+      size_pos = SizeArgument::width;
+      return specifier;
+    }
----------------
this check needs to go before the `isdigit` loop.


================
Comment at: libc/src/stdio/gpu/parser.h:131
+  while (format[cur_pos] != '\0' && internal::isdigit(format[cur_pos]))
+    ++cur_pos;
+  if (format[cur_pos] == '.') {
----------------
jhuber6 wrote:
> michaelrj wrote:
> > jhuber6 wrote:
> > > michaelrj wrote:
> > > > jhuber6 wrote:
> > > > > michaelrj wrote:
> > > > > > jhuber6 wrote:
> > > > > > > michaelrj wrote:
> > > > > > > > jhuber6 wrote:
> > > > > > > > > michaelrj wrote:
> > > > > > > > > > to handle the `*`s you could check if `format[cur_pos]=='*'` after this loop and if true set `unfinished` and return a specifier with the int.
> > > > > > > > > > 
> > > > > > > > > > If you make `unfinished` an enum with three states - `finished`, `width`, and `precision` - then you can set it to `width` here and `precision` after precision. This lets you skip all the loops before the point you're resuming to, which also makes it more accurate by avoiding consuming flags in invalid positions (e.g. the current parser would accept `%-*-d` since it loops through the flags again after each `*`).
> > > > > > > > > I was messing around with something like that, but a lot of complicated control flow really bloats the register count. The first version of this used like 22 and we're already up to 26 so we're well into the bounds of non-trivial. I'm somewhat apprehensive to increase it further, but I can try to see what it costs since I'm not happy with the big `FIXME` up there.
> > > > > > > > is register usage proportional to the number of variables or to the number of branches? I don't know a lot about how GPU code works.
> > > > > > > I replied to the wrong comment.
> > > > > > > 
> > > > > > > Also, would the above description handle a case like `%*.**`? I think we'd skip past the parsing and then just get the next `*`.
> > > > > > the way I'm describing would, I believe, parse `%*.**` as follows:
> > > > > > `%` - start the format specifier
> > > > > > `*` - width is a variable (need to send an int)
> > > > > > `.` - precision exists
> > > > > > `*` - precision is a variable (need to send an int)
> > > > > > `*` - format name is `*`. 
> > > > > > 
> > > > > > The parser treats the first character that doesn't fit into another category as the format name. Since this should be past the width and precision members the only category left that isn't the format name is length modifier, and `*` is not valid as a length modifier. Since `*` isn't valid as a format name this whole format specifier gets marked as raw and is printed literally, though the two variables will be consumed. Needless to say this is very undefined and I wouldn't recommend actually doing it.
> > > > > > 
> > > > > > As a sidenote, I'd be fascinated to know what NVidia's printf does for `printf("%*.** %d %d\n",1,2,3,4);`. I've gotten a different answer from every libc I've tried it on.
> > > > > I'll sate your curiousity, Nvidia's `printf` on CUDA 12.2 returns the following: `%1* 0 2`. My `glibc` version 2.38 prints `%1.2* 3 4`. I have no clue what `**` is even supposed to do since it doesn't seem to be ignored directly.
> > > > `**` should be treated like a `*` followed by any other invalid format name, equivalent to `*?`. Evidently not all libc implementations handle it cleanly though, I tried it with MacOS's libc and it returns `  %d 4`. I have no idea how CUDA manages to find a 0 in there. From previous experience I believe that glibc prints a format specifier with all of the arguments included when it fails to parse, meaning it will deduplicate flags and convert integer arguments to strings. It's actually pretty cool, if a bit overkill for such an unusual situation. Our libc returns `%*.** 3 4` which, is in my opinion, reasonable.
> > > The Nvidia implementation of `printf` is a complete mystery to me. It's proprietary so you can't see how it's implemented, but process of elimination suggests that it's a ring buffer that's flushed by the host runtime at predetermined points. They must do some form of parsing on the GPU similar to here, because it handles non-constant strings and divergent values correctly as far as I'm aware. Pretty much the way it works is if your program has an undefined reference to something called `printf` it will automatically link in a magic implementation that lives somewhere. What's confusing to me is the resource usage. I'm pretty sure that this `printf` support is some opaque binary blob that isn't taken into account when checking things like register usage. This is because I attempted to compare register usage with my implementation and theirs. This implementation uses something like 64 (which used to be the entire register budget back in the sm_20 days, but we have a lot more now) but checking the Nvidia implementation it states it's something like 23 register, which just so happens to be the exact same results you'd get if you called an empty external function with the same signature. That's about all I've been able to glean from it, it's basically magic. I think they've improved it a lot since I last checked, they used to ignore a lot of flags.
> > That's very strange. We know they have to parse the format string on-device or else the %s conversion wouldn't work. Maybe they have some proprietary hardware trickery.
> > 
> > I'm okay with this parser having some strange behavior on aggressively undefined inputs, though I would recommend adding tests for the parser specifically.
> With this most recent patch I get `%*.** 3 4` as the output for your provided most-vexing printf parse.
excellent! The printf parser side looks good to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158774/new/

https://reviews.llvm.org/D158774



More information about the libc-commits mailing list