[libc-commits] [PATCH] D158774: [libc] Add GPU support for `printf` and `fprintf`

Michael Jones via Phabricator via libc-commits libc-commits at lists.llvm.org
Wed Sep 6 16:09:35 PDT 2023


michaelrj added inline comments.


================
Comment at: libc/src/stdio/gpu/parser.h:131
+  while (format[cur_pos] != '\0' && internal::isdigit(format[cur_pos]))
+    ++cur_pos;
+  if (format[cur_pos] == '.') {
----------------
jhuber6 wrote:
> michaelrj wrote:
> > jhuber6 wrote:
> > > michaelrj wrote:
> > > > jhuber6 wrote:
> > > > > michaelrj wrote:
> > > > > > to handle the `*`s you could check if `format[cur_pos]=='*'` after this loop and if true set `unfinished` and return a specifier with the int.
> > > > > > 
> > > > > > If you make `unfinished` an enum with three states - `finished`, `width`, and `precision` - then you can set it to `width` here and `precision` after precision. This lets you skip all the loops before the point you're resuming to, which also makes it more accurate by avoiding consuming flags in invalid positions (e.g. the current parser would accept `%-*-d` since it loops through the flags again after each `*`).
> > > > > I was messing around with something like that, but a lot of complicated control flow really bloats the register count. The first version of this used like 22 and we're already up to 26 so we're well into the bounds of non-trivial. I'm somewhat apprehensive to increase it further, but I can try to see what it costs since I'm not happy with the big `FIXME` up there.
> > > > is register usage proportional to the number of variables or to the number of branches? I don't know a lot about how GPU code works.
> > > I replied to the wrong comment.
> > > 
> > > Also, would the above description handle a case like `%*.**`? I think we'd skip past the parsing and then just get the next `*`.
> > the way I'm describing would, I believe, parse `%*.**` as follows:
> > `%` - start the format specifier
> > `*` - width is a variable (need to send an int)
> > `.` - precision exists
> > `*` - precision is a variable (need to send an int)
> > `*` - format name is `*`. 
> > 
> > The parser treats the first character that doesn't fit into another category as the format name. Since this should be past the width and precision members the only category left that isn't the format name is length modifier, and `*` is not valid as a length modifier. Since `*` isn't valid as a format name this whole format specifier gets marked as raw and is printed literally, though the two variables will be consumed. Needless to say this is very undefined and I wouldn't recommend actually doing it.
> > 
> > As a sidenote, I'd be fascinated to know what NVidia's printf does for `printf("%*.** %d %d\n",1,2,3,4);`. I've gotten a different answer from every libc I've tried it on.
> I'll sate your curiousity, Nvidia's `printf` on CUDA 12.2 returns the following: `%1* 0 2`. My `glibc` version 2.38 prints `%1.2* 3 4`. I have no clue what `**` is even supposed to do since it doesn't seem to be ignored directly.
`**` should be treated like a `*` followed by any other invalid format name, equivalent to `*?`. Evidently not all libc implementations handle it cleanly though, I tried it with MacOS's libc and it returns `  %d 4`. I have no idea how CUDA manages to find a 0 in there. From previous experience I believe that glibc prints a format specifier with all of the arguments included when it fails to parse, meaning it will deduplicate flags and convert integer arguments to strings. It's actually pretty cool, if a bit overkill for such an unusual situation. Our libc returns `%*.** 3 4` which, is in my opinion, reasonable.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158774/new/

https://reviews.llvm.org/D158774



More information about the libc-commits mailing list