[PATCH] D83423: [MC, NVPTX] Add MCAsmPrinter support for unsigned-only data directives.

Tue Jul 14 12:41:44 PDT 2020

ABataev added inline comments.

================
Comment at: llvm/lib/MC/MCExpr.cpp:69
+    else if (MAI && !MAI->supportsSignedData())
+      OS << static_cast<uint64_t>(Value);
     else
----------------
ABataev wrote:
> hfinkel wrote:
> > hfinkel wrote:
> > > tra wrote:
> > > > tra wrote:
> > > > > hfinkel wrote:
> > > > > > ABataev wrote:
> > > > > > > tra wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > tra wrote:
> > > > > > > > > > hfinkel wrote:
> > > > > > > > > > > tra wrote:
> > > > > > > > > > > > hfinkel wrote:
> > > > > > > > > > > > > Will uint64_t always be correct here? Shouldn't this depend on SizeInBytes (like the hex printing does)?
> > > > > > > > > > > > MCConstantExpr::getValue() returns int64_t so casting it to uint64_t should be safe.
> > > > > > > > > > > > 
> > > > > > > > > > > > I guess I can find the matching unsigned type using std::make_unsigned. E.g:
> > > > > > > > > > > > ```
> > > > > > > > > > > >     using unsigned_t = typename std::make_unsigned<decltype(Value)>::type;
> > > > > > > > > > > >     OS <<  static_cast<unsigned_t>(v);
> > > > > > > > > > > > ```
> > > > > > > > > > > I'm not worried about the cast being unsafe, at the C++ level, I'm worried about it printing a number larger than the relevant directive actually accepts. In your test, the directive is .b64, which presumably takes a 64-bit integer, so everything's fine. What if it were .b8 and the printed argument were 18446744073709551613 (or whatever)?
> > > > > > > > > > > 
> > > > > > > > > > Got it. I've masked out the unwanted bits. 
> > > > > > > > > Does cuda gdb correctly handle the bitfields with this fix? Can you read/write values to/from bitfield in the debugger?
> > > > > > > > Sort of. It seems to handle field bits within a byte and knows correct bit field length, but struggles with the field which crosses the byte boundary.
> > > > > > > > 
> > > > > > > > E.g:
> > > > > > > > ```
> > > > > > > >     struct s {
> > > > > > > >       unsigned char a : 3;
> > > > > > > >       unsigned char b : 6;
> > > > > > > >     } __attribute__((__packed__)) b;
> > > > > > > > 
> > > > > > > > (cuda-gdb) set var b.a=7   # This sets the field correctly.
> > > > > > > > (cuda-gdb) p b
> > > > > > > > $7 = {
> > > > > > > >   a = 7 '\a',
> > > > > > > >   b = 0 '\000'
> > > > > > > > }
> > > > > > > > (cuda-gdb) x/2bx &b
> > > > > > > > 0x7fffca800100: 0x07    0x00
> > > > > > > > (cuda-gdb) set var b.a=8
> > > > > > > > warning: Value does not fit in 3 bits. 
> > > > > > > > 
> > > > > > > > (cuda-gdb) set var b.b=0x3f   # This one loses the top bit.
> > > > > > > > (cuda-gdb) p b
> > > > > > > > $8 = {
> > > > > > > >   a = 0 '\000',
> > > > > > > >   b = 31 '\037'   # Only 5 bits are set.
> > > > > > > > }
> > > > > > > > (cuda-gdb) x/2bx &b
> > > > > > > > 0x7fffca800100: 0xf8    0x00  # Should've been 0xf8 0x01
> > > > > > > > 
> > > > > > > > 
> > > > > > > > ```
> > > > > > > Hm, did you try to emit it as hexadecimal?
> > > > > > > Hm, did you try to emit it as hexadecimal?
> > > > > > 
> > > > > > Good point. Maybe we want the option to just force hex printing instead of this new case? Then we can just reuse the existing logic above for that.
> > > > > Makes no difference. I've run ptxas on .ptx with the constant represented as decimal or as hex. ptxas accepted both and produced bit-for-bit identical binaries.
> > > > I'm concerned that the extra `0x` and always-on zero-padding will noticeably increase the size of PTX with the debug info as DWARF output seems to be predominantly `.b8`. Large builds like tensorflow already struggle with their binary size (we already run into ELF reloc overflows in debug builds unless we limit the number of GPUs we target) and this will contribute to the issue.
> > > In that case, I recommend that we go with Alexey's suggestion. Remove this logic and make the condition above `if (PrintInHex || (MAI && !MAI->supportsSignedData()))` (or something like that).
> > > I'm concerned that the extra 0x and always-on zero-padding will noticeably increase the size of PTX with the debug info as DWARF output seems to be predominantly .b8. Large builds like tensorflow already struggle with their binary size (we already run into ELF reloc overflows in debug builds unless we limit the number of GPUs we target) and this will contribute to the issue.
> > 
> > Ah, okay. I can understand that.
> > 
> > Can you make a test case where this affects the printing of a .b8 directive?
> > 
> Maybe just print the value as hexadecimal if it is negative?
Also, I can try to check if another my patch for bitfields works correctly.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83423/new/

https://reviews.llvm.org/D83423