[PATCH] MIR Serialization: Serialize machine instruction names.

Thu Jun 18 14:08:05 PDT 2015

2015-06-17 18:06 GMT-07:00 Ahmed Bougacha <ahmed.bougacha at gmail.com>:

> On Wed, Jun 17, 2015 at 5:51 PM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
> >
> >> On 2015 Jun 17, at 16:45, Ahmed Bougacha <ahmed.bougacha at gmail.com>
> wrote:
> >>
> >> REPOSITORY
> >>  rL LLVM
> >>
> >> ================
> >> Comment at: lib/CodeGen/MIRPrinter.cpp:115
> >> @@ -86,1 +114,3 @@
> >> +
> >> +  OS << StringRef(TII->getName(MI.getOpcode())).lower();
> >> }
> >> ----------------
> >> Why lower-case?
> >
> > I prefer lowercase because the characters are designed to be easier for
> > humans to read than uppercase (and I'm human).
> >
> >> I'd much prefer the exact same as the C++/MI enums/names.
> >
> > LLVM IR has different capitalization in textual format vs. C++ classes.
> I
> > think it works well.
>
> In general, I totally agree, lowercase is easier to read, but here
> we're encoding more than just the opcode in the name, and IMO
> capitalization helps a lot in immediately separating the name into:
> - what's the actual opcode (for IMUL32rri8, that's IMUL)
> - what size does it operate on (32), and
> - what operands does it take (rri8).
>
> >> And if no one else does this lowercasing already (I don't think so),
> this can introduce ambiguity (it will).
> >
> > I was somewhat skeptical this was a valid concern -- that any ambiguity
> is
> > a bug -- and I set out to prove it.  I was thinking we could just add
> some
> > rules to enforce it, and be done with it.
> >
> > Sadly I found ST_Fp32m vs. ST_FP32m and ST_Fp64m vs. ST_FP64m in X86.
> > --
> > $ clang -x c++ -E -DGET_INSTRINFO_ENUM *GenInstrInfo.inc 2>/dev/null |
> > awk '$0 ~ /namespace Sched {/ {exit}' |
> > sed -e 's, *,,g' -e 's,=.*,,' |
> > grep '.' | grep -v '[{}]' |
> > tr '[A-Z]' '[a-z]' | sort | uniq -c | sort -n |
> > awk '$1 > 1 {print}'
> >    2 st_fp32m
> >    2 st_fp64m
> > --
> > (There's also ST_FpP{32,64}m, as it happens.)
> >
> > Same issue in ARM:
> > --
> >    2 tstri
> >    2 tstrr
>
> Hah, I guess one is TSTri, the other is tSTRi:  if so, that (weakly)
> proves my point!  I can immediately tell from the name that TSTri is a
> register-immediate bit test, and tSTRi is perhaps a thumb
> store-immediate?
>

Thanks for pointing those issues out, I will commit this patch without
changing the instruction names to lowercase when printing and with case
sensitivity in place when parsing.

While lowercase won't work for instruction names, I still want to see
physical registers and other unambiguous things in lowercase.

Cheers,
Alex

> But yes, maybe the real fix would be to avoid ambiguity ;)
>
> -Ahmed
>
> > --
> >
> > (As it happens, AArch64 is clean.  I didn't check other targets.)
> >
> > I'd weakly argue it would be worth removing the ambiguity (and enforcing
> its
> > absence), but that's definitely outside the scope of serializing MIR :(.
> >
> > Case-sensitive it is!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150618/d390ba64/attachment.html>