[PATCH] MIR Serialization: Serialize machine instruction names.

Ahmed Bougacha ahmed.bougacha at gmail.com
Wed Jun 17 18:06:22 PDT 2015


On Wed, Jun 17, 2015 at 5:51 PM, Duncan P. N. Exon Smith
<dexonsmith at apple.com> wrote:
>
>> On 2015 Jun 17, at 16:45, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote:
>>
>> REPOSITORY
>>  rL LLVM
>>
>> ================
>> Comment at: lib/CodeGen/MIRPrinter.cpp:115
>> @@ -86,1 +114,3 @@
>> +
>> +  OS << StringRef(TII->getName(MI.getOpcode())).lower();
>> }
>> ----------------
>> Why lower-case?
>
> I prefer lowercase because the characters are designed to be easier for
> humans to read than uppercase (and I'm human).
>
>> I'd much prefer the exact same as the C++/MI enums/names.
>
> LLVM IR has different capitalization in textual format vs. C++ classes.  I
> think it works well.

In general, I totally agree, lowercase is easier to read, but here
we're encoding more than just the opcode in the name, and IMO
capitalization helps a lot in immediately separating the name into:
- what's the actual opcode (for IMUL32rri8, that's IMUL)
- what size does it operate on (32), and
- what operands does it take (rri8).

>> And if no one else does this lowercasing already (I don't think so), this can introduce ambiguity (it will).
>
> I was somewhat skeptical this was a valid concern -- that any ambiguity is
> a bug -- and I set out to prove it.  I was thinking we could just add some
> rules to enforce it, and be done with it.
>
> Sadly I found ST_Fp32m vs. ST_FP32m and ST_Fp64m vs. ST_FP64m in X86.
> --
> $ clang -x c++ -E -DGET_INSTRINFO_ENUM *GenInstrInfo.inc 2>/dev/null |
> awk '$0 ~ /namespace Sched {/ {exit}' |
> sed -e 's, *,,g' -e 's,=.*,,' |
> grep '.' | grep -v '[{}]' |
> tr '[A-Z]' '[a-z]' | sort | uniq -c | sort -n |
> awk '$1 > 1 {print}'
>    2 st_fp32m
>    2 st_fp64m
> --
> (There's also ST_FpP{32,64}m, as it happens.)
>
> Same issue in ARM:
> --
>    2 tstri
>    2 tstrr

Hah, I guess one is TSTri, the other is tSTRi:  if so, that (weakly)
proves my point!  I can immediately tell from the name that TSTri is a
register-immediate bit test, and tSTRi is perhaps a thumb
store-immediate?

But yes, maybe the real fix would be to avoid ambiguity ;)

-Ahmed

> --
>
> (As it happens, AArch64 is clean.  I didn't check other targets.)
>
> I'd weakly argue it would be worth removing the ambiguity (and enforcing its
> absence), but that's definitely outside the scope of serializing MIR :(.
>
> Case-sensitive it is!



More information about the llvm-commits mailing list