<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2015-06-17 18:06 GMT-07:00 Ahmed Bougacha <span dir="ltr"><<a href="mailto:ahmed.bougacha@gmail.com" target="_blank">ahmed.bougacha@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Wed, Jun 17, 2015 at 5:51 PM, Duncan P. N. Exon Smith<br>

<<a href="mailto:dexonsmith@apple.com">dexonsmith@apple.com</a>> wrote:<br>

><br>

>> On 2015 Jun 17, at 16:45, Ahmed Bougacha <<a href="mailto:ahmed.bougacha@gmail.com">ahmed.bougacha@gmail.com</a>> wrote:<br>

>><br>

>> REPOSITORY<br>

>>  rL LLVM<br>

>><br>

>> ================<br>

>> Comment at: lib/CodeGen/MIRPrinter.cpp:115<br>

>> @@ -86,1 +114,3 @@<br>

>> +<br>

>> +  OS << StringRef(TII->getName(MI.getOpcode())).lower();<br>

>> }<br>

>> ----------------<br>

>> Why lower-case?<br>

><br>

> I prefer lowercase because the characters are designed to be easier for<br>

> humans to read than uppercase (and I'm human).<br>

><br>

>> I'd much prefer the exact same as the C++/MI enums/names.<br>

><br>

> LLVM IR has different capitalization in textual format vs. C++ classes.  I<br>

> think it works well.<br>

<br>

</span>In general, I totally agree, lowercase is easier to read, but here<br>

we're encoding more than just the opcode in the name, and IMO<br>

capitalization helps a lot in immediately separating the name into:<br>

- what's the actual opcode (for IMUL32rri8, that's IMUL)<br>

- what size does it operate on (32), and<br>

- what operands does it take (rri8).<br>

<span class=""><br>

>> And if no one else does this lowercasing already (I don't think so), this can introduce ambiguity (it will).<br>

><br>

> I was somewhat skeptical this was a valid concern -- that any ambiguity is<br>

> a bug -- and I set out to prove it.  I was thinking we could just add some<br>

> rules to enforce it, and be done with it.<br>

><br>

> Sadly I found ST_Fp32m vs. ST_FP32m and ST_Fp64m vs. ST_FP64m in X86.<br>

> --<br>

> $ clang -x c++ -E -DGET_INSTRINFO_ENUM *GenInstrInfo.inc 2>/dev/null |<br>

> awk '$0 ~ /namespace Sched {/ {exit}' |<br>

> sed -e 's, *,,g' -e 's,=.*,,' |<br>

> grep '.' | grep -v '[{}]' |<br>

> tr '[A-Z]' '[a-z]' | sort | uniq -c | sort -n |<br>

> awk '$1 > 1 {print}'<br>

>    2 st_fp32m<br>

>    2 st_fp64m<br>

> --<br>

> (There's also ST_FpP{32,64}m, as it happens.)<br>

><br>

> Same issue in ARM:<br>

> --<br>

>    2 tstri<br>

>    2 tstrr<br>

<br>

</span>Hah, I guess one is TSTri, the other is tSTRi:  if so, that (weakly)<br>

proves my point!  I can immediately tell from the name that TSTri is a<br>

register-immediate bit test, and tSTRi is perhaps a thumb<br>

store-immediate?<br></blockquote><div><br></div><div>Thanks for pointing those issues out, I will commit this patch without changing the instruction names to lowercase when printing and with case sensitivity in place when parsing.</div><div><br></div><div>While lowercase won't work for instruction names, I still want to see physical registers and other unambiguous things in lowercase.</div><div><br></div><div>Cheers,</div><div>Alex </div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

But yes, maybe the real fix would be to avoid ambiguity ;)<br>

<span class="HOEnZb"><font color="#888888"><br>

-Ahmed<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

> --<br>

><br>

> (As it happens, AArch64 is clean.  I didn't check other targets.)<br>

><br>

> I'd weakly argue it would be worth removing the ambiguity (and enforcing its<br>

> absence), but that's definitely outside the scope of serializing MIR :(.<br>

><br>

> Case-sensitive it is!<br>

</div></div></blockquote></div><br></div></div>