[PATCH] MIR Serialization: Serialize immediate machine operands.

Fri Jun 19 14:38:46 PDT 2015

> On 2015-Jun-19, at 13:35, Alex Lorenz <arphaman at gmail.com> wrote:
> 
> Hi dexonsmith, bob.wilson, bogner,
> 
> This patch is based on a previous serialization patch that serializes physical register operands (http://reviews.llvm.org/D10525).
> 
> This patch serialized immediate machine operands by using integer literals.
> 
> REPOSITORY
>  rL LLVM
> 
> http://reviews.llvm.org/D10573
> 
> Files:
>  lib/CodeGen/MIRParser/MILexer.cpp
>  lib/CodeGen/MIRParser/MILexer.h
>  lib/CodeGen/MIRParser/MIParser.cpp
>  lib/CodeGen/MIRPrinter.cpp
>  test/CodeGen/MIR/X86/immediate-operands.mir
> 
> EMAIL PREFERENCES
>  http://reviews.llvm.org/settings/panel/emailpreferences/
> <D10573.28043.patch>

> Index: lib/CodeGen/MIRParser/MILexer.cpp
> ===================================================================
> --- lib/CodeGen/MIRParser/MILexer.cpp
> +++ lib/CodeGen/MIRParser/MILexer.cpp
> @@ -35,6 +35,8 @@
>  
>    char peek() const { return isEOF() ? 0 : *Ptr; }
>  
> +  char peekNext() const { return (Ptr + 1) >= End ? 0 : *(Ptr + 1); }

This looks like undefined behaviour.  You can only compare two pointers
if they're within the same allocation.  It looks like `Ptr + 1` might go
past the end of the allocation here (already undefined behaviour IIRC,
even if you don't compare it to anything).

Unless you can guarantee that it's valid to dereference `End` (and even
then, really, for clarity), rearrange the terms:

    return End - Ptr <= 1 ? 0 : Ptr[1];

(Also note the `Ptr[1]` -- IMO, brackets are easier to read than
dereference-and-parentheses.  Up to you though.)

I wonder if this can just be merged with `peek()`?

   char peek(unsigned I = 0) const { return End - Ptr <= I ? 0 : Ptr[I]; }

Then instead of calling `peekNext()`, call `peek(1)`.  I don't really
have an opinion here, it's just an idea; feel free to keep separate
functions.

> +
>    void advance() { ++Ptr; }
>  
>    StringRef remaining() const { return StringRef(Ptr, End - Ptr); }
> @@ -77,6 +79,32 @@
>    return C;
>  }
>  
> +static APSInt toInteger(StringRef Str) {
> +  assert(!Str.empty() && "Integer literal string must not be empty");
> +  unsigned NumBits = ((Str.size() * 64) / 19) + 2;
> +  APInt Tmp(NumBits, Str, /*Radix=*/10);
> +  if (Str[0] == '-') {
> +    unsigned MinBits = Tmp.getMinSignedBits();
> +    if (MinBits > 0 && MinBits < NumBits)
> +      Tmp = Tmp.trunc(MinBits);
> +    return APSInt(Tmp, /*IsUnsigned=*/false);
> +  }
> +  unsigned ActiveBits = Tmp.getActiveBits();
> +  if (ActiveBits > 0 && ActiveBits < NumBits)
> +    Tmp = Tmp.trunc(ActiveBits);
> +  return APSInt(Tmp, /*IsUnsigned=*/true);
> +}

Is this copied directly from `LLLexer`?  If so, is there some way of
sharing the logic?  (Should it go in `APSInt.h`?)

If not, what's different?

> +
> +static Cursor lexIntegerLiteral(Cursor C, MIToken &Token) {
> +  auto Range = C;
> +  C.advance();
> +  while (isdigit(C.peek()))
> +    C.advance();
> +  StringRef StrVal = Range.upto(C);
> +  Token = MIToken(MIToken::IntegerLiteral, StrVal, toInteger(StrVal));
> +  return C;
> +}
> +
>  static MIToken::TokenKind symbolToken(char C) {
>    switch (C) {
>    case ',':