[PATCH] D75306: [ms] [llvm-ml] Add initial MASM STRUCT/UNION support

Mon May 4 12:22:04 PDT 2020

thakis added a comment.

Sorry it took a while. This is a good chunk of work, thanks for doing it. Smaller changes are faster to review :)

As a general point, before sending out a change I'd suggest doing a detailed read by yourself to find things like typos in comments, notice things like needless duplication (sometimes, it's the right tradeoff), etc.

================
Comment at: llvm/include/llvm/MC/MCParser/MCAsmParser.h:173

+  virtual bool LookupFieldOffset(StringRef Base, StringRef Member,
+                                 unsigned &Offset) {
----------------
nit: LookUpFieldOffset. "Lookup" is a noun, "look up" is the verb.

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:167
+  FieldType FT;
+  union {
+    IntFieldInfo IntInfo;
----------------
epastor wrote:
> rnk wrote:
> > I think this union is creating a fair amount of code complexity. The more clang-y code pattern would be to do something like:
> > - Use a BumpPtrAllocator for memory allocation to make these all owned by the asm parser, there should be one already somewhere
> > - Use raw pointers to point between these classes, avoids need for unique_ptr or other ownership management
> > - Set up the usual class hierarchy:
> >     class FieldInitializer { FieldType FT; ... };
> >     class IntFieldInit : public FieldInitializer { ... };
> >     class RealFieldInit : public FieldInitializer { ... };
> >     class StructFieldInit : public FieldInitializer { ... };
> > 
> > Then the parseFieldInitializer variants could return a nullable pointer to indicate parsing success or failure. WDYT?
> > 
> > The drawback of this approach is that objects in BumpPtrAllocators are not destroyed, so they cannot own external resources. Most vectors can just become ArrayRefs pointing to arrays in the same allocator. The main thing I see that this would affect is the StringMap for fields and APInt for int fields. So, I guess I'm not sure it's worth a big rewrite, but it's something to consider.
> I agree it would might be worth the change - but the rewrite looks to be pretty big, and complicated due to several of the members in these objects. Let's move forward in a full review for now, and discuss the right answer as we go.
Sounds good to me, but maybe add a FIXME with rnk's suggestion.

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:116

+enum FieldType { FT_INTEGRAL, FT_REAL, FT_STRUCT };
+struct FieldInfo;
----------------
Can you add comments what each means? Since REAL is an APInt below, I guess it's also an integer type?

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:123
+  size_t Size = 0;
+  std::vector<FieldInfo> Fields;
+  StringMap<size_t> FieldsByName;
----------------
Should this be a SmallVector?

(The clang approach would probably to allocate field infos right after the struct info so that it's in one slab, but probably not worth it here.)

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:211
+    case FT_STRUCT:
+      llvm_unreachable("Cannot initialize a struct field with integral values");
+      break;
----------------
s/integral values/struct values/

...but why pass in a FieldType at all if only one value is valid? Why not just

```
FieldInitializer(SmallVector<const MCExpr *, 1> &&Values) : FT(FT_INTEGRAL) {
  new (&IntInfo) IntFieldInfo(Values);
}
```

?

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:219
+    case FT_INTEGRAL:
+      llvm_unreachable("Cannot initialize an integral field with real values");
+      break;
----------------
same

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:225
+    case FT_STRUCT:
+      llvm_unreachable("Cannot initialize a struct field with real values");
+      break;
----------------
same

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:233
+      : FT(FT) {
+    switch (FT) {
+    case FT_INTEGRAL:
----------------
same

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:357
+    if (Size % Alignment != 0) {
+      Size += Alignment - (Size % Alignment);
+    }
----------------
can you use llvm::alignTo() for this?

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:403
+
+  /// maps struct tags to struct definitions
+  StringMap<StructInfo> Structs;
----------------
sentence case

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:406
+
+  /// maps data location names to user-defined types
+  StringMap<const StructInfo *> KnownType;
----------------
same

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:3527
+
+    return false;
+  }
----------------
This is identical to the else you added in the previous function as far as I can tell. Can you change this to be less copy-pasty? (extract function, or make one of the two dispatch to the other, or what have you)

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:3540
+      Tok.is(AsmToken::LessLess) || Tok.is(AsmToken::LessGreater)) {
+    if (Field.LengthOf == 1 && Field.Type > 1)
+      return Error(Loc, "Cannot initialize scalar field with array value");
----------------
this is very nested. maybe try this structure:

  if lcurly
    return parseFieldInitializerCurly()
  if less
    return parseFieldInitializeLess
  if ...

i.e. add helper functions, with comments. If you find yourself at nesting depth 6, that's often a sign that extracting a function might be a good idea. I find "every part of a function should be at the same level of abstraction" a good rule of thumb fairly often.

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:3605
+  if (Tok.is(AsmToken::LCurly) || Tok.is(AsmToken::Less) ||
+      Tok.is(AsmToken::LessLess) || Tok.is(AsmToken::LessGreater)) {
+    if (Field.LengthOf == 1)
----------------
This looks very similar to the previous function. 
extract the common stuff into helper functions.

================
Comment at: llvm/lib/MC/MCParser/MasmParser.cpp:3669
+    Optional<AsmToken::TokenKind> EndToken;
+    if (parseOptionalToken(AsmToken::LCurly)) {
+      EndToken = AsmToken::RCurly;
----------------
same

================
Comment at: llvm/test/tools/llvm-ml/struct.test:9
+
+FOOBAR STRUCT 2
+  c BYTE 3 DUP (4)
----------------
can you use some variation in keyword and identifier case to add test coverage for this all being case-insensitive?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75306/new/

https://reviews.llvm.org/D75306