PLEASE REVIEW: modularize: new preprocessor conditional directive checking feature

Sean Silva silvas at purdue.edu
Wed Jun 19 20:10:00 PDT 2013


+// Do macro substitutions in a string.
+// FIXME: Note that this is very simple parser, such that it doesn't
+// support things like function-style macros, macros with string
+// or character literals, and defined().
+// FIXME: There probably should be a function in the Preprocessor
+// class that will lex a string, do substitutions, and return either
+// a token list or a converted string.  However, it's somewhat
+// problematic in that it could interfere with the preprocessor
+// and lexor state, and call the MacroExpanded handlers again.
+// So for now, we stick with this simple function, to be extend
+// to support function-style macros.
+std::string PreprocessorCallbacks::getMacroSubstitutedString(std::string
str) {

Again, this function is not appropriate. Please investigate the
implementation of clang's macro-related diagnostic messages for
inspiration, or consult with cfe-dev with a more refined question than your
last question (see below; I think I may have identified what the key
question is); the appropriate solution will almost surely require
significant changes to your patch. Clang gives detailed notes in
diagnostics if the offending code was expanded from a macro, and traces the
expansion many levels deep in some cases.


+  // Return true if this directive matches another.
+  // (It compares the converted conditionals.)
+  bool match(PreprocessorDirective *other);

This needs a more useful name. Also, it's unacceptable that this is based
on a string comparison; walk clang's macro expansion history at each
location and use that to do the comparison.

If I understand your code correctly, what you want to do is basically:

map<FileAndLine, PreprocessorCondition> M;
for each PreprocessorCondition P at FileAndLine FL across the given headers:
  if (M contains an entry with key FL)
    if (M[FL] != P)
      diagnoseError(M[FL], P);
  M[FL] = P;

Where you have an appropriate equivalence relation defined for
PreprocessorCondition which takes into account the entire macro expansion
at that location (this is probably what you want to ask about on cfe-dev,
but do some research on it first by investigating clang's macro-related
diagnostics). This approach is fully general and can easily be expanded to
work on arbitrary macro expansions anywhere in the file, as long as an
appropriate equivalence relation is available. In reality, you probably
want to keep a vector of distinct PreprocessorCondition's (there may be
more than just 2 different expansions), and so the real code would look
more like:

map<FileAndLine, vector<PreprocessorCondition>> M;
for (each PreprocessorCondition P at FileAndLine FL across the given
headers)
  vector<PreprocessorCondition> &V = M[FL];
  if (find(V.begin(), V.end(), C) == V.end())
    V.push_back(C);
for (auto &E : M)
  if (E->second.size() > 1)
    diagnose(E->second);

The first `for` loop probably won't take exactly that form in the final
code, as its body will likely execute inside of a PPCallbacks callback.



Although almost none of your existing code will end up in an updated patch
with the algorithm I suggested above, you might find the following comments
helpful as you dive into writing the new code (these are just various
things I saw while trying to grok your code).

+  HeaderInstanceVectorIterator hiIter = getHeaderInstancesBegin();
+  HeaderInstanceVectorIterator hiEnd = getHeaderInstancesEnd();

Please also address the naming convention issues *in the patch* as well!

+HeaderFile *PreprocessorCallbacks::getHeaderFile(const std::string& name) {

And the trivial formatting issues! Please, save yourself (and me!) the
trouble and just use clang-format.

+  HeaderInstance *headerInstance = new HeaderInstance(
+  headerFile, topHeader);

+      delete headerFile;  // No longer needed.

+  HeaderInstance *newInstance = new HeaderInstance(
+  headerFile, topHeader);

+void HeaderTracker::deleteHeaderInstances() {
+  HeaderInstanceVectorIterator hiIter = getHeaderInstancesBegin();
+  HeaderInstanceVectorIterator hiEnd = getHeaderInstancesEnd();
+  for (; hiIter != hiEnd; ++hiIter)
+    delete (*hiIter);
+  HeaderInstances.clear();
+}

It's still not clear who owns these and what their lifetime semantics are.

+#ifndef HEADER_TRACKER_H
+#define HEADER_TRACKER_H

Please add a MODULARIZE_ prefix here and to any other header guards that
may be missing it.

+/// This class tracks the instances of one particular header.  It stores
the
+/// header name and a vector of HeaderInstanceís.  If all instances

I'm seeing a non-ASCII character in HeaderInstance<<HERE>>s. Please clean
up any other non-ASCII characters as well.

+std::string getIndent(int level) {
+  char buffer[256];
+  int index = 0;
+
+  while (level > 0) {
+    int size = 4;
+    while (size-- > 0)
+      buffer[index++] = ' ';
+    level--;
+  }
+
+  buffer[index] = '\0';
+
+  return std::string(buffer);
+}
+
+}

Just do `std::string(level * 4, ' ')`.

+bool PreprocessorDirective::match(PreprocessorDirective *other) {
+  if (ConvertedConditional == other->getConvertedConditional()) {
+    return true;
+  }
+  return false;
+}
+

Just do `return ConvertedConditional == other->getConvertedConditional();`.

+  }
+  else
+    CurrentHeaderFile = mhf;

Please use our bracing style. I recommend widespread use of clang-format.

+  // Preprocessor directive accessors.
+  PreprocessorDirectiveVector &getDirectives() { return Directives; }
+  int getDirectiveCount() { return Directives.size(); }
+  PreprocessorDirectiveVectorIterator getDirectivesBegin()
+    { return Directives.begin(); }
+  PreprocessorDirectiveVectorIterator getDirectivesEnd()
+    { return Directives.end(); }
+  PreprocessorDirective &getDirective(int index) { return
Directives[index]; }
+  void addDirective(PreprocessorDirective &directive)
+    { Directives.push_back(directive); };

These seem to be pointless boilerplate. Just access `Directives` directly.

+  // Source location string accessors.
+  unsigned getLineNumber() { return LineNumber; }
+  void setLineNumber(unsigned value) { LineNumber = value; };
+
+  // Unconverted conditional accessors.
+  std::string &getUnconvertedConditional()
+    { return UnconvertedConditional; }
+  void setUnconvertedConditional(std::string value)
+    { UnconvertedConditional = value; };
+
+  // Converted conditional accessors.
+  std::string &getConvertedConditional() { return ConvertedConditional; }
+  void setConvertedConditional(std::string value)
+    { ConvertedConditional = value; };

These also seem like pointless boilerplate. Just make this a lightweight
value struct and define it near the class that owns instances of it.

+}  // end namespace clang

This comment is wrong, and so are others. Please fix all your "end
namespace" comments.

+// Collections for users of this typs.
+typedef llvm::StringMap<HeaderFile *> HeaderFileMap;
+typedef HeaderFileMap::iterator HeaderFileMapIterator;

These typedefs don't seem to be buying you much, especially the iterator
one. There are similar typedefs in other places. You can probably just
remove them with little loss of readability.

+  int offset = source.find(':', 2);
+  if (offset < 0)

std::string::find returns an unsigned type (usually size_t). On a platform
with 32-bit int and 64-bit size_t, does the conversion of
size_t(0xFFFFFFFFFFFFFFFFULL) to int result in -1? It's hard to tell
whether this code is right or wrong; please rectify that.

+      returnValue = false;
+      break;
+    }
+  }
+  return returnValue;

just `return false;` instead of `break`ing.

+  if (Directives.size() == 0)

Use .empty()

+  return Directives[Directives.size() - 1].getLineNumber();

Directives.back().getLineNumber()

+// Get directive spelling.
+const char *PreprocessorDirective::getDirectiveSpelling(
+    clang::tok::PPKeywordKind kind) {
+  const char *directiveString;
+  switch (kind) {
+    case clang::tok::pp_if:
+      directiveString = "if";
+      break;
+    case clang::tok::pp_elif:
+      directiveString = "elif";
+      break;
+    case clang::tok::pp_ifdef:
+      directiveString = "ifdef";
+      break;
+    case clang::tok::pp_ifndef:
+      directiveString = "ifndef";
+      break;
+    default:
+      directiveString = "(unknown)";
+      break;
+  }
+  return directiveString;
+}

Use early returns here. I.e. `case clang::tok::pp_elif: return "elif";`

+  HeaderInstanceVectorIterator hiIter = getHeaderInstancesBegin();
+  HeaderInstanceVectorIterator hiEnd = getHeaderInstancesEnd();
+  int directiveCount = 0x7fffffff;  // Big number.
+  // Get the number of instances which is least.
+  for (; hiIter != hiEnd; ++hiIter) {
+    HeaderInstance *headerInstance = (*hiIter);
+    HeaderFile *headerFile = headerInstance->getHeaderFile();
+    if (headerFile->getDirectiveCount() < directiveCount)
+      directiveCount = headerFile->getDirectiveCount();
+  }

This is needlessly verbose. The following seems a lot more idiomatic and
clear to me:

typedef std::vector<HeaderInstance *>::iterator Iter;
size_t MinDirectives = std::numeric_limits<size_t>::max();
for (Iter I = HeaderInstances.begin(), E = HeaderInstances.end(); I != E;
++I)
  MinDirectives = std::min((*I)->getHeaderFile->getDirectiveCount(),
                            MinDirectives);

And it's still not clear who owns all these pointers.

+// Report instances where header snapshots differ.
+// Returns true if no errors, i.e. no differing snapshots.
+bool HeaderTracker::report() {

You really need to clarify what the "algorithm" being done here is.

+    for (int headerInstanceIndex = 1;
+        headerInstanceIndex < headerInstanceCount;
+        headerInstanceIndex++) {

Why does this loop start at `1`? Also, use shorter loop variable names (I,
E, J, JE, etc.)

+    // We had one or more mismatches.  Output diagnostic and exit.
+    errs() << "warning: The instances of header "
+      << Name << " have different contents after preprocessing:\n";
+    for (int headerInstanceIndex = 0;
+        headerInstanceIndex < headerInstanceCount;
+        headerInstanceIndex++) {
+      HeaderInstance *headerInstance =
+        getHeaderInstance(headerInstanceIndex);
+      HeaderFile *headerFile = headerInstance->getHeaderFile();
+      errs() << getIndent(1)
+        << "When included or nested in these top-level headers:\n";
+      TopHeaderVectorIterator thIter =
+        headerInstance->getTopHeadersBegin();
+      TopHeaderVectorIterator thEnd = headerInstance->getTopHeadersEnd();
+      for (; thIter != thEnd; ++thIter) {
+        errs() << getIndent(2) << *thIter << "\n";
+      }
+      errs() << getIndent(1)
+        << "This preprocessor directive is mismatched (the parentheses"
+          " show the condition after macro substitution):\n";
+      PreprocessorDirective *directive =
+        &headerFile->getDirective(directiveIndex);
+      directive->dump(2, Name.c_str());

Factor out this code which produces the diagnostic into a method.

+    break;  // Break out now that we've found directives that don't match,
+            // as the directives list might not match anymore.
+  }
+  return false;   // There was a mismatch.

Use an early return here. Also, it looks like this function returns `false`
regardless of whether a mismatch was found.

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130619/99e1acef/attachment.html>


More information about the cfe-commits mailing list