PLEASE REVIEW: modularize: new preprocessor conditional directive checking feature

Sean Silva silvas at purdue.edu
Mon Jun 17 16:10:37 PDT 2013


On Mon, Jun 17, 2013 at 1:15 PM, Thompson, John <
John_Thompson at playstation.sony.com> wrote:

>  Sean,****
>
> ** **
>
> Thanks again for your help.****
>
> ** **
>
> Sorry, I overlooked revising the function that was doing some old C-style
> string handling.  I’ve addressed that now (in the new
> ModularizePPDirective::print function).****
>
> ** **
>
> The only other item I didn’t address was the macro substitutions function,
> which was intentional as I described.  I’ve looked into this further, but
> I’d still like to punt on it for now.  Because there isn’t such a function
> already, it appears the only clean way to do it would be to instantiate a
> new Lexer and Preprocessor object, linking the new preprocessor to the old
> so it can find the macros (there seems to be a pointer for this).  It would
> also require adding at least a new Lexer constructor, so it can set up the
> buffer pointers and correct state.  The constructor for lexing pragmas is
> close, but it sets the lexer in raw mode, which I don’t want.  There’s also
> the problem that it will call the MacroExpanded handler again, and I also
> worry there might be other side-effects due to the preprocessor and lexer
> states being changed inadvertently.  Since this is an external tool, I’m
> kind of thinking that it would be simpler to just use the function I put in
> for now, and I will look into it some more when I try to figure out how to
> do the function-style macro expansion, possibly using the existing code in
> the Preprocessor class.
>

Please ask on cfe-dev about what you are trying to achieve and how best to
achieve it. While this specific functionality (a function that does this)
may not be available, clang may have facilities for producing the end
result in a simpler way. Since clang already gives detailed information
about macro expansion for many diagnostics, it seems likely that your goals
here can be achieved with the current code.


> ****
>
> ** **
>
> Regarding breaking the patch up, it might be easier to consider the
> changes in three parts:****
>
> ** **
>
> **1.       **The new classes.****
>
> **2.       **The changes to Modularize.cpp.****
>
> **3.       **The changes to the tests.
>
I’ll provide these in separate patch files, though the associated changes
> will need to go in at the same time, however.
>
In the LLVM community "breaking the patch up" is understood to mean
breaking into a series of patches that can be applied in series and that
compile and run (and pass all tests) after each patch is applied. Try to
keep changes that need to be applied together together in a single patch
file (this includes tests); this is basically always achievable (the
exception is typically changes that affect multiple projects (e.g. changes
to LLVM that require changes to clang), but that isn't an issue in this
situation).


> ****
>
> ** **
>
> The new classes (mod_2013_06_17_new_patch.txt).  I should give some
> description of the new classes added and how they work together:****
>
> ** **
>
> **1.       **ModularizePPCallbacks – Derives from the Clang PPCallbacks
> class to track preprocessor actions, such as changing files and handling
> preprocessor directives and macro expansions.  It has to figure out when a
> new header file is entered and left, as the provided handler is not
> particularly clear about it.  It also stores a map of macro expansions
> obtained from the MacroExpands callback, for use by a function that
> effectively preprocesses a conditional.  It handles the top-level aspects
> of collecting header file instance information, and tracking the
> preprocessor conditional directives.****
>
> **2.       **ModularizePPDirective – Stores information about one
> preprocessor directive instance, presently limited to #if, #elif, #ifdef,
> and #ifndef, since that is all modularize needs for now.  It stores the
> source file line number, a directive kind code, and both the unpreprocessed
> and preprocessed conditional source code snippet.****
>
> **3.       **ModularizeHeaderFile – Store a header file name and a vector
> of ModularizePPDirective instances collected for that header file.****
>
> **4.       **ModularizeHeaderInstance – Stores a pointer to a
> ModularizeHeaderFile for a header, and a vector of header file names for
> the headers from the modularize header list that reference the particular
> header, either directly or indirectly via some nested include.  If separate
> instances of the header are encountered when modularize processes its
> header list, if the preprocessed directive conditionals stored in the
> ModularizePPDirective vector are the same for and existing
> ModularizeHeaderFile object, the top-level header name is added to the
> instance, effectively reusing the ModularizeHeaderFile object.  If a header
> is seen for the first time, or if the preprocessed conditionals for the
> stored directives don’t match those of an instance of the header seen
> before, a new ModularizeHeaderInstance object is created and saved.****
>
> **5.       **ModularizeHeaderTracker – Tracks the instances of one
> particular header.  It stores the header name and a vector of
> ModularizeHeaderInstance’s.  If all instances of a header seen have the
> same conditionals after preprocessing, there will only be one
> ModularizeHeaderInstance.  If one or more conditionals were difference,
> there will be two or more instance objects saved.****
>
> **6.       **ModularizeMasterHeaderTracker – Stores a map of all the
> ModularizeHeaderTracker objects, and provides an “addHeaderFile” function
> for handling a header file, and a “report” function for outputting the
> warnings about the preprocessor conditional directive mismatches.
>

Please put these descriptions in doxygen comments for their respective
classes.


> ****
>
> ** **
>
> The changes to Modularize.cpp (mod_2013_06_17_modularize_patch.txt):****
>
> ** **
>
> **1.       **Add an option for disabling the preprocessing consistency
> checking.  This is a fallback, in case of problems with the mechanism, or
> to reduce warnings volume.****
>
> **2.       **Set up a ModularizePPCallbacks object for tracking the
> preprocessor.  This is done in the CollectEntitiesConsumer object.****
>
> **3.       **Set up a ModularizeMasterHeaderTracker for storing the
> header instance data.  This is done in the CollectEntitiesConsumer object.
> ****
>
> **4.       **Call the ModularizeMasterHeaderTracker::report function to
> report any warnings about the preprocessor conditional directive mismatches.
> ****
>
> **5.       **Fixed some naming convention issues.
>

At the very least number 5 can be a separate patch (feel free to commit it
directly).


> ****
>
> ** **
>
> The changes to the tests (mod_2013_06_17_test_patch.txt):****
>
> ** **
>
> **1.       **Some new lines in a couple of files for the new feature.****
>
> ** **
>
> I’m also including a zip with the changed files.
>
For future reference, including a zip like this is not necessary (nor
particularly desirable for most people's workflow I assume).


> ****
>
> ** **
>
> I’m hoping I can check this in soon, as it makes me nervous to sit on so
> much, and makes it harder to continue experimenting.  Since this is still
> an experimental tool, I’m hoping we can improve it in incremental steps.
>
At the very least, the stylistic issues (variable naming, line endings,
tabs, formatting, etc.) will need to be cleared up before committing.


> ****
>
> ** **
>
> You mentioned you have other suggestions too.  Please do feel free to send
> them.****
>
> **
>


> **
>
> One thing I’m aware of  is that the collections are probably leaking the
> objects’ memory.  I can fix that if necessary.
>

Yes, please fix that; memory leaks are not acceptable.

Some other things:

* Please rename things called `ModularizeFoo` to be just `Foo` if they are
already in the `Modularize` namespace.
* Please name your variables in accordance with LLVM naming conventions and
put `*` and `&` on the right.
* Remove hard tabs.
* Use Unix (LF) line endings throughout (I see many CRLF) and make sure to
have <=80 columns per line.

+bool ModularizeHeaderTracker::report() {
+  int headerInstanceIndex;
+  int headerInstanceCount = getHeaderInstanceCount();
+  ModularizeHeaderInstance *headerInstance;
+  ModularizeHeaderInstance *headerInstance0;
+  ModularizeHeaderFile *headerFile;
+  ModularizeHeaderFile *headerFile0;
+  int directiveIndex;
+  ModularizePPDirective *directive;
+  ModularizePPDirective *directive0;
+  bool mismatch;

Please move these declarations to their point of first use. Also clarify
what the `0` suffix on some of these names means (possibly by renaming to a
more useful name).

+  if (getHeaderInstanceCount() > 1) {
+    int directiveCount = 0x7fffffff;  // Big number.
+    ModularizeHeaderInstanceVectorIterator hiIter =
getHeaderInstancesBegin();

Reverse the condition of this `if` and use an early exit to simplify and
reduce indentation <
http://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code
>.

+      if (mismatch) {
+        errs() << "warning: The instances of header " << Name << " have
different contents after preprocessing:\n";

Same for this `if`.

+// Represents a preprocessor directive kind.
+enum ModularizePPDirectiveKind {
+  MPPD_If,
+  MPPD_ElIf,
+  MPPD_IfDef,
+  MPPD_IfNDef
+};

Surely clang already has an enum that can serve this purpose?

+// Get directive spelling.
+const char *ModularizePPDirective::getDirectiveSpelling(
+    ModularizePPDirectiveKind kind) {
+  const char *directiveString;
+  switch (kind) {
+    case MPPD_If:
+      directiveString = "if";

Surely clang already has something that can do this?

+  ModularizeHeaderFile *mhf = new ModularizeHeaderFile(topFile.str(), PP);
+  addHeaderFile(mhf);
+  CurrentHeaderFile = mhf;
+  RootHeaderFile = mhf;

Who owns this? When is it freed?

+  mhf = getHeaderFile(*fileName);
+  if (mhf == NULL) {
+    mhf = new ModularizeHeaderFile(*fileName, PP);

Same here.

+// Retrieve source snippet from file image.
+std::string ModularizePPCallbacks::getSourceSnippet(SourceRange
sourceRange) {

Surely clang has this functionality somewhere already?

+  std::string returnValue(bPtr, length);
+
+  return returnValue;

You can directly `return std::string(...)`.

+  // Trim snippet.
+  while ((*bPtr <= ' ') && (length != 0)) {
+    bPtr++;
+    length--;
+  }
+
+  while ((length != 0) && (bPtr[length - 1] <= ' '))
+    length--;

StringRef::trim?

+  using namespace clang;
+  using namespace llvm;

No `using namespace` in headers. (Second sentence of <
http://llvm.org/docs/CodingStandards.html#do-not-use-using-namespace-std>)

+    return (*iter).second;

Use operator-> (e.g. iter->second), here and in other places.

+#ifndef MODULARIZEHEADERTRACKER_H
+#define MODULARIZEHEADERTRACKER_H

In LLVM we typically put underscores between "words" in header guard
macros. So e.g. this should be MODULARIZE_HEADER_TRACKER_H.

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130617/8a8e8efb/attachment.html>


More information about the cfe-commits mailing list