PLEASE REVIEW: modularize: new preprocessor conditional directive checking feature

Thompson, John John_Thompson at playstation.sony.com
Wed Jun 19 17:10:34 PDT 2013


Sean,

Here's another shot at it that I believe addresses all issues, except for the following:

I didn't get a lead on a function to replace getSourceSnippet, but with your suggestion about StringRef, and Eli's pointer to getCharacterData, I rewrote it to be much simpler.

I didn't get any leads pertaining to getMacroSubstitutedString, so no changes there yet.

I addressed the naming convention changes in the old Modularize.cpp in a separate check-in.

I can't really break it up any smaller in a meaningful way.

-John

From: Sean Silva [mailto:silvas at purdue.edu]
Sent: Monday, June 17, 2013 4:11 PM
To: Thompson, John
Cc: cfe-commits at cs.uiuc.edu; John.Thompson.JTSoftware at gmail.com
Subject: Re: PLEASE REVIEW: modularize: new preprocessor conditional directive checking feature



On Mon, Jun 17, 2013 at 1:15 PM, Thompson, John <John_Thompson at playstation.sony.com<mailto:John_Thompson at playstation.sony.com>> wrote:
Sean,

Thanks again for your help.

Sorry, I overlooked revising the function that was doing some old C-style string handling.  I've addressed that now (in the new ModularizePPDirective::print function).

The only other item I didn't address was the macro substitutions function, which was intentional as I described.  I've looked into this further, but I'd still like to punt on it for now.  Because there isn't such a function already, it appears the only clean way to do it would be to instantiate a new Lexer and Preprocessor object, linking the new preprocessor to the old so it can find the macros (there seems to be a pointer for this).  It would also require adding at least a new Lexer constructor, so it can set up the buffer pointers and correct state.  The constructor for lexing pragmas is close, but it sets the lexer in raw mode, which I don't want.  There's also the problem that it will call the MacroExpanded handler again, and I also worry there might be other side-effects due to the preprocessor and lexer states being changed inadvertently.  Since this is an external tool, I'm kind of thinking that it would be simpler to just use the function I put in for now, and I will look into it some more when I try to figure out how to do the function-style macro expansion, possibly using the existing code in the Preprocessor class.

Please ask on cfe-dev about what you are trying to achieve and how best to achieve it. While this specific functionality (a function that does this) may not be available, clang may have facilities for producing the end result in a simpler way. Since clang already gives detailed information about macro expansion for many diagnostics, it seems likely that your goals here can be achieved with the current code.


Regarding breaking the patch up, it might be easier to consider the changes in three parts:


1.       The new classes.

2.       The changes to Modularize.cpp.

3.       The changes to the tests.

I'll provide these in separate patch files, though the associated changes will need to go in at the same time, however.
In the LLVM community "breaking the patch up" is understood to mean breaking into a series of patches that can be applied in series and that compile and run (and pass all tests) after each patch is applied. Try to keep changes that need to be applied together together in a single patch file (this includes tests); this is basically always achievable (the exception is typically changes that affect multiple projects (e.g. changes to LLVM that require changes to clang), but that isn't an issue in this situation).


The new classes (mod_2013_06_17_new_patch.txt).  I should give some description of the new classes added and how they work together:


1.       ModularizePPCallbacks - Derives from the Clang PPCallbacks class to track preprocessor actions, such as changing files and handling preprocessor directives and macro expansions.  It has to figure out when a new header file is entered and left, as the provided handler is not particularly clear about it.  It also stores a map of macro expansions obtained from the MacroExpands callback, for use by a function that effectively preprocesses a conditional.  It handles the top-level aspects of collecting header file instance information, and tracking the preprocessor conditional directives.

2.       ModularizePPDirective - Stores information about one preprocessor directive instance, presently limited to #if, #elif, #ifdef, and #ifndef, since that is all modularize needs for now.  It stores the source file line number, a directive kind code, and both the unpreprocessed and preprocessed conditional source code snippet.

3.       ModularizeHeaderFile - Store a header file name and a vector of ModularizePPDirective instances collected for that header file.

4.       ModularizeHeaderInstance - Stores a pointer to a ModularizeHeaderFile for a header, and a vector of header file names for the headers from the modularize header list that reference the particular header, either directly or indirectly via some nested include.  If separate instances of the header are encountered when modularize processes its header list, if the preprocessed directive conditionals stored in the ModularizePPDirective vector are the same for and existing ModularizeHeaderFile object, the top-level header name is added to the instance, effectively reusing the ModularizeHeaderFile object.  If a header is seen for the first time, or if the preprocessed conditionals for the stored directives don't match those of an instance of the header seen before, a new ModularizeHeaderInstance object is created and saved.

5.       ModularizeHeaderTracker - Tracks the instances of one particular header.  It stores the header name and a vector of ModularizeHeaderInstance's.  If all instances of a header seen have the same conditionals after preprocessing, there will only be one ModularizeHeaderInstance.  If one or more conditionals were difference, there will be two or more instance objects saved.

6.       ModularizeMasterHeaderTracker - Stores a map of all the ModularizeHeaderTracker objects, and provides an "addHeaderFile" function for handling a header file, and a "report" function for outputting the warnings about the preprocessor conditional directive mismatches.

Please put these descriptions in doxygen comments for their respective classes.


The changes to Modularize.cpp (mod_2013_06_17_modularize_patch.txt):


1.       Add an option for disabling the preprocessing consistency checking.  This is a fallback, in case of problems with the mechanism, or to reduce warnings volume.

2.       Set up a ModularizePPCallbacks object for tracking the preprocessor.  This is done in the CollectEntitiesConsumer object.

3.       Set up a ModularizeMasterHeaderTracker for storing the header instance data.  This is done in the CollectEntitiesConsumer object.

4.       Call the ModularizeMasterHeaderTracker::report function to report any warnings about the preprocessor conditional directive mismatches.

5.       Fixed some naming convention issues.

At the very least number 5 can be a separate patch (feel free to commit it directly).


The changes to the tests (mod_2013_06_17_test_patch.txt):


1.       Some new lines in a couple of files for the new feature.

I'm also including a zip with the changed files.
For future reference, including a zip like this is not necessary (nor particularly desirable for most people's workflow I assume).


I'm hoping I can check this in soon, as it makes me nervous to sit on so much, and makes it harder to continue experimenting.  Since this is still an experimental tool, I'm hoping we can improve it in incremental steps.
At the very least, the stylistic issues (variable naming, line endings, tabs, formatting, etc.) will need to be cleared up before committing.


You mentioned you have other suggestions too.  Please do feel free to send them.


One thing I'm aware of  is that the collections are probably leaking the objects' memory.  I can fix that if necessary.

Yes, please fix that; memory leaks are not acceptable.

Some other things:

* Please rename things called `ModularizeFoo` to be just `Foo` if they are already in the `Modularize` namespace.
* Please name your variables in accordance with LLVM naming conventions and put `*` and `&` on the right.
* Remove hard tabs.
* Use Unix (LF) line endings throughout (I see many CRLF) and make sure to have <=80 columns per line.

+bool ModularizeHeaderTracker::report() {
+  int headerInstanceIndex;
+  int headerInstanceCount = getHeaderInstanceCount();
+  ModularizeHeaderInstance *headerInstance;
+  ModularizeHeaderInstance *headerInstance0;
+  ModularizeHeaderFile *headerFile;
+  ModularizeHeaderFile *headerFile0;
+  int directiveIndex;
+  ModularizePPDirective *directive;
+  ModularizePPDirective *directive0;
+  bool mismatch;

Please move these declarations to their point of first use. Also clarify what the `0` suffix on some of these names means (possibly by renaming to a more useful name).

+  if (getHeaderInstanceCount() > 1) {
+    int directiveCount = 0x7fffffff;  // Big number.
+    ModularizeHeaderInstanceVectorIterator hiIter = getHeaderInstancesBegin();

Reverse the condition of this `if` and use an early exit to simplify and reduce indentation <http://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code>.

+      if (mismatch) {
+        errs() << "warning: The instances of header " << Name << " have different contents after preprocessing:\n";

Same for this `if`.

+// Represents a preprocessor directive kind.
+enum ModularizePPDirectiveKind {
+  MPPD_If,
+  MPPD_ElIf,
+  MPPD_IfDef,
+  MPPD_IfNDef
+};

Surely clang already has an enum that can serve this purpose?

+// Get directive spelling.
+const char *ModularizePPDirective::getDirectiveSpelling(
+    ModularizePPDirectiveKind kind) {
+  const char *directiveString;
+  switch (kind) {
+    case MPPD_If:
+      directiveString = "if";

Surely clang already has something that can do this?

+  ModularizeHeaderFile *mhf = new ModularizeHeaderFile(topFile.str(), PP);
+  addHeaderFile(mhf);
+  CurrentHeaderFile = mhf;
+  RootHeaderFile = mhf;

Who owns this? When is it freed?

+  mhf = getHeaderFile(*fileName);
+  if (mhf == NULL) {
+    mhf = new ModularizeHeaderFile(*fileName, PP);

Same here.

+// Retrieve source snippet from file image.
+std::string ModularizePPCallbacks::getSourceSnippet(SourceRange sourceRange) {

Surely clang has this functionality somewhere already?

+  std::string returnValue(bPtr, length);
+
+  return returnValue;

You can directly `return std::string(...)`.

+  // Trim snippet.
+  while ((*bPtr <= ' ') && (length != 0)) {
+    bPtr++;
+    length--;
+  }
+
+  while ((length != 0) && (bPtr[length - 1] <= ' '))
+    length--;

StringRef::trim?

+  using namespace clang;
+  using namespace llvm;

No `using namespace` in headers. (Second sentence of <http://llvm.org/docs/CodingStandards.html#do-not-use-using-namespace-std>)

+    return (*iter).second;

Use operator-> (e.g. iter->second), here and in other places.

+#ifndef MODULARIZEHEADERTRACKER_H
+#define MODULARIZEHEADERTRACKER_H

In LLVM we typically put underscores between "words" in header guard macros. So e.g. this should be MODULARIZE_HEADER_TRACKER_H.

-- Sean Silva

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130620/d9ec023b/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mod_2013_06_19_patch.txt
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130620/d9ec023b/attachment.txt>


More information about the cfe-commits mailing list