[PATCH] D30691: [analyzer] Support for naive cross translational unit analysis

Wed Dec 6 07:17:11 PST 2017

gerazo added a comment.

The code modifications are coming soon (after doing some extensive testing) for the scan-build part.

================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:223
+    ctu_config = get_ctu_config(args)
+    if ctu_config.collect:
+        shutil.rmtree(ctu_config.dir, ignore_errors=True)
----------------
danielmarjamaki wrote:
> danielmarjamaki wrote:
> > not a big deal but I would use early exits in this function
> with "not a big deal" I mean; feel free to ignore my comment if you want to have it this way.
I've checked it through. The only place for an early exit now would be before the else. The 1st and 2nd ifs are in fact non-orthogonal.

================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:145
+            mangled_ast_pairs.append((mangled_name, ast_files.pop()))
+
+    return mangled_ast_pairs
----------------
george.karpenkov wrote:
> Overall, instead of creating a dictionary with multiple elements, and then converting to a list, it's much simper to only add an element to `mangled_to_asts` when it is not already mapped to something (probably logging a message otherwise), and then just return `mangled_to_asts.items()`
The reason for the previous is that we need to count the occurence number ofdifferent mappings only let those pass through which don't have multiple variations.

================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:189
+            # Remove all temporary files
+            shutil.rmtree(fnmap_dir, ignore_errors=True)
+
----------------
gerazo wrote:
> george.karpenkov wrote:
> > Having an analysis tool remove files is scary, what if (maybe not in this revision, but in a future iteration) a bug is introduced, and the tool removes user code instead?
> > Why not just create a temporary directory with `tempfile.mkdtemp`, put all temporary files there, and then simply iterate through them?
> > Then you would be able to get rid of the constant `CPU_TEMP_FNMAP_FOLDER` entirely, and OS would be responsible for cleanup.
> Yes, you are right. We are essentially using a temp dir. Because of the size we first had to put it next to the project (not on tmp drive for instance) and for debugging purposes we gave a name to it. Still it can be done with mkdtemp as well.
Finally, I came to the conclusion that mkdtemp would not be better than the current solution. In order to find our created dir by other threads, we need a designated name. Suffixing it by generated name would further complicate things as we need not to allow multiple concurrent runs here. The current solution is more robust from this point of view.

================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:241
+        run_analyzer_parallel(args)
+        shutil.rmtree(ctu_config.dir, ignore_errors=True)
+    else:
----------------
george.karpenkov wrote:
> Same as the comment above about removing folders. Also it seems like there should be a way to remove redundancy in `if collect / remove tree` block repeated twice.
Th previous call for data removal happens because the user asked for a collect run, so we clean data to do a recollection. This second one happens because the user asked for a full recollection and anaylsis run all in one. So here we destroy the temp data for user's convenience. This happens after, not before like previously. The default behavior is to do this when the user uses the tool the easy way (collect and analyze all in one) and we intentionally keep collection data if the user only asks for a collect or an analyze run. So with this, the user can use a collect run's results for multiple analyze runs. This is written in the command line help. I will definitely put comments here to explain.

================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:296
+        'command': [execution.cmd[0], '-c'] + compilation.flags,
+        'ctu': json.loads(os.getenv('ANALYZE_BUILD_CTU'))
     }
----------------
george.karpenkov wrote:
> Again, is it possible to avoid JSON-over-environment-variables?
There is an other thing against changing this. Currently the interface here using env variables is used by intercept-build, analyze-build and scan-build tool as well. In order to drop json, we need to change those tools too. It would be a separate patch definitely.

================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:561
+            except OSError:
+                pass
+        ast_command = [opts['clang'], '-emit-ast']
----------------
gerazo wrote:
> george.karpenkov wrote:
> > `try/except/pass` is almost always bad.
> > When can the error occur? Why are we ignoring it?
> I think this code is redundant with the if above.
Here the folders are created on demand. Because these are created parallel by multiple processes, there is small chance that an other process already created the folder between the isdir check and the makedirs call. This is why the the pass is needed to make it always run correctly. I will add a comment.

================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:567
+        ast_command.append('-o')
+        ast_command.append(ast_path)
+        logging.debug("Generating AST using '%s'", ast_command)
----------------
george.karpenkov wrote:
> The above can be written more succinctly as:
> `ast_command = [opts['clang'], ...] + args + ['-w', ...]`
After several iterations of the code, I find it easier to version control such multiline constructs. If someone changes a data source, it is clear which one (which line) was modified. The succint notation does not allow clean VCS annotations.

================
Comment at: tools/scan-build-py/tests/unit/test_analyze.py:338
+
+class PrefixWithTest(unittest.TestCase):
+
----------------
george.karpenkov wrote:
> Probably more tests are required for almost 400 lines of functional Python code in this PR.
> Would it be hard to have a full LIT-style integration test? E.g. have a dummy script emulating Clang with a dummy directory structure, which would show how all pieces are meant to fit together?
You are right. The testing infra in scan-build-py is not right anyway (uses nosetests). However, this should be a new patch as you've mentioned earlier.

https://reviews.llvm.org/D30691