[PATCH] D30691: [analyzer] Support for naive cross translational unit analysis

Mon Dec 11 05:30:27 PST 2017

gerazo added inline comments.

================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:44

+CTU_FUNCTION_MAP_FILENAME = 'externalFnMap.txt'
+CTU_TEMP_FNMAP_FOLDER = 'tmpExternalFnMaps'
----------------
george.karpenkov wrote:
> gerazo wrote:
> > george.karpenkov wrote:
> > > What would happen when multiple analyzer runs are launched concurrently in the same directory? Would they race on this file?
> > Yes and no. The 1st, collect part of CTU creates this file by aggregating all data from the build system, while the 2nd part which does the analysis itself only reads it. Multiple analysis can use this file simultaneously without problem. However, multiple collect phases launched on a system does not make sense. In this case, the later one would write over the previous results with the same data.
> > However, multiple collect phases launched on a system does not make sense
> 
> Why not? What about a system with multiple users, where code is in the shared directory? Or even simpler: I've launched a background process, forgot about it, and then launched it again?
> 
> > In this case, the later one would write over the previous results with the same data.
> 
> That is probably fine, I am more worried about racing, where process B would be reading a partially overriden file (not completely sure whether it is possible)
> 
I see your point. In order to create the multiple user scenario you've mentioned, those users need to give the exact same output folders for their jobs. Our original bet was that our users are not doing this as the scan-build tool itself is also cannot be used this way. Still the process left there is something to consider. I will plan some kind of locking mechanism to avoid situations like this.

================
Comment at: tools/scan-build-py/libscanbuild/analyze.py:609
+    # Recover namedtuple from json when coming from analyze_cc
+    if not hasattr(ctu_config, 'collect'):
+        ctu_config = CtuConfig(collect=ctu_config[0],
----------------
george.karpenkov wrote:
> gerazo wrote:
> > george.karpenkov wrote:
> > > In which case is this branch hit? Isn't improperly formed input argument indicative of an internal error at this stage?
> > An other part of scan-build-py, analyze_cc uses namedtuple to json format to communicate. However, the names are not coming back from json, so this code helps in this. This is the case when someone uses the whole toolset with compiler wrapping. All the environment variable hassle is also happening because of this. So these env vars are not for user modification (as you've suggested earlier).
> OK so `opts['ctu']` is a tuple or a named tuple depending on how this function is entered? BTW could you point me to the `analyze_cc` entry point?
> 
> For the purpose of having more uniform code with less cases to care about, do you think we could just use ordinary tuples instead of constructing a named one, since we have to deconstruct an ordinary tuple in any case?
Using a NamedTuple improves readability of the code a lot with less comments. It is unfortunate that serializing it is not solved by Python. I think moving this code to the entry point would make the whole thing much nicer. The entry point is at analyze_compiler_wrapper

================
Comment at: tools/scan-build-py/libscanbuild/arguments.py:376
+            metavar='<ctu-dir>',
+            default='ctu-dir',
+            help="""Defines the temporary directory used between ctu
----------------
george.karpenkov wrote:
> BTW can we also explicitly add `dest='ctu_dir'` here, as otherwise I was initially very confused as to where the variable is set.
Yes, of course.

https://reviews.llvm.org/D30691