<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Dec 31, 2014, at 11:21 AM, James Burgess <<a href="mailto:jamesrburgess@mac.com" class="">jamesrburgess@mac.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html charset=windows-1252" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Thanks Ted for your insight, should be fairly straightforward for me now.<div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Dec 31, 2014, at 12:01 AM, Ted Kremenek <<a href="mailto:kremenek@apple.com" class="">kremenek@apple.com</a>> wrote:There's two design points at play here that I think are worth noting:</div></blockquote><div class=""><blockquote type="cite" class=""><div class=""><br class=""></div><div class="">(1) This interposition is not a great hack. Sometimes the build system doesn't control every compiler innovation. For example, a Makefile build could spin off a shell script that invokes the compiler, and that script may not consult CC or CXX. Also, some Makefiles (or whatever) hardwire the compiler they use, thus setting CC or CXX has no effect. Unless the build system has total control of the compilation jobs, the only sure way to interpose on all compiler instances is to do true OS interposition. This can be accomplished in various platform-specific ways on different platforms, and is the approach taken by several commercial static analysis tools.</div><div class=""><br class=""></div></blockquote></div><div class="">This might be irrelevant but I have spent a *lot* of time wrestling makefile-based systems over the years (~25 years) both my own and others. I personally would consider a Makefile that hardwires CC or CXX to be a broken Makefile. I would not let that influence my thinking on how I’d implement the analyzer. The business of running another script from a Makefile is one of the reasons I'm not using make anymore, while sometimes necessary it just a path that leads to a rats nest and so I would also disregard that when thinking about making an analyzer.</div></div></div></div></div></blockquote><div><br class=""></div><div>I have similar sentiments about having well-structured Makefiles/build systems. That said, if a goal for the analyzer is to be relatively turn key for any build system then the analyzer needs to be able to work without making any assumptions on how the build system is implemented. Thus the build system becomes a black box, and a mixture of Makefiles and scripts aggregates into just something that performs a build. If it is not CC or CXX, perhaps some other piece of information acts as the locus to tie a combination of Makefiles or scripts together? Even for well-principled builds, there is enough variation with how people do things that I have found that without deep integration of the analyzer with the build system there's really no assumptions one can make on how the build system works.</div><div><br class=""></div><div>That said, I think deep integration of the analyzer in a build system, as we have done with Xcode/xcodebuild, is extremely powerful, and enables new workflows and encourages people to have well-structured projects that are amendable to great tooling.</div><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><div class=""><div class=""><br class=""></div><br class=""><blockquote type="cite" class=""><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class=""></div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">(2) The way we do analysis and gather reports could hypothetically change in the future. Right now we analyze each file at a time, generate reports, and then stitch the reports together. In the future, a reasonable direction for the analyzer would be do two phases of analysis: first do a pass over the files like we do now, and then do a more global analysis to find bugs that cross file boundaries. This approach would inherently defer the generation of reports to the very end. Also, the mechanism by which reports are generated could change. Right now HTML files are generated eagerly, but those really are a static rendering. A more versatile approach would be to generate output that summarizes the bug reports, and have the HTML reports (or whatever) generated from those.</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class=""></div></blockquote><br class=""><div class="">A two phase system sounds like a huge win if it gets global analysis. As far as the interface between the build system and the analyzer goes, for this I would think it will come down to how to pass the state from each phase. You could require the build system pass in some identifier to the analyzer for each pass. The identifier could then be turned into a directory name or db key to do whatever it needed to communicate between passes. Feels like a natural thing for a build system to do (generate a build id). I’m sure when I go look in the scan-build script there’s going to be some “look in /tmp for a set of files named X” type arrangement, this would just be formalizing that idea. </div></div></div></div></div></blockquote><br class=""></div><div>A two phase approach technically doesn't even require the build system to get involved. The analyzer can interpose on the build, intercepting uses of the compiler, and do the first phase. That interposition can dump results anywhere without needing to thread phase state into the build system. Indeed, the second phase can happen completely asynchronously from the build itself, delayed much later or even run on a different machine. I've seen some static analysis tools take this approach, which allows them to integrate within a large nightly build and delay the expensive static analysis part until after the build completes.</div><div><br class=""></div><div>If this was integrated into the build system, I think the build system would just need to know where the dump the phase 1 stuff and inform the analyzer of this information. The build system could then be responsible for running the phase 2 (post-build), or something else could be responsible for initiating that part.</div></body></html>