[PATCH] D123032: [clang][dataflow] Exclude protobuf types from modeling in the environment.

Fri May 6 06:32:25 PDT 2022

ymandel marked 2 inline comments as done.
ymandel added inline comments.

================
Comment at: clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp:513
+  const RecordDecl *RD = BaseTy->getDecl();
+  if (RD->getIdentifier() == nullptr || RD->getName() != "Message")
+    return false;
----------------
xazax.hun wrote:
> xazax.hun wrote:
> > ymandel wrote:
> > > xazax.hun wrote:
> > > > Not sure how often is this invoked but we could reduce the number of string comparisons by caching the identifier ptr and do a pointer comparison.
> > > Good question. It means an extra comparison for each type until the pointer is cached (to check if the cache is set) and then, afterwards, 2 comparisons vs ~10 for the common case where the class name is doesn't match. In the matching case, though, it is clearly saving much more.
> > > 
> > > For proto-heavy code, it seems a win, and a loss otherwise.  But, the question is where to put the cache. It seems to me best to move this to be a method on DataflowAnalysisContext (since it is a global, not local env, property) and make the cached pointer a private member of DAC.
> > > 
> > > Thoughts?
> > I think it might be nice to have a context that is scoped to the translation unit rather than a function. We might have other stuff that we want to cache here.
> > 
> > An example how the static analyzer is dealing with this: https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp#L960
> > 
> > In case we do not want to be lazy and willing to populate the cache eagerly at the beginning of the analysis, we would not pay for the extra checks to see if the cache is populated.
> I'm also fine with not doing caching for now and having a note to consider this in the future if it becomes measurable in the profiles. It is probably fine for now when we only have a couple of short strings to compare. But I guess the number of special cases we handle this way will only increase in the future. 
> I'm also fine with not doing caching for now and having a note to consider this in the future if it becomes measurable in the profiles. It is probably fine for now when we only have a couple of short strings to compare. But I guess the number of special cases we handle this way will only increase in the future. 

(sorry for the very long delay in responding).  I think holding off on the cache may be best for now, pending performance analysis. I'm hesitant to put too much effort in to optimizing this, because ultimately we want to move to a lazy-initialization model which would obviate the need for this kind of optimization, because we would only model the fields that are used, making the size of the underlying struct irrelevant.

I've added a FIXME to this effect.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123032/new/

https://reviews.llvm.org/D123032