[PATCH] D153366: [dataflow] Add dedicated representation of boolean formulas

Fri Jun 23 16:31:03 PDT 2023

sammccall added a comment.

BTW, since this is a substantial API/concept change, I'll wait on approval from all of @mboehme, @xazax.hun, @gribozavr2 - if you don't plan to review just LMK.
(Not in a hurry, just wanted to mention so we don't deadlock :-)

================
Comment at: clang/include/clang/Analysis/FlowSensitive/Formula.h:71
+  ArrayRef<const Formula *> operands() const {
+    return ArrayRef(reinterpret_cast<Formula *const *>(this + 1),
+                    numOperands(kind()));
----------------
mboehme wrote:
> sammccall wrote:
> > mboehme wrote:
> > > Is this technically legal? (I've taken a look, but the definition of similar types makes my eyes glaze over.)
> > > 
> > > Why not use `TrailingObjects` instead? (I'm not really familiar with `TrailingObjects`, so there may be good reasons against it -- just asking because it seems like an obvious existing mechanism that could be used here.)
> > > Is this technically legal?
> > 
> > Yeah, it's OK: similar types etc is not relevant because these pointers are not pointing to objects of different types: there's an object of type Formula at one address that has a lifetime that begins at the constructor (but it's trivially destructible and never destroyed), and then objects of type `Formula*` at higher addresses - these don't overlap, and are both part of the chunks of memory provided by malloc.
> > 
> > > Why not use TrailingObjects instead? 
> > 
> > TrailingObjects is complicated and ugly - it *can* work for this, but I don't think it's actually easier to understand. (Do a code search for "this + 1" in LLVM - this is often done "by hand" instead of with TrailingObjects for simple cases like this one).
> > 
> > e.g. here we have a little magic in create() and a little in operands(). TrailingObjects equally needs ~2 pieces of magic, they're *slightly* less scary, but they also don't relate directly to the memory layout, and I don't think you can have a meaningful understanding of what this is for without also thinking about the memory layout.
> > 
> > TrailingObjects shines when the pointer arithmetic gets complicated, though...
> > > Is this technically legal?
> > 
> > Yeah, it's OK: similar types etc is not relevant because these pointers are not pointing to objects of different types: there's an object of type Formula at one address that has a lifetime that begins at the constructor (but it's trivially destructible and never destroyed), and then objects of type `Formula*` at higher addresses - these don't overlap, and are both part of the chunks of memory provided by malloc.
> 
> Thanks for the explanation!
> 
> Is it worth putting a (condensed) version of this in a comment, or is this well-known enough that it's not worth it?
> 
> > > Why not use TrailingObjects instead? 
> > 
> > TrailingObjects is complicated and ugly - it *can* work for this, but I don't think it's actually easier to understand. (Do a code search for "this + 1" in LLVM - this is often done "by hand" instead of with TrailingObjects for simple cases like this one).
> > 
> > e.g. here we have a little magic in create() and a little in operands(). TrailingObjects equally needs ~2 pieces of magic, they're *slightly* less scary, but they also don't relate directly to the memory layout, and I don't think you can have a meaningful understanding of what this is for without also thinking about the memory layout.
> > 
> > TrailingObjects shines when the pointer arithmetic gets complicated, though...
> 
> Again, thanks for the explanation. I don't have any experience with `TrailingObjects` except for using classes that are implemented using it, so thanks for filling me in.
> Is it worth putting a (condensed) version of this in a comment, or is this well-known enough that it's not worth it?

I've added a few lines inside create() - putting this in the class comment is just distracting I think.

================
Comment at: clang/include/clang/Analysis/FlowSensitive/Formula.h:82
+                         ArrayRef<const Formula *> Operands,
+                         unsigned Value = 0);
+
----------------
mboehme wrote:
> This looks like an "internal" API -- it's really only intended for use by `Arena`, right?
> 
> Maybe add a comment indicating that it's not intended to be used directly?
Done. We could `friend Arena`, but it's not actually a really specialized or dangerous API, so I think a comment is cleaner.

================
Comment at: clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp:218

-  BooleanFormula Formula(NextVar - 1, std::move(Atomics));
+  BooleanFormula Form(NextVar - 1, std::move(Atomics));
   std::vector<bool> ProcessedSubVals(NextVar, false);
----------------
mboehme wrote:
> sammccall wrote:
> > mboehme wrote:
> > > Now that we've added `Formula`, it's pretty confusing that we also have `BooleanFormula`.
> > > 
> > > I assume this is one of the renamings that you want to get to later? I guess `BooleanFormula` becomes something like `CnfFormula` (would like to do `3CnfFormula`, but you can't start with a digit... :( )
> > Agree this is an unfortunate conflict and we should probably rename this local one.
> > (Just because it's more important that the public type gets the good name)
> > 
> > CNFFormula or just CNF SGTM.
> > I'll leave this open and do it as a followup if that's OK, the patch is noisy as is.
> Yes, please do leave it open. This should happen later, just wanted to clarify that that's the plan.
In fact there are only a handful of references to this class name, and it's pretty confusing, so I've done the rename to `CNFFormula`. I think this actually makes the purpose of this class much clearer!

(I did also update some comments here were "value" was being used in a way that's now misleading)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153366/new/

https://reviews.llvm.org/D153366