[LLVMdev] MergeFunctions: reduce complexity to O(log(N))
Stepan Dyatkovskiy
stpworld at narod.ru
Wed May 7 06:49:57 PDT 2014
Hi Nick,
Thanks you for reviews. I have corrected for of comments from FAQ to
what seems to be regular in LLVM.
Patches has been committed as r208173 and r208189.
Thanks!
-Stepan
Nick Lewycky wrote:
> On 29 April 2014 04:36, Stepan Dyatkovskiy <stpworld at narod.ru
> <mailto:stpworld at narod.ru>> wrote:
>
> Hi Nick.
>
> I have fixed and reattached all the patches from 0001 till 0011.
>
> 0001, 0002:
> I have added detailed order descriptions to the methods declaration.
> I have also fixed everything as you asked.
>
>
> 0001:
>
> + for (uint64_t i = 0; i < NumElementsL; ++i) {
> vs.
> + for (unsigned i = 0, e = NumElementsL; i != e; ++i) {
>
> This difference is necessary to avoid a warning? I don't actually care
> which form you use here given that NumElementsL is already an integer
> there's no recalculation we're avoiding like getting the end iterator
> each time, I just want it to look consistent. If it's working around a
> warning that's more important than consistent looks.
>
> 0001 looks good. Please commit it.
>
> 0002:
>
> One note, you use a writing style of asking a question, followed by
> answering it. For instance, "Will we still get the same numbering? As
> follows from FunctionComparator::compare(), we do CFG walk, ...". I
> don't think it's worth your time to rewrite it now, but we generally
> don't use that construction in llvm's comments, instead preferring to
> explain what the code does an including an argument that explains why
> the approach is correct. This amounts to the same thing, it's just a
> different style.
>
> + /// their functions. If right value was meet first during
> scanning,
>
> "was meet" --> "was met"
>
> Please commit it.
>
> Nick
>
>
> 0003, cmpStrings
> StringRef::compare performs lexicographical comparison, and it isn't
> the fastest one.
> In case, when strings has different length, it still compares their
> prefixes lexicographically.
> We need ordering among strings set, but we don't need to compare
> strings contents lexicographically all the time. So we can compare
> their sizes first, and do lexicographical comparison only when we
> really need it: only when strings are equal in size.
>
> I have added short version of these explanations into cmpStrings. So
> we could get really well documented methods based on our posts ;-)
> Those were good questions and definitely the answers should be in
> documentation somewhere.
>
> 0004: I have added ordering description.
> 0005: Fixed.
> 0007: Fixed.
> 0009:
>
> > Okay, I'm confused. We do all this work to produce a 3-way
> comparison,
> > then we wrap it up in an operator< for a std::set that will only
> look at
> > less-than...
> Actually, on top level we don't need 3-way comparison. But we do
> need it to determine whether functions are equal on some stage, just
> to proceed to the next "less significant" stage.
> Consider two functions (where BB1 and BB11 are equal)
>
> "F1: {BB1, BB2}" and
> "F2: {BB11, BB3}".
>
> First we have to realize that BB1 and BB11 are equal. And then
> determine which one of BB2 and BB3 is less than another.
> So finally we got 3-way comparison: less, equal and greater (in fact
> just neither less nor equal).
>
>
> > When we start detecting equal functions, do we merge them
> immediately? If so, can't that cause functions already in the set to
> have their equality comparisons change
> Yes we merge them immediately. And no, it won't cause situation you
> have mentioned. Once we changed function X, we lookup for functions
> that uses X, remove them from FnTree (former FnSet) and put them
> into Deferred list. Then we fill up worklist with Deferred contents
> and repeat comparison routine for these functions.
>
>
> Thanks for reviews!
>
> -Stepan
>
> Stepan Dyatkovskiy wrote:
>
> ping
> Stepan Dyatkovskiy wrote:
>
> Hi Nick,
> Thank you for reviews! Here is my answers for first two patches.
>
> Though as additional, I think its important answer to this
> question
> first:
> > Okay, I'm confused. We do all this work to produce a
> 3-way comparison,
> > then we wrap it up in an operator< for a std::set that
> will only
> look at
> > less-than...
> Actually, on top level we don't need 3-way comparison. But
> we do need it
> to determine whether functions are equal on some stage just
> to proceed
> to the next "less meaningful" stage.
> Consider two functions
> "F1: {BB1, BB2}" and
> "F2: {BB-equal-to-BB1, BB3}".
> First we have to realize that BB1 and BB-equal-to-BB1 are
> equal. And
> then determine which one of BB2 and BB3 is less.
>
> First patch.
>
> > + /// 1. Bitcastability.
> > + /// Check whether L's type could be losslessly
> bitcasted to R's
> type.
> > + /// On this stage method, in case when lossless
> bitcast is not
> possible
> > + /// method returns -1 or 1, thus also defining which
> type is
> greater in
> > + /// context of bitcastability.
> I have added detailed order description for this method. You
> also can
> find this description in cmpConstantsComment.txt.
>
> > Optional: "Return its pointers comparison result" ->
> "Return the
> result
> > of comparing the types"?
> Oops! Fixed.
>
> > + if (TyL->getTypeID() == Type::X86_MMXTyID)
> > + TyLWidth = 64;
> > + else if (const VectorType *VecTyL =
> dyn_cast<VectorType>(TyL))
> > + TyLWidth = VecTyL->getBitWidth();
> >
> > Yikes! We never merge function A using a vector and
> function B
> using an
> > x86_mmx type together, right?
> Fixed!
>
> > "Nor of TyL and TyR" -> "Neither Tyl nor TyR".
> >
> > Optional: "Return its pointers comparison result" ->
> "Return the
> result
> > of comparing the types"?
> Fixed.
>
> > + case Value::UndefValueVal: return cmpType(L->getType(),
> R->getType());
> >
> > You already have L->getType() and R->getType() in TyL
> and TyR.
> Fixed. Sorry.
>
> Second patch.
>
> > I'm curious whether we guarantee that
> > we get the same numbering when comparing two functions
> if I simply
> > reorder the BasicBlocks in one of the functions, will we
> still assign
> > the same numberings or not?
> ...
> > Why not just start at the beginning and run to
> > the end, relying on the SSA dominance property.
> Good questions!
>
> For first one I have added these notes:
> [quote]
> As follows from FunctionComparator::compare(), we do CFG
> walk, we start
> from entry, and then take each terminator. So it doesn't
> matter how in
> fact BBs are ordered in function. And since cmpValues are
> called during
> this walk, the numbering depends only on how BBs located
> inside the CFG.
> [/quote]
>
> As for second question, I have added next notes:
>
> [quote]
> We also can't use dominance properties. Consider situation:
> If we compare two instruction operands: first is usage of
> local variable
> AL from function FL, and second is usage of local variable
> AR from FR.
> We are still not able to compare operands of PHI, since
> those could be
> operands from further BBs we didn't scan yet.
> [/quote]
>
> You can find comments also in sn_mapLRComment.txt
> I have also described lexicographical ordering for cmpValues
> in method
> comments. I have also put them in separated txt file:
> cmpValuesComment.txt.
>
> > + if (L == R) return 0;
> > + // Compare C1 and the bitcast result.
> > + if (int Res = cmpConstants(ConstL, ConstR))
> > + return Res;
> > [...]
> > + return 0;
> >
> > Isn't that just:
> > return cmpConstants(ConstL, ConstR);
> OOOPS! Fixed.
>
> I'm working on rest of patches.
> Thanks!
>
> -Stepan.
>
> Nick Lewycky wrote:
>
> Stepan Dyatkovskiy wrote:
>
> Hello Nick!
>
> Can we proceed to work on patch series?
>
>
> Just in case, I have reattached them to this post.
> Would you prefer to see them in
> http://reviews.llvm.org ?
>
>
> Oddly, I prefer attachments, but we can use
> reviews.llvm.org <http://reviews.llvm.org> if you
> prefer?
>
> 0001-constants-comparison.__patch
>
> + /// 1. Bitcastability.
> + /// Check whether L's type could be losslessly
> bitcasted to R's type.
> + /// On this stage method, in case when lossless
> bitcast is not
> possible
> + /// method returns -1 or 1, thus also defining which
> type is
> greater in
> + /// context of bitcastability.
>
> Looking at the code, it looks like the order is:
> 1. first class types, by typeid
> 2. vectors (by element size? by number of elements?
> the two
> multiplied together?)
>
> + // Nor of TyL and TyR are values of first class
> type. Return
> + // its pointers comparison result.
>
> "Nor of TyL and TyR" -> "Neither Tyl nor TyR".
>
> Optional: "Return its pointers comparison result" ->
> "Return the result
> of comparing the types"?
>
> + if (TyL->getTypeID() == Type::X86_MMXTyID)
> + TyLWidth = 64;
> + else if (const VectorType *VecTyL =
> dyn_cast<VectorType>(TyL))
> + TyLWidth = VecTyL->getBitWidth();
>
> Yikes! We never merge function A using a vector and
> function B using an
> x86_mmx type together, right? Note what the LangRef says
> about x86_mmx:
> "The operations allowed on it are quite limited:
> parameters and return
> values, load and store, and bitcast."
>
> + case Value::UndefValueVal: return cmpType(L->getType(),
> R->getType());
>
> You already have L->getType() and R->getType() in TyL
> and TyR. You also
> already have cmpType(TyL, TyR) in TypesRes. Please just
> reuse them in
> the rest of this function.
>
> 0002-values-comparison-__replacement-for-enumerate.__patch
>
> + // Assign serial numbers to values from left
> function, and values
> from
> + // right function.
> + // Explanation:
> + // Being comparing functions we need to compare
> values we meet at
> left and
> + // right sides.
> + // Its easy to sort things out for external values.
> It just should be
> + // the same value at left and right.
> + // But for local values (those were introduced inside
> function body)
> + // we have to ensure they were introduced at exactly
> the same place,
> + // and plays the same role. How to do it?
> + // Just assign serial number to each value when we
> meet it first
> time.
> + // Values that were met at same place will be with
> same serial
> numbers.
>
> This explanation is correct, but I'm curious whether we
> guarantee that
> we get the same numbering when comparing two functions
> if I simply
> reorder the BasicBlocks in one of the functions, will we
> still assign
> the same numberings or not?
>
> Also, it's not clear from the comment why we need the
> complexity of
> numbering them anyways. Why not just start at the
> beginning and run to
> the end, relying on the SSA dominance property to
> guarantee that we
> visit definitions before uses, and assume we'll visit
> each value in the
> same order (look at each instruction and operand in
> order and just
> complain about differences)? The answer is that PHI
> nodes mean we will
> visit some uses before some definitions.
>
> Anyways, changing this text is optional in my mind. I
> think it could be
> better, but it's not worth spending much time on it.
>
> +int FunctionComparator::cmpValues(__const Value *L,
> const Value *R) {
>
> I wouldn't mind an explanation of the lexographic
> ordering on this one
> either. Constants before non-constants, then InlineAsm
> (ordered by
> address), then visit order.
>
> + if (L == R) return 0;
> + // Compare C1 and the bitcast result.
> + if (int Res = cmpConstants(ConstL, ConstR))
> + return Res;
> [...]
> + return 0;
>
> Isn't that just:
> return cmpConstants(ConstL, ConstR);
> ?
>
> 0003-attributes-comparison.__patch
>
> +int FunctionComparator::__cmpStrings(StringRef L,
> StringRef R) const {
> + // Prevent heavy comparison, compare sizes first.
> + if (int Res = cmpNumbers(L.size(), R.size()))
> + return Res;
> +
> + return L.compare(R);
> +}
>
> Optional: I'm curious whether that's actually worth it?
> I checked
> StringRef and it does the length comparison last. It
> could be that you
> know that your strings tend to be longer than StringRef
> which tries to
> be efficient in general.
>
> 0004-operations-comparison.__patch
>
> +int FunctionComparator::__cmpOperation(const
> Instruction *L,
>
> Optional: Again, a description of the ordering would be
> interesting.
> Opcode number, then number of operands, then type, then
> 'nsw/nuw/exact/volatile'.
>
> 0005-GEP-comparison.patch
>
> +int FunctionComparator::cmpGEP(__const GEPOperator *GEPL,
> + const GEPOperator *GEPR) {
>
> + unsigned int ASL = GEPL->getPointerAddressSpace()__;
>
> Extra line after {
>
> if (DL) {
> [...]
> + unsigned BitWidth = DL ?
> DL->getPointerSizeInBits(ASL) : 1;
>
> BitWidth doesn't need to test DL for null-ness.
>
> 0006-top-level-compare-method.__patch
>
> Perfect!
>
> 0007-sanity-check.patch
>
> + // Check symmetricity.
>
> "symmetricity" -> "symmetry".
>
> 0008-removed-unused-methods.__patch
>
> Perfect
>
> 0009-FnSet-replaced-with-__FnTree.patch
>
> Okay, I'm confused. We do all this work to produce a
> 3-way comparison,
> then we wrap it up in an operator< for a std::set that
> will only look at
> less-than, but never uses the equals result. We want to
> do one
> comparison between two functions, not two (ie., std::set
> does
> func1<func2 and func2<func1 to see whether they're equal
> or less than,
> and we don't need that since cmp(func1, func2) tells us
> directly). Maybe
> this comes in a future patch?
>
> One other question, I should probably know the answer to
> this but it's
> been a while. :) When we start detecting equal
> functions, do we merge
> them immediately? If so, can't that cause functions
> already in the set
> to have their equality comparisons change (ie., A calls
> X, B calls Y, A
> and B are in the set, then we merge X and Y for being
> equal, now we've
> changed A and B to both called the same merged function,
> so they start
> to compare the same). Sets tend not to like having their
> comparison
> functions change on elements that they contain.
>
> 0010-updated-comments.patch
>
> Okay.
>
> 0011-removed-DenseMap-helpers.__patch
>
> Yay deleting dead code!
>
> Nick
>
>
> Thanks!
> -Stepan.
>
> Stepan Dyatkovskiy wrote:
>
> ping
> Stepan Dyatkovskiy wrote:
>
> Hi Nick,
>
> Please find the fixed patches in attachment.
> Series starts from "constants comparison".
>
> Below small report of what has been fixed,
> and answers on your
> questions.
>
> cmpTypes:
> > Please add a comment for this method.
> Include the meaning of the
> returned value. ("man strcmp" for inspiration.)
> Fixed and committed. So you can look in
> trunk, may be I forgot to do
> something (hope not :-) )
>
> checkForLosslessBitcast, cmpConstants:
> > Why isn't this using cmpType?
> Fixed.
>
> > Please put the else on the same line as
> the closing brace.
> Fixed.
>
> >> + else if (const VectorType *thisPTy =
> dyn_cast<VectorType>(L))
> > Missing initial capital
> Sorry. Fixed. Actually these typos has been
> cloned from
> Type::canLosslesslyBitCastTo.
>
> >> + return cmpNumbers((uint64_t)L,
> (uint64_t)R);
> >>
> > Please assert on unknown constant.
> That's possible. But what if we really got
> unknown constant? New
> comming
> constant types merely depends on
> MergeFunctions implementation. So we
> get crash if it will happen. And we loose
> nothing comparing them
> just as
> pointers. So, do we really need an
> assertion? Currently I kept it
> as it
> was. If your answer is "yes, we need it", it
> would easy to add it:
>
> case Value::FunctionVal:
> case Value::GlobalVariableVal:
> - case Value::GlobalAliasVal:
> - default: // Unknown constant
> - return cmpNumbers((uint64_t)L, (uint64_t)R);
> + case Value::GlobalAliasVal:
> + return cmpNumbers((uint64_t)L, (uint64_t)R);
> + default:
> + llvm_unreachable("Unknown constant.");
>
> About variable names: Var1, Var2 vs VarL,VarR.
> I think its better to use L/R concept. Easer
> to understand what to
> return (-1, or 1) when you see L/R rather
> than Var1/Var2.
> Var1/Var2 has been kept for functions that
> are to be removed till the
> end of patch series.
> F1 and F2 names were also kept till the "top
> level compare method"
> patch.
>
> > Alternatively, any reason not to have
> cmpPointers(const void *L,
> const void *R)?
> I could do it. I just wanted to show that
> pointers are compared
> like a
> numbers in some situations. Actually we do
> it in cases when we have
> absolutely no idea of smarter comparison.
> And back to cmpPonters. Its rather about
> what intuition tells. If
> pointers are equal as numbers, could they be
> different somehow? Could
> they be non-castable to numbers on some
> platforms? The last one is
> big
> trouble for DenseMapInfo implementation..
> If there is any (even very small)
> possibility of such cases - then
> yes,
> I vote for cmpPointers. The last word is up
> to you anyway.
>
> Attributes (0005):
> > Attributes already have operator< and
> operator==. Please reuse
> them.
> Fixed. I used simple "if":
>
> if (LA < RA)
> return -1;
> if (RA < LA)
> return 1;
>
> Though its possible to use:
>
> if (int Res = (LA < RA) ? -1 : (RA < LA) ? 1
> : 0)
> return Res;
>
> Which one is more preferable?
>
> cmpGEP (0007):
> > + int cmpGEP(const GEPOperator *GEP1,
> const GEPOperator *GEP2);
> > + int cmpGEP(const GetElementPtrInst *GEP1,
> > + const GetElementPtrInst *GEP2) {
> >
> > Const members?
> We can't make it constant, since it compares
> values (cmpValues, aka
> enumerate). So, if we see Value first time,
> we have to add it into
> sn_mapL/R.
>
> > I think you should put this under if (DL)
> { /* declare Offset1,
> Offset2, etc. */ }
> Fixed.
>
> About 0008:
> > Did you mean "cmpOperation"?
> Yes. We could keep it in one place. I have
> fixed this case and
> removed
> TODO.
>
> > "compare" still returns "bool". I'm going
> to assume this was
> meant to
> go in 0009.
> Yes.
>
> About 0011:
> > std::set is frowned upon, see
> http://llvm.org/docs/__ProgrammersManual.html#set
> <http://llvm.org/docs/ProgrammersManual.html#set>
> . Do you actually
> rely
> on the stable iterator guarantee? Or would
> another set-like container
> be a
> better fit?
> Huh.. I have looked for alternatives.
> Unfortunately SmallSet is not
> suitable, since we need to lookup existing
> items, and SmallSet::find
> method is not present. May be I have missed
> something.
> Actually we need binary tree implementation.
> Do we have better
> analogue?
> I could implement new one. Think it should
> be just one more patch
> afterwards.
>
> >No. An identifier with underscore then
> capital letter is reserved
> for
> the implementation. You can just call them
> "F" and "DL", then the
> ctor
> initializer list can be "F(F), DL(DL)" and
> that will work fine.
> Fixed.
>
> 0013:
> > Sorry, I'm not sure what this sentence
> means? The thing your
> example
> is talking about is showing is that two
> functions in an SCC may be
> equivalent, and detecting them requires
> "blinding" (ignoring) certain
> details of the function when you do the
> comparison. We blind
> ourselves
> to the pointee types, but not to callees of
> functions in the same
> SCC.
>
> I meant generalising of cross-reference case:
> in old implementation, while comparing F and
> G functions, we treat as
> equal case when F uses G, with case when G
> uses F (at the same
> place).
> It could happen that F uses G1, while G1
> uses G2, while G2 uses F.
> So it
> still the same. But currently we have to
> invent way how to detect
> such
> cases.
>
> Thanks for reviews!
> -Stepan
>
> Stepan Dyatkovskiy wrote:
>
> Hi Nick,
> I have committed 0001 as r203788.
> I'm working on fixes for 0002 - 0014.
>
> > After reading through this patch
> series, I feel like I'm missing
> > something important. Where's the sort
> function? It looks like
> we're
> > still comparing all functions to all
> other functions.
> When you insert functions into std::set
> or its analo g s it does all
> the
> job for you. Since internally it builds
> a binary tree using "less"
> comparison, and each insert/look-up
> operation takes O(log(N)) time.
>
> -Stepan.
>
> Nick Lewycky wrote:
>
> On 27 February 2014 08:23, Stepan
> Dyatkovskiy <stpworld at narod.ru
> <mailto:stpworld at narod.ru>
> <mailto:stpworld at narod.ru
> <mailto:stpworld at narod.ru>>> wrote:
>
> Hi Nick,
>
> I tried to rework changes as you
> requested. One of patches (0004
> with extra assertions) has been removed.
>
>
> > + bool isEquivalentType(Type
> *Ty1, Type *Ty2) const {
> > + return cmpType(Ty1, Ty2) == 0;
> > + }
> >
> > Why do we still need
> isEquivalentType? Can we nuke this?
> Yup. After applying all the patches
> isEquivalentType will be
> totally
> replaced with cmpType. All isEqXXXX
> and friends will be removed in
> 0011 (old 0012). No traces left.
> Old function wasn't removed in 0001
> just for keeping patches
> without
> extra noise like:
>
> - something that uses isEquivalentType
> + something that uses cmpType
>
> The point is, that "something" that
> uses isEquivalentType, will be
> also replaced with one of next
> patches in this series.
>
>
> >
> > +static int cmpNumbers(uint64_t
> L, uint64_t R) {
> > + if (L < R) return -1;
> > + if (L > R) return 1;
> > + return 0;
> > +}
> >
> > At a high level, you don't
> actually need a <=> operator to use a
> sort. A
> > strict ordering ( < operator) is
> sufficient. (Note that for
> mergefunc, a
> > strict weak ordering is not
> sufficient, it must be a total
> ordering.)
> That could be done with int
> FunctionComparator::compare(). We can
> replace it with bool
> FunctionComparator::less(). Though
> for all
> other cmp methods need at least 2
> bits of information as result:
> 1. Whether things are equal.
> 2. Whether left less than right.
>
> As for
> FunctionComparator::compare(),
> conversion into less() will
> increase time of sanity check (patch
> #0010).
> Sanity check is just a sweet bonus.
> It checks that ordering
> implemented properly (checks order
> relation properties).
> Turning compare() into less() mean,
> that we'll have to run
> comparison two times: L.less(R) and
> R.less(L). But may be sanity
> check is not a thing to be published
> at all.
>
>
> >
> > Consider hoisting this inside the
> FunctionComparator class? That
> class
> > should have a bunch of
> implementations of comparisons between
> various
> > different things, which can pass
> down to other methods in the
> same class.
> In new patch series attached to this
> post, I have moved all static
> methods into FunctionComparator.
>
>
> > + // Replacement for
> type::canLosslesslyBitCastTo, that
> > + // establish order relation on
> this kind of properties.
> > + int
> checkForLosslessBitcast(const Type
> *L, const Type *R);
> >
> > Type:: not type:: . Please make
> this comment more descriptive.
> Done.
> [new comment]
> Replacement for
> Type::canLosslesslyBitCastTo, that
>
> establish order relation on this
> kind of properties
> Returns 0, if L and R types could be
> converted to each other
> without
> reinterpretation of bits.
> Otherwise method returns -1 or 1,
> defining total ordering between
> types in context of lossless
> bitcastability trait.
> E.g.: if L is less than R (result is
> -1), than every type that
> could be
> losslessly bitcasted to L is less
> than R.
> [/new comment]
>
>
> >
> > + /// Replacement for C1 ==
> getBitCast(C2, C1Ty)
> > + /// Its more controllable, and
> supposed to be simpler and
> more
> > predictionable.
> > + /// As very important
> advantage: it allows to introduce order
> relation on
> > + /// constants set, and thus use
> it as trait in refinement
> routines.
> >
> > "Its" --> "It's".
> "predictionable" --> "predictable".
> And how is
> it more
> > predictable? I think this comment
> would be better if it
> described the
> > function instead of making
> comparisons between it and other
> functions.
> > Something like, "Compare
> constants under a system where pointer
> to X and
> > pointer to Y are considered
> equal" or whatever is actually true
> here.
> Done.
> [new comment]
>
> Replacement for C1 == getBitCast(C2,
> C1Ty)
> Parses constants contents, assuming
> that types are losslessly
> bitcasted between each other. So
> actually it ignores types and only
> compares bits from L and R.
> Returns 0, if L and R has equivalent
> content.
> -1 or 1 if values are different.
> Maintaining total ordering
> requires
> two values that indicates
> non-equivalence (L less R, L greater R).
> [/new comment]
>
>
> >
> > +enum ConstantType {
> > I'm not sure that this buys you
> much. All the "isa" tests can be
> broken
> > down into a switch on
> getValueID() with the one exception of
> isNullValue().
> Done.
>
>
> > + assert(
> > +
> C1->getType()->____canLosslesslyBitCastTo(C2->____getType())
> &&
> > + "Pass is healthless.
> checkForLosslessBitcast should be
> twin of "
> > + "canLosslesslyBitCastTo method,
> except the case when the
> last one "
> > + "returns false, the first one
> should return -1 or 1");
> ...
>
> > I think we can skip the asserts
> here. They aren't detecting a
> specific
> > bug, they're checking whether the
> new code does a certain task
> relative
> > to the old code. Drop the old
> code, your new code is the new
> sheriff in
> > town.
> Asserts has been removed.
>
>
> >
> > + DenseMap<const Value*, int>
> sn_map1, sn_map2;
> >
> > What is sn short for ("seen")?
> Why are there two of these?
> Serial number :-)
>
> >
> > + std::pair<DenseMap<const Value
> *, int>::iterator, bool>
> > + LeftSN =
> sn_map1.insert(std::make_pair(____V1,
> sn_map1.size())),
> > + RightSN =
> sn_map2.insert(std::make_pair(____V2,
> sn_map2.size()));
> >
> > So I think I get it, this is easy
> to reason about. You number
> each value
> > going down both functions. But I
> think you can eliminate one of
> these
> > maps because you know that if the
> left and right were ever
> different
> > then we terminate the comparison
> immediately. You can at least
> share one
> > map with both V1 and V2, but I
> think you can reduce the number
> of map
> > insertions.
> Not sure. Consider that in left you
> have:
> %A = alloca i32
> %B = alloca i32
> store i32 1, i32* %A
>
> And in right:
> %A = alloca i32
> %B = alloca i32
> store i32 1, i32* %B
>
> When we meet 'store' instruction we
> need to check in which order %A
> was allocated at left and in which
> order %B was allocated at right.
> So for each value in function we
> have to assign serial number. Then
> comparing to local values from
> different functions we can just
> check
> their serial numbers.
> Though, may be I missed something?
> May be there is another way to
> determine "who was first".
>
>
> > High-level question, are
> attributes ordered? Your comparison is
> ordered.
> > So if I have function F with
> string attributes "sse2",
> "gc=hemisphere"
> > and another function G with
> string attributes "gc=hemisphere",
> "sse2"
> > then they would be considered
> different. I don't think that's
> right.
> I didn't check it. But if it is not
> ordered, we could order it
> lexicographically.
>
>
> >
> > + int cmpOperation(const
> Instruction *I1, const Instruction
> *I2)
> const;
> > +
> > bool isEquivalentOperation(const
> Instruction *I1,
> > - const Instruction *I2) const;
> > + const Instruction *I2) const {
> > + return cmpOperation(I1, I2) == 0;
> > + }
> >
> > Hopefully this patch series
> ultimately eliminates calls of
> > isEquivalentOperation?
> Of course. Just like isEquivalentType.
>
>
> > By the way, this is an
> interesting one. Should "add x, y" and
> "add nsw
> > x, y" be equal or not?
> Yeah. Those are interesting
> questions. We could even treat 'mul a,
> 2' and 'shl a,1' as equal in some
> cases. I think it's matter of
> time
> and further optimization.
>
> Please find first reworked patches
> in attachment.
>
>
> I am extremely concerned that you
> may end up with A < B and B < C
> but A
> > C. There are cases where you use
> cmpTypes to compare two types,
> then
> others where you compare them via
> cmpNumbers. The only way to
> ensure
> that the sort function is
> self-consistent is to be entirely
> consistent
> with how we traverse the function.
>
> 0001:
> + int cmpType(Type *Ty1, Type *Ty2)
> const;
> +
>
> Please add a comment for this
> method. Include the meaning of the
> returned value. ("man strcmp" for
> inspiration.)
>
> + static int cmpNumbers(uint64_t L,
> uint64_t R);
>
> Optional: is there any benefit to
> making this static instead of a
> const
> method? It doesn't need access to
> the 'this' pointer, but it seems
> like
> that's an incidental artifact of the
> implementation. The other cmp*
> members are const methods.
>
> + return
> cmpNumbers(FTy1->isVarArg(),
> FTy2->isVarArg());;
>
> Extra semi-colon.
>
> I trust you to apply the fixes
> above, you can choose to commit the
> patch
> without going through another round
> of review with me.
>
> 0002:
>
> +int
> FunctionComparator::__checkForLosslessBitcast(const
> Type *L,
> const
> Type *R) {
> ...
> + int TypesRes =
> cmpNumbers((uint64_t) L, (uint64_t) R);
> ...
> + return TypesRes;
>
> Why isn't this using cmpType?
> Alternatively, any reason not to have
> cmpPointers(const void *L, const
> void *R)?
>
> + }
> + else {
>
> Please put the else on the same line
> as the closing brace.
>
> + else if (const VectorType *thisPTy
> = dyn_cast<VectorType>(L))
>
> Missing initial capital, see
> http://llvm.org/docs/__CodingStandards.html#name-__types-functions-variables-and-__enumerators-properly
> <http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly>
>
>
>
>
>
>
>
> . Also, I find it odd that you
> switch from "L" and "R" to "this"
> and
> "that". Please make those
> consistent, probably on left and right.
>
> 0003:
>
> + int cmpConstants(const Constant
> *L, const Constant *R);
>
> Any reason this method isn't const?
>
> +int
> FunctionComparator::__cmpConstants(const
> Constant *L, const
> Constant
> *R) {
> +
> + // Pack null value as one more
> Value ID
> + unsigned LType = L->isNullValue()
> ? 0 : L->getValueID() + 1;
> + unsigned RType = R->isNullValue()
> ? 0 : R->getValueID() + 1;
> +
> + if (int Res = cmpNumbers(LType,
> RType))
> + return Res;
> +
> + if (LType == 0)
> + return 0;
> +
> + switch (LType-1) {
>
> Optional: how about:
>
> if (L->isNullValue() &&
> R->isNullValue())
> return cmpType(L->getType(),
> R->getType());
> if (L->isNullValue() &&
> !R->isNullValue())
> return 1;
> if (!L->isNullValue() &&
> R->isNullVaue())
> return -1;
>
> if (int Res =
> cmpNumbers(L->getValueID(),
> R->getValueID()))
> return Res;
>
> switch (L->getValueID()) {
>
> ? In particular I'm not a fan of how
> LType/RType are equal to
> valueID+1,
> even with the comment.
>
> + case Value::FunctionVal:
> + case Value::GlobalVariableVal:
> + case Value::GlobalAliasVal:
> + default: // Unknown constant
> + return cmpNumbers((uint64_t)L,
> (uint64_t)R);
>
> Please assert on unknown constant.
>
> How are function, global variable
> and alias reachable here?
>
> 0004:
>
> Thanks for the comment on sn_mapL/R!
>
> +int
> FunctionComparator::__cmpEnumerate(const
> Value *V1, const Value
> *V2) {
>
> "Compare enumerate"? This name
> doesn't make sense to me. "cmpValue"
> perhaps?
>
> + const Constant *C1 =
> dyn_cast<Constant>(V1);
> + const Constant *C2 =
> dyn_cast<Constant>(V2);
> + if (C1 && C2) {
> + if (V1 == V2) return 0;
> // TODO: constant expressions with
> GEP or references to F1 or F2.
> - if (C1->isNullValue() &&
> C2->isNullValue() &&
> - isEquivalentType(C1->getType()__,
> C2->getType()))
> - return true;
> - // Try bitcasting C2 to C1's type.
> If the bitcast is legal and
> returns C1
> - // then they must have equal bit
> patterns.
> - return
> checkForLosslessBitcast(C1->__getType(),
> C2->getType()) ==
> 0 &&
> - cmpConstants(C1, C2) == 0;
> + if (C1->isNullValue() &&
> C2->isNullValue()) {
> + if (int Res =
> cmpType(C1->getType(), C2->getType()))
> + return Res;
> + }
> +
> + // Check whether C2 and C1 has
> equal bit patterns.
> + if (int Res =
> checkForLosslessBitcast(C1->__getType(),
> C2->getType()))
> + return Res;
> +
> + // Compare C1 and the bitcast result.
> + if (int Res = cmpConstants(C1, C2))
> + return Res;
> +
> + return 0;
> }
> + if (C1)
> + return -1;
> + if (C2)
> + return 1;
>
> I'm confused why this isn't just
> using cmpConstants? I think the
> isNullValue to cmpType check is
> already in cmpConstants, why not
> move
> the checkForLosslessBitcast there?
> Is cmpConstants used from
> elsewhere
> and needs to not have that check?
>
> + std::pair<DenseMap<const Value *,
> int>::iterator, bool>
> + LeftSN =
> sn_mapL.insert(std::make_pair(__V1,
> sn_mapL.size())),
> + RightSN =
> sn_mapR.insert(std::make_pair(__V2,
> sn_mapR.size()));
>
> "auto"?
>
> 0005:
>
> Okay, this makes attributes ordered.
>
> + enum AttrType {
> + Enum,
> + Align,
> + Other
> + } LeftAttrType = Other,
> RightAttrType = Other;
> +
> + if (LA.isAlignAttribute())
> LeftAttrType = Align;
> + else if (LA.isEnumAttribute())
> LeftAttrType = Enum;
> + if (RA.isAlignAttribute())
> RightAttrType = Align;
> + else if (RA.isEnumAttribute())
> RightAttrType = Enum;
> +
> + if (int Res =
> cmpNumbers(LeftAttrType, RightAttrType))
> + return Res;
> +
> + switch (LeftAttrType) {
> + case Enum:
> + if (int Res =
> cmpNumbers(LA.getKindAsEnum(),
> RA.getKindAsEnum()))
> + return Res;
> + break;
> + case Align:
> + if (int Res =
> cmpNumbers(LA.getValueAsInt(),
> RA.getValueAsInt()))
> + return Res;
> + break;
> + case Other:
> + if (int Res =
> cmpStrings(LA.getKindAsString(__),
> RA.getKindAsString()))
> + return Res;
> + if (int Res =
> cmpStrings(LA.__getValueAsString(),
> RA.getValueAsString()))
> + return Res;
> + break;
> + }
>
> Attributes already have operator<
> and operator==. Please reuse
> them.
>
> 0006:
>
> This looks fine.
>
> 0007:
>
> + int cmpGEP(const GEPOperator
> *GEP1, const GEPOperator *GEP2);
> + int cmpGEP(const GetElementPtrInst
> *GEP1,
> + const GetElementPtrInst *GEP2) {
>
> Const members?
>
> + unsigned BitWidth = DL ?
> DL->getPointerSizeInBits(AS1) : 1;
> + APInt Offset1(BitWidth, 0),
> Offset2(BitWidth, 0);
> + if (DL &&
> +
> (GEP1->__accumulateConstantOffset(*DL,
> Offset1) &&
> +
> GEP2->__accumulateConstantOffset(*DL, Offset2)))
> + return cmpAPInt(Offset1, Offset2);
>
> I think you should put this under if
> (DL) { /* declare Offset1,
> Offset2,
> etc. */ }
>
> 0008:
>
> + // TODO: Already checked in cmpOps
>
> Did you mean "cmpOperation"?
>
> - if (!enumerate(F1BB, F2BB) ||
> !compare(F1BB, F2BB))
> + if (!enumerate(F1BB, F2BB) ||
> compare(F1BB, F2BB) != 0)
>
> "compare" still returns "bool". I'm
> going to assume this was meant
> to go
> in 0009.
>
> 0009:
>
> Looks fine.
>
> 0010: not reviewing this now.
>
> 0011:
>
> Yay!
>
> 0012:
>
> + typedef std::set<FunctionPtr>
> FnTreeType;
>
> std::set is frowned upon, see
> http://llvm.org/docs/__ProgrammersManual.html#set
> <http://llvm.org/docs/ProgrammersManual.html#set>
> . Do you actually
> rely
> on the stable iterator guarantee? Or
> would another set-like
> container be
> a better fit?
>
> + FunctionPtr(Function *_F, const
> DataLayout *_DL) : F(_F),
> DL(_DL) {}
>
> No. An identifier with underscore
> then capital letter is reserved
> for
> the implementation. You can just
> call them "F" and "DL", then the
> ctor
> initializer list can be "F(F),
> DL(DL)" and that will work fine.
>
> 0013:
>
> +// a ² a (reflexivity)
>
> I'm seeing a "squared" between the
> two 'a's. Could you represent
> this in
> plain ASCII?
>
> +// Comparison iterates through each
> instruction in each basic
> block.
>
> Two spaces before "iterates" should
> be one.
>
> +// While ability to deal with
> complex references could be really
> perspective.
>
> Sorry, I'm not sure what this
> sentence means? The thing your
> example is
> talking about is showing is that two
> functions in an SCC may be
> equivalent, and detecting them
> requires "blinding" (ignoring)
> certain
> details of the function when you do
> the comparison. We blind
> ourselves
> to the pointee types, but not to
> callees of functions in the same
> SCC.
>
> 0014:
>
> Yay again!
>
>
> After reading through this patch
> series, I feel like I'm missing
> something important. Where's the
> sort function? It looks like we're
> still comparing all functions to all
> other functions.
>
> Nick
>
>
> _________________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu
> <mailto:LLVMdev at cs.uiuc.edu>
> http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>
>
>
> _________________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
> http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/__mailman/listinfo/llvmdev
> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>
>
>
>
> _________________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> <mailto:llvm-commits at cs.uiuc.edu>
> http://lists.cs.uiuc.edu/__mailman/listinfo/llvm-commits
> <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
>
>
>
>
> _________________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
> http://lists.cs.uiuc.edu/__mailman/listinfo/llvm-commits
> <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
More information about the llvm-commits
mailing list