[PATCH] D11240: Add basic #include sorting functionality to clang-format

Tue Sep 22 09:07:42 PDT 2015

klimek added a comment.

Sending another batch of comments.

================
Comment at: lib/Tooling/Core/Replacement.cpp:305-307
@@ +304,5 @@
+  Replacements Result;
+  // Iterate over both sets and always add the next element (smallest total
+  // Offset) from either 'First' or 'Second'. Merge that element with
+  // subsequent replacements as long as they overlap.
+  for (auto FirstI = First.begin(), SecondI = Second.begin();
----------------
Merge the two sets which are each sorted by offset:
Look at the the next replacement from both sets and check whether they overlap:
1. as long as they do not overlap, we add the replacement with the lower offset to the result set
2. when they do overlap, merge into the replacement with the lower offset:
    repeatedly take the next replacement from the other queue and merge it, until the end of the
    next replacement is past the end of the currently merged into replacement;
    in that case, the next replacement becomes the currently merged into replacement;

Invariants:
- we always merge replacements from the first set into a replacement from the second set or vice versa
- an invariant while merging is that the offset of the next replacement to merge is always >= the offset
  of the currently merged-into replacement
- more?

Position projections:
When comparing positions, we calculate the projection of positions in the second (later) set into where they would happen in the original string.
In the above algorithm, we can always project the start position of a replacement from the second set correctly: only a replacement with a smaller offset can influence the start position of a replacement from the second set; due to the invariant, we never try to merge a replacement from the second set before we have merged all replacements with a smaller offset.
We cannot generally project the end position of a replacement from the second set correctly, but:
- we can correctly project it correctly if it is smaller or equal to the end position of the replacement
   from the first set we currently compare to
- that means if we cannot correctly project it, we still know it is larger than the end position of the
   replacement from the first set we currently compare against, so the replacement from the second set
   will stay the element to merge into, and we will continuously adapt the end position's projection
   when we merge further replacements from the first set.

================
Comment at: lib/Tooling/Core/Replacement.cpp:308-309
@@ +307,4 @@
+  // subsequent replacements as long as they overlap.
+  for (auto FirstI = First.begin(), SecondI = Second.begin();
+       FirstI != First.end() || SecondI != Second.end();) {
+    // 'MergeSecond' is true if an element from 'Second' needs to be merged
----------------
Comment:
// We get here on the first run or when we ended a merge sequence (that is, we found that
// we are at the start of a new replacement in the result set).

(not sure whether that comment carries it's weight; I started it because it took me a moment to figure out why we don't compute which one to take next during merging, and it's a bit more complex, but not sure where to put the comment yet; basically, the problem is that if the next replacement at the end doesn't overlap, it might also be after a non-overlapping replacement from the other set, which I think is not entirely obvious).

================
Comment at: lib/Tooling/Core/Replacement.cpp:310-314
@@ +309,7 @@
+       FirstI != First.end() || SecondI != Second.end();) {
+    // 'MergeSecond' is true if an element from 'Second' needs to be merged
+    // next, and false if an element from 'First' should be merged. As the input
+    // replacements are non-overlapping (within each set), we always need to
+    // merge an element from 'Second' into an element from 'First' or vice
+    // versa.
+    bool MergeSecond = SecondI == Second.end() ||
----------------
Ok, I think I now understand what confuses me here:
When we do *not* merge (the replacements don't overlap), we take the other element next into the result set. A less specific name like 'FirstNext', or 'NextFromFirst', or even 'MergedIsFirst' (which would fit the 'Merged' prefix of the attributes later) would probably fit my mental model better.
Note that for me there's also ambiguity (at least on the first read) between 'merge into the result set' and 'merge (subsume) one replacement into the current one'.

For me calling this:
NextIsFirst, NextOffset, NextLength, NextReplacement
would help.

================
Comment at: lib/Tooling/Core/Replacement.cpp:326
@@ +325,3 @@
+    unsigned MergedLength = R.getLength();
+    std::string MergedText = R.getReplacementText();
+
----------------
This I'm less sure about, but I think 'Text' is somewhat ambiguous here - perhaps 'MergedReplaceWith'?

================
Comment at: lib/Tooling/Core/Replacement.cpp:354
@@ +353,3 @@
+        StringRef Head = MergedRef.substr(0, Offset - MergedOffset);
+        StringRef Tail = MergedRef.substr(End - MergedOffset);
+        MergedText = (Twine(Head) + SecondI->getReplacementText() + Tail).str();
----------------
There are various overloads of + that yield Twine?

http://reviews.llvm.org/D11240