[clangd-dev] [monorepo] Much improved downstream zipping tool available

David Greene via clangd-dev clangd-dev at lists.llvm.org
Tue Jan 29 10:33:03 PST 2019


Björn Pettersson A <bjorn.a.pettersson at ericsson.com> writes:

> In the new monorepo UC1 may or may not be a parent to UL1.
> We could actually have something like this:
>
>   UL4->UC2->UL3->UL2->UL1->UL0->UC1
>
> Our DL1 commit should preferably have UL1 as parent after
> conversion
>
>   UL4->UC2->UL3->UL2->UL1->UL0->UC1
>                        |
>                  ...->DL1
>
> but since it also includes DC1 (via submodule reference) we
> want to zip in DC1 before DL1, right? 
>
>   UL4->UC2->UL3->UL2->UL1->UL0->UC1
>                        |
>             ...->DC1->DL1
>
> The problem is that DC1 is based on UC1, so we would get something
> like this
>
>   UL4->UC2->UL3->UL2->UL1->UL0->UC1
>                        |         |
>             ...->DC1->DL1        |
>                   ^              |
>                   |              |
>                    --------------
>
> Which is not correct, since then we also get the UL0 commit
> as predecessor to DL1.

To be clear, is DC1 a commit that updates the clang submodule to UC1 and
DL1 a separate local commit to llvm that merges in UL1?

When zip-downstream-fork.py runs, it *always* uses the exact trees in
use by each downstream commit, whether from submodules or the umbrella
itself.  It tries very hard to maintain the state of the trees as they
appeared in the umbrella repository.

Since in your case llvm isn't a submodule (it's the "umbrella"), DL1
will absolutely have the tree from UL1, not UL0.  This is how
migrate-downstream-fork.py works and zip-downstream-fork.py won't touch
the llvm tree since it's not a submodule.  The commit DL1 doesn't update
any submodules so it will just use the clang tree from DC1.

I haven't tested this case explicitly but I would expect the resulting
history graph to look as you diagrammed above (reformatted to make it
clear there isn't a cycle):

   UL4->UC2->UL3->UL2->UL1->UL0->UC1 <- monorepo/master
                        |         |
                        \         |
                         `-----------.        
                                  |   \
                           ... ->DC1->DL1 <- zip/master

The "redundant" edge here is indicating that the state of the llvm tree
at DL1 is based on UL1, not UL0.  All other projects will be in the
state at UC1 (assuming you don't have other submodules under llvm).  I
know it looks strange but this is the best I could come up with because
in general there is no guarantee that submodule updates were in any way
correlated with when upstream commits were made (as you discovered!).
There's some discussion of this issue on the documentation I posted [1],
as well as in header comments in zip-downstream-fork.py.

The difficulty with this is that going forward, if you merge from
monorepo/master git will think you already have the changes from UL0.
There are at least two ways to work around this issue.  The first is to
just manually apply the llvm diff from UL1 to UL0 on top of zip/master
and then merge from monorepo/master after that.  The other way is to
freeze your local split repositories and merge from the upstream split
masters for all subprojects before running migrate-downstream-fork.py
and zip-downstream-fork.py.  Then everything will have the most
up-to-date trees and you should be fine going forward.  Doing such a
merge isn't possible for everyone at the time they want to migrate, but
the manual diff/patch method should suffice for those situations.  You
just have to somehow remember to do it before the next merge from
upstream.  Creating an auxilliary branch with the patch applied is one
way to remember.

I haven't really thought of a better way to handle situations like this
so I'm open to ideas!

                           -David

[1] https://reviews.llvm.org/D56550


More information about the clangd-dev mailing list