1 ==============================
2 Moving LLVM Projects to GitHub
3 ==============================
8 We are planning to complete the transition to GitHub by Oct 21, 2019. See
9 the GitHub migration `status page <https://llvm.org/GitHubMigrationStatus.html>`_
10 for the latest updates and instructions for how to migrate your workflows.
12 .. contents:: Table of Contents
19 This is a proposal to move our current revision control system from our own
20 hosted Subversion to GitHub. Below are the financial and technical arguments as
21 to why we are proposing such a move and how people (and validation
22 infrastructure) will continue to work with a Git-based LLVM.
24 What This Proposal is *Not* About
25 =================================
27 Changing the development policy.
29 This proposal relates only to moving the hosting of our source-code repository
30 from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing
31 using GitHub's issue tracker, pull-requests, or code-review.
33 Contributors will continue to earn commit access on demand under the Developer
34 Policy, except that that a GitHub account will be required instead of SVN
35 username/password-hash.
37 Why Git, and Why GitHub?
38 ========================
43 This discussion began because we currently host our own Subversion server
44 and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and
45 provides limited support, but there is only so much it can do.
47 Volunteers are not sysadmins themselves, but compiler engineers that happen
48 to know a thing or two about hosting servers. We also don't have 24/7 support,
49 and we sometimes wake up to see that continuous integration is broken because
50 the SVN server is either down or unresponsive.
52 We should take advantage of one of the services out there (GitHub, GitLab,
53 and BitBucket, among others) that offer better service (24/7 stability, disk
54 space, Git server, code browsing, forking facilities, etc) for free.
59 Many new coders nowadays start with Git, and a lot of people have never used
60 SVN, CVS, or anything else. Websites like GitHub have changed the landscape
61 of open source contributions, reducing the cost of first contribution and
62 fostering collaboration.
64 Git is also the version control many LLVM developers use. Despite the
65 sources being stored in a SVN server, these developers are already using Git
66 through the Git-SVN integration.
70 * Commit, squash, merge, and fork locally without touching the remote server.
71 * Maintain local branches, enabling multiple threads of development.
72 * Collaborate on these branches (e.g. through your own fork of llvm on GitHub).
73 * Inspect the repository history (blame, log, bisect) without Internet access.
74 * Maintain remote forks and branches on Git hosting services and
75 integrate back to the main repository.
77 In addition, because Git seems to be replacing many OSS projects' version
78 control systems, there are many tools that are built over Git.
79 Future tooling may support Git first (if not only).
84 GitHub, like GitLab and BitBucket, provides free code hosting for open source
85 projects. Any of these could replace the code-hosting infrastructure that we
88 These services also have a dedicated team to monitor, migrate, improve and
89 distribute the contents of the repositories depending on region and load.
91 GitHub has one important advantage over GitLab and
92 BitBucket: it offers read-write **SVN** access to the repository
93 (https://github.com/blog/626-announcing-svn-support).
94 This would enable people to continue working post-migration as though our code
95 were still canonically in an SVN repository.
97 In addition, there are already multiple LLVM mirrors on GitHub, indicating that
98 part of our community has already settled there.
100 On Managing Revision Numbers with Git
101 -------------------------------------
103 The current SVN repository hosts all the LLVM sub-projects alongside each other.
104 A single revision number (e.g. r123456) thus identifies a consistent version of
105 all LLVM sub-projects.
107 Git does not use sequential integer revision number but instead uses a hash to
108 identify each commit.
110 The loss of a sequential integer revision number has been a sticking point in
111 past discussions about Git:
113 - "The 'branch' I most care about is mainline, and losing the ability to say
114 'fixed in r1234' (with some sort of monotonically increasing number) would
115 be a tragic loss." [LattnerRevNum]_
116 - "I like those results sorted by time and the chronology should be obvious, but
117 timestamps are incredibly cumbersome and make it difficult to verify that a
118 given checkout matches a given set of results." [TrickRevNum]_
119 - "There is still the major regression with unreadable version numbers.
120 Given the amount of Bugzilla traffic with 'Fixed in...', that's a
121 non-trivial issue." [JSonnRevNum]_
122 - "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_.
124 However, Git can emulate this increasing revision number:
125 ``git rev-list --count <commit-hash>``. This identifier is unique only
126 within a single branch, but this means the tuple `(num, branch-name)` uniquely
129 We can thus use this revision number to ensure that e.g. `clang -v` reports a
130 user-friendly revision number (e.g. `main-12345` or `4.0-5321`), addressing
131 the objections raised above with respect to this aspect of Git.
133 What About Branches and Merges?
134 -------------------------------
136 In contrast to SVN, Git makes branching easy. Git's commit history is
137 represented as a DAG, a departure from SVN's linear history. However, we propose
138 to mandate making merge commits illegal in our canonical Git repository.
140 Unfortunately, GitHub does not support server side hooks to enforce such a
141 policy. We must rely on the community to avoid pushing merge commits.
143 GitHub offers a feature called `Status Checks`: a branch protected by
144 `status checks` requires commits to be explicitly allowed before the push can happen.
145 We could supply a pre-push hook on the client side that would run and check the
146 history, before allowing the commit being pushed [statuschecks]_.
147 However this solution would be somewhat fragile (how do you update a script
148 installed on every developer machine?) and prevents SVN access to the
151 What About Commit Emails?
152 -------------------------
154 We will need a new bot to send emails for each commit. This proposal leaves the
155 email format unchanged besides the commit URL.
157 Straw Man Migration Plan
158 ========================
160 Step #1 : Before The Move
161 -------------------------
163 1. Update docs to mention the move, so people are aware of what is going on.
164 2. Set up a read-only version of the GitHub project, mirroring our current SVN
166 3. Add the required bots to implement the commit emails, as well as the
167 umbrella repository update (if the multirepo is selected) or the read-only
168 Git views for the sub-projects (if the monorepo is selected).
173 4. Update the buildbots to pick up updates and commits from the GitHub
174 repository. Not all bots have to migrate at this point, but it'll help
175 provide infrastructure testing.
176 5. Update Phabricator to pick up commits from the GitHub repository.
177 6. LNT and llvmlab have to be updated: they rely on unique monotonically
178 increasing integer across branch [MatthewsRevNum]_.
179 7. Instruct downstream integrators to pick up commits from the GitHub
181 8. Review and prepare an update for the LLVM documentation.
183 Until this point nothing has changed for developers, it will just
184 boil down to a lot of work for buildbot and other infrastructure
187 The migration will pause here until all dependencies have cleared, and all
188 problems have been solved.
190 Step #3: Write Access Move
191 --------------------------
193 9. Collect developers' GitHub account information, and add them to the project.
194 10. Switch the SVN repository to read-only and allow pushes to the GitHub repository.
195 11. Update the documentation.
196 12. Mirror Git to SVN.
201 13. Archive the SVN repository.
202 14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to
203 point to GitHub instead.
205 GitHub Repository Description
206 =============================
211 The LLVM git repository hosted at https://github.com/llvm/llvm-project contains all
212 sub-projects in a single source tree. It is often referred to as a monorepo and
213 mimics an export of the current SVN repository, with each sub-project having its
214 own top-level directory. Not all sub-projects are used for building toolchains.
215 For example, www/ and test-suite/ are not part of the monorepo.
217 Putting all sub-projects in a single checkout makes cross-project refactoring
220 * New sub-projects can be trivially split out for better reuse and/or layering
221 (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a
223 * Changing an API in LLVM and upgrading the sub-projects will always be done in
224 a single commit, designing away a common source of temporary build breakage.
225 * Moving code across sub-project (during refactoring for instance) in a single
226 commit enables accurate `git blame` when tracking code change history.
227 * Tooling based on `git grep` works natively across sub-projects, allowing to
228 easier find refactoring opportunities across projects (for example reusing a
229 datastructure initially in LLDB by moving it into libSupport).
230 * Having all the sources present encourages maintaining the other sub-projects
233 Finally, the monorepo maintains the property of the existing SVN repository that
234 the sub-projects move synchronously, and a single revision number (or commit
235 hash) identifies the state of the development across all projects.
237 .. _build_single_project:
239 Building a single sub-project
240 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
242 Even though there is a single source tree, you are not required to build
243 all sub-projects together. It is trivial to configure builds for a single
248 mkdir build && cd build
249 # Configure only LLVM (default)
250 cmake path/to/monorepo
251 # Configure LLVM and lld
252 cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld
253 # Configure LLVM and clang
254 cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang
258 Outstanding Questions
259 ---------------------
261 Read-only sub-project mirrors
262 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
264 With the Monorepo, it is undecided whether the existing single-subproject
265 mirrors (e.g. https://git.llvm.org/git/compiler-rt.git) will continue to
268 Read/write SVN bridge
269 ^^^^^^^^^^^^^^^^^^^^^
271 GitHub supports a read/write SVN bridge for its repositories. However,
272 there have been issues with this bridge working correctly in the past,
273 so it's not clear if this is something that will be supported going forward.
278 * Using the monolithic repository may add overhead for those contributing to a
279 standalone sub-project, particularly on runtimes like libcxx and compiler-rt
280 that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs.
281 1GB for the monorepo), and the commit rate of LLVM may cause more frequent
282 `git push` collisions when upstreaming. Affected contributors may be able to
283 use the SVN bridge or the single-subproject Git mirrors. However, it's
284 undecided if these projects will continue to be maintained.
285 * Using the monolithic repository may add overhead for those *integrating* a
286 standalone sub-project, even if they aren't contributing to it, due to the
287 same disk space concern as the point above. The availability of the
288 sub-project Git mirrors would addresses this.
289 * Preservation of the existing read/write SVN-based workflows relies on the
290 GitHub SVN bridge, which is an extra dependency. Maintaining this locks us
291 into GitHub and could restrict future workflow changes.
296 * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
297 * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`.
298 * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
299 * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`.
300 * :ref:`Bisecting <workflow-mono-bisecting>`.
302 Workflow Before/After
303 =====================
305 This section goes through a few examples of workflows, intended to illustrate
306 how end-users or developers would interact with the repository for
309 .. _workflow-checkout-commit:
311 Checkout/Clone a Single Project, with Commit Access
312 ---------------------------------------------------
319 # direct SVN checkout
320 svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
321 # or using the read-only Git view, with git-svn
322 git clone https://llvm.org/git/llvm.git
324 git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
325 git config svn-remote.svn.fetch :refs/remotes/origin/main
326 git svn rebase -l # -l avoids fetching ahead of the git mirror.
328 Commits are performed using `svn commit` or with the sequence `git commit` and
331 .. _workflow-multicheckout-nocommit:
336 With the monorepo variant, there are a few options, depending on your
337 constraints. First, you could just clone the full repository:
339 git clone https://github.com/llvm/llvm-project.git
341 At this point you have every sub-project (llvm, clang, lld, lldb, ...), which
342 :ref:`doesn't imply you have to build all of them <build_single_project>`. You
343 can still build only compiler-rt for instance. In this way it's not different
344 from someone who would check out all the projects with SVN today.
346 If you want to avoid checking out all the sources, you can hide the other
347 directories using a Git sparse checkout::
349 git config core.sparseCheckout true
350 echo /compiler-rt > .git/info/sparse-checkout
351 git read-tree -mu HEAD
353 The data for all sub-projects is still in your `.git` directory, but in your
354 checkout, you only see `compiler-rt`.
355 Before you push, you'll need to fetch and rebase (`git pull --rebase`) as
358 Note that when you fetch you'll likely pull in changes to sub-projects you don't
359 care about. If you are using sparse checkout, the files from other projects
360 won't appear on your disk. The only effect is that your commit hash changes.
362 You can check whether the changes in the last fetch are relevant to your commit
365 git log origin/main@{1}..origin/main -- libcxx
367 This command can be hidden in a script so that `git llvmpush` would perform all
368 these steps, fail only if such a dependent change exists, and show immediately
369 the change that prevented the push. An immediate repeat of the command would
370 (almost) certainly result in a successful push.
371 Note that today with SVN or git-svn, this step is not possible since the
372 "rebase" implicitly happens while committing (unless a conflict occurs).
374 Checkout/Clone Multiple Projects, with Commit Access
375 ----------------------------------------------------
377 Let's look how to assemble llvm+clang+libcxx at a given revision.
384 svn co https://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION
386 svn co https://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION
388 svn co https://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION
392 git clone https://llvm.org/git/llvm.git
394 git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
395 git config svn-remote.svn.fetch :refs/remotes/origin/main
397 git checkout `git svn find-rev -B r258109`
399 git clone https://llvm.org/git/clang.git
401 git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username>
402 git config svn-remote.svn.fetch :refs/remotes/origin/main
404 git checkout `git svn find-rev -B r258109`
406 git clone https://llvm.org/git/libcxx.git
408 git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username>
409 git config svn-remote.svn.fetch :refs/remotes/origin/main
411 git checkout `git svn find-rev -B r258109`
413 Note that the list would be longer with more sub-projects.
415 .. _workflow-monocheckout-multicommit:
420 The repository contains natively the source for every sub-projects at the right
421 revision, which makes this straightforward::
423 git clone https://github.com/llvm/llvm-project.git
425 git checkout $REVISION
427 As before, at this point clang, llvm, and libcxx are stored in directories
428 alongside each other.
430 .. _workflow-cross-repo-commit:
432 Commit an API Change in LLVM and Update the Sub-projects
433 --------------------------------------------------------
435 Today this is possible, even though not common (at least not documented) for
436 subversion users and for git-svn users. For example, few Git users try to update
437 LLD or Clang in the same commit as they change an LLVM API.
439 The multirepo variant does not address this: one would have to commit and push
440 separately in every individual repository. It would be possible to establish a
441 protocol whereby users add a special token to their commit messages that causes
442 the umbrella repo's updater bot to group all of them into a single revision.
444 The monorepo variant handles this natively.
446 Branching/Stashing/Updating for Local Development or Experiments
447 ----------------------------------------------------------------
452 SVN does not allow this use case, but developers that are currently using
453 git-svn can do it. Let's look in practice what it means when dealing with
454 multiple sub-projects.
456 To update the repository to tip of trunk::
461 cd ../../projects/libcxx
464 To create a new branch::
466 git checkout -b MyBranch
468 git checkout -b MyBranch
469 cd ../../projects/libcxx
470 git checkout -b MyBranch
474 git checkout AnotherBranch
476 git checkout AnotherBranch
477 cd ../../projects/libcxx
478 git checkout AnotherBranch
480 .. _workflow-mono-branching:
485 Regular Git commands are sufficient, because everything is in a single
488 To update the repository to tip of trunk::
492 To create a new branch::
494 git checkout -b MyBranch
498 git checkout AnotherBranch
503 Assuming a developer is looking for a bug in clang (or lld, or lldb, ...).
508 SVN does not have builtin bisection support, but the single revision across
509 sub-projects makes it possible to script around.
511 Using the existing Git read-only view of the repositories, it is possible to use
512 the native Git bisection script over the llvm repository, and use some scripting
513 to synchronize the clang repository to match the llvm revision.
515 .. _workflow-mono-bisecting:
520 Bisecting on the monorepo is straightforward, and very similar to the above,
521 except that the bisection script does not need to include the
522 `git submodule update` step.
524 The same example, finding which commit introduces a regression where clang-3.9
525 crashes but not clang-3.8 passes, will look like::
527 git bisect start releases/3.9.x releases/3.8.x
528 git bisect run ./bisect_script.sh
530 With the `bisect_script.sh` script being::
535 ninja clang || exit 125 # an exit code of 125 asks "git bisect"
536 # to "skip" the current commit
538 ./bin/clang some_crash_test.cpp
540 Also, since the monorepo handles commits update across multiple projects, you're
541 less like to encounter a build failure where a commit change an API in LLVM and
542 another later one "fixes" the build in clang.
544 Moving Local Branches to the Monorepo
545 =====================================
547 Suppose you have been developing against the existing LLVM git
548 mirrors. You have one or more git branches that you want to migrate
549 to the "final monorepo".
551 The simplest way to migrate such branches is with the
552 ``migrate-downstream-fork.py`` tool at
553 https://github.com/jyknight/llvm-git-migration.
558 Basic instructions for ``migrate-downstream-fork.py`` are in the
559 Python script and are expanded on below to a more general recipe::
561 # Make a repository which will become your final local mirror of the
564 git -C my-monorepo init
566 # Add a remote to the monorepo.
567 git -C my-monorepo remote add upstream/monorepo https://github.com/llvm/llvm-project.git
569 # Add remotes for each git mirror you use, from upstream as well as
570 # your local mirror. All projects are listed here but you need only
571 # import those for which you have local branches.
584 for p in ${my_projects[@]}; do
585 git -C my-monorepo remote add upstream/split/${p} https://github.com/llvm-mirror/${p}.git
586 git -C my-monorepo remote add local/split/${p} https://my.local.mirror.org/${p}.git
589 # Pull in all the commits.
590 git -C my-monorepo fetch --all
592 # Run migrate-downstream-fork to rewrite local branches on top of
593 # the upstream monorepo.
596 migrate-downstream-fork.py \
599 --new-repo-prefix=refs/remotes/upstream/monorepo \
600 --old-repo-prefix=refs/remotes/upstream/split \
601 --source-kind=split \
602 --revmap-out=monorepo-map.txt
605 # Octopus-merge the resulting local split histories to unify them.
607 # Assumes local work on local split mirrors is on main (and
608 # upstream is presumably represented by some other branch like
610 my_local_branch="main"
612 git -C my-monorepo branch --no-track local/octopus/main \
613 $(git -C my-monorepo merge-base refs/remotes/upstream/monorepo/main \
614 refs/remotes/local/split/llvm/${my_local_branch})
615 git -C my-monorepo checkout local/octopus/${my_local_branch}
617 subproject_branches=()
618 for p in ${my_projects[@]}; do
619 subproject_branch=${p}/local/monorepo/${my_local_branch}
620 git -C my-monorepo branch ${subproject_branch} \
621 refs/remotes/local/split/${p}/${my_local_branch}
622 if [[ "${p}" != "llvm" ]]; then
623 subproject_branches+=( ${subproject_branch} )
627 git -C my-monorepo merge ${subproject_branches[@]}
629 for p in ${my_projects[@]}; do
630 subproject_branch=${p}/local/monorepo/${my_local_branch}
631 git -C my-monorepo branch -d ${subproject_branch}
634 # Create local branches for upstream monorepo branches.
635 for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
636 refs/remotes/upstream/monorepo); do
637 upstream_branch=${ref#refs/remotes/upstream/monorepo/}
638 git -C my-monorepo branch upstream/${upstream_branch} ${ref}
641 The above gets you to a state like the following::
643 U1 - U2 - U3 <- upstream/main
645 \ \ - Llld1 - Llld2 -
647 \ - Lclang1 - Lclang2-- Lmerge <- local/octopus/main
649 - Lllvm1 - Lllvm2-----
651 Each branched component has its branch rewritten on top of the
652 monorepo and all components are unified by a giant octopus merge.
654 If additional active local branches need to be preserved, the above
655 operations following the assignment to ``my_local_branch`` should be
656 done for each branch. Ref paths will need to be updated to map the
657 local branch to the corresponding upstream branch. If local branches
658 have no corresponding upstream branch, then the creation of
659 ``local/octopus/<local branch>`` need not use ``git-merge-base`` to
660 pinpoint its root commit; it may simply be branched from the
661 appropriate component branch (say, ``llvm/local_release_X``).
663 Zipping local history
664 ---------------------
666 The octopus merge is suboptimal for many cases, because walking back
667 through the history of one component leaves the other components fixed
668 at a history that likely makes things unbuildable.
670 Some downstream users track the order commits were made to subprojects
671 with some kind of "umbrella" project that imports the project git
672 mirrors as submodules, similar to the multirepo umbrella proposed
673 above. Such an umbrella repository looks something like this::
675 UM1 ---- UM2 -- UM3 -- UM4 ---- UM5 ---- UM6 ---- UM7 ---- UM8 <- main
677 Lllvm1 Llld1 Lclang1 Lclang2 Lllvm2 Llld2 Lmyproj1
679 The vertical bars represent submodule updates to a particular local
680 commit in the project mirror. ``UM3`` in this case is a commit of
681 some local umbrella repository state that is not a submodule update,
682 perhaps a ``README`` or project build script update. Commit ``UM8``
683 updates a submodule of local project ``myproj``.
685 The tool ``zip-downstream-fork.py`` at
686 https://github.com/greened/llvm-git-migration/tree/zip can be used to
687 convert the umbrella history into a monorepo-based history with
688 commits in the order implied by submodule updates::
690 U1 - U2 - U3 <- upstream/main
692 \ -----\--------------- local/zip--.
694 - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 <-'
697 The ``U*`` commits represent upstream commits to the monorepo main
698 branch. Each submodule update in the local ``UM*`` commits brought in
699 a subproject tree at some local commit. The trees in the ``L*1``
700 commits represent merges from upstream. These result in edges from
701 the ``U*`` commits to their corresponding rewritten ``L*1`` commits.
702 The ``L*2`` commits did not do any merges from upstream.
704 Note that the merge from ``U2`` to ``Lclang1`` appears redundant, but
705 if, say, ``U3`` changed some files in upstream clang, the ``Lclang1``
706 commit appearing after the ``Llld1`` commit would actually represent a
707 clang tree *earlier* in the upstream clang history. We want the
708 ``local/zip`` branch to accurately represent the state of our umbrella
709 history and so the edge ``U2 -> Lclang1`` is a visual reminder of what
710 clang's tree actually looks like in ``Lclang1``.
712 Even so, the edge ``U3 -> Llld1`` could be problematic for future
713 merges from upstream. git will think that we've already merged from
714 ``U3``, and we have, except for the state of the clang tree. One
715 possible mitigation strategy is to manually diff clang between ``U2``
716 and ``U3`` and apply those updates to ``local/zip``. Another,
717 possibly simpler strategy is to freeze local work on downstream
718 branches and merge all submodules from the latest upstream before
719 running ``zip-downstream-fork.py``. If downstream merged each project
720 from upstream in lockstep without any intervening local commits, then
721 things should be fine without any special action. We anticipate this
722 to be the common case.
724 The tree for ``Lclang1`` outside of clang will represent the state of
725 things at ``U3`` since all of the upstream projects not participating
726 in the umbrella history should be in a state respecting the commit
727 ``U3``. The trees for llvm and lld should correctly represent commits
728 ``Lllvm1`` and ``Llld1``, respectively.
730 Commit ``UM3`` changed files not related to submodules and we need
731 somewhere to put them. It is not safe in general to put them in the
732 monorepo root directory because they may conflict with files in the
733 monorepo. Let's assume we want them in a directory ``local`` in the
736 **Example 1: Umbrella looks like the monorepo**
738 For this example, we'll assume that each subproject appears in its own
739 top-level directory in the umbrella, just as they do in the monorepo .
740 Let's also assume that we want the files in directory ``myproj`` to
741 appear in ``local/myproj``.
743 Given the above run of ``migrate-downstream-fork.py``, a recipe to
744 create the zipped history is below::
746 # Import any non-LLVM repositories the umbrella references.
747 git -C my-monorepo remote add localrepo \
748 https://my.local.mirror.org/localrepo.git
751 subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
752 libcxx libcxxabi libunwind lld lldb llgo llvm openmp
753 parallel-libs polly pstl )
755 # Import histories for upstream split projects (this was probably
756 # already done for the ``migrate-downstream-fork.py`` run).
757 for project in ${subprojects[@]}; do
758 git remote add upstream/split/${project} \
759 https://github.com/llvm-mirror/${subproject}.git
760 git fetch umbrella/split/${project}
763 # Import histories for downstream split projects (this was probably
764 # already done for the ``migrate-downstream-fork.py`` run).
765 for project in ${subprojects[@]}; do
766 git remote add local/split/${project} \
767 https://my.local.mirror.org/${subproject}.git
768 git fetch local/split/${project}
771 # Import umbrella history.
772 git -C my-monorepo remote add umbrella \
773 https://my.local.mirror.org/umbrella.git
776 # Put myproj in local/myproj
777 echo "myproj local/myproj" > my-monorepo/submodule-map.txt
782 zip-downstream-fork.py \
783 refs/remotes/umbrella \
784 --new-repo-prefix=refs/remotes/upstream/monorepo \
785 --old-repo-prefix=refs/remotes/upstream/split \
786 --revmap-in=monorepo-map.txt \
787 --revmap-out=zip-map.txt \
789 --submodule-map=submodule-map.txt \
793 # Create the zip branch (assuming umbrella main is wanted).
794 git -C my-monorepo branch --no-track local/zip/main refs/remotes/umbrella/main
796 Note that if the umbrella has submodules to non-LLVM repositories,
797 ``zip-downstream-fork.py`` needs to know about them to be able to
798 rewrite commits. That is why the first step above is to fetch commits
799 from such repositories.
801 With ``--update-tags`` the tool will migrate annotated tags pointing
802 to submodule commits that were inlined into the zipped history. If
803 the umbrella pulled in an upstream commit that happened to have a tag
804 pointing to it, that tag will be migrated, which is almost certainly
805 not what is wanted. The tag can always be moved back to its original
806 commit after rewriting, or the ``--update-tags`` option may be
807 discarded and any local tags would then be migrated manually.
809 **Example 2: Nested sources layout**
811 The tool handles nested submodules (e.g. llvm is a submodule in
812 umbrella and clang is a submodule in llvm). The file
813 ``submodule-map.txt`` is a list of pairs, one per line. The first
814 pair item describes the path to a submodule in the umbrella
815 repository. The second pair item describes the path where trees for
816 that submodule should be written in the zipped history.
818 Let's say your umbrella repository is actually the llvm repository and
819 it has submodules in the "nested sources" layout (clang in
820 tools/clang, etc.). Let's also say ``projects/myproj`` is a submodule
821 pointing to some downstream repository. The submodule map file should
822 look like this (we still want myproj mapped the same way as
826 tools/clang/tools/extra clang-tools-extra
827 projects/compiler-rt compiler-rt
828 projects/debuginfo-tests debuginfo-tests
829 projects/libclc libclc
830 projects/libcxx libcxx
831 projects/libcxxabi libcxxabi
832 projects/libunwind libunwind
835 projects/openmp openmp
837 projects/myproj local/myproj
839 If a submodule path does not appear in the map, the tools assumes it
840 should be placed in the same place in the monorepo. That means if you
841 use the "nested sources" layout in your umrella, you *must* provide
842 map entries for all of the projects in your umbrella (except llvm).
843 Otherwise trees from submodule updates will appear underneath llvm in
846 Because llvm is itself the umbrella, we use --subdir to write its
847 content into ``llvm`` in the zippped history::
849 # Import any non-LLVM repositories the umbrella references.
850 git -C my-monorepo remote add localrepo \
851 https://my.local.mirror.org/localrepo.git
854 subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc
855 libcxx libcxxabi libunwind lld lldb llgo llvm openmp
856 parallel-libs polly pstl )
858 # Import histories for upstream split projects (this was probably
859 # already done for the ``migrate-downstream-fork.py`` run).
860 for project in ${subprojects[@]}; do
861 git remote add upstream/split/${project} \
862 https://github.com/llvm-mirror/${subproject}.git
863 git fetch umbrella/split/${project}
866 # Import histories for downstream split projects (this was probably
867 # already done for the ``migrate-downstream-fork.py`` run).
868 for project in ${subprojects[@]}; do
869 git remote add local/split/${project} \
870 https://my.local.mirror.org/${subproject}.git
871 git fetch local/split/${project}
874 # Import umbrella history. We want this under a different refspec
875 # so zip-downstream-fork.py knows what it is.
876 git -C my-monorepo remote add umbrella \
877 https://my.local.mirror.org/llvm.git
880 # Create the submodule map.
881 echo "tools/clang clang" > my-monorepo/submodule-map.txt
882 echo "tools/clang/tools/extra clang-tools-extra" >> my-monorepo/submodule-map.txt
883 echo "projects/compiler-rt compiler-rt" >> my-monorepo/submodule-map.txt
884 echo "projects/debuginfo-tests debuginfo-tests" >> my-monorepo/submodule-map.txt
885 echo "projects/libclc libclc" >> my-monorepo/submodule-map.txt
886 echo "projects/libcxx libcxx" >> my-monorepo/submodule-map.txt
887 echo "projects/libcxxabi libcxxabi" >> my-monorepo/submodule-map.txt
888 echo "projects/libunwind libunwind" >> my-monorepo/submodule-map.txt
889 echo "tools/lld lld" >> my-monorepo/submodule-map.txt
890 echo "tools/lldb lldb" >> my-monorepo/submodule-map.txt
891 echo "projects/openmp openmp" >> my-monorepo/submodule-map.txt
892 echo "tools/polly polly" >> my-monorepo/submodule-map.txt
893 echo "projects/myproj local/myproj" >> my-monorepo/submodule-map.txt
898 zip-downstream-fork.py \
899 refs/remotes/umbrella \
900 --new-repo-prefix=refs/remotes/upstream/monorepo \
901 --old-repo-prefix=refs/remotes/upstream/split \
902 --revmap-in=monorepo-map.txt \
903 --revmap-out=zip-map.txt \
905 --submodule-map=submodule-map.txt \
909 # Create the zip branch (assuming umbrella main is wanted).
910 git -C my-monorepo branch --no-track local/zip/main refs/remotes/umbrella/main
913 Comments at the top of ``zip-downstream-fork.py`` describe in more
914 detail how the tool works and various implications of its operation.
916 Importing local repositories
917 ----------------------------
919 You may have additional repositories that integrate with the LLVM
920 ecosystem, essentially extending it with new tools. If such
921 repositories are tightly coupled with LLVM, it may make sense to
922 import them into your local mirror of the monorepo.
924 If such repositories participated in the umbrella repository used
925 during the zipping process above, they will automatically be added to
926 the monorepo. For downstream repositories that don't participate in
927 an umbrella setup, the ``import-downstream-repo.py`` tool at
928 https://github.com/greened/llvm-git-migration/tree/import can help with
929 getting them into the monorepo. A recipe follows::
931 # Import downstream repo history into the monorepo.
932 git -C my-monorepo remote add myrepo https://my.local.mirror.org/myrepo.git
935 my_local_tags=( refs/tags/release
940 import-downstream-repo.py \
941 refs/remotes/myrepo \
942 ${my_local_tags[@]} \
943 --new-repo-prefix=refs/remotes/upstream/monorepo \
945 --tag-prefix="myrepo-"
948 # Preserve release branches.
949 for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \
950 refs/remotes/myrepo/release); do
951 branch=${ref#refs/remotes/myrepo/}
952 git -C my-monorepo branch --no-track myrepo/${branch} ${ref}
956 git -C my-monorepo branch --no-track myrepo/main refs/remotes/myrepo/main
959 git -C my-monorepo checkout local/zip/main # Or local/octopus/main
960 git -C my-monorepo merge myrepo/main
962 You may want to merge other corresponding branches, for example
963 ``myrepo`` release branches if they were in lockstep with LLVM project
966 ``--tag-prefix`` tells ``import-downstream-repo.py`` to rename
967 annotated tags with the given prefix. Due to limitations with
968 ``fast_filter_branch.py``, unannotated tags cannot be renamed
969 (``fast_filter_branch.py`` considers them branches, not tags). Since
970 the upstream monorepo had its tags rewritten with an "llvmorg-"
971 prefix, name conflicts should not be an issue. ``--tag-prefix`` can
972 be used to more clearly indicate which tags correspond to various
973 imported repositories.
975 Given this repository history::
982 The above recipe results in a history like this::
984 U1 - U2 - U3 <- upstream/main
986 \ -----\--------------- local/zip--.
988 - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 - M1 <-'
997 Commits ``R1``, ``R2`` and ``R3`` have trees that *only* contain blobs
998 from ``myrepo``. If you require commits from ``myrepo`` to be
999 interleaved with commits on local project branches (for example,
1000 interleaved with ``llvm1``, ``llvm2``, etc. above) and myrepo doesn't
1001 appear in an umbrella repository, a new tool will need to be
1002 developed. Creating such a tool would involve:
1004 1. Modifying ``fast_filter_branch.py`` to optionally take a
1005 revlist directly rather than generating it itself
1007 2. Creating a tool to generate an interleaved ordering of local
1008 commits based on some criteria (``zip-downstream-fork.py`` uses the
1009 umbrella history as its criterion)
1011 3. Generating such an ordering and feeding it to
1012 ``fast_filter_branch.py`` as a revlist
1014 Some care will also likely need to be taken to handle merge commits,
1015 to ensure the parents of such commits migrate correctly.
1017 Scrubbing the Local Monorepo
1018 ----------------------------
1020 Once all of the migrating, zipping and importing is done, it's time to
1021 clean up. The python tools use ``git-fast-import`` which leaves a lot
1022 of cruft around and we want to shrink our new monorepo mirror as much
1023 as possible. Here is one way to do it::
1025 git -C my-monorepo checkout main
1027 # Delete branches we no longer need. Do this for any other branches
1029 git -C my-monorepo branch -D local/zip/main || true
1030 git -C my-monorepo branch -D local/octopus/main || true
1033 git -C my-monorepo remote remove upstream/monorepo
1035 for p in ${my_projects[@]}; do
1036 git -C my-monorepo remote remove upstream/split/${p}
1037 git -C my-monorepo remote remove local/split/${p}
1040 git -C my-monorepo remote remove localrepo
1041 git -C my-monorepo remote remove umbrella
1042 git -C my-monorepo remote remove myrepo
1044 # Add anything else here you don't need. refs/tags/release is
1045 # listed below assuming tags have been rewritten with a local prefix.
1046 # If not, remove it from this list.
1054 git -C my-monorepo for-each-ref --format="%(refname)" ${refs_to_clean[@]} |
1055 xargs -n1 --no-run-if-empty git -C my-monorepo update-ref -d
1057 git -C my-monorepo reflog expire --all --expire=now
1059 # fast_filter_branch.py might have gc running in the background.
1060 while ! git -C my-monorepo \
1061 -c gc.reflogExpire=0 \
1062 -c gc.reflogExpireUnreachable=0 \
1063 -c gc.rerereresolved=0 \
1064 -c gc.rerereunresolved=0 \
1065 -c gc.pruneExpire=now \
1070 # Takes a LOOOONG time!
1071 git -C my-monorepo repack -A -d -f --depth=250 --window=250
1073 git -C my-monorepo prune-packed
1074 git -C my-monorepo prune
1076 You should now have a trim monorepo. Upload it to your git server and
1082 .. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html
1083 .. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html
1084 .. [JSonnRevNum] Joerg Sonnenberger, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html
1085 .. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html
1086 .. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/