您如何捆绑 git 中的子模块更改?

How do you bundle submodule changes in git?

如果我有一个包含子模块的存储库,并且我想为 sneakernet 制作一个捆绑包,我该如何使捆绑包包含更新子模块所需的对象?

例如,假设我有一个名为 parent 的存储库,它包含一个名为 submod 的子模块。我想创建一个包,其中包含自提交 basecommit 以来的所有最新工作,所以很自然地,我会从 parent 根目录中执行以下操作:

git bundle create mybundlefile.bundle basecommit..myworkingbranch

这将创建一个名为 mybundlefile.bundle 的文件,其中包含 parent 存储库中 basecommit..myworkingbranch 范围内的所有提交对象以及引用 myworkingbranch。问题是,如果这些提交中的任何一个更改了子模块,生成的包将不会非常有用,因为此类提交仅作为更改子模块哈希存储在包文件中。所以存储在包文件中的对象只是说 "I'm commit 3ba024b and I change the hash of submodule submod from 2b941cf to 1bb84ec." 但包实际上并不包含对象 2b941cf..1bb84ec necceassary 从包更新子模块并为 myworkingbranch 创建一个干净的工作树。

如何创建捆绑文件,以便还包含子模块存储库中的所有这些对象。也就是说,如果父仓库的基础提交 basecommit 将子模块 submod 指向哈希 A 并且父仓库的工作分支 myworkingbranch 将子模块 submod 指向哈希 B,那么我的包不仅需要包含 basecommit..myworkingbranch,还需要包含 A..B

How do I create the bundle file such that all those objects from the submodule repos are also included.

你不能。捆绑文件特定于 Git 存储库。子模块只是 link 到 另一个 Git 存储库,因此您必须为单独的 Git 存储库创建一个单独的包。 (然后您可以从各种包中制作一个存档。)

很明显,Git 可以下降到每个子模块中,运行 git bundle 可以下降到每个子模块中,为您制作这些不同的文件。不过,传递给 sub-git bundle 命令的参数很棘手。您将必须编写自己的脚本并使用 git submodule foreach 将其添加到每个子模块中的 运行,并让您的脚本找出要使用的参数。

然后您可能希望将每个包打包在一起(大概是 tar 或 rar 或 zip 或任何存档文件)以进行传输和拆分/获取。在解绑期间你会想要另一个 git submodule foreach,这可能会更烦人,因为理想情况下这应该使用一组新的子模块(在解绑顶级模块并选择适当的提交之后)。

可能有人编写了脚本来执行此操作,但如果是这样,我就不知道了。它不包含在 Git 本身中——捆绑包本身有点笨拙而且 non-mainstream-y.

我写 github.com/TamaMcGlinn/submodule_bundler 来做到这一点。有很多角落案例,我怀疑我是否已经全部掌握。请尝试一下,如果您的用例需要任何修复,请在项目上提出问题。

为了后人,我将在上面列出项目中的所有代码;但你应该直接从 github.

克隆

bundle.py

#!/usr/bin/env python3
""" Create bundles for submodules """

import os
import argparse
import subprocess
import tarfile
import submodule_commits
import string
import random
import shutil

parser = argparse.ArgumentParser(description='Create bundles for submodules (recursively), \
                                              to facilitate sneakernet connections. On the online computer, \
                                              a bundle is made for each repository, and then packed into a .tar file. \
                                              On the offline computer, use unbundle.py on the tarfile to unzip and \
                                              pull from the corresponding bundle for each repository.')

parser.add_argument('filename', metavar='filename', type=str, help='file to create e.g. ../my_bundles.tar')
parser.add_argument('commit_range', metavar='[baseline]..[target]', type=str, default='..HEAD', nargs='?',
                    help='commit range of top-level repository to bundle; defaults to everything')

args = parser.parse_args()

class IllegalArgumentError(ValueError):
    pass

try:
    [baseline, target] = args.commit_range.split('..')
except ValueError:
    raise IllegalArgumentError(f"Invalid commit range: '{args.commit_range}': "
                               + "Expected [baseline]..[target]. Baseline and target are optional "
                               + "but the dots are necessary to distinguish between the two.") from None

full_histories = False
from_str = f'from {baseline} '
if baseline == '':
    print("No baseline (all bundles will be complete history bundles)")
    full_histories = True
    from_str = "from scratch "

if target == '':
    target = 'HEAD'

print('Making bundles to update ' + from_str + f'to {target}')

updates_required = {}
new_submodules = {}
bundles = []

for submodule in submodule_commits.submodule_commits('.', target):
    new_submodules[submodule['subdir']] = submodule['commit']

root_dir = os.getcwd()
tar_file_name = os.path.basename(args.filename).split('.')[0]
temp_dir = f'temp_dir_for_{tar_file_name}_bundles'  # note this won't work if that dir already has contents

def create_bundle(submodule_dir, new_commit_sha, baseline_descriptor=''):
    bundle_path_in_temp = f'{submodule_dir}.bundle'
    bundle_path = f'{temp_dir}/{bundle_path_in_temp}'
    if submodule_dir == '.':
        route_to_root = './'
    else:
        route_to_root = (submodule_dir.count('/') + 1) * '../'
    os.makedirs(os.path.dirname(bundle_path), exist_ok=True)
    os.chdir(submodule_dir)
    rev_parse_output = subprocess.check_output(['git', 'rev-parse', '--abbrev-ref', 'HEAD'])
    current_branch = rev_parse_output.decode("utf-8").strip('\n')
    subprocess.run(['git', 'bundle', 'create', route_to_root + bundle_path,
                   f'{baseline_descriptor}{current_branch}', '--tags'])
    bundles.append(bundle_path_in_temp)
    os.chdir(root_dir)

if not full_histories:
    for existing_commit in submodule_commits.submodule_commits('.', baseline):
        baseline_commit = existing_commit['commit']
        submodule_dir = existing_commit['subdir']
        new_commit_sha = new_submodules.pop(submodule_dir, None)
        if new_commit_sha is None:
            # the submodule was removed, don't need to make any bundle
            continue
        if new_commit_sha == baseline_commit:
            # no change, no bundle
            continue
        print(f"Need to update {submodule_dir} from {baseline_commit} to {new_commit_sha}")
        create_bundle(submodule_dir, new_commit_sha, f'{baseline_commit}..')

for submodule_dir, commit_sha in new_submodules.items():
    print(f"New submodule {submodule_dir}")
    bundle_name = f'{submodule_dir}.bundle'
    create_bundle(submodule_dir, commit_sha)

# the bundle of the top-level repository itself is oddly called '..bundle'
# it is impossible to have a submodule that clashes with this
# because you cannot name a directory '.'
baseline_descriptor = ''
if not full_histories:
    baseline_descriptor = f'{baseline}..'
create_bundle('.', target, baseline_descriptor)

print("Packing bundles into tarfile:")
with tarfile.open(args.filename, mode="w:") as tar:  # no compression; git already does that
    os.chdir(temp_dir)
    for bundle in bundles:
        print(bundle)
        tar.add(bundle)
    os.chdir(root_dir)

print("Removing temp directory")
shutil.rmtree(temp_dir)

unbundle.py

#!/usr/bin/env python3
""" Extract bundles for submodules """

import os
import argparse
import shutil
import tarfile
import pullbundle
import submodule_commits
import subprocess

parser = argparse.ArgumentParser(description='Create bundles for submodules (recursively), \
                                              to facilitate sneakernet connections. On the online computer, \
                                              a bundle is made for each repository, and then packed into a .tar file. \
                                              On the offline computer, use unbundle.py on the tarfile to unzip and \
                                              pull from the corresponding bundle for each repository.')

parser.add_argument('filename', metavar='filename', type=str, help='file to create e.g. ../my_bundles.tar')

args = parser.parse_args()

tar_file_name = os.path.basename(args.filename).split('.')[0]
temp_dir = f'temp_dir_for_{tar_file_name}_extraction'

with tarfile.open(args.filename, 'r:') as tar:
    tar.extractall(temp_dir)

root_dir = os.getcwd()

def is_git_repository(dir):
    """ Return true iff dir exists and is a git repository (by checking git rev-parse --show-toplevel) """
    if not os.path.exists(dir):
        return False
    previous_dir = os.getcwd()
    os.chdir(dir)
    rev_parse_toplevel = subprocess.check_output(['git', 'rev-parse', '--show-toplevel'])
    git_dir = rev_parse_toplevel.decode("utf-8").strip('\n')
    current_dir = os.getcwd().replace('\', '/')
    os.chdir(previous_dir)
    return current_dir == git_dir

pullbundle.pullbundle(f'{temp_dir}/..bundle', True)
for submodule in submodule_commits.submodule_commits():
    subdir = submodule["subdir"]
    commit = submodule["commit"]
    print(f'{subdir} -> {commit}')
    bundle_file_from_root = f'{temp_dir}/{subdir}.bundle'
    if not os.path.isfile(bundle_file_from_root):
        print(f'Skipping submodule {subdir} because there is no bundle')
    else:
        if not is_git_repository(subdir):
            # clone first if the subdir doesn't exist or isn't a git repository yet
            subprocess.run(['git', 'clone', bundle_file_from_root, subdir])
        route_to_root = (subdir.count('/') + 1) * '../'
        bundle_file = f'{route_to_root}{bundle_file_from_root}'
        os.chdir(subdir)
        pullbundle.pullbundle(bundle_file)
    os.chdir(root_dir)

print("Removing temp directory")
shutil.rmtree(temp_dir)

subprocess.run(['git', 'submodule', 'update', '--recursive'])

拉bundle.py

#!/usr/bin/env python3
""" Pull from bundles """

import argparse
import subprocess
import re
import os

ref_head_regex = 'refs/heads/(.*)'
head_commit = None

class UnableToFastForwardError(RuntimeError):
    pass

def iterate_branches(bundle_refs):
    """ Given lines of output from 'git bundle unbundle' this writes the HEAD commit to the head_commit global
        and yields each branch, commit pair """
    global head_commit
    for bundle_ref in bundle_refs:
        ref_split = bundle_ref.split()
        commit = ref_split[0]
        ref_name = ref_split[1]
        if ref_name == 'HEAD':
            head_commit = commit
        else:
            match = re.search(ref_head_regex, ref_name)
            if match:
                branch_name = match.group(1)
                yield (branch_name, commit)

def update_branch(branch, commit, check_divergence=False):
    """ Update branch to commit if possible by fast-forward """
    rev_parse_branch_output = subprocess.check_output(['git', 'rev-parse', branch])
    old_commit = rev_parse_branch_output.decode("utf-8").strip('\n')
    if old_commit == commit:
        print(f'Skipping {branch} which is up-to-date at {commit}')
    else:
        rev_parse_current_output = subprocess.check_output(['git', 'rev-parse', '--abbrev-ref', 'HEAD'])
        current_branch = rev_parse_current_output.decode("utf-8").strip('\n')

        returncode = subprocess.call(['git', 'merge-base', '--is-ancestor', branch, commit])
        branch_is_behind_commit = returncode == 0
        if branch_is_behind_commit:
            print(f'Fast-forwarding {branch} from {old_commit} to {commit}')
            if current_branch == branch:
                subprocess.call(['git', 'reset', '--hard', '-q', commit])
            else:
                subprocess.call(['git', 'branch', '-Dq', branch])
                subprocess.run(['git', 'branch', '-q', branch, commit])
        else:
            returncode = subprocess.call(['git', 'merge-base', '--is-ancestor', commit, branch])
            branch_is_ahead_of_commit = returncode == 0
            if branch_is_ahead_of_commit:
                print(f'Skipping {branch} which is at {old_commit}, ahead of bundle version {commit}')
                if current_branch == branch and check_divergence:
                    raise UnableToFastForwardError("Unable to update branch: already ahead of bundle") from None
            else:
                print(f'Error: {branch} already exists, at {old_commit} which diverges from '
                      + f'bundle version at {commit}')
                print('You could switch to the bundle version as follows, but you might lose work.')
                print(f'git checkout -B {branch} {commit}')
                if current_branch == branch and check_divergence:
                    raise UnableToFastForwardError("Unable to update branch: diverged from bundle") from None

def checkout(commit):
    subprocess.run(['git', 'checkout', '-q', '-f', commit])

def pullbundle(bundle_file, check_divergence=False):
    """ Main function; update all branches from given bundle file """
    global head_commit
    head_commit = None

    subprocess.run(['git', 'fetch', bundle_file, '+refs/tags/*:refs/tags/*'], stderr=subprocess.DEVNULL)
    unbundle_output = subprocess.check_output(['git', 'bundle', 'unbundle', bundle_file])
    bundle_refs = filter(None, unbundle_output.decode("utf-8").split('\n'))

    for branch, commit in iterate_branches(bundle_refs):
        returncode = subprocess.call(['git', 'show-ref', '-q', '--heads', branch])
        branch_exists = returncode == 0
        if branch_exists:
            update_branch(branch, commit, check_divergence)
        else:
            print(f'Created {branch} pointing at {commit}')
            subprocess.run(['git', 'branch', branch, commit])
            checkout(commit)

    if head_commit is not None:
        # checkout as detached head without branch
        # note this might not happen; if the bundle updates a bunch of branches
        # then whichever one we were already on is updated already and we don't need to do anything here
        checkout(head_commit)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Update all branches and tags contained in a bundle file')
    parser.add_argument('filename', metavar='filename', help='git bundle file to pull e.g. ../foo.bundle')
    parser.add_argument('-c', '--check_divergence', help="return an errorcode if the current branch was not updated "
                        + "because of already being ahead or having diverged from the bundle version of that branch",
                        action='store_true')
    args = parser.parse_args()
    pullbundle(args.filename, args.check_divergence)

submodule_commits.py

#!/usr/bin/env python3
""" Print the commit of each submodule (recursively) at some commit"""

import os
import argparse
import subprocess
import re

def print_submodule_commits(root_subdir, root_commit):
    for result in submodule_commits(root_subdir, root_commit):
        print(f'{result["subdir"]} {result["commit"]}')

def submodule_commits(subdir='.', commit='HEAD', prefix=''):
    is_subdir = subdir != '.'
    if is_subdir:
        previous_dir = os.getcwd()
        os.chdir(subdir)
    git_ls_tree = subprocess.check_output(['git', 'ls-tree', '-r', commit])
    ls_tree_lines = filter(None, git_ls_tree.decode("utf-8").split("\n"))
    submodule_regex = re.compile(r'^[0-9]+\s+commit')
    for line in ls_tree_lines:
        if submodule_regex.match(line):
            line_split = line.split()
            commit_hash = line_split[2]
            subdirectory = line_split[3]
            submodule_prefix = subdirectory
            if prefix != '':
                submodule_prefix = f'{prefix}/{subdirectory}'
            yield {'subdir': submodule_prefix, 'commit': commit_hash}
            yield from submodule_commits(subdirectory, commit_hash, submodule_prefix)
    if is_subdir:
        os.chdir(previous_dir)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Print the commit of each submodule (recursively) at some commit')
    parser.add_argument('commit', metavar='commit_hash', type=str, default='HEAD', nargs='?',
                        help='commit to examine; defaults to HEAD')
    args = parser.parse_args()
    print_submodule_commits('.', args.commit)