-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to move files to the framework #145
base: main
Are you sure you want to change the base?
Add script to move files to the framework #145
Conversation
Signed-off-by: David Horstmann <[email protected]>
This is useful for building file-moves on top of other not-yet-merged file moves. Signed-off-by: David Horstmann <[email protected]>
When there are no dirs to make we try anyway. This is bad and causes an error. Fix this. Signed-off-by: David Horstmann <[email protected]>
To allow branches to be built on other branches, resolve conflicts 'manually' when the 2 branches delete each others' files. Signed-off-by: David Horstmann <[email protected]>
Allow passing the --except flag, which stops a file from being moved (and perform the necessary path manipulation acrobatics to make that work). Created to be able to move tests/include except for tests/include/alt-dummy, like so: --path tests/include:include --except tests/include/alt-dummy Signed-off-by: David Horstmann <[email protected]>
This was failing if no exceptions were supplied. Signed-off-by: David Horstmann <[email protected]>
This causes the script to not perform the operation, but simply print the self._file_map dictionary of files to rename (this is useful for debugging the file-moving logic, especially with combinations of --path and --except options). Signed-off-by: David Horstmann <[email protected]>
When files are moved to the exact same location in the framework repo as they were in Mbed TLS, there is a problem. Since no git change has taken place on the Mbed TLS side, the most recent change to the file is its deletion when a previous set of files were moved to the framework (since all other files are deleted in this kind of move). As a result the moved files are deleted and do not appear in the final branch. Deal with this edge case explicitly by taking files which are not renamed from the file map and checking them out to the state in the temporary branch, before amending the resulting commit. Signed-off-by: David Horstmann <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a mostly full code review, more than we usually do in mbedtls-docs
. Arguably this script could live in the framework repository; for that level of quality, I'd want most of my comments to be handled. For the level of quality in mbedtls-docs
, all my comments are optional except one: this script seems to mess up if the destination repo is the framework
subdirectory of the source.
def add_file_move(self, src_path: str, dst_path: str) -> None: | ||
"""Move file at relative path src_path in the source repo to dst_path | ||
in the destination repo""" | ||
self._file_map[src_path] = dst_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no normalization like there is for the initialization of _file_map
?
cmd = [self.GIT_EXE] + git_cmd | ||
return subprocess.check_output(cmd, **kwargs) | ||
|
||
def add_file_move(self, src_path: str, dst_path: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method isn't used anywhere. __init__
should probably call it instead of setting _file_map
on its own.
def __init__(self, source_repo: str, dest_repo: str, | ||
source_branch_name: str, dest_branch_name: str, | ||
file_map: Dict[str, str], | ||
dest_base_branch_name: Optional[str] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to document the parameters. It's not obvious what you can put in file_map
, or the difference between dest_branch_name
and dest_base_branch_name
.
def _direct_children(self, dir_path: str) -> List[str]: | ||
"""Get the direct children of a subdirectory (at least the ones | ||
that matter to git)""" | ||
leaf_files = self.run_git(['ls-files', dir_path]).split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we ever have a file name containing a space, this will split on the space. (Control characters in file names would also not work properly.) The robust way would be
leaf_files = self.run_git(['ls-files', dir_path]).split() | |
leaf_files = self.run_git(['ls-files', '-z', '--', dir_path]).rstrip('\0').split('\0') |
This applies to the other uses of ls-files
.
It would be good to make an auxiliary method that calls ls-files
and returns a list.
leaf_files = self.run_git(['ls-files', dir_path]).split() | ||
path_len = len(self._path_parts(dir_path)) + 1 | ||
direct_children = set([ | ||
os.path.join( | ||
*self._path_parts(f)[:path_len] | ||
) | ||
for f in leaf_files | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be equivalent but simpler: run git -C dir_path ls-files
and strip anything after a slash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though looking at how this method is used, it seems that you're stripping the paths down to base names here, only to reconstruct the full path later in _expand_dir_move
?
def main() -> int: | ||
"""Command line entry point.""" | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument('--path', action='append', nargs='?', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than require --path
to be repeated for each path, I think this should be the positional arguments. In particular it would allow calls like mbedtls-move-to-framework … subdir/*.py
.
GIT_EXE = 'git' | ||
FRAMEWORK_DEFAULT_BASE_BRANCH = 'main' | ||
|
||
def __init__(self, source_repo: str, dest_repo: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be consistent when abbreviating “destination”: either dst
or dest
, not both.
parser.add_argument('--dst-base-branch', default=None, required=False, | ||
help='Base branch in the framework repo to make changes from') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't actually need to be a branch name, it can be any revision. In practice I'd want whatever is the current head of the framework submodule, and I think that should be the default, rather than main
.
In fact main
is likely not to work in practice, because it'll typically be some old reference made when the working directory was originally cloned I delete the default branch name in my local clones to avoid keeping that old reference around, so when I didn't pass --dst-branch-name
, I got the error
fatal: 'main' matched multiple (4) remote tracking branches
self.run_git(['checkout', self._framework_base_branch_name]) | ||
|
||
# Create a new branch with checkout -b | ||
self.run_git(['checkout', '-b', self._framework_branch_name]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add checks for common errors before starting to do anything. At least check that the branches don't already exist, which is likely to be the case the second time the same person uses the script.
# Delete all files except the ones we are moving | ||
for f in repo_files: | ||
if os.path.normpath(f) not in files_to_retain: | ||
self.run_git(['rm', f]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BLOCKER: As far as I understand it (more info in Slack thread), this will in particular remove framework
. So from that point, when checking out the original head again, the framework submodule is no longer initialized. Then later the framework directory itself disappears, which I don't understand.
I don't fully understand what's going on, but as far as I can tell, you can't have the destination be the framework
directory of the source, it has to be a separate worktree. This is a very unintuitive limitation that needs to be at least documented.
I'm marking this as priority-high because we need this to move files to the framework with the history preserved. Arguably this should be in the framework repository as an essential maintainer tool (even if it'll only be essential for a few months). |
tools/bin/mbedtls-move-to-framework
Outdated
self.run_git(['mv', f, self._file_map[f]]) | ||
|
||
# Commit the result | ||
commit_message = "Move some files to framework repository" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like the ability to specify a commit message. I can do it afterwards by rebasing, and that's not really a problem in the source branch, but in the destination branch, the commit message is behind a merge and git is reluctant to edit history beyond a merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, if I want to edit the commit message, I need to redo the conflict resolution which is implemented here as resolve_deletion_conflicts
. That's not easy!
I've now used the script successfully. As far as I can't tell it just can't work if the destination is the So this script should come at least with a warning, if not enforcement, that both the source and destination should be independent worktrees, freshly checked out, that are not used for doing anything else. Preferably, the script should create these worktrees itself (as |
I've not yet used this script myself, but looking at its result there seems to be extra commits that I wouldn't expect to be present in the history, see Mbed-TLS/mbedtls-framework#55 (comment) |
Agreed. I've created Mbed-TLS/mbedtls-framework#57 and added it to the first framework epic. (I don't care much about where the script lives, as long as it meets quality requirements.) |
We're preserving the commit history, not just the patch history. So each import branch must have the full history of the project up to the last commit that modifies one of the imported files. The first time we did this in a pull request, GitHub showed basically all the mbedtls history as part of the pull request, because these were all new commits in the There are tools that can rewrite a git branch to only retain commits that modify specific files, and construct a commit history with the same messages and the same patches for this subset of files. But you lose commit IDs (obviously), renamed files, content moved between files, etc. And these tools are harder to use. So I think we've made the right choice. |
Ah OK, I had misunderstood what the script does.
Yeah, that's what I thought the script was doing, and indeed now that you mention it that wouldn't be the right thing to do. Thanks for clearing that up. |
This allows the user to write a message that they prefer. While we're at it, we can supply a helpful template so that it's a bit clearer what each commit message is for. Signed-off-by: David Horstmann <[email protected]>
Signed-off-by: David Horstmann <[email protected]>
5c5b953
to
7847e77
Compare
Pushing a small script that moves files to the
framework
repo from Mbed TLS, while retaining history.This script works but may not be in a fully-polished state.