Mirroring svn-forks

Our svn mono repos is primarily centered on our prop. game engine.

When they started their second game, they naturally enough branched /svn/Game1 → /svn/Game2 and continued merging engine and tool stuff between the two. And so on for Game3, etc.

When I set up subgit mirroring tho, I had to split these into discrete git repositories, with the result that they all have disconnected git histories.

It occurs to me that this is essentially svn-forking.

Would it be possible for subgit config to allow specifying forks?

    forks = true
    parent_fork = /svn/Game1
    child_forks = /svn/Game[0-9]+ /svn/Game/* /svn/well/why_be_consistent_eh

This would be equivalent to subgit converting /svn/Game1 → Game1.git, but when you reach the copy that creates Game2.git, you would simply clone Game1.git to create Game2, and Game1->Game or Game2->Game3.

It seems like the tricky part would be merges and a potential actual blocker would be: what if someone did a merge of
/svn/GameB-> /svn/GameA+/svn/GameC … it would be hard to do that, but never underestimate smart people’s ability to do really stupid things :)

Hi Oliver,

I’m afraid SubGit has no such feature make SVN forks and hardly it’ll be possible to implement such a feature in the near future. May I ask you, however, what would be the use for this feature in your case, is that to gather complete history of all the repositories? If so, then I think we might try to fetch the history to every particular Git repository, it may be possible depending on the configuration.

It would allow accurate reflection of the SVN history in an SCM like bitbucket / gitlab etc, but critically it would also allow for merges between repositories.

Imagine this history:

+ = add, c = change, m = merge
+ /svn/Game01/Trunk/Engine/engine.cpp
+ /svn/Game01/Branches/Release_01/Engine/engine.cpp
c /svn/Game01/Branches/Release_01/Engine/engine.cpp
m /svn/Game01/Branches/Release_01 -> Trunk
+ /svn/Game01/Trunk -> /svn/Game02/Trunk  # fork to start new game
c /svn/Game01/Trunk/Engine/engine.cpp
c /svn/Game02/Trunk/Engine/engine.cpp
+ /svn/Game01/Trunk -> /svn/Game03/Trunk  # start 3rd game
c /svn/Game03/Trunk/engine.cpp  # make games run at 1200 fps
m /svn/Game03/Trunk/engine.cpp -> /svn/Game01/Trunk/engine.cpp
m /svn/Game01/Trunk/engine.cpp -> /svn/Game02/Trunk/engine.cpp

Those last two operations (partial history merges) are possible because they’re really all a single repos. If you are running subgit for an SCM like bitbucket or gitlab, you would do the same thing with forks, but under the hood those are just implemented with local clones

Hello Oliver,

I’ve got your point, thank you for explanation and sorry for the delay with the response.
As I mentioned, however, SubGit has not such a feature, so for now it’s only possible to have such operations possible when all the games are in the same repository, I’m afraid(

Thank you for your patience with me, Ildar.

I realize subgit cannot implement forks itself, that is a matter for the SCM like bitbucket/gitlab.

But when someone has been doing this by having their multiple projects in their single SVN repos, subgit could assist in translating it by recognizing this and handling the translation slightly differently.

Version 1 would be similar to the way subgit already commits multiple projects from a single svn repository → separate .gits, it would just use a “git clone” to initialize the repository instead of having to create a first commit from scratch for each:

  • find the first commit of a new “parent_fork” configuration parameter, and begin translating this “Fork1”,
  • record the first commit that goes to each “child_fork”,
  • continue until end of history,
  • fail if any child_fork was not svn-branched from set(parent) | set(child_forks),
  • for each child_fork: instead of git init create the .git using git clone from the recorded svn branch revisions, and then proceed appending to its history seeking forward.

subgit users would then need to approach their SCM providers to determine how to make - say - gitlab understand that “Fork2” is a fork, but they would have a basis now to do that.

Hi Oliver,

thank you for your valuable suggestions and thorough workup on the feature!

We discussed it internally with our developers and they found it interesting approach for the future, but at the same time pointed out that such a feature is not easy to implement, it’s rather a new mirror mode than just and extension, and it would require plenty changes to be made in SubGit, especially to support this inter-repositories merges. I hate to say this, but hardly we will manage to add this feature to our plan for the near future.
For the time being, the only way to have the complete history along with a possibility to merge between projects is to translate such an SVN repository as a whole not dividing it to the separate Git repositories for every project. Of course, this is not that convenient in Git as the per-project setup, but it allows to have the history and the merges. Besides, it’s possible to additionally separate the projects inside the Git repository placing their branches in separate namespaces, so it wouldn’t be a big mess :) and would not be big to clone, and at the same time would provide both the history and merges.
What would you say about such a setup?

For the moment - forget the merging - that is just an advantange.

The real guts of this feature, then, would be:

Ability to point subgit at an svn repository with a history like this:

you would declare “Trunk” the main fork, and configure subgit to start at r300, specifying “Proj2” and “Proj3” paths as child forks in the config.

On reaching r500 subgit notes a child fork by path, excluding commits/branches from it from Trunk.

On reaching r900 subgit notes a second child fork, excludes this too

If you were mirroring this repository now - as I am with subgit+bitbucket - you would have to configure+install 3 separate configurations.

With this change, subgit would be able to do this for me. On it’s first pass it detected where “Proj2” and “Proj3” began, and instead of having to fetch all of the svn history to start them again, it can simply “git clone Trunk”.

Before:

cd git-data/repositories
subgit configure https://svn/projects/Proj1 --trunk Trunk --minimal-revision 300
subgit install Proj1
subgit configure https://svn/projects/Proj2 --trunk Trunk --minimal-revision 500
subgit install Proj2
subgit configure https://svn/projects/Proj3 --trunk Trunk --minimal-revision 900
subgit install Proj3

after

cd git-data/repositories
subgit configure https://svn/projects --main-fork Proj1 --child Proj2 --child Proj3 --minimal-revision 300 --trunk Trunk
subgit install Proj1 
subgit install Proj2
subgit install Proj3

at the end of the subgit install Proj1, instead of a completely empty repository in Proj2, you have a clone of Proj1.

The ‘install’ for Proj2 doesn’t have to start by fetching the initial state - it git clones Proj1 at hash ae89c410 and then begins collecting svn commits that were to Proj2 and adding those - this makes me think of the way “rebuild from” works.

At a glance, these repositories won’t seem so different from doing this the old way. Git still won’t let you merge between them.

Except that because Proj2 and Proj3 are derived from Proj1 by cloning, you can use another layer of software - like gitlab - to recognize them as forks and use tools like pull requests to use git workflows to do merge between them when you transition from svn mirroring to git-centric operation.

I figured a pseudo code example might help, with the “before” showing my mental model of how subgit currently works:

//// BEFORE

fn subgit_install_path(svn_repos, path, git_repos) {
	// Iterate over each commit in the svn repos
	foreach commit in svn_repos.history() {  // fetch diffs from current rev on
		branch, changes = commit.changes_below(path)
		NEXT if changes is empty

		git_repos.add(branch, changes)
	}
}

fn subgit_install(svn_repos, min_rev, svn_root, paths, git_config) {
	// e.g.:
	//  svn_repos = svn://svn.serv/svn
	//  svn_root  = Projects
	//  paths     = Proj01, Proj02
	// -> svn://svn.serv/svn/Projects/Proj01  map to Proj01.git

	// Iterate over each path->git mapping
	foreach path in paths {
		svn_repos.checkout(min_rev)

		subgit_install(svn_repos, svn_root + "/" + path, git_config.repos(path))
	}
}


//// AFTER
// Old subgit_install renamed as subgit_install_paths.

// New subgit_install
fn subgit_install(svn_repos, min_rev, svn_root, main_path, other_paths, git_config, use_forks) {
	// essentially
	//  main_path = paths[0], other_paths = paths[1:]
	// where paths[0] is Trunk when not using forks or "main_fork" when using forks.
	forked = map()

	main_svn_path = svn_root + "/" + main_path
	main_repos    = git_config.repos(main_path)

	svn_repos.checkout(min_rev)

	foreach commit in svn_repos.history() {  // fetch diffs from min_rev on
		if use_forks and other_paths is not empty {
			forks = matching_paths(commit, svn_root, other_paths)
			// record the revision and hash we are at
			foreach fork in forks {
				forked.add(fork, commit.revn, git_repos.hash)
			}
			// stop tracking those forks
			other_paths.remove(forks)
		}

		branch, changes = commit.changes_below(main_svn_path)
		if changes is not empty {
			git_repos.add(branch, changes)
		}
	}

	if not use_forks {
		subgit_install_paths(svn_repos, min_rev, svn_root, other_paths, git_config)
	} else {
		foreach fork in other_paths {
			fork_rev, fork_hash = forked[fork] else {
				warning("No commits to {fork} found.")
				NEXT
			}

			fork_paths = [fork]  // single entry list

			git_config.clone(main_repos, fork, fork_hash)
			svn_repos.checkout(fork_rev)
			subgit_install_paths(svn_repos, svn_root, fork_paths, git_config)
		}
	}
}

Hello Oliver!

Thank you again for the very thorough explanation, really appreciate the schema and the code, that is indeed a great way to explain the idea!
And the better your explanation is, the more disappointing is that I must again answer that it is hardly be possible adding this feature to our plan for the near future. We discussed it with the dev team, the feature seems to be not easy to implement, especially taking into account GitLab specifics, so we just have no possibility to implement it now.

1 Like

Completely ignoring the gitlab/bitbucket/scm stuff, I know I’m doing a terrible job of describing the core feature I’m asking for, sorry :(

But there’d be no possibility of adding a feature to enable something like the following, even via paid contract work?

/svn
	/monorepo
		/OpsStuff
			/Trunk
			/Branches
				/Experiment
		/UserRepos
			/Trunk
		/Project01
			/Trunk
			/Branches
				/Release
				/Dev
		/ProjectFork02		# svn copied from /Project01 at r500
			/Trunk
				/Release
				/Dev
		/ProjectFork03		# svn copied from /Project02 at r900
			/Trunk
				/Release
				/Dev

subgit configure svn://server/svn/monorepo/Project01  # create project01
git clone --bare Project01 ProjectFork02  <hash of commit 500> # or some subgit command to copy other files too
subgit configure --rebuild-from-revision 501 svn://server/svn/monorepo/ProjectFork02

my attempts to describe it previously were referring to more advanced features built atop something like this, incase I was trying to do something you already had a solution for :)

I guess in a nutshell this is the ability to have subgit “install” behave like rebuild and continue a history from some-place else or - at least - from a previous subgit run on a related path.

Everything else aside, this would allow us to have the full original history in each of our subgit repos, since the additional projects were all created with “svn copy /svn/repos/Project01 /svn/repos/Project02”, but when we import from subgit it only imports the history starting at the copy.

Hi Oliver,

I suggest to have a live call to discuss all the details, would it work for you? If yes, could you please suggest an appropriate time for the meeting?

Sounds like a plan – what’s your timezone?

Hi Oliver,

my timezone is GMT+5