Pre-use/pre-licensing technical questions

Hello, we’re considering using SubGit to power our SVN-to-Git migration. This is a large set of very large repositories and would be an enterprise licensing for 25-50 users for 12-24 months. We have a very unique Subversion layout, so I’d like to run the rough details by you and make sure your product can support it before we start trying it out.

We have a single Subversion server containing 132 repositories. Each Subversion repository contains anywhere from 1000 to 100,000 commits (over a 20±year time span) and encompasses anywhere from 1 to 300 “components.” We would migrate just one of these 132 repositories for now (more in the future). However, that single Subversion repository contains over 200 “components,” (and about 70,000 commits) and each “component” will become its own Git repository on Bitbucket (so a single Subversion repository will become ~200 Git/Bitbucket repositories). Paths to components could take different routes, such as “foo/bar/Component1,” “foo/bar/Component2,” “baz/qux/Component3,” etc., and the corresponding Git repository names would be “Component1-bar,” “Component2-bar,” “Component3-qux,” etc.

In addition to this non-standard layout of hundreds of repositories in a single repository, none of these components/repositories have trunks. Each component is comprised of a set of branches as the top-level folders in that component folder. The branches all have intermingled merge history, but none of them derive from or arrive to a trunk/master. We can, as likely necessary, select a “default” branch for each component/repository, but it would be different default branch name for each, not a standard name across all of them.

So, the first/simplest question is: Can SubGit do this? The next question is: What kind of time frame are we looking at for a 70,000-commit migration here, and will the “subgit install” command show any kind of progress meter while it migrates history? As a first step while I researched other options (like SubGit), I began a “git svn clone” process for a single component. It’s been going for about 6 hours now, and there’s no indication of how much longer it has left. Initial online inquiries show reports of 1 hour to 2 weeks to do a “git svn clone,” depending on Subversion repository size.

Thanks in advance for proving some technical guidance while we get started.

Nick Williams

Hello Nick,

thank you for your interest in our products!

Answering your questions: yes, SubGit is definitely able to drive such a migration, and so does our other product – SVN Mirror add-on for Bitbucket server, it would be a better choice if you intend to use Bitbucket as the target Git server. It is not mandatory to have a trunk in every SVN project and derive all branches from it, so the setup you’ve described is perfectly possible to migrate, yet a “default” branch should be set, just to have the mandatory mapping in SubGit configuration, but it can be any branch at your choice.
It’s very hard to tell for sure how much time would take a particular repository migration since it depends on a number of parameters, such as protocol used to access the repository, networks speed, SVN and Git servers performance, the migrations settings, and even the SVN project history specifics. At average, a regular data changing revision translation takes about 1 - 1.5 seconds, but the translation time is noticeably longer for revisions adding big data chunks; also, there are some SubGit/SVN Mirror add-on setting that may increase the translation time, here are details about those settings:

TMate SubGit: Frequently asked questions

So for 70000-revisions repository I Would estimate the initial import time as something about 2 days or so in case there are not big data adding or erroneous revisions (like a whole repository copying instead of a particular directory).
SubGit does show a progress bar in the console where the subigt install has been invoked, and so does SVN Mirror add-on (in the Web UI), and it also possible to track the progress in the logs, there is also information available about what the tool is doing at the moment. I must to note, however, that all those sources only provide information about the currently importing revision, but not the time left for the import – just for the reason I mentioned above, it’s very hard to predict the time needed for import.

Hope I managed to clarify the station a bit for you, but don’t hesitate to reach out to us had you got any more questions, of course.

Thanks for the quick reply, Ildar.

The experimental git svn clone import I was attempting took approximately 20 hours, so maybe that’s a guide for how long subgit install should take. However…

Here’s what I did:

First, I created a new configuration:

$ subgit configure --layout directory http://svn.example.org/svn/REPO/path/to/Component1/trunk-6.1.0.beta2 Component1.git

Where “trunk-6.1.0.beta2” is one of many branches (some of which also start with “trunk-," some of which don’t).

Then I edited the config file and changed it as follows:

url = http://svn.example.org/svn/REPO/path/to/Component1

trunk = trunk-6.1.0.beta2:refs/heads/trunk-6.1.0.beta2
branches = :refs/heads/

By my understanding, this should treat “trunk-6.1.0.beta2” as the required default branch mapping, and all other branches as regular branches.

Then I configured the passwd file and the authors.txt file. Finally, I ran the install command:

$ subgit install Component1.git

Here’s where things got weird. After a few seconds, the install command began showing a progress bar that started already at about 73%, and already about 60,000 commits in to the 70,000-commit repo. Over the course of about 10 minutes, it moved to 100% and completed, and then immediately failed with an error:

error: Failed to launch background translation process: timeout waiting for pid file ‘/Users/williamsn/Desktop/scratch/git-tests/Component1.git/subgit/daemon.pid’.
error: Unexpected error has occurred; please report along with the logs

To make sure that the process had enough time to start, I changed the config launchTimeout = 3600 (1 hour) and tried again. Same result, only I had to wait an hour for it. I’m unsure from the log what is happening. It looks like the daemon process launches (no errors reported), but then just disappears. If I ps ax, there is no matching, running process, and no PID file is ever created.

I’ve attached the log, which I had to sanitize and truncate the beginning of for security purposes. I think everything that’s relevant is present in the attached log. If you do not see the relevant information and need the full, unsanitized log, let me know, because I’ll have to request permission to send you that.

Thanks,

Nick Williams

subgit-install.0.log (11.7 KB)

Hello Nicholas,

I’m afraid this mapping configuration is not completely correct:

trunk = trunk-6.1.0.beta2:refs/heads/trunk-6.1.0.beta2
branches = :refs/heads/

the trunk mapping is OK, but the branches would not work this way, and that is what happening during the translation: as far as I understand the trunk-6.1.0.beta2 branch has been created in SVN somewhere around revision 77000, so SubGit moves directly to it thus progressing to 73% immediately.
I haven’t completely got the situation with branches, are they present as directories at the same level as the trunk-6.1.0.beta2 directory? If so, I would rather use the following configuration for this SVN project:

url = http://svn.example.org/svn/REPO/path/to

trunk = Component1/trunk-6.1.0.beta2:refs/heads/trunk-6.1.0.beta2
branches = Component1/*:refs/heads/*

or, another way, if there are not so many branches that are to be imported (and some are to be excluded), they can be mentioned explicitly:

url = http://svn.example.org/svn/REPO/path/to

trunk = Component1/trunk-6.1.0.beta2:refs/heads/trunk-6.1.0.beta2
branches = Component1/branch1:refs/heads/branch1
branches = Component1/branch2:refs/heads/branch2
...

Hope this will help.

Yes, all branches are in the same level directory: all/source/Library/Component1/6.0.5 (a released branch), all/source/Library/Component1/6.0.8 (a released branch), all/source/Library/Component1/trunk-6.0.9 (a working branch), all/source/Library/Component1/trunk-6.1.0.beta2 (a working branch), etc.

So, I actually tried something similar to what you’re suggesting yesterday:

url = http://svn.example.org/svn/REPO

trunk = all/source/Library/Component1/trunk-6.1.0.beta2:refs/heads/trunk-6.1.0.beta2
branches = all/source/Library/Component1/:refs/heads/

This installed starting at around r63000 and completed seemingly without error, but then the background daemon did not start and did not create a PID file (same log contents).

As an experiment, I also tried getting some older branches that were in a different location within the repo:

trunk = all/source/Library/Component1/trunk-6.1.0.beta2:refs/heads/trunk-6.1.0.beta2
branches = all/source/Library/Component1/:refs/heads/
branches = 6./source/Library/Component1/:refs/heads/old_6._

This installed starting at around r54000 and, again, completed seemingly without error, but then the background daemon did not start and did not create a PID file (same log contents).

I don’t see any indication that my configuration is creating an issue here. The daemon just isn’t starting and isn’t spitting out any error indicating why it’s not starting.

By the way, if it matters, I’m testing this on macOS Big Sur 11.6.2 on a 2019 MBP 8-core I9 with 32 GB RAM. I guess the next thing I might try is a different operating system.

Thanks,

Nick

Update since my last email sent 27 minutes ago…

The last configuration I showed you below (with the old branches mapped, too) appears to have worked perfectly on RedHat Enterprise Linux 8 instead of macOS:

SubGit version 3.3.14 (‘Bobique’) build #4433

Translating Subversion revisions to Git commits…

Subversion revisions translated: 77235.
Total time: 533 seconds.

INSTALLATION SUCCESSFUL

You are using SubGit in evaluation mode.
Your evaluation period expires on April 28, 2022 (in 7 days).

Extend your trial or purchase a license key at https://subgit.com/pricing

The daemon is running in the background:

$ ps ax|grep subgit
1534802 ? Ssl 0:01 /usr/local/java/jdk1.8.0_321/jre/bin/java -noverify -client -Djava.awt.headless=true -Djna.nosys=true -Dsvnkit.http.methods=Digest,Basic,NTLM,Negotiate -Dsvnkit.ssh.client=apache -cp /home/williamsn/scratch/git-testing/Component1-Library.git/subgit/lib/subgit-3.3.14_4433_fat.jar org.tmatesoft.translator.SubGitDaemon test --svn /home/williamsn/scratch/git-testing/Component1-Library.git --limit 1650541600927

I can clone it:

$ git clone Component1-Library.git Component1-Library-Clone
Cloning into ‘Component1-Library-Clone’…
done.

The default branch is correct:

$ git branch

  • trunk-6.1.0.beta2

I can check out other branches:

$ git checkout trunk-6.0.9
Branch ‘trunk-6.0.9’ set up to track remote branch ‘trunk-6.0.9’ from ‘origin’.
Switched to a new branch ‘trunk-6.0.9’

I can commit and push:

$ git push
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Delta compression using up to 32 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 1.03 KiB | 1.03 MiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
remote: Fetching revisions from SVN repository:
remote: up to date
remote: Sending commits to SVN repository:
remote: dc42dce => r77236 all/source/Library/Component1/trunk-6.1.0.beta2
remote: Sync completed successfully
To /home/williamsn/scratch/git-testing/Component1-Library.git
b6f04d82…dc42dce9 trunk-6.1.0.beta2 → trunk-6.1.0.beta2

And, finally, after a commit is made in Subversion, I can pull/rebase that down:

$ git pull --rebase
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), 1.01 KiB | 172.00 KiB/s, done.
From /home/williamsn/scratch/git-testing/Component1-Library
dc42dce9…4eb47082 trunk-6.1.0.beta2 → origin/trunk-6.1.0.beta2
Updating dc42dce9…4eb47082
Fast-forward
component.xml | 2 ±
1 file changed, 1 insertion(+), 1 deletion(-)

So, all that’s to say … it works! It seems to have no problem with our complicated layout. However, there does seem to be a macOS-specific bug where the daemon will not start.

I’d like to jump ahead and ask you, “How does this end?” In other words … once the migration is completely done and we have locked the Subversion repository against commits, how do we make this repository a “regular” Git repository (both with SubGit directly and with the BitBucket plugin)?

Thanks!

Nick

Hello Nicholas,

that’s great, glad to know it is working! And thank you for the MacOS issue report, we will investigate it.

As for the “How does this end” question – in fact, it does not end with the migration, it’s just an initial stage. The thing is that SubGit and SVN Mirror add-on are able to drive two-way mirror between SVN and Git and after the initial import finishes the repositories can be used as regular SVN and Git repositories, and commits from both sides will be synchronized with the counterparts so that a new commit sent to VN appears in Git and vice versa. In this setup the SVN side may as well be locked against commits, but still synchronized so that all the Git commits appear in SVN, yet in this case the SVN user set in SubGit must be allowed to commit, otherwise SubGit will not be able to send the data to SVN.
If, on the other hand, the plan is to migrate to Git at once and sunset SVN completely, then the mirror can be stopped after the initial import:

subgit uninstall <REPO PATH>

or in this way:

subgit uninstall --purge <REPO PATH>

the second command will also remove all SubGit metadata from the repository; and once SubGit is uninstalled from the repository, it is just a regular Git repository and thus can be used as any other.
For this setup you can also use subgit import command instead of subgit install – the former one only performs the initial import and does not start the daemon after the import, while the latter is actually one for the mirror.

Ahah, I see. What we would actually do is run it as a mirror for some months while the migration takes place, and then once the migration is complete, lock SVN (except for the SubGit user) and then subgit uninstall (or whatever the BitBucket equivalent is).

Thanks for all your help. I’m going to strongly advocate for licensing SubGit for our migration in my meeting later today.

Thanks,

Nick

I came up with two more questions after experimenting with mappings and using SubGit a bit more…

Question 1: Is there a way to tell SubGit that, when it encounters a Subversion commit that deletes a Subversion branch, it creates a “backup” Git tag of that deleted branch at the commit before it was deleted? This is the default behavior of the KDE svn2git migration tool, and we’ve found that having those tags is a great way of exploring the history of the repository a bit more and also verifying that the migration was successful and accurate.

Question 2: Is there a way to scope a particular “branches” mapping directive to a maximum revision? One of our components was renamed early on from “Component” to “ComponentRuntime” (for example), and then later a new component was created named “Component” in the same location that “ComponentRuntime” used to be. Right now, I can’t include a mapping directive that covers “Component,” because it also picks up commits from the other “Component” that was created after “ComponentRuntime” was renamed from “Component.” Again citing KDE svn2git here, I can define a match for “Component” with "max revision 36646,” that gives us the very oldest history for “ComponentRuntime.” However, in SubGit we have to have a shorter history that omits all history prior to “ComponentRuntime’s” rename from “Component.” Does this make sense?

Thanks,

Nick

Hello Nicholas,

I’m afraid SubGit has no such a feature to create some additional Git references during the migration, it precisely reflects the SVN project structure not adding anything that is no present in SVN.

SubGit has the svn.minimalRevision setting that allows to start the import not from the very beginning but rather limit the import with some number of recent revisions thus allowing to reduce the time of migration and the size of the repository; but it does not allow to limit the import with a maximum revision, it always imports the data up the latest revision in the repository, it one of the conditions required for the mirroring capabilities.
So, if you just need to import part of an SVN project history limiting it up on top, then it would require some workaround on the SVN side – exporting the repository up to the needed revision, creating a new SVN repository out of the exported data and then import the new repo with SubGit, for example.
However, if the main issue is the history, then it can be handled by SubGit during the migration, or, in complex cases, by Git commands. For the case you described it is possible to get the history with the following mapping configuration:

[svn]
    ...
    branches = Component:refs/heads/Component
    branches = ComponentRuntime:refs/heads/ComponentRuntime

If the new Component has been created by SVN move facility, then SubGit will recognise it and import the history in the proper way.

Hope it will help.