Tech Support for subgit error

Hi, I downloaded your subgit tool (version 3.3.15 (‘Bobique’) build #4442) as part of doing a few experiments with one-time Subversion to GIT migrations. Depending on the size of the Subversion archive I select, the tool works fine, getting complete history, etc. But I am finding on a “larger” archive I keep hitting the following error (this occurs after the process runs for around 1 day or so):

Fetching SVN history… Done.
Growing trees… Done.
Project origin detected: r1 este/trunk
Building branches layouts… Done.
Combing beards…
beard 305111 of 305111 combed | 100% [||||||||||]
CONFIGURATION FAILED

error: svn: E175002: chunked stream ended unexpectedly
error: svn: E175002: REPORT request failed on ‘/svn/este-src/!svn/vcc/default’
error: chunked stream ended unexpectedly

I googled and found this recommendation to try:

In the subgit config file, add the following:

[svn]
httpSpooling = true

But I still get the same error.

Is there a way to get quoted for some tech support to see if we can get a resolution to the above error? My experience with the software so far is that I had to beef up the memory on the server running subgit to get by earlier issues where garbage collection would no longer keep up, but now that I have more memory in place, the above error is the next obstacle I am running into. Thank you.

Hello Robert!

Thank you for reaching out to us on that matter!

The error “chunked stream ended unexpectedly” is reported when an SVN server closes connection due to server-side timeout. In practice, that means that response to the client’s (SubGit) request is large and takes the client too long time to process.

The svn.httpSpooling setting can indeed help in such cases as it tells SubGit to spool the data locally thus reducing time needed to handle the response and also reduce memory consumption. The matter is that, though, that this only works during the actual import or mirror, but not during the configure stage, so it’s little of a help here.

A possible workaround for this issue is to change the timeout settings on the SVN server so that it waits longer allowing SubGit to finish the response handling. It may also help during the actual import in case there are big files or revisions that would take much time for the translation.

Another possible workaround is to use other protocol to access the SVN server instead of http as svn, ssh, or file protocols are not affected by this issue; it’s also possible to use some of these protocols for the configuration stage and switch back to http for the actual import if needed.

Finally, it is possible to not to use the auto layout during the configuration: the idea here is that when subgit configure command is invoked with layout auto option, SubGit connects to the SVN server, scans its history and that is when the issue occurs. If configure is invoked with layout std or without layout option at all (which is implicit std) then SubGit just creates a new repository with a mapping configuration for the “standard” SVN layout (with trunk, branches, and tags) and it does not connect to SVN hence not hitting the issue. The drawback here is that the branches mapping configuration should be created and adjusted manually while layout auto allows configuring the mapping automatically.

Hope this will help, but don’t hesitate contacting us had you faced any issues.

Hi,

I ended up trying the svn+ssh approach to connecting to the SVN server and that worked, although I needed to enter in my ssh key location manually, it would not work from the config file, but that was only a minor inconvenience.

At this point, I keep hitting a Garbage Collection issue where it runs until the Garbage Collection can no longer perform its function – I tried this 2 or 3 times in a row just to be sure – here are some results:

. . .

Fetching SVN history… Done.

Growing trees… Done.

Project origin detected: r1 este/trunk

Building branches layouts…

layout 263797 of 305396 built | 86% [|||||||| ]

CONFIGURATION FAILED

error: GC overhead limit exceeded

error: Unexpected error has occurred; please report along with the logs (‘/home/rgomes/subgit/oltp09/subgit-configure-20220910-231727.zip’)

error: to https://support.tmatesoft.com/, thank you!

80735.73s real 131488.70s user 909.66s system

$ time subgit configure --layout auto --trunk branches/ric_este_05_03_dev_br svn+ssh://xxsubver/srv/svn/este-src/este oltp09.git

SubGit version 3.3.13 (‘Bobique’) build #4426

Configuring writable Git mirror of remote Subversion repository:

Subversion repository URL : svn+ssh://xxsubver/srv/svn/este-src/este

Git repository location : oltp09.git

Detecting peg location…

Authentication realm: svn+ssh://xxsubver

Username: rgomes

Password for ‘xxsubver’ (leave blank if you are going to use private key):

Private key for ‘xxsubver’ (OpenSSH format): /home/rgomes/.ssh/id_rsa

Private key passphrase [none]:

Port number for ‘xxsubver’ [22]:

The ‘xxsubver’ server’s key fingerprint is:

33:2c:e5:16:ff:8a:ff:25:7d:0b:be:a8:24:a6:09:a2

If you want to carry on connecting just once, without adding the key to the cache, type ‘t’.If you do not trust this host, type ‘R’ to abandon the connection.

Peg location detected: r305397 este/branches/ric_este_05_03_dev_br

Fetching SVN history… Done.

Project origin detected: r1 este/trunk

Building branches layouts…

layout 263752 of 305396 built | 86% [|||||||| ]

CONFIGURATION FAILED

error: GC overhead limit exceeded

error: Unexpected error has occurred; please report along with the logs (‘/home/rgomes/subgit/oltp09/subgit-configure-20220911-152810.zip’)

error: to https://support.tmatesoft.com/, thank you!

As you can see it gets to about 86% completion before it runs out of resources each time. I did try changing the actual Garbage Collection mechanism being used, here are the “differences” from the subgit commands I am using versus the original:

diff subgit subgit_orig

162c162

< EXTRA_JVM_ARGUMENTS=“-Dsun.io.useCanonCaches=false -Djava.awt.headless=true -Djna.nosys=true -Dsvnkit.http.methods=Digest,Basic,NTLM,Negotiate -Dsvnkit.ssh.client=apache -Xmx32g -XX:+UseG1GC -XX:+UseStringDeduplication”

Hi Robert,

during the configure stage the authentication data is indeed should be entered manually since there is no configuration file at this stage and the subgit configure command is supposed to generate the configuration file instead.
As for the issue itself – it looks like there is some tricky history change happened somewhere around the r263797 that just requires that many resources for parsing and configuration so that even 32G of heap is not enough. This, probably, was also the reason why the configuration command was failing with the timeout issue, SubGit was just working too long collecting all that data so that the web server closed the connection due to timeout.
One possible way to overcome it is just to dedicate more memory to SubGit so that it has enough for the configuration. The drawback, however, is that it’s not clear how much memory exactly it would need, so it would probably make sense to give it twice as much or even more to avoid hitting this issue again.
It would also make sense to change the command itself as the longer the trunk path is the more memory is needed for the configuration in the general case. I’m not sure what the intend is, but if the trunk and all the supposed branches are in the branches directory, then it may make sense changing the command like this:

subgit configure --layout auto --trunk ric_este_05_03_dev_br svn+ssh://xxsubver/srv/svn/este-src/este/branches oltp09.git

Or, other way, the configuration may be set manually without using the layout auto option: this option allows generating the mapping configuration parsing SVN history, but it’s not mandatory and the configuration can be set manually. Here is our article that describes the mapping in detail:

TMate SubGit: Branches and tags mapping

but of course we would be glad to help with this in case you got issues creating it. We would need to know the SVN repository layout and also what branches and tags are supposed to be imported to the resulting Git repository.

Hi,

Thank you for your response. I did try changing the command as you suggested, but then ran into another error where a missing tag was not found:

$ time subgit configure --layout auto --trunk ric_este_05_03_dev_br svn+ssh://xxsubver/srv/svn/este-src/este/branches oltp09.git

SubGit version 3.3.13 (‘Bobique’) build #4426

Configuring writable Git mirror of remote Subversion repository:

Subversion repository URL : svn+ssh://xxsubver/srv/svn/este-src/este/branches

Git repository location : oltp09.git

Detecting peg location…

Authentication realm: svn+ssh://xxsubver

Username: rgomes

Password for ‘xxsubver’ (leave blank if you are going to use private key):

Private key for ‘xxsubver’ (OpenSSH format): /home/rgomes/.ssh/id_rsa

Private key passphrase [none]:

Port number for ‘xxsubver’ [22]:

The ‘xxsubver’ server’s key fingerprint is:

33:2c:e5:16:ff:8a:ff:25:7d:0b:be:a8:24:a6:09:a2

If you want to carry on connecting just once, without adding the key to the cache, type ‘t’.If you do not trust this host, type ‘R’ to abandon the connection.

Peg location detected: r305491 este/branches/ric_este_05_03_dev_br

Fetching SVN history…

r296817 fetched | 2% [ ]

CONFIGURATION FAILED

error: svn: E160013: File not found: revision 296816, path ‘/este/tags/fla_este_02_21_00_03’

error: Unexpected error has occurred; please report along with the logs (‘/home/rgomes/subgit/oltp09/subgit-configure-20220913-085753.zip’)

error: to https://support.tmatesoft.com/, thank you!

375.20s real 43.58s user 18.51s system

Hi Robert,

ok, looks like that tag is involved in history and needed for the translation, so this way to set the url and trunk won’t work. As such, the only reliable way is to stick with the manual configuration approach, I think. This can be done by first creating a configuration for the “standard” layout:

subgit configure svn+ssh://xxsubver/srv/svn/este-src/este oltp09.git

and the editing SubGit configuration file oltp09.git/subgit/config' setting correct mapping in the [svn]` section.

Hi,

I tried as you suggested by doing a brief configure, then editing the terms in the subgit/config file to indicate what sort of trunk and branches to import. Here are a few of the directives I set in the config file:

authorsFile = /home/rgomes/subgit/authors.txt

url = svn+ssh://xxsubver/srv/svn/este-src/este (this was already set by the config step)

trunk = ric_este_05_03_dev_br:refs/heads/master

branches = ric_:refs/heads/ric_

Then I tried the import, and ended up getting this error:

$ time subgit import oltp10.git

SubGit version 3.3.13 (‘Bobique’) build #4426

Translating Subversion revisions to Git commits…

IMPORT FAILED

error: svn: E210004: Malformed network data

4008.16s real 8.08s user 1.60s system

I have not seen this error before, so not sure what it relates to. Thoughts?

Thanks,

Rob

Hi,

Here is an update – after running the command several times, it appeared to finally complete:

Hi Robert,

sorry for the delay with the response.

Glad to know the import eventually worked! Actually, the error E210004: Malformed network data you mentioned earlier means that SubGit was unable to parse SVN server response. From our experience, such an issue most often is caused by network errors, which seems to be the case here, too, as subsequent restarts resolved the problem.

Please read further down, where I said:

However, I don’t see anything under the “objects” subdirectory nor under “refs/heads” as expected for GIT structuring.

It is as if the command ran but didn’t really pull anything from Subversion – I am not clear why – I did specify the following for trunk and branches:

trunk = ric_este_05_03_dev_br:refs/heads/master

branches = ric_:refs/heads/ric_

And there are a number of branches that begin with “ric_” and have many checkins, etc. – am I missing something in how this is structured to run?

Hello Robert,

looks like that part is missing in your previous message, I only saw the message that the import finally worked.

If there are no references in the Git repository after the import, then most probably the mapping configuration does not reflect the actual SVN repository layout accurately and thus SubGit is unable to find the mentioned paths in the SVN repository hence importing no actual data. As I understood from the previous messages, the configuration was set as follows:

url = svn+ssh://xxsubver/srv/svn/este-src/este

trunk = ric_este_05_03_dev_br:refs/heads/master
branches = ric_:refs/heads/ric_

but a little earlier you were trying to run configure command using the same URL and trunk set as branches/ric_este_05_03_dev_br so my impression was the ric_este_05_03_dev_br branch actually resides in the branches directory, is that correct?
If yes, then this is the cause: SubGit tries to find the trunk at the following path:

svn+ssh://xxsubver/srv/svn/este-src/este/ric_este_05_03_dev_br

but there’s no such directory in SVN, so SubGit imports nothing. As of my understanding, the mapping should be set either like this:

url = svn+ssh://xxsubver/srv/svn/este-src/este/branches

trunk = ric_este_05_03_dev_br:refs/heads/master
branches = ric_*:refs/heads/ric_*

or like this:

url = svn+ssh://xxsubver/srv/svn/este-src/este

trunk = branches/ric_este_05_03_dev_br:refs/heads/master
branches = branches/ric_*:refs/heads/ric_*

The latter variant with multi-segment paths will not create any difficulties for the import like we got for configure as the configure command works in a way that differs from import algorithms and thus both configurations are equal for import or install.

Hi again,

After fixing the branches reference, the import worked. However, I was expecting deeper history on module changes according to all the branches I specified. How can I control the DEPTH of history that is imported?

Thanks,

Rob

Hi Robert,

SubGit always imports all the history it can get, but in some occasions it just unable to trace the whole history. It usually happens when a branch was created or renamed not using the svn move facility, but in some other was, like renaming using regular operating system’s mv and then adding it to SVN – in such a case SubGit has no means to determine the relations and thus traces history till the move moment only. This can be worked around, however: suppose there is a branch branch that had a name old_branch back then and was renamed not with svn mv. In this case, to import the whole history both branches must be added to the mapping configuration as follows:

[svn]
    …
    trunk = …
    branches = branch:refs/heads/branch
    branches = old_branch:refs/heads/old_branch

Note, however, that this would work if the old_branch is not present in the repository anymore; if it does, then this branch would just be imported as any other regular SVN branch.

Hi,

I worked with the branch mappings a bit further and I believe I have finally achieved a successful import with the intended history. It took a few more trials and tuning, but I think I am there. Thank you for your support and guidance.

Rob