Very Slow initial translation

Hi,
I am in the process of moving our SVN and git repo to a new server and getting stuck on the initial translation.
This is to change from a virtual machine to a physical machine, and make use of a newer version of Operating System and SVN.

The old system was Centos 6 with SVN 1.6 and subgit-3.3.5
The new system is Centos 8 with SVN 1.10 and subgit-3.3.9

I have done a svn dump and load onto the new system, and no one is currently using or committing to the new repos.
I am trying to translate one repo with 208452 revisions and about 155 branches

To begin with the translation was fairly quick, and had done about 50% or so in about a day.
It has been about 4 or 5 days, and the translation has seemingly stopped around 84%
For some strange reason after a while of this running the translation slows to a crawl after a day or certain number of revision.
I’ve noticed running top we have the process using 100% cpu, but not much read or writing to disk happening. We are translating a local repo.

Here is a copy of the config I am using, copied from the old server (I have already confirmed some of the details against https://subgit.com/documentation/faq.html):
[core]
shared = false
logs = subgit/logs
authorsFile = conf/authors.txt
defaultDomain = lmax.com
gitPath = /usr/bin/git
logLevel = finer
[svn]
triggerGitGC = 1000
useGlueFetch = true
[translate]
eols = false
[git “buck-all”]
translationRoot = buck-all
repository = /var/www/git/tfx/buck-all.git
pathEncoding = UTF-8
trunk = trunk:refs/heads/master
branches = branches/:refs/heads/
shelves = shelves/:refs/shelves/
tags = tags/:refs/tags/
keepGitCommitTime = false
[daemon]
pidFile = subgit/daemon.pid
idleTimeout = 0
classpath = subgit/lib

If there is anything that you can suggest or help?
I do have a subgit.key if you need the details.

Thanks and Kind regards.
Dom

Hi Dom,

from the description it looks like an issue in the SVN repository history: it happens sometimes that somebody created a new branch, for example, not by copying ‘trunk’ or other directory, but by copying the whole root directory and even if that erroneous branch was removed in the next revision, the action of copying the full repository still there in the history. And as a result, when SubGit meets such a revision, it starts downloading the whole repository as the revisions data and that may be extremely big, of course.
The pattern of issue you described looks consistent with that – SubGit was importing regular revisions and that was fast, but then it hit the erroneous revision and now busy downloading that revisions data.

A workaround for such cases is to avoid translating such revisions. It can be done either by starting the import from a revision higher than the erroneous one (this can be done by setting svn.minimalRevision) or by excluding the erroneous branch (that can be done by svn.excludeBranches). Also, SubGit usually recognises such revisions in history and adds appropriate setting to mapping configuration, but it can only do it if ‘subgit configure’ command was called with ‘layout auto’ option – in this case SubGit connects to SVN repository and scans its history to find and workaround issues in SVN history.

Hi,

Wanted to get a bit more help, as I am continuing to have issues.

From what I believe I’m seeing is Subgit is working hard trying to mirror a single branch, and this has been on going for days (4 days on this one branch or revision - about 7 to 8 days in total).

This is some information about the revision around where we hare experiencing difficulties.

This one was a botched branch - managed to branch into another branch… gets deleted two revisions later, so could probably be skipped

$ svn log -c 181380 https://svn.co.tradefair/repos/tfx/


r181380 | wisa | 2018-10-01 10:17:27 +0100 (Mon, 01 Oct 2018) | 1 line

#5554 Creating end of iteration branch for iteration it245 from revision 181362 of trunk.


$ svn log -c 181382 https://svn.co.tradefair/repos/tfx/


r181382 | byattj | 2018-10-01 11:11:18 +0100 (Mon, 01 Oct 2018) | 2 lines

[James] #6789 Remove broken it245 branch

We really would need to try and keep all the branches in our svn repo as far as possible.

Are you able to explain what it is that SubGit is doing as we are seeing many events in the install log

[2020-06-07 16:48:20.862][subgit-install][1] close file branches/it245/trunk/lib/npm-cache/_cacache/content-v2/sha512/3b/19/309c32902c8bb29ce197292c4a630775aea40e0de4c0a1055ce305cc9835b6660d840d60df91e36b9ede2b971f9142894c6a55c7787d9eb77f13ebced73d

[2020-06-07 16:48:20.862][subgit-install][1] successfully received ‘branches/it245/trunk/lib/npm-cache/_cacache/content-v2/sha512/3b/19/309c32902c8bb29ce197292c4a630775aea40e0de4c0a1055ce305cc9835b6660d840d60df91e36b9ede2b971f9142894c6a55c7787d9eb77f13ebced73d’
with size=10995

[2020-06-07 16:48:20.862][subgit-install][1] close dir

[2020-06-07 16:48:20.862][subgit-install][1] Attributes query started

[2020-06-07 16:48:21.039][subgit-install][1] Attributes query ended

[2020-06-07 16:48:21.039][subgit-install][1] Attributes query started

[2020-06-07 16:48:21.299][subgit-install][1] Attributes query ended

[2020-06-07 16:48:21.299][subgit-install][1] Attributes query started

[2020-06-07 16:48:21.559][subgit-install][1] Attributes query ended

What are these ‘Attributes query’ ?

Does subgit have a way to query what it is doing or to get more information of the subgit process from an open JMX port?

Do you have any suggestions on what we could do? I’ve also attached a copy of a section of the logs

As this is our migration of our SVN repo to a new server, people are still using and committing to the old svn repo, will subgit mirror or translate the changes to the repo if we were to load into the new SVN repo the incremental changes, or will
I have to run subgit install again, and will it continue from where it last recognises?

Would it only update the mirror if we ran svnsync?

Just some details again of our setup:

Old SVN

Centos 6 Kernel 2.6.32-642.3.1.el6.x86_64

SVN 1.6.11

java-1.8.0-openjdk-headless-1.8.0

New SVN

Centos 8 Kernel 4.18.0-147.3.1.el8_1.x86_64

SVN 1.10.2

openjdk-1.8.0.232

Thanks and Kind regards,

Dom

subgit-install.log.gz (22.9 KB)

Hi Dom,

it’s perfectly possible that SubGit may stuck translating a single revision for a long time, especially in case of erroneous revisions that I described. It’s not clear, however, what was happening in the revisions you suspect, may we have ‘svn log -v’ those suspicious revisions? or, if possible, the full log?

Similarly, the install log portion provides not that much information for analysis; actually, it only contains about 5 minutes of translation and there’s mostly normal translation process. There are several exceptions, though, that mentions “non-existent child node” which most probably indicate some problems in the SVN repository, but they don’t affect the performance. To be able to investigate the issue we would need all the install logs, could it be possible to provide us with the logs?

Actually, SubGit logs describe in detail what SubGit is doing, so they are supposed to be a primary source of information. The logs can be set event more detailed, there’s setting core.logLevel, that can be set to info (default), finer, or finest. With the latter SubGit virtually logs network exchange, so logs are very detailed.

I’m a bit confused by that ‘old’ and ‘new’ SVN servers stuff, what is the one SubGit is working with? If it works with the new one, how was that new SVN repository migrated from the old one? And how do you suppose to load latest changes from the old to the new SVN server?
If you intend to use svnsync to get data from the old repository to the new, than it should work: I mean, if the mirror is established between Git and the new SVN repo and changes from the old repo is being loaded to the new with svnsync, then they should go to Git as well, as svnsync actually replays revisions and this will looks for SubGit just as a regular revisions which will be sent to Git in a regular way.

Thanks Ildar,

I have attempted to attached a compress version of the logs, please let me know if you are able to access. https://files.lmax.com/m106ik

Let’s forget about old and new SVN servers - I guess for this it really isn’t pertinent.

If the logs do not provide the details, I will change the loglevel to be set higher.

Thanks

Hi Dom,

thank you for the SubGit logs; but I’ve got question about SVN logs, could it be possible to provide us ‘svn log -v’ output, at least for the suspect revisions range? It would be very helpful for the investigation.

Thanks for getting back to me so quickly, hopefully this helps?

Sure, it is helpful, but it is most helpful when we have SVN logs to understand the initial structure and the history. Would it be possible to provide us the ‘svn log -v’ output?

Perhaps I wasn’t clear - I have instead attached the logs as a file instead.

svnlog.txt (3.61 KB)

Hi Dom,

thank you for the logs!

It looks that you was right, SubGit spends most of the time for those attributes queries. It’s strange enough, however, since you have set translate.eols to false, so we would need to investigate this further. Could you please check out if there is an .gitattributes file in the Git repository? Also, could you please try to collect dump with jstack (jstack -l PID) at the time those Attributes query… operations ongoing? Those operations seems to be the most time-consuming, so at least one should succeed if you run jstack a few times.

Hi Ildar,

There does not appear to have any .gitattributes file present in the directory.

I have attached a zip copy of several jstack, taken during the Attribute query.

Thanks.

jstack.tar.gz (2.99 KB)

Thank you Dom, we’re checking the dumps.

Hi Dom,

it turned out that all those attributes queries relate to svn:ignore properties and most of the time spent on import is taken by that. It’s not completely clear why the revision 181380 introduced that high number of ignores, probably, it’s because the properties were set previously in the /buck-all/trunk directory and it seems there were plenty of them.
I must note that this is the first time we see attributes translation takes so much time. We will be investigating this on our side and we’ll try to rework the code that handle attributes and ignores in particular, but it will definitely take some time.
As a workaround I would suggest switching ignores translating off by the translate.ignores setting:

[translate]
   ignores = false

This should speed up the translation several times. The drawback is that ignores will not be translated to .gitignore.

Thank you for finding this.

I will check if this is something that will cause issue for us by setting the translate option, I will also confirm if this does work around the issue for us.

Thanks Ildar,

That option has worked for us and the translation finished very quickly afterwards.

Thanks, and Kind Regards.

Dom

Hi Dom,
glad to hear it worked, thanks for letting us know!