Subgit is stuck

anton.kiryushkin · May 19, 2022, 2:53pm

Hello dear developers.

We have a problem with Subgit. Two problems.

Last week, we noticed that the Subgit process could silently be stuck or wrote a message like:
java.lang.OutOfMemoryError: Java heap space
We use SubGit version 3.3.11 ('Bobique') build #4408. We haven’t changed anything. Our repository hasn’t been dramatically grown. However, it is huge.
Do we have an opportunity to check if the process is alive? What was the last translated commit according to the Subgit log or status? This process is critical for us, and I want to know when it is broken as soon as possible.

Thank you.

ildar.khusainov · May 19, 2022, 3:55pm

Hello Anton,

thank you for reaching out to us on that matter!

SubGit is written in Java and as any other Java program it may suffer from the memory starvation if the amount of data it handles is too big and that is the exact reason for those OutOfMemory errors. By default, SubGit does not set memory size explicitly that means that the upper limit of memory is the default value for the given platform which not always big enough. This default value can be checked out with the following command:

 java -XX:+PrintFlagsFinal -version | grep MaxHeapSize

or on Windows:

 java -XX:+PrintFlagsFinal -version | findstr MaxHeapSize

It will print the maximum heap size in bytes; and if the value is small, it may worth setting higher memory limits by setting the Xmx JVM parameter. It can be set right in the SubGit startup script which reside in bin subdirectory in SubGit installation directory. There is a EXTRA_JVM_ARGUMENTS line in the script and the Xmx parameter can be added right in the end of that line like this:

EXTRA_JVM_ARGUMENTS="-Dsun.io.useCanonCaches=false -Djava.awt.headless=true -Djna.nosys=true -Dsvnkit.http.methods=Digest,Basic,NTLM,Negotiate -Dsvnkit.ssh.client=apache -Xmx4G"

As mentioned, the root cause of this issue usually is that SubGit tries to put all the data in memory during a revision translation and in some cases the data amount is too high. It usually does not happen when SubGit is configured to use svn, ssh, or file protocol to access the SVN repository, just because of those protocols using specifics, but often happens when the SVN repository is being accessed over http, especially with default SubGit setting – in this case SubGit tries to put all the revision data in RAM. However, it’s possible to change this behaviour by setting svn.httpSpooling configuration option in SubGit configuration file to true (default is false):

[svn]
    url = …
    …
    httpSpooling = true

Don’t forget to run subgit install <REPO_PATH> after changing the setting to apply the changes.
When it’s set like this, SubGit stores the revision data in a disk cache instead of RAM thus reducing memory consumption. It may lead to the disk space issues, on the other hand, but helps overcoming the OOMEs errors.

From the SVN side those errors may happen not only due to drastic repository grow but also due to erroneous or not correct SVN operations. A very often example is copying repository’s root instead of a particular folder – like, in a try to create a new branch one copies the repository root instead of the trunk. Such an action does not add much data in the SVN repository, but it’s a big difference during the translation to Git since SubGit is forced to copy the whole repository data to reflect that action and this, of course, can lead to those OOME errors.

As for the process checking – I’m afraid there are no embedded facilities for that, but it’s possible whit a regular operating system’s tools and features. When a new commit come to a mirrored SubGit repository or when subgit install or subgit fetch commands are invoked, SubGit starts a daemon process which is visible in the list of system processes and thus can be monitored. The latest translated commit can be found in the daemon log or, for the Git-originated commit also in the pre-receive log, but it’s also present in the daemon log, so probably it’s betted to check the daemon log all the time.

Hope the above will help.

anton.kiryushkin · May 19, 2022, 6:22pm

In our case the problem is that Subgit process don’t write anything in it logfiles:
/subgit/logs$ ls -lt | head
total 421644
-rw-r–r-- 1 user user 273186 2022-05-19 10:35 subgit-fetch.0.log
-rw-r–r-- 1 user user 70406 2022-05-19 07:39 subgit-install-20220519-073926.zip
-rw-r–r-- 1 user user 0 2022-05-19 06:53 daemon.0.log.lck
-rw-r–r-- 1 root root 2855588 2022-05-19 05:31 daemon.0.log
-rw-r–r-- 1 root root 20971551 2022-05-18 13:45 daemon.1.log
-rw-r–r-- 1 user user 20971569 2022-05-17 20:36 daemon.2.log
-rw-r–r-- 1 root root 94374 2022-05-17 08:03 subgit-install-20220517-080320.zip
-rw-r–r-- 1 root root 2804 2022-05-17 08:02 subgit-shutdown-20220517-080204.zip
-rw-r–r-- 1 user user 111995 2022-05-16 07:40 test-hook.0.log

subgit/logs$ date
Thu May 19 11:19:47 PDT 2022

However, the process is running:
user 7880 1 99 07:39 pts/3 03:53:03 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -noverify -Djava.awt.headless=true -Djna.nosys=true -Xmx16g -cp /mnt/gerrit/review_site/git/manage.git/subgit/lib/subgit-3.3.14_4433_fat.jar org.tmatesoft.translator.SubGitDaemon --svn /mnt/gerrit/review_site/git/manage.git --limit 1652971766321

The process is running but looks stuck and doing nothing.

anton.kiryushkin · May 19, 2022, 9:02pm

I also tried to update Subgit to the latest available version 3.3.14. With this version I see quite strange messages in the install logfile:

[2022-05-19 13:58:13.066][subgit-install][1] Executing daemon shutdown request, the reason is UNKNOWN.
[2022-05-19 13:58:13.069][subgit-install][1] svn: E210004: Handshake failed, data stream ended unexpectedly
org.tmatesoft.translator.util.f: svn: E210004: Handshake failed, data stream ended unexpectedly
	at org.tmatesoft.translator.util.f.c(SourceFile:109)
	at org.tmatesoft.translator.util.f.b(SourceFile:75)
	at org.tmatesoft.translator.k.aj.a(SourceFile:2243)
	at org.tmatesoft.translator.k.aj.a(SourceFile:2124)
	at org.tmatesoft.translator.k.aj.a(SourceFile:749)
	at org.tmatesoft.translator.k.aj.a(SourceFile:650)
	at org.tmatesoft.translator.k.d.g.a(SourceFile:241)
	at org.tmatesoft.translator.k.d.g.d(SourceFile:119)
	at org.tmatesoft.translator.a.O.b(SourceFile:55)
	at org.tmatesoft.translator.a.t.c(SourceFile:23)
	at org.tmatesoft.translator.b.a(SourceFile:193)
	at org.tmatesoft.translator.b.a(SourceFile:144)
	at org.tmatesoft.translator.b.a(SourceFile:54)
	at org.tmatesoft.translator.process.f.d(SourceFile:136)
	at org.tmatesoft.translator.process.f.c(SourceFile:106)
	at org.tmatesoft.translator.b.a(SourceFile:79)
	at org.tmatesoft.translator.b.a(SourceFile:70)
	at org.tmatesoft.translator.SubGit.main(SourceFile:10)
Caused by: com.syntevo.svngitkit.core.b.i: svn: E210004: Handshake failed, data stream ended unexpectedly
	at com.syntevo.svngitkit.core.b.i.a(SourceFile:29)
	at com.syntevo.svngitkit.core.c.l.a(SourceFile:890)
	at com.syntevo.svngitkit.core.c.l.a(SourceFile:649)
	at com.syntevo.svngitkit.core.c.l.a(SourceFile:601)
	at com.syntevo.svngitkit.core.c.l.a(SourceFile:249)
	at com.syntevo.svngitkit.core.c.l.a(SourceFile:189)
	at com.syntevo.svngitkit.core.c.O.c(SourceFile:43)
	at org.tmatesoft.translator.j.b.b.a(SourceFile:387)
	at org.tmatesoft.translator.j.b.b.a(SourceFile:300)
	at org.tmatesoft.translator.j.b.b.a(SourceFile:184)
	at org.tmatesoft.translator.j.F.a(SourceFile:383)
	at com.syntevo.svngitkit.core.c.O.c(SourceFile:43)
	at com.syntevo.svngitkit.core.c.O.b(SourceFile:36)
	at org.tmatesoft.translator.k.aj.a(SourceFile:2227)
	... 15 more
Caused by: org.tmatesoft.svn.core.SVNException: svn: E210004: Handshake failed, data stream ended unexpectedly
	at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:70)
	at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:57)
	at org.tmatesoft.svn.core.internal.io.svn.SVNConnection.skipLeadingGrabage(SVNConnection.java:139)
	at org.tmatesoft.svn.core.internal.io.svn.SVNConnection.handshake(SVNConnection.java:148)
	at org.tmatesoft.svn.core.internal.io.svn.SVNConnection.open(SVNConnection.java:83)
	at org.tmatesoft.svn.core.internal.io.svn.SVNRepositoryImpl.openConnection(SVNRepositoryImpl.java:1282)
	at org.tmatesoft.svn.core.internal.io.svn.SVNRepositoryImpl.hasCapability(SVNRepositoryImpl.java:1590)
	at org.tmatesoft.translator.j.b.d.b(SourceFile:1205)
	at org.tmatesoft.translator.j.b.d.a(SourceFile:1173)
	at org.tmatesoft.translator.j.b.d.d(SourceFile:1230)
	at com.syntevo.svngitkit.core.c.l.a(SourceFile:866)
	... 27 more

However, the SVN server is perfectly available. Does it is possible not to fail the process but do a re-try?

ildar.khusainov · May 20, 2022, 4:10am

Hello Anton,

one thing that caught my eyes is that the files belong to different users, user and root, probably because SubGit was started on behalf of different users at some time. This may lead to a situation when SubGit is unable to access/write files, so I would recommend stopping the daemon:

subgit shutdown /mnt/gerrit/review_site/git/manage.git

adjusting files ownership in the repository:

chown -R user /mnt/gerrit/review_site/git/manage.git

and starting the daemon again running the following command on behalf of user:

subgit install /mnt/gerrit/review_site/git/manage.git

The E210004 error conditions are not clear enough, looks like SubGit experiences issues during the connection, but the cause is not clear. May I ask you try to run the install once again after adjusting the file permissions and, if the error persists, sending us the install, daemon, and hooks logs?

anton.kiryushkin · May 20, 2022, 3:32pm

Hi Ildar. Thank you for your answer.

The traces that I had sent here yesterday disappeared after the daemon finally started.
The traces were in the install log on each svn revision. It means that I should restart the process after each commit that was transferred to SVN. However, after Subgit started, it has been working fine since midnight.

I have a question here:
How to start Subgit in this case? We have a lot of commits that should be transferred to SVN;

Since the process is working now, I am not able to restart it because it would cause some problems with our working circle.

ildar.khusainov · May 20, 2022, 4:01pm

Hi Anton,

do I understand correctly that for now all the commits have been translated to SVN, SubGit daemon have started and is running now and the mirror is working? It’s not clear to the very end how had it happened that there were many commits on the Git side (as I understood) since every commit pushed to a mirrored SubGit repository should normally start the daemon automatically, it’s strange you had to do it manually.
Anyway, regarding your question on how to start SubGit in this case – I’m afraid we don’t have an answer here, just because those errors are not supposed to happen during translation and SubGit is not designed to overcome such and error condition. The Handshake failed, data stream ended unexpectedly error definitely means something went wrong during the connection, it must be investigated and the error conditions eliminated, but I’m afraid the exception you sent is not enough for the troubleshooting, that’s why I asked for the logs hoping to get more information and find the root cause. The install logs normally are not cleared out automatically, is that any chance they still present somewhere? They may as well reside in a directory where you started subgit install command. The daemon and hooks logs are also of a particular interest here, would appreciate you sent them all, too.

anton.kiryushkin · May 20, 2022, 5:31pm

Hi Ildar.

Where could I send the logs privately? I don’t want to share it here.

ildar.khusainov · May 21, 2022, 7:34am

Hi Anton,

I’m afraid we don’t have a resource where it could be possible to upload logs privately. Could it be possible to share them using some cloud service or sending them over email if they are not too big?

anton.kiryushkin · May 23, 2022, 6:01am

Hi Ildar.

Yes, email would be fine.

ildar.khusainov · May 23, 2022, 6:02am

Hi Anton,

the files can be sent to suport@subgit.com email address then.

anton.kiryushkin · May 24, 2022, 12:14pm

I got an automatic answer from your mail server that this address does not exist.

ildar.khusainov · May 24, 2022, 12:19pm

My apologies, Anton, it looks like I sent incorrect address, the right one is support@subgit.com

anton.kiryushkin · May 24, 2022, 12:37pm

Yes, I recognise the mistake. I’ve sent our logs via email.

ildar.khusainov · May 25, 2022, 9:14am

Hello Anton,

thank you for the files, we have received them.

Judging from the logs, the install process is being interrupted by two main reasons: a file permissions error like java.nio.file.AccessDeniedException: /mnt/gerrit/review_site/git/manage.git/refs/subgit/unsynced/heads/master/56.lock and a communication problem com.syntevo.svngitkit.core.b.i: svn: E210004: Handshake failed, data stream ended unexpectedly. The first one is most probably caused by wrong files ownerships, as we discussed above, so it would worth changing the files ownerships and permissions in the repository.
As for the second issue – it looks to be coming from the SVN server side, such an error usually appears when the server closes/breaks the connection unexpectedly. However, it’s not clear from the logs why exactly the SVN server did that, so it would probably worth checking logs on the SVN side to find out the cause.
As for the daemon issue – it looks to be caused by the OOME errors. The daemon does not look to hang, it fails with out of memory error and restarts the process over and over again, but every attempt fails with the OOME. It looks that heap size has already been increased and almost all relevant SubGit configuration options have already been set, but one that allows SubGit to use Git binary to stream data that can significantly decrease memory and disk space consumption. Here is how it can be set:

[core]
    ...
    gitPath = /path/to/git/binary

As usual, subgit install must be invoked to apply the changes. I suggest applying this change and check out if it eliminates the OOMEs/GC overhead limit errors during the installation and daemon operations.

anton.kiryushkin · May 25, 2022, 11:47am

Hi Ildar.

Thank you for your answer.
I’ve added the option that you suggested in the configuration file. Will see if it helps.

However, I don’t understand one moment. Those failures that you found appear only during the installation stage after each new SVN revision. It works perfectly as soon as the Subgit finishes the installation and demonizes himself.

ildar.khusainov · May 25, 2022, 1:33pm

Hi Anton,

the errors that I found in the install logs didn’t happen during all the install attempts, some of them eventually finished with success starting the daemon; but the fact the errors arose indicates that some problems are present, at least sporadically, and they affect the install process.
As for the daemon – it does not seem to be working perfectly, the log shows it fails with OOME over and over again trying to get SVN log.

anton.kiryushkin · June 13, 2022, 2:03pm

Hi Ildar.

We increased the memory limit to 32g, and the process seems more stable. However, since we increased the limit at least once, we got the same situation as we had with a lower memory limit.

Could you please tell is there any way to understand why the process consumes so much memory? Probably, we could make a memory dump of the process and provide it to you for future investigation?

ildar.khusainov · June 14, 2022, 8:06am

Hi Anton,

I’m afraid there’s no other way but to dig into the situation and analyse the state and the data. Memory dump would be very helpful, indeed, but it’s not all we would need, all the SubGit logs are required, too, and also it would be great to have part of the SVN repository log that would include svn log for the revision SubGit goes OOM on, and, like, 100 revisions before and after this revision. If the OOME happens on a new Git commit translation, then it would be great to have similar part of Git log.

anton.kiryushkin · June 14, 2022, 11:16pm

Hi Ildar.

Just to specify, do you mean a result of git log -100 and svn log -l 100 commands?