Configure --layout auto takes forever and then runs out of heap space without generating a config

subgit configure --layout auto http://**** ****
SubGit version 3.3.3 (‘Bobique’) build #3877

Configuring writable Git mirror of remote Subversion repository:
Subversion repository URL : ****
Git repository location : ****

Peg location detected: r653809 trunk
Fetching SVN history… Done.
Growing trees… Done.
Project origin detected: r1 trunk
Building branches layouts…
layout 387502 of 653808 built | 59% [||||| ]
CONFIGURATION FAILED

error: Java heap space
error: Unexpected error has occurred; please report along with the logs (‘D:***\subgit-configure-20180612-020203.zip’)

Hello Carson

the memory usage depends on SVN repository layout and history, so I think the best solution in this case, would be heap size increasing. To get this done, edit SubGit launching script (\bin\subgit.bat) and add ‘-Xmx1024m’ to ‘EXTRA_JVM_ARGUMENTS’ line:

set EXTRA_JVM_ARGUMENTS=-Djava.util.logging.config.file="%BASEDIR%\conf\logging.properties" -Dsun.io.useCanonCaches=false -Djava.awt.headless=true -Djna.nosys=true -Dsvnkit.http.methods=Digest,Basic,NTLM,Negotiate -Xmx1024m

It would also worth to add ‘–trunk PATH’ option to the ‘configure’ command to set trunk path explicitly.

It’s also possible to set mapping configuration manually. To get this done, run ‘configure’ without ‘layout auto’:

subgit configure --svn-url http://{redacted} <REPO_PATH>

Then edit <REPO_PATH>\subgit\config and set ‘trunk’, ‘branches’ and ‘tags’ options to reflect the SVN repository layout; let me know if you’d need any help with the mapping configuration creating.

Hello again, Carson

I felt that I probably didn’t emphasize enough some of my points in the answer, so I’d like to do it now: absence of the “–trunk” option in the command may lead to the issues like this, so I’d recommend to add it as the first step to resolve the issue. And the second point I’d like to emphasize is that the auto-configuration is not the only way to create the config, it can be done manually, too. If you don’t intend to investigate ‘configure’ problem and want just to proceed with import/mirror, just let me know your SVN repository layout and which branches/tags from SVN you intend to get in Git, I will develop a suitable configuration for you.

My two cents:

I’m trying a subgit configure on a similar repo (400000+ commits, with --trunk specified, I might add), and I’m seeing the same thing.

To me, giving subgit more memory (I needed 12GB to get past the “Building branches layouts…” stage) did help.

With less memory, I could see in Java Mission Control, that it would start to trash the GC. Eventually it would throw an OOME, or a “GC overhead limit exceeded” Exception.

Hello Joachim

thank you for reporting the issue! Could it be possible to provide us ‘configure’ and import/install logs from that repository?

Hi Ildar,

I would provide the logs, but the process is not yet finished.
It’s been “Generating SVN to Git mapping…” for two days now. The repository contains well over 100 projects, that each have 1…400+ branches and tags. I can see this takes a while. :-)

I’ll get back to you, and file an issue if it fails.

As an update, the configuration eventually did complete after specifying a memory limit.

@carsonlee.blizzard Any indication as to how long this took?
Mine is still “Generating SVN to Git mapping…”, almost a week now (but I did suspend my machine during the weekend).

Hello Joachim

there’s another way but logs to get information on what’s happening and why the process takes that long time – collecting thread dumps. Could you run:

 jstack -l PID > threads-X.txt

several times, e.g. once per second thus producing several thread dumps. The thread dumps may help if SubGit is spending the most of the time on unexpected activities.
You can get process id PID using ‘jps’ command. Both ‘jstack’ and ‘jps’ command are included into JVM.

I’m sorry, I gave up for now.
I could see in JProfiler that SubGit was performing a lot of IO during mapping, but only took 10% CPU or so.
I’ll try again in a couple of weeks using a ramdisk. That might make things a bit quicker.

Any other suggestions are welcome.

Hello Joachim

we are going to release the next SubGit version 3.3.4 that contains some changes in this algorithm, so my suggestion is to try this version, I will inform you as soon as it is released.

Hi,

I’m trying again, using SubGit 3.3.4. It still takes a long time, but I can see CPU usage is anywhere between 30% and 100% consistently and memory usage is down significantly. I’ll let this run for a while and update this issue.

The OOME should be fixed at r3726 of SGK trunk.