Branch mapping with wildcard and excludeBranchs not working

May I again ask you to attach the output of ‘jstack’ command?

jstack -l $PROCESS_PID > threads-1.txt
# wait several seconds
jstack -l $PROCESS_PID > threads-2.txt
# wait several seconds
jstack -l $PROCESS_PID > threads-3.txt
...

You can obtain the SubGit PID ($PROCESS_ID) using ‘jps’ command. Both ‘jstack’ and ‘jps’ utilities are included into any JDK distribution.

By analyzing several threads dumps we could find out where the most of the time is spent.

Could you also provide a [maybe obfuscated] example of

Checking path "some/path" for changes within layout

I mean: an example of such path and timestamps difference between those “Checking…” messages. I’m trying to understand which of these scenarios take place:

  • the path is actually inside the layout (i.e. matches some rule) but is incorrectly classified as “not inside”;
  • all the paths are correctly classified but the paths analysis takes significant time because some SVN operations are performed (this should be fixed in 3.3.17-rc2 but there’s a chance that the fix was incorrect for some reason); in this case ‘jstack’ will show SVN operations and probabbly each “Checking…” would take significant time;
  • all the paths are correctly classified and no SVN operations are performed but the whole process is slow because the revision being processed has millions of changed paths; in this case ‘jstack’ will show no SVN operations and each “Checking…” would take little time but because of number of check the total time is significant.

Also please make sure SubGit was upgraded to 3.3.17-rc2 correctly. You can do that by looking at

GIT_REPO/subgit/lib

directory and at the log daemon.0.log.

Because of “socket output stream requested” line, I think that the second situation takes place but it depends on the moment when it appears. ‘jstack’ should give the answers.

I’d also add that there’re 2 phases. It it true that the most of the time is still spent in “preparation phase” even after upgrading to 3.3.17-rc2? The typical logs splits into phases like the following:

# Preparation phase

[2022-12-13 16:30:27.003][daemon][20] Fetching file:///tmp/test up to revision=HEAD
[2022-12-13 16:30:27.007][daemon][20] Running fetch: baseRevision=3, targetRevision=4, latestFetchedRevision=3, minimalRevision=1, firstFetch=false
[2022-12-13 16:30:27.017][daemon][20] Getting log for [4,4]; limit=8193
[2022-12-13 16:30:27.023][daemon][20] Processing log entry for r4
[2022-12-13 16:30:27.023][daemon][20] Checking path "trunk/file" for changes within layout
[2022-12-13 16:30:27.023][daemon][20] It is inside the layout

# Getting data ("fetch") phase

[2022-12-13 16:30:27.024][daemon][20] SET_PATH '' 4 not empty depth=infinity
[2022-12-13 16:30:27.025][daemon][20] SET_PATH 'trunk' 3 not empty depth=infinity
[2022-12-13 16:30:27.026][daemon][20] svn: E140000: Can't read length line from file /tmp/test/db/format
[2022-12-13 16:30:27.031][daemon][20] root
[2022-12-13 16:30:27.032][daemon][20] change dir prop svn:entry:committed-rev = 4
[2022-12-13 16:30:27.032][daemon][20] change dir prop svn:entry:committed-date = 2022-12-13T15:30:08.851729Z
[2022-12-13 16:30:27.032][daemon][20] change dir prop svn:entry:last-author = dmit10
[2022-12-13 16:30:27.032][daemon][20] change dir prop svn:entry:uuid = 33542dd2-da12-424f-87fe-28ee309a287f
[2022-12-13 16:30:27.041][daemon][20] open dir trunk
[2022-12-13 16:30:27.048][daemon][20] Attributes loading started properties manager
[2022-12-13 16:30:27.049][daemon][20] Attributes loading finished properties manager
[2022-12-13 16:30:27.050][daemon][20] Attributes query started
[2022-12-13 16:30:27.050][daemon][20] Attributes query ended
[2022-12-13 16:30:27.051][daemon][20] Attributes query started
[2022-12-13 16:30:27.051][daemon][20] Attributes query ended
[2022-12-13 16:30:27.052][daemon][20] fetching: branch = refs/svn/root/trunk, revision = 4, receivedFileCount=0
[2022-12-13 16:30:27.053][daemon][20] change dir prop svn:entry:committed-rev = 4
[2022-12-13 16:30:27.053][daemon][20] change dir prop svn:entry:committed-date = 2022-12-13T15:30:08.851729Z
[2022-12-13 16:30:27.053][daemon][20] change dir prop svn:entry:last-author = dmit10
[2022-12-13 16:30:27.053][daemon][20] change dir prop svn:entry:uuid = 33542dd2-da12-424f-87fe-28ee309a287f
[2022-12-13 16:30:27.054][daemon][20] open file trunk/file
[2022-12-13 16:30:27.054][daemon][20] Attributes query started
[2022-12-13 16:30:27.054][daemon][20] Attributes query ended
[2022-12-13 16:30:27.056][daemon][20] Attributes query started
[2022-12-13 16:30:27.056][daemon][20] Attributes query ended
[2022-12-13 16:30:27.057][daemon][20] change file prop svn:entry:committed-rev = 4
[2022-12-13 16:30:27.057][daemon][20] change file prop svn:entry:committed-date = 2022-12-13T15:30:08.851729Z
[2022-12-13 16:30:27.057][daemon][20] change file prop svn:entry:last-author = dmit10
[2022-12-13 16:30:27.057][daemon][20] change file prop svn:entry:uuid = 33542dd2-da12-424f-87fe-28ee309a287f
[2022-12-13 16:30:27.058][daemon][20] apply delta trunk/file
[2022-12-13 16:30:27.060][daemon][20] delta chunk trunk/file
[2022-12-13 16:30:27.060][daemon][20] delta end trunk/file
[2022-12-13 16:30:27.060][daemon][20] close file trunk/file
[2022-12-13 16:30:27.060][daemon][20] successfully received 'trunk/file' with size=10
[2022-12-13 16:30:27.062][daemon][20] close dir

I have attached 3 thread dumps and the obfuscated log:

threads-1.txt (68.1 KB)
threads-2.txt (51.7 KB)
threads-3.txt (51.3 KB)
unrelated-changes-log-obfuscated.log (502.8 KB)

Somehow jstack did not redirect its output directly, but I saw the thread dump output where SubGit was running, so I just copied the information and put it into a file.

The output of checking subgit/lib:

subgit/lib# ls -la
total 69180
drwxrwsr-x 2 git git       95 Dec  6 08:17 .
drwxrwsrwx 7 git git      253 Dec 16 15:30 ..
-rw-rw-r-- 1 git git      159 Dec  6 08:17 .files
-rw-rw-r-- 1 git git      300 Dec  6 08:17 .version
-rw-rw-r-- 1 git git   112848 Dec  6 08:17 libjnidispatch.so
-rw-rw-r-- 1 git git 70716067 Dec  6 08:17 subgit-3.3.17_4455_fat.jar

And the daemon.0.log also shows the following:

...
[2022-12-16 15:30:13.387][daemon][12] Sent '(version (3.3.17 4455 ))'.
...

The first successfully received log message I can see is from 2022-12-12 08:26:12.735, so after 66 hours of processing:

...
[2022-12-12 08:26:12.735][subgit-install][1] successfully received
...

Thank you in advance

Hello Patrick,
thanks for the information, it was very helpful. The problem seems to be in absolutely another place than where I expected it to be, without ‘jstack’ it would be impossible to identify it.

I would start with the fact that SubGit mostly doesn’t allow one to chage trunk/branches/tags/shelves/excludeBranches options on the fly with some rare exceptions. This is because after such a change the translated part of the SVN repository could contradict a config. For instance, imagine a situation, when I have a ‘branches/branch’ and ‘trunk’ in SVN. ‘branches/branch’ was created from ‘trunk’ and then merged back to it. In Git this would result into a merge commit. Now suppose that I’ve removed ‘branches=’ option or (equivalently in this case) added ‘excludeBranches=branches/*’ option. Then the Git commits corresponding to ‘branches/branch’ should stop to exist and this would influence ‘trunk’ via the merge commit.

To prevent such problems we decided to forbid adding or removing new branches on the fly. There’s a notable exception. Suppose you want to introduce ‘releases/*’ to your SVN structure. Then you can add ‘branches=releases/*:refs/releases/*’ rule to SubGit, run “subgit install” to apply the change and only then add ‘releases/*’ to your SVN. This is allowed.

To distinguish between both cases, “subgit install” runs a check whether it is safe to change trunk/branches/tags/excludeBranches rules or not. To do that, it scans SVN history to check whether the newly added or removed rule could be applied to a branch (trunk, or tag) ever existed in SVN.

And this check is what takes much time in your case.

There’re some work-arounds:

  1. If you know for sure yourself that none of branches ever translated to Git to the current moment match the rule you add or remove. In particular, for instance, if you add ‘excludeBranches=tags/nightly-build/2014*/*/product’, you must be sure that there’s no tag of pattern tags/nightly-build/2014*/*/product ever translated to Git to the current moment. If this is so, you can trick SubGit to suppress the check. To do that, you can edit GIT_REPO/subgit/.run/config file to add the 'excludeBranches=' there as well.

How does this work? At every moment GIT_REPO/subgit/.run/config file contain the config actually used by SubGit. Whenever you change GIT_REPO/subgit/config, it compares it to GIT_REPO/subgit/.run/config, finds changed options, checks whether these options are allowed to change (and this takes too much time in your case). If everything is ok, it applies the changes to GIT_REPO/subgit/.run/config. By applying the changes manually, you bypass the checks at a risk of creating of potentially unsafe situation.

  1. If some of the excluded tags are already translated to Git, you can find the revision number where these tags appeared for the first time. Then subtract “1” and run
subgit install --rebuild-from-revision REV_MINUS_ONE GIT_REPO

E.g. if you add an exclude for tags/nightly-build/2014*/*/product and the first tag ever appeared in SVN matching this rule was created at r12345, then you can subtract one: 12345-1=12344 and run

subgit install --rebuild-from-revision 12344 GIT_REPO

Note that unlike “subgit install”, the option --rebuild-from-revision does not run the checks of whether the config can be changed in that way but just silently applies all changes in GIT_REPO/subgit/config to GIT_REPO/subgit/.run/config. So use it with care. Also the option reverts the current state to the revision specified and re-fetches everything with the new config. But I must note that reverting the current state to the revision specified may also take much time if you have too many branches and tags with the new config (it lists all of them and reverts each of them to its older state). So I’m not sure that it would be better for you in the current situation.

Normally, one can’t change trunk/branches/tags/excludeBranches on the fly, that’s why I can only propose a work-around.