Reduce repository size

We’re currently evaluating SubGit, and it looks really promising so far!

One issue we stumbled upon is the repository size: After importing our SVN repository (approx. 10’000 commits), the Git repository size is around 3 GB. For comparison, the same repository imported using git svn was less than 100 MB.

Is there any way to reduce the repository size? Like this, cloning is painfully slow.

What we tried:

In a local repository, we were able to get the size down to below 100 MB using git repack:

$ du -h -d 1 .git
2.8G	.git/objects
4.0K	.git/info
232K	.git/logs
 44K	.git/hooks
228K	.git/refs
  0B	.git/branches
2.8G	.git

$ git repack -A -d -F
Counting objects: 142310, done.
Delta compression using up to 24 threads.
Compressing objects: 100% (132481/132481), done.
Writing objects: 100% (142310/142310), done.
Total 142310 (delta 94702), reused 0 (delta 0)

$ du -h -d 1 .git
 58M	.git/objects
 16K	.git/info
232K	.git/logs
 44K	.git/hooks
228K	.git/refs
  0B	.git/branches
 59M	.git

However, doing the same in the server-side repository (GitLab) did not have any effects, although git repack seemed to be successful.

Any hints are greatly appreciated.

Hi Daniel,

thank you for reaching out to us with this issue!

It looks pretty unusual, however, we have never met such an issue, Git repository generally does not grow in size during the import. It might as well be specific to the particular SVN repository or to the particular configuration used for import, so it would worth to investigate that deeper. Could it be possible to share all SubGit logs from the affected repository? It would also be great to have logs and import configuration of the ‘git svn’ attempt, if that is possible. Also, could you please advise have you ran git gc on the repository?

Thanks for your quick reply!

I attached the SubGit logs. Note that I ran subgit shutdown before doing the git repack in the repository.

Unfortunately, I don’t find any logs from the git-svn attempt. It was done using

git svn clone --stdlayout

However, I could repeat the command and send you the output, if this may be helpful.

No, we did not run git gc. git repack was the only thing we did so far, after noting the high repository size.

subgit-logs.zip (2.8 MB)

Meanwhile we were able to narrow it down to the core.bigFileThreshold setting:

$ cat 83f814f7a92e365cbd79f9addceed185761a8d38a06a2d4350bb1fe4b7632b34.git/config 
[core]
	repositoryformatversion = 0
	filemode = true
	bare = true
	bigFileThreshold = 1m
	logallrefupdates = true
	autocrlf = false
	eol = lf
	symlinks = true
[gitlab]
	fullpath = indel/inos2
[gc]
	autodetach = false
	auto = 0
[svn]
	pathnameencoding = UTF-8

If we change the 1m here to the default value of 512m and do git repack -A -d -F, the repository becomes conveniently small.

As I understand from SGT-604, this setting is set by SubGit during the configure step, so as a workaround, chances are that we could reset the value between subgit configure and subgit install:

git config --unset core.bigFileThreshold

Do you have any background info on why SubGit sets this value?
Is there an other possibility to configure this option using SubGit?

Hi Daniel,

thank you for sharing your findings!
I haven’t yet reached the point to dig in that direction)
As for the setting – yes, it’s being set by SubGit, but I’m not sure what was the reason to overwrite the defaults, will discuss it with devs.

Hi Daniel,

I’d like to inform we have just released SubGit 3.3.11 which includes fixes for this issue, so it should not appear anymore with that new version.