Git grep performance on large mirrored repository

Several users are noticing that git grep is running much slower on a large repository synced with subgit. For instance, the same svn project mirrored with git-svn takes 5 seconds, while the subgit mirrored repository takes almost 3 minutes!

git-svn mirror without any files excluded:

> time git grep 'hithere'
real    0m5.666s
user    0m12.555s
sys     0m15.041s
> du -sh .
21G     .
> du -sh .git
4.2G    .git

subgit mirror of same svn repo with many large directories excluded:

> time git grep 'hithere'
real    2m52.859s
user    2m52.920s
sys     0m3.251s
> du -sh .
1.8G    .
> du -sh .git
684M    .git

That’s a 30x slowdown! When I do ‘git cat-file -p master^{tree}’ it looks like top level directory tree objects have the same hash. I’m at a bit of a loss. What could be causing this issue?

it seems to be the problem is the 82000 line .gitattributes file, is that strictly necessary? Can it be removed somehow?

Hi Jonah,

SubGit synchronizes SVN properties with Git and stores them in the .gitattributes file. If there are many properties set, then the file may grow big, indeed. It’s possible to stop this synchronization if needed. There are two SubGit settings that control this behavior – translate.eols and translate.otherProperties. To stop synchronize properties with Git, set both those settings to false adding the translate setting to SubGit configuration file:

[translate]
    eols = false
    otherProperties = false

and then run 'subgit install ’ command against the repository to apply those settings.
This will stop synchronization between SVN properties and .gitattributes file, so it can be safely removed after that.

Hi Ildar,

Subgit doesn’t appear to want to let me disable eols:

SubGit version 3.3.9 ('Bobique') build #4351
...
INSTALLATION FAILED

error: Unable to activate configuration file '/var/opt/gitlab/git-data/repositories/@hashed/4b/22/4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a.git/subgit/config':
error: 'translate.eols' option can't be changed on the fly.
error: Revert the changes and rerun the command, or force rebuild with '--rebuild' option.

If I only disable otherProperties will those files be removed automatically by subgit from .gitattributes or do I have to do it manually? Is there any workaround to disable eol without rebuilding?

Hi Jonah,

I forgot to mention that, indeed, my apologies for that. The ‘eols’ setting cannot be changed on the fly as it affects files content and thus commits hashes, so it can only be set prior to the initial import and can only be changed with full repository rebuild. I’m afraid there is no way to change it without rebuild.
The files themselves won’t be removed if the settings are ‘false’, the setting will just stop properties to files synchronization, but the file will still there and will be treated as a regular file.

Hi Ildar,

I’m not sure we want to disable the eol property anyway. So is there anything we can do to reduce the size of the file? Most of the lines look like this:

* text=auto !eol
path/to/file/1 -text
path/to/file/2 -text svneol=unset#text/plain
path/to/file/3 eol=lf

file/3 is caused by svn:eol-style property in svn.
file/2 is caused by svn:mime-type property in svn

the majority of the rules (~70k) are of the file/1 variety. What causes these rules to appear? Is there anything we can do on svn side to get rid of them?

Hi Jonah,

the file is being fulfilled by the information translated from svn properties, so to reduce the file either properties should not be translated (that is, translate.eols is false) or there should be less properties in SVN. So, one way is to set svn:eol-style property in SVN to “native” for a file – this will remove a line for the file from .gitattributes; otherwise, the eols translation should be switched off.

Is there no way to configure subgit to respect a different default rule

* -text

and subgit only generates modifications to the default? Both ignoring the eol property or changing it for so many files might be very disruptive

I’m afraid no, SubGit does not have such a feature at the moment.

Ok thanks for clarifying the available options!