Storing binary files

I have a self-hosted GitLab server that is actively being used to build software from multiple source code repositories. There is a new team starting that would prefer to use Subversion as they are unfamiliar with Git. I’m considering using SubGit to allow them to work with a Subversion repo but mirror it as a Git repo to integrate it with GitLab’s built-in CI/CD features. My concern is that this team will be committing binary files, which will rapidly increase the size of the Git repo. I read in your support forum that LFS is not supported but is it possible to use a hook to store these files in an artifact repository such as Nexus or Maven if the assumption is that commits will only be made to the Subversion repo?

Hello!

SubGit does not support LFS, indeed; the hooks, on the other hand, are fully supported, SubGit itself uses hooks for some part of the functionality, so the Git hooks can surely be used to add some features. Honestly, I haven’t got completely what is your idea about handling those big files with hooks, could you please describe it in a little more detail?

Hi, thanks for the quick response! I want to avoid a scenario where a Git repo grows to tens of gigabytes in size and becomes unmanageable. I suppose I could exclude them from Git but I was hoping to find a place to store everything, for completeness. Since source code repositories are not the ideal place for archives or executables, I was trying to think of an alternative. For example, when a developer makes commits to Subversion, if it’s a text file it could get synchronized with a particular Git repo via SubGit but, if it’s a dll, I could use a script to store it somewhere else like in a Nexus server. I don’t anticipate anyone to commit directly to Git and need synchronize files in the other direction. Basically, I already have multiple workloads running in GitLab but now a new group of developers has joined and I’m trying to determine if there’s a way to let them use their preferred tool, Subversion, while taking advantage of the existing CI/CD infrastructure, without generating a huge 50 GB Git repo as a side effect. I think that most of the binary files in question will be Word documents, archives, etc. that aren’t related to CI/CD (and therefore I could add them to excludePath) but if the team ends up fully migrating to Git in the future, I would like all the data to be present.

Hi!

Frankly said, I’m still unsure on what exactly the script is supposed to do, but let my try to describe possible situations.
The main point about SubGit-driven mirror is that both SVN and Git sides must contain the same data exactly – SubGit does not support setups where a file is present on one side, but replaced with something else on the other – in such a situation the file or the “something else” will be synchronized so that both counterpart"sides are the same. That is why LFS would not work – once it replaces a big file in a repository with the pointer file, that pointer files being sent to SVN.
Thus if a binary file is committed to the SVN repository, then it must appear in Git repository, too – I’m afraid, SubGit has no features that would allow sending those files somewhere else and there is no way to change this with a script.
On the other hand, it is possible to set a script (say, a pre-commit hook in SVN) that would detect the big files on the fly before they are written to the SVN repository and send them to Maven replacing them with something else in repository, but the Git counterpart repository will contain those “something else” files instead of the source binaries, of course. If the intent to pick up then those binaries from Maven in CD/CI pipeline, then it is indeed a possible setup that will allow decreasing the repository size.
Hope I managed to address your concern.