Checksum mismatch when fetching from SVN

Failed to fetch new revisions from SVN repository

First of all, we checked similar topics on the forum, but it’s not our case.
For some reason, we lost sync between git and svn repos. The alleged reason is an edited commit message in SVN, but we are not pretty sure because it took about a week when we noticed the absence of sync.

When we tried to fetch new revisions from SVN, we got next message:

sh-4.4$ /opt/subgit/bin/subgit fetch ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d.git
SubGit version 3.3.10 ('Bobique') build #4368Fetching revisions from SVN repository:
error: svn: E204900: Checksum mismatch in release/5.1.115.128/order/ors/src/server/FrontToBack.cpp: expected e7bbba38f18f9a714b3c0e7a22f99319 but found d00d633847e032877269645e2c9864f7
error: Checksum mismatch in release/5.1.115.128/order/ors/src/server/FrontToBack.cpp: expected e7bbba38f18f9a714b3c0e7a22f99319 but found d00d633847e032877269645e2c9864f7
error: Unexpected error has occurred; please report along with the logs ('/var/opt/gitlab/git-data/repositories/@hashed/ef/2d/subgit-fetch-20201026-153330.zip')
error:   to http://issues.tmatesoft.com/, thank you!

So we hope that you can clarify the reason for the lack of synchronization and help us avoid such things in the future!

Log/zip file:
subgit-fetch-20201026-153330.zip (4.6 KB)

Hello Nikita,

thank you for the log.

Could you please describe the alleged reason you mentioned – what was the revision, which branch and which files changed that revision and how have the commit message been changed?

Hello Ildar!

Thanks for your attention.

We had a merge of branches in GitLab with an inappropriate auto-generated commit message, which was synchronized in svn. Then, our developer manually corrected the message only in svn.

branch - trunk
revision - 160099
files - none, only commit msg
commit msg - via svn propset -r 160099 --revprop svn:log "New log message" http://example/path/to/trunk

Hello Nikita,

ok, thanks for the clarification. Initially, I got an impression that the change might have been made in some kind of a disruptive way, but changing the property is definitely not a change of that kind and should not cause such a problem. So it looks like it’s not the case and thus the “Checksum mismatch” issue requires a deeper investigation. Could you please describe what happened to the mirror and why the synchronization was lost? What exactly did you do to restore the synchronization? And which changed were made (if any) to the SVN and to the SubGit repositories in between, when the mirror was off? Also could you please share all the SubGit logs from the affected repositories – not only fetch, but all other, including old ones?

Okay, in order:

  1. No, I can’t describe what happened to the mirror, because, as I already said:

And neither I can’t say why the synchronization was lost, I hoped to find it out here.
2. We did nothing to restore it because SubGit doesn’t offer a lot of commands. Is there a way we should try?
3. There were many commits in the SVN repo, while the SubGit repo was “dead”
4. There is only one similar log file with another fetch, I’m not sure it will be helpful, but I can upload it if you think differently.

Hello Nikita,

thanks for the explanation, it makes the situation a little clearer.
SubGit has a feature that allows to rebuild a mirror in case, for example, of differences on Git and SVN sides, that is the ‘rebuild’ subcommand:

subgit install rebuild <path to SubGit repository>

It also allows to rebuild the repository from a certain revision:

subgit install --rebuild-from-revision <revision number> <path to SubGit repository>

The former command backups current repository (actually, just renames it) and translates the repository from scratch out from the SVN repository.
The latter command resets the repository state to the mentioned revision and then retranslate subsequent revisions from SVN to Git.

The rebuild definitely resolves issues like the “Checksum mismatch” we faced here, but note, that commits hashes, especially commits made in Git, may changes after the rebuild.

And, of course, just rebuilding the repository does not reveal what was the reason for the synchronization fault, so it may worth to investigate this to prevent such an issue from happening in the future. We would definitely need SubGit logs for the investigation, not the second fetch log, indeed, but other SubGit logs, like hooks and daemon logs. They reside in the SubGit repository directory, in the ‘subgit/logs’ subdirectory. It would also worth to have the ‘svn log -v’ output for the revisions made after the synchronization failed.

Hello again!

We can’t afford to rebuild the repository from scratch every time, because our project is really big and it takes like 48h to do this.

We tried to do subgit install --rebuild-from-revision <revision number> <path to SubGit repository>, but it’s also failed instantly with the same checksum mismatch error in the same file (even though this file, even this branch did not exist in <revision number>)

Archive with daemon and hooks logs: subgit-logs.zip (4.1 MB) . We will really appreciate if you’ll find something useful for our investigation there, if no, we will have to rebuild the repo from scratch.

Hello Nikita,

thank you for the logs, we’ll try to find out what happened there. May we also have the SVN repository log (svn log -v output)?

Hello.

Sorry, but for security reasons, we can’t share svn logs. What exactly you want to check there? Maybe we can check it by our-selfs.

Hello Nikita,

I’m afraid I cannot say for sure what to look for in the logs. The “Checksum mismatch” error itself means that SubGit expected to find specific hash for a file, but the hash differs from expected. It’s not clear which of the two is correct and how has the second been changed; it can be caused by SVN database corruption, for example, but of course, it’s not the only reason. It’s also hard to tell for sure where exactly (at which revision) has it been changed and thus it’s hard to tell which part of log is of particular interest and which revision to rebuild the repository from (if any). This issue is pretty complex, there are number of possible causes and it requires a thorough investigation to find the cause and that is what we were hoping to start doing having the logs.
Another possible approach is to have a meeting with our devs to check both repositories together. I’m not sure if that is acceptable from the security point of view, but I’m afraid hardly we would be able to find the cause having only SubGit logs.