We have a three server architecture because we are unable to get access to run subgit on our organizations enterprise gitlab instance.
Gitlab EE <-> Subgit Server <-> SVN
The subgit server is hidden from users. They either commit to SVN or to Gitlab EE.
When commits are pushed to Gitlab EE, a pre-receive hook pushes the commit to the Subgit Server first. We’ve been using this setup for about 6 months and it generally works quite well.
In order to keep the Gitlab EE instance master branch up to date, we have a user-post-hook on the Subgit Server, which is only triggered in the svn->git direction by checking SVN_LAST_FETCHED_REVISION.
Sometimes, when a user pushes to Gitlab EE, the whole thing works, except that our user-post-receive hook shows SVN_LAST_FETCHED_REVISION is defined, so it thinks it’s an svn commit and tries to push the commit back to Gitlab EE.
The git client that performs the original push gets an error code and our CI pipeline fails because there is a race to rewrite the master ref e.g.
remote: error: cannot lock ref 'refs/heads/master': is at
630fd1a174e4094318a70d3def527399fa949b6f but expected
1189e52b643da166946bacb7f5b3f225469d87d2
It’s a minor issue because the end result is correct, all 3 repositories are at the correct ref. However, I’d like some help getting to the bottom of the issue as to why SVN_LAST_FETCHED_REVISION is not always a reliable indicator of the sync direction.
it looks to be some error, indeed, and would require an investigation to find out the cause. May I ask you to provide us with SubGit logs that would contain an example of the issue? That would help us a lot to find out the cause.
it turns out this latest one is something that looks similar but may not have been the same issue. I need to do some more work on my end for this case. Gitlab all by itself can be a bit flaky. However, here is one of the older examples for the issue I described above:
Since the examples are older I only have the subgit pre-receive-hook.0.log, the others appear to get flushed. It appears to succeed and subgit is showing the sync in the git->svn direction:
[2020-08-18 15:14:21.376][pre-receive] Received '(message (29: a8e1c6b => r922062 mainline ))'.
[2020-08-18 15:14:21.376][pre-receive] Received '(message (27:Sync completed successfully ))'.
yet somehow the user-post-receive hook is going down the svn->git path.
I’d like to revive this thread and send over some logs. the issue was intermittent and didn’t come up for a while. but (un)luckily we have seen this issue with almost every commit today.
I have received the logs over email, thanks for that. We will investigate it and come back as we find something. I suggest not to stick with email, however, and continue with the communication here.
we reviewed the logs and found that the r933037 translation went well and finished successfully, so the push that was made to GitLab should had finished correctly, too. Judging from the timestamps, the post-receives-hook was triggered not during the push operation, but a minute later during the regular SVN changes check, so even if the hook pushed changes back to GitLab, it should not block the original push.
And basically the hook should push nothing back to GitLab even if the SVN_LAST_FETCHED_REVISION is set, just because the SVN synchronization delivered no new commits to the SubGit repository, so Git should not push anything back to GitLab just because it should be in sync already since the latest commit came from GitLab. Сould you please advise what happens in the GitLab repository, are any new commits come in such situations, and if anything was pushed for the r933037? Also, regarding this line in the hook script:
Without this line sometimes gitlab doesn’t pick up the changes in the UI until you peform a rake task on the cache. It’s just an issue with updating repositories from the server side with gitlab in general.
My expectation is that the post-receive-hook should be running after the git-push direction as well. It does most of the time, but whenever I see this problem it is conspicuously skipped.
Our jenkins job that merges the change gets an error indicating that subgit rejected the commit even though it went through. We have a prereceive hook on the gitlab server based on https://gitlab.rim.net/qnx/devops/gitlab-subgit-docker/-/blob/master/pre-receive.hook which unambiguously reports that the failure is due to subgit rejecting the commit upstream (even though it went through).
Something is silently failing, otherwise how to explain that the user-post-receive hook is not executing only sometimes? That failure is eventually getting propagated to the client, despite not seeming to show up in the subgit logs.
thank you for the explanation. It appears a bit strange for me that GitLab may require such a workaround to get the UI updated, never meet that before. But looks it should not affect the workflow.
The user-post-receive hooks should be triggered during the push, that’s right. Judging from the logs, it hasn’t been triggered at some occasions, indeed, I will consult with our dev team what the reason might be.
I’m afraid we cannot check the URL you mentioned, the gitlab.rim.net appears to be a private name that is not being resolved by public DNS, but as of my understanding, the main problem is that the jenkins job is failing? For instance, when the 5fed0a54f commit has been pushed, the Jenkins job failed despite it’s been translated into the r933037 successfully, is that correct? If yes, could you please share the error reports where hook/jenkins state that subgit rejected the commit? Also, could you please confirm are there any extra commits in the GitLab repository?