Cannot commit: timeout waiting for pid file

Hello, one SVN repo synchronized between with Git is consistently failing with error for commit requests.

[2020-06-01 16:19:44.907][pre-commit] Launching daemon, classpath: /home/subversion/svndata/fw-sw2/subgit/lib/subgit-3.3.9_4351_fat.jar                                      
java options: -noverify -client -Djava.awt.headless=true -Djna.nosys=true                                                                                                
java command: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b07-1.el6_10.x86_64/jre/bin/java                                                                                 
launch timelimit: 1591021194907                                                                                                                                              
 repository path: /home/subversion/svndata/fw-sw2                                                                                                                            
[2020-06-01 16:19:44.907][pre-commit] Waiting for pid file '/home/subversion/svndata/fw-sw2/subgit/daemon.pid' creation.                                                     
[2020-06-01 16:19:56.408][pre-commit] Failed to launch background translation process: timeout waiting for pid file '/home/subversion/svndata/fw-sw2/subgit/daemon.pid'.     
[2020-06-01 16:19:56.408][pre-commit]   at launch_daemon (daemon.c:226)                                                                                                      
[2020-06-01 16:19:56.408][pre-commit]   at open_connection (daemon.c:359)                                                                                                    
[2020-06-01 16:19:56.408][pre-commit]   at hook_daemon_client_send_packet (daemon.c:460)                                                                                     
[2020-06-01 16:19:56.408][pre-commit]   at hook_execute (hook.c:284)                                                                                                         
[2020-06-01 16:19:56.408][pre-commit]   at internal_pre_commit (pre-commit.c:36)                                                                                             
[2020-06-01 16:19:56.408][pre-commit]                                                                                                                                        

it does not appear to be permission-related as we can create files in the subgit subfolder with both subversion user and the users who attempt the commits.
We are accessing svn with svn+ssh protocol, so I am expecting subgit pre-commit hook to be executed as the same user who is committing.

The system has been working for months and was working yesterday. We do not understand what can have possibly changed to break the commits today.

Any idea on how to troubleshoot the issue?

Is there any cleanup command that we can run to get rid of possible unclean failed command, like svn cleanup?
Thanks
Aldo

Hello Aldo,

On the received commit, SubGit starts the daemon and wait until the daemon creates a pid file with the port number the daemon listens (the daemon listens random unoccupied port). Then it connects to the daemon using this port and sends a command to it.

In your case, it looks like the daemon doesn’t manage to create the pid file within given period of time. A possible reason is that the daemon was too slow to start. To resolve it increase the daemon launch timeout in SubGit configuration file:

[daemon]

launchTimeout = 5

the default value is 5; try to increase it to, say, 60 seconds.
If that doesn’t help, please collect SubGit logs from the repository for analysis.

Thanks Ildar,
just before reading your reply, I tried a desperate subgit uninstall / subgit install. this apparently has fixed the issue.
Is it possible that, other than uninstalling/reinstalling the hooks, the command performs some other cleanups that may have helped?

I currently have inserted the timeout config line you suggested, commented out in my config file, ready to try it in case of further problems.

I’ll let you know in case.

Aldo

PS Just to be clear, the problem was appearing consistently since hours, no way to commit anything to the repo.

Well, the ‘uninstall’ actually leaves most of settings in the repository unless ‘purge’ option is used, so it’s hardly the settings to blame, it does removes the hooks and SubGit binary from the repository, maybe the issue came from that side, yet I don’t recall issues that would be caused by hooks or binaries.
Glad to know the issue is resolved! We’re always here for you to help, so don’t hesitate to contact us if any appear.

Hi Ildar,
we have seen the same issue again today on another repo on the same server. I suspect some sort of delay in the server, but our IT has not found anything out of order yet.
I tried increasing timeout at first but it did not help.
Actually I did not notice a longer time before the error: I assumed the config files were immediately effective - do I need to do anything to apply the new configuration?

Then I tried the uninstall / install sequence and, this time after a long wait, the commit succeeded.

Can you please have a deeper look into this? I am attaching the zip of the logs folder, do you need anything else?

subgit-logs.tgz (4.1 MB)

Hi Aldo,

it looks I forgot to mention that before, sorry for that, but the changes to the configuration file are not effective immediately, ‘subgit install’ should be invoked against the repository to apply changes in the configuration. Looks that you found the right way by yourself) yet I’d like to note that ‘uninstall’ is not mandatory, just ‘subgit install’ is enough to apply the settings.
So, looks that the timeout increasing helped (after it was applied) that means that daemon start is indeed takes a long time on your SubGit server. I’m not sure what is the reason for that – may be a high CPU load, for example – the machine and circumstances should be investigated further. A possible workaround from SubGit side is to increase daemon idle timeout:

[daemon]
    idleTimeout = <timeout in seconds>

The bigger number is set here, the longer daemon waits prior to exit, so that there’ll be no delay if commit comes while the daemon is still running. It also possible to set infinity there so that the daemon will not exit at all.