Replacing trilead-ssh2 with Apache sshd

Hello Flemming,
you’re awesome! Sorry for not responding for a while.
I’ve added your changes to SVNKit locally.

Now I would like to ask you to confirm explicitly that you’re willing to give up your ownership of the code you’ve written and thus to allow us to put our usual header to the Java files you’ve added or changed:

/*
 * ====================================================================
 * Copyright (c) 2004-2022 TMate Software Ltd.  All rights reserved.
 *
 * This software is licensed as described in the file COPYING, which
 * you should have received as part of this distribution.  The terms
 * are also available at http://svnkit.com/license.html.
 * If newer versions of this license are posted there, you may use a
 * newer version instead, at your option.
 * ====================================================================
 */

Once (and if) you confirm that, I’ll commit your changes to SVNKit trunk.

Also I see that you’ve raised Java requirements from 1.6 to 1.8 but it’s ok because we use the same requirements as JavaHL interface and the latest JavaHL requires 1.8.

Sure, have at it, you’re welcome to the code, if that means that we can get rid of trilead-ssh2 at some point.

Note that I have improved the code yesterday, so you need the latest version.

I also upgraded to gradle 3.0, which is the oldest version of gradle that IDEA supports, but that’s also ancient, so perhaps you can take the opportunity to upgrade to gradle 7.3.

Note that I have just added ed25519 key type support by adding net.i2p.crypto:eddsa:0.3.0

We’re currently running the apache-sshd based fork in production and it seems stable, so I hope the change makes its way into a release.

Thank you very much. I’ll apply all your changes and once you don’t have changes anymore, we’ll release a new SVNKit version.

@dren.dk We recently implemented this on our servers. We are seeing some threading issues, potentially from connections not closing properly.

This stuck thread actually prevents the server from shutting down:

"Piping consumer" #2519 prio=5 os_prio=0 tid=0x00007f6a20270800 nid=0x65c waiting on condition [0x00007f6a104e8000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.tmatesoft.svn.core.internal.io.svn.StreamLogger.lambda$new$0(StreamLogger.java:56)
        at org.tmatesoft.svn.core.internal.io.svn.StreamLogger$$Lambda$559/1479132319.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:748)

There are a few more that indicate connections not closing:

"sshd-SshClient[9ced0fc]-nio2-thread-3" #15032 daemon prio=5 os_prio=0 tid=0x000055f6be15b000 nid=0x6761 waiting on condition [0x00007f770fe70000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000781c0f7d8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

We only see these issues when using the new ssh implementation.

Thank you for the feedback, I’ll try to take a stab at it, I suspect it’s a relatively simple fix.

No problem. Let me know what you find

I have just committed what I think will fix your problem: 18a6ea050f9b4f27563f6eb27dfb66f47c727a4f

For some reason this forum doesn’t allow me to post a direct link to the commit on github???

/dren-dk/svnkit/commit/18a6ea050f9b4f27563f6eb27dfb66f47c727a4f

Please let me know if it helps.

@dren.dk Applied and tested that change and it does look like that resolves the issues

If @dren.dk agrees, I’ll include the fix into SVNKit, so it will become a part of the next release (1.10.6).

@dren.dk All good for the next svnkit release (1.10.6) ?

@dmitry.pavlenko We have a customer who is pushing hard to resolve this issue. The change has been confirmed on our side for two weeks. I realize that @dren.dk contributed this change but we are paying for support from TMate. I do not expect that this sits idle until the community blesses it for release. Please confirm when version 1.10.6 will be released with this change so that we can communicate appropriately with our customer.

Erick

Hello Erick,
We will release 1.10.6 with this fix on the next week.

Thanks Dmitry. We appreciate that!

Do you know if it will be first half or second half of the week? The reason why I am asking is that we are releasing the next version of our product on 5/21, so would really like 2-3 days to run it through our regression tests.

Thanks,
Erick

Taking that into account, we’ll release it during the first half of week, as soon as possible.
Sorry for that delay, I just had an impression that you had your the problem solved so it wasn’t so urgent.

Dmitry,

Perfect! Thanks for accommodating.

It appears to be resolved in the version we have built from source, we just want to ship the released version to the customer, and run the regression tests knowing there could be other changes in there as well.

Hello Erick,
I would like to let you know that we’ve uploaded 1.10.6 to Maven Central. Once it passes their review process, it’ll become publicly available. Meanwhile you can use our mirror of Maven Central https://maven.tmatesoft.com/ for testing purposes (the new SVNKit is already there). Later in the day we will announce 1.10.6 release on svnkit.com website.

SVNKIT-764: a work-around for the case if the channel fails to be opened (the original SshConnection#reOpen wasn’t called in that case). On my tests discarding the whole pool with SVNSSHConnector#recreateSessionPoolForTest makes things faster though.

I tried to extract SSH code to tests it alone (without SVNKit):

        final File privateKey = new File("/path/to/id_rsa");

        try (final SshClient sshClient = SshClient.setUpDefaultClient()) {
            sshClient.setIoServiceFactoryFactory(new Nio2ServiceFactoryFactory());

            final byte[] privateKeyBytes = SVNFileUtil.readFully(privateKey);
            assert privateKeyBytes != null;
            final Iterable<KeyPair> keyPairs = SecurityUtils.loadKeyPairIdentities(null, () -> "private key",
                    new ByteArrayInputStream(privateKeyBytes), FilePasswordProvider.EMPTY);
            if (!keyPairs.iterator().hasNext()) {
                throw new RuntimeException("Did not find a keypair in the provided private key string: " + privateKey);
            }
            sshClient.addPublicKeyIdentity(keyPairs.iterator().next());
            sshClient.setServerKeyVerifier((clientSession, remoteAddress, serverKey) -> true);
            sshClient.setHostConfigEntryResolver(DefaultConfigFileHostEntryResolver.INSTANCE);

            sshClient.start();

            ConnectFuture connectFuture = sshClient.connect("username", "hostname", 22);
            connectFuture.await();

            try (final ClientSession clientSession = connectFuture.getClientSession()) {
                clientSession.auth().verify(10000);

                for (int i = 0; i < 20; i++) {
                    System.out.println("i = " + i);

                    try (final ChannelExec channel = clientSession.createExecChannel("svnserve -t")) {
                        OpenFuture openFuture = channel.open();
                        openFuture.await();
                        if (openFuture.isOpened()) {
                            System.out.println("opened");
                        } else {
                            System.out.println("not opened");
                        }
                        
                        final InputStream in = new ByteArrayInputStream(new byte[0]);
                        final OutputStream out = new ByteArrayOutputStream();
                        final OutputStream err = new ByteArrayOutputStream();

                        channel.setIn(in);
                        channel.setOut(out);
                        channel.setErr(err);
                        
                        SVNFileUtil.closeFile(err);
                        SVNFileUtil.closeFile(out);
                        SVNFileUtil.closeFile(in);
                        
                        channel.close(true);
                        channel.waitFor(Collections.singletonList(ClientChannelEvent.CLOSED), Duration.ofDays(30));
                    }
                }
            }
        }

it outputs

i = 0
opened
i = 1
opened
i = 2
opened
i = 3
opened
i = 4
opened
i = 5
opened
i = 6
opened
i = 7
opened
i = 8
opened
i = 9
opened
i = 10
not opened
i = 11
not opened
i = 12
not opened
i = 13
not opened
i = 14
not opened
i = 15
not opened
i = 16
not opened
i = 17
not opened
i = 18
not opened
i = 19
not opened

This can be perfectly explained by the fact that by default in OpenSSH MaxSessions option is 10. If I understand coerrectly, this corresponds to 10 Apache SSHD channels. (And the TCP connection corresponds to ClientSession class).

The code above shows that either:

  • the counter doesn’t reset when closing the channel;
  • the code above doesn’t close the channel correctly.

In SVNKit there’s a sessions counter SshConnection#sessionCount which is incremented when the channel is opened and decremented when it is closed. But for now it seems that after several channel open/close cycles exhaust the connection and another one should be opened in the pool. Or there should be a pool of open channels and they shouldn’t be closed (though it’s suspicious from authentication point of view). Or maybe there’s a way to close the channel to decrement the counter on the server side.

Another difficulty is to detect "MaxSessions limit hit" because openFuture.getException() contains generic “open failed” error with no further details.