Add Idle property to SSHClientConnection #711

theely · 2024-11-13T12:17:36Z

theely
Nov 13, 2024

Hi there,
I would love to discuss the following idea or alternative solutions.

Idea

add an Idle property to the SSHClientConnection to indicate when a connection has gone idle

Motivation

We are using asyncssh to power a REST API for HPC resource access. To cope with high throughput regimes (> 250 requests/s) we are implementing a SSH connection pool to leverage existing connections for subsequent commands execution.

In order to maintain and mange the connections within the pool we have the necessity to evict stale connections. As of today there is no method to tell if a SSHClientConnection has gone idle.

Implementation Design

Leveraging the existing keep_alive method to set the idle flag as soon as the first keep_alive timer triggers. Receiving or sending data would reset the idle flag.

ronf · 2024-11-13T13:15:28Z

ronf
Nov 13, 2024
Maintainer

Are you looking for "idle" connections which are still perfectly healthy but just haven't been used in a while, or "stale" connections, where the connection has broken at something like the TCP level but something in the network failed to notify you about that break (like NAT state timing out)?

The existing keep alive mechanism should be enough to detect the latter, and make sure that connections get cleaned up when this happens. Also, you can potentially catch errors when trying to open a new session on a connection and retry with a new connection (or a different connection in the pool) when a failure occurs.

If you're truly looking to close connections which haven't been used in a while, why not do that at the application level? Whenever you send a new request on a connection in the pool you could cancel and restart an idle timer. If that timer goes off at any point, you could remove the idle connection from the pool and close it. I'm assuming here that requests are processed quickly enough that you wouldn't have to worry about the idle timer going off while some existing request is still being processed, but you could guard against that by checking if there are active sessions on a connection before trying to close it, and instead reset the idle timeout if it goes off with a request still in progress.

0 replies

theely · 2024-11-13T13:31:15Z

theely
Nov 13, 2024
Author

I am considering the first scenario, a healthy connection that has not been used for a while and is considered "idle".

Keeping track if a connection is idle on the application side would be possible indeed. But is it the best design?
The SSHClientConnection already has methods in place to track if the connection is being used or not (e.g. keep_alive) extending them to expose the idle state is straight forward and would reduce code duplication. Furthermore knowing if a healthy connection has not been used might enable other use-cases.

Having this logic wrapped around SSHClientConnection might be more error prone and overall a less elegant solution.

0 replies

ronf · 2024-11-14T02:55:45Z

ronf
Nov 14, 2024
Maintainer

I'm not against adding some kind of idle timeout -- there are already other timeouts such as connect_timeout and login_timeout implemented today in the AsyncSSH connection options. However, if I understand your use case, this really isn't an "idle timeout".
The client connections you have open in the connection pool would NEVER receive data while the connection remains in the pool. In order for any reads or writes to happen on it, it would have to be triggered by a new REST API request which decides it needs to pull a connection out of the pool. Once the request is finished, the connection would go back in the pool (with a reset timer) and would not get any new reads or writes on it until another REST API request comes in. Do I have that right?

As for reducing code duplication - I'm not sure that's the case. To support both this timeout and the existing keep-alive, two different timers would need to be scheduled (independent of one another), since each is reset under different conditions. So, different member variables would be needed to keep track of the new timeout, and new functions would be needed to set/clear it, called from the appropriate places.

Also, note that reads and writes of data happen at the session level, not at the connection level. If there was an idle timeout implemented, I would expect it to be at the session level, so you could have an open SSH connection with multiple sessions open in parallel and it should be possible for one of those sessions to time out without affecting others.

If I understand your use case, you're really looking for a way to clean up your connection pool, which is implemented at the application level. So, having the timeout implemented at the application level seems like the best option, especially if the conditions which reset the timeout are tied to adding and removing connections from the connection pool.

0 replies

theely · 2024-11-15T08:09:59Z

theely
Nov 15, 2024
Author

Yes your analysis is mostly correct.
The connection pool design is simpler, for each user a connection is established and shared with every REST request of such user.
A connection can be used by multiple REST request in parallel, each request has it's own channel. Anytime a connection is "taken" from the pool, the "timer" is reset.
There is no concept of releasing a connection, hence no precise way to know when a connection is no longer used. By setting a sufficiently large timeout we can hope no channel is currently used (this is the weak part of the design).

Having such information provided by the connection would make for a much more robust design. Regardless of the use-case and the connection pool design I believe that knowing if a connection is idle should be a responsibility of the connection object itself not an external entity.

For additional context here is the code that "borrows" the connection and the one that prunes the connection pool:


connections: Dict[str, SSHClientConnection] = {}


@classmethod
def prune_connection_pool(cls):
    for connection in SSHConnectionPool.connections.values():
        last_used = connection.get_extra_info("last_used")
        if time() - last_used > SSHConnectionPool.idle_timeout:
            connection.close()

    # remove closed connections
    SSHConnectionPool.connections = {
        k: conn for k, conn in SSHConnectionPool.connections.items() if not conn.is_closed()
    }

@classmethod
def get_connection(self, username: str):
    conn = None
    if username in SSHConnectionPool.connections:
        conn = SSHConnectionPool.connections[username]
        if self.conn.is_closed():
            del SSHConnectionPool.connections[username]
            conn = None
   
    # if  conn is None a new connection is established
 
   conn.set_extra_info(**{"last_used": time()})
   return conn

0 replies

ronf · 2024-11-16T01:42:53Z

ronf
Nov 16, 2024
Maintainer

Does each use of a connection involve creating a new SSH session on the connection pulled from the pool? I'm guessing it does, to allow multiple requests to be simultaneously using a connection without stepping on one another.

If that's the case, is this session closed when the request is complete, even though the connection remains open (and usable for additional requests)? If sessions are closed when a request is done with them, you should be able to track how many open sessions there are on each connection. When doing your prune check, you could skip any connections which have a non-zero session count, avoiding the risk of pruning a connection which is in use. You'd just need to add something that increments the session count when a new session is opened and decrements it when the session is closed. You could still keep an idle timeout before you pruned connections, but it would only prune those with a session count of 0. This might allow your timeout to be lower, since there'd be no risk of pruning a connection with sessions still active.

This seems like a pretty clean solution to me, with nearly all of the logic all in your connection pool class.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Idle property to SSHClientConnection #711

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Add Idle property to SSHClientConnection #711

theely Nov 13, 2024

Idea

Motivation

Implementation Design

Replies: 5 comments

ronf Nov 13, 2024 Maintainer

theely Nov 13, 2024 Author

ronf Nov 14, 2024 Maintainer

theely Nov 15, 2024 Author

ronf Nov 16, 2024 Maintainer

theely
Nov 13, 2024

ronf
Nov 13, 2024
Maintainer

theely
Nov 13, 2024
Author

ronf
Nov 14, 2024
Maintainer

theely
Nov 15, 2024
Author

ronf
Nov 16, 2024
Maintainer