Discussion:
curl / libssh2 sftp write performance (with patch)
Daniel Jeliński
2018-08-25 22:55:55 UTC
Permalink
Hello all,
I took a look at curl performance when uploading data to sftp servers;
curl is internally using a fixed-size buffer, which is only refilled
when completely acknowledged by sftp write function. This creates a
performance bottleneck on high-latency links, as there's a pause in
upload every time curl is waiting for acknowledgements.

I propose a change implemented in
https://github.com/libssh2/libssh2/pull/264, where sftp write
acknowledges data as soon as it is sent to the server, and checks
server response when it becomes available, which may happen during
sftp close.

I realize that this is a breaking change - many code examples do not
even check the result of sftp close, which means that a lot of code in
the wild would probably break if the patch was accepted in its current
form.
Would a patch like that be acceptable? What needs to happen to get it accepted?
Thanks,
Daniel
_______________________________________________
libssh2-devel https://c
Daniel Stenberg
2018-08-26 08:56:27 UTC
Permalink
Post by Daniel Jeliński
I propose a change implemented in
https://github.com/libssh2/libssh2/pull/264, where sftp write acknowledges
data as soon as it is sent to the server, and checks server response when it
becomes available, which may happen during sftp close.
So will it not care for any ACKs? If you send a 10GB file and the first packet
is never acked? Maybe a limit for amount of outstanding un-acked data?
Post by Daniel Jeliński
I realize that this is a breaking change - many code examples do not even
check the result of sftp close, which means that a lot of code in the wild
would probably break if the patch was accepted in its current form.
Can we make users opt-in to this and if not, do like before?
--
/ daniel.haxx.se
Daniel Jeliński
2018-08-26 19:24:26 UTC
Permalink
Post by Daniel Stenberg
So will it not care for any ACKs? If you send a 10GB file and the first packet
is never acked? Maybe a limit for amount of outstanding un-acked data?
No, not like that; available acks are processed after every successful
send. We just don't wait for acks that are not available yet, at least
in nonblocking mode. I don't know how the code would behave in
blocking mode, I should probably check that.

Side note, I had a coding bug that caused just the behavior you
describe, and I sent 1GB file before checking for any acks. This
resulted in a dramatic slowdown when sftp_close was looking for its
ack in the unprocessed list of received packets.

Limiting the number of outstanding packets sounds reasonable; it would
protect us against evil ssh servers. However, I don't see an easy way
to figure out a number that would never limit our transfer rates. Need
to think it through.
Post by Daniel Stenberg
Can we make users opt-in to this and if not, do like before?
How would you suggest to implement that? I don't see any existing
mechanism that could be used for this; there's no libssh2_sftp_setopt,
no version argument in libssh2_sftp_init, no extra parameter to
libssh2_sftp_open that could be used for this.
I could implement this as a new set of functions duplicating the
existing functionality (like posix defines write and fwrite, I could
add libssh2_sftp_fwrite), if you think that's the right direction to
take.

_______________________________________________
libssh2-devel https://cool.haxx.se/cgi-bin/mailman/listinfo/libs
Will Cosgrove
2018-08-27 21:54:06 UTC
Permalink
Post by Daniel Jeliński
Post by Daniel Stenberg
So will it not care for any ACKs? If you send a 10GB file and the first packet
is never acked? Maybe a limit for amount of outstanding un-acked data?
No, not like that; available acks are processed after every successful
send. We just don't wait for acks that are not available yet, at least
in nonblocking mode. I don't know how the code would behave in
blocking mode, I should probably check that.
I’ve written custom pipelining code for upload/download using some of the libssh2 primitive functions. I also recommend having a max outstanding packet number, I’ve saturated lesser servers by flooding them with concurrent writes. Currently our value is 64 outstanding writes which, I think, mirrors OpenSSH.

The approach I took is slowly increasing the concurrent requests. I send an initial request and wait to see if it fails due of a write error or other transient error and then start sending concurrent requests if the first synchronous write was successful.

I then ramp the requests slowly by first sending 2 writes, drain the ready replies, then increase the outstanding requests by one and drain and so on until I reach the max of 64 outstanding requests. This gradual increase allows small files to upload without immediately doing a bunch of concurrent writes to the server, seems to work well in practice.

Hope this helps,
Will


_______________________________________________
libssh2-devel https://cool.haxx.se/cgi-bin/mailman/listinfo/lib
Peter Stuge
2018-08-28 10:08:31 UTC
Permalink
Post by Daniel Jeliński
Limiting the number of outstanding packets sounds reasonable
Can you use channel windowing in the transport protocol here? If not, why not?
Post by Daniel Jeliński
Post by Daniel Stenberg
Can we make users opt-in to this and if not, do like before?
How would you suggest to implement that? I don't see any existing
mechanism that could be used for this
Add a new API which would be used to enable the new behavior. The API
should probably not be too specific to this behavior, but should also
not be too generic.

Also consider whether a new API really belongs on SFTP level or perhaps
rather on channel level.
Post by Daniel Jeliński
I could implement this as a new set of functions duplicating the
existing functionality
No, don't do that.


//Peter
_______________________________________________
libssh2-devel https://
Daniel Jeliński
2018-08-28 11:57:28 UTC
Permalink
Post by Peter Stuge
Can you use channel windowing in the transport protocol here? If not, why not?
Channel windowing is used, and under normal circumstances should limit
the number of outstanding packets without further intervention.
However, an evil server could send window increases without sending
status responses, which would bloat our number of outstanding buffers,
as we only remove buffers from the list when we receive corresponding
status response. Is that what you're referring to?
Post by Peter Stuge
Add a new API which would be used to enable the new behavior. The API
should probably not be too specific to this behavior, but should also
not be too generic.
Something like:
typedef enum {
SFTPOPT_USESENDBUFFER = 1
} SftpOption;

libssh2_sftp_setopt(LIBSSH2_SFTP_HANDLE *handle, SftpOption option, ...);

used like:
libssh2_sftp_setopt(handle, SFTPOPT_USESENDBUFFER , 1L) to enable the
new behavior?
Post by Peter Stuge
Also consider whether a new API really belongs on SFTP level or perhaps
rather on channel level.
Wouldn't we want to enable this behavior on every file handle independently?

Daniel

_______________________________________________
libssh2-devel https://cool.haxx.se/cgi-bin/mailman/listinfo

Loading...