Discussion:
Debugging SCP failure: "Timed out waiting for SCP response"
Tim Martin
2010-10-11 07:51:18 UTC
Permalink
Hi,

I am trying to transfer a file from an SSH server using libssh2. The SSH server in question is running under RouterOS (http://www.mikrotik.com) and seems to be misbehaving, but I don't know enough to narrow the problem down further.

The client can connect to the server and authenticate OK. I can also execute commands on the remote shell OK. However, when I try and call libssh2_scp_recv() I always get a null result. After the failure, libssh2_session_lasterror returns an empty string, and libssh2_session_lasterrno returns zero.

Test code and debug logs are available here:

http://gist.github.com/620189

The most noticeable log line is

[libssh2] 1.001111 Failure Event: -28 - Timed out waiting for SCP response

I believe on one occasion I got this error message returned by libssh2_session_lasterror() but I can't reproduce that. I don't know what the timeout should be, but it certainly isn't waiting very long before timing out (a fraction of a second at most).

I'd be grateful for any pointers as to how to proceed with this. I don't mind dipping into the libssh2 code, but I don't have the time to grok it at the moment. I'm fairly sure the problem ultimately lies on the RouterOS side, but I don't want to give up if there's a simple workaround.

Tim
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Uli Zappe
2010-10-11 09:49:33 UTC
Permalink
Post by Tim Martin
The client can connect to the server and authenticate OK. I can also execute commands on the remote shell OK. However, when I try and call libssh2_scp_recv() I always get a null result.
[...]
The most noticeable log line is
[libssh2] 1.001111 Failure Event: -28 - Timed out waiting for SCP response
[...]
I'd be grateful for any pointers as to how to proceed with this. I don't mind dipping into the libssh2 code, but I don't have the time to grok it at the moment. I'm fairly sure the problem ultimately lies on the RouterOS side
Could it possibly be (due to an error on the RouterOS side or whatever) that the file you're trying to receive has a size of 0 Byte? I encountered the issue that in this case libssh2 (at least libssh2-1.2.7) immediately returns but logs a timeout error. (But Daniel Stenberg could not reproduce this with the current github build.)
Post by Tim Martin
I don't know what the timeout should be, but it certainly isn't waiting very long before timing out (a fraction of a second at most).
The (fixed) timeout of libssh2 is 60 seconds. You should never get a timeout error message before this timespan, if I understand this correctly.

Bye
Uli
________________________________________________________

Uli Zappe, Solmsstraße 5, D-65189 Wiesbaden, Germany
http://www.ritual.org
Fon: +49-700-ULIZAPPE
Fax: +49-700-ZAPPEFAX
________________________________________________________

_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Tim Martin
2010-10-11 09:58:39 UTC
Permalink
Post by Tim Martin
Post by Tim Martin
The client can connect to the server and authenticate OK. I can also
execute commands on the remote shell OK. However, when I try and call
libssh2_scp_recv() I always get a null result.
Post by Tim Martin
[...]
The most noticeable log line is
[libssh2] 1.001111 Failure Event: -28 - Timed out waiting for SCP
response
Could it possibly be (due to an error on the RouterOS side or
whatever) that the file you're trying to receive has a size of 0 Byte?
I don't think so. If I do

$ scp admin-Q0ErXNX1RubBXf62G/***@public.gmane.org:firewall.rsc ./

then it successfully downloads as a 27Kb file.

Thanks for the suggestions and info on timeouts,

Tim
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Daniel Stenberg
2010-10-11 12:27:06 UTC
Permalink
Post by Tim Martin
[libssh2] 1.001111 Failure Event: -28 - Timed out waiting for SCP response
Okay, the human readable text there is not really helpful and even downright
wrong. (I'll change that in a minute.)

The -28 is however more interesting. -28 equals LIBSSH2_ERROR_SCP_PROTOCOL,
and if you check where that pair (error code + text) is return you find two
spots in src/scp.c and both returns due to an error in
_libssh2_channel_read().

I would suggest you start by checking which of those two that fails and what
the actual return code is from _libssh2_channel_read() in that case.

With that info at hand, we can dig deeper...
--
/ daniel.haxx.se
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Tim Martin
2010-10-11 12:51:58 UTC
Permalink
Post by Tim Martin
Post by Tim Martin
[libssh2] 1.001111 Failure Event: -28 - Timed out waiting for SCP
response
Okay, the human readable text there is not really helpful and even
downright
wrong. (I'll change that in a minute.)
The -28 is however more interesting. -28 equals
LIBSSH2_ERROR_SCP_PROTOCOL,
and if you check where that pair (error code + text) is return you
find two
spots in src/scp.c and both returns due to an error in
_libssh2_channel_read().
I would suggest you start by checking which of those two that fails
and what
the actual return code is from _libssh2_channel_read() in that case.
The call that is triggering the error is the one on line 385, and the return code from _libssh2_channel_read() is zero. I'll do a little more digging myself to see if I can trace any further what's happening immediately before this, but I don't konw if this makes any sense on its own?

Tim
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Daniel Stenberg
2010-10-11 13:04:27 UTC
Permalink
Post by Tim Martin
The call that is triggering the error is the one on line 385, and the return
code from _libssh2_channel_read() is zero. I'll do a little more digging
myself to see if I can trace any further what's happening immediately before
this, but I don't konw if this makes any sense on its own?
I was just reading that code and returning error due to 0 there is a bug. It
just means that _libssh2_channel_read() didn't return any payload data in that
call.

Try changing the error codition to (rc < 0 and add a

else if(rc == 0)
continue;

... and see if that makes anything different. I think that's more appropriate
and I'm working in my end to try out such a modfication.

The second place has the same flaw but is harder to fix...
--
/ daniel.haxx.se
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Daniel Stenberg
2010-10-11 13:10:41 UTC
Permalink
Post by Daniel Stenberg
The second place has the same flaw but is harder to fix...
Sorry, I confused myself. The second spot should be possible to fix the same
way, I mixed it up with another piece of code.
--
/ daniel.haxx.se
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Uli Zappe
2010-10-11 13:14:39 UTC
Permalink
Post by Daniel Stenberg
Post by Tim Martin
The call that is triggering the error is the one on line 385, and the return code from _libssh2_channel_read() is zero. I'll do a little more digging myself to see if I can trace any further what's happening immediately before this, but I don't konw if this makes any sense on its own?
I was just reading that code and returning error due to 0 there is a bug.
Hm, but isn't this exactly the issue I reported in "BUG: libssh2_channel_read() does not handle files with size = 0 correctly" where you replied that the issue does not occur anymore in current builds?

Bye
Uli
________________________________________________________

Uli Zappe, Solmsstraße 5, D-65189 Wiesbaden, Germany
http://www.ritual.org
Fon: +49-700-ULIZAPPE
Fax: +49-700-ZAPPEFAX
________________________________________________________

_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Uli Zappe
2010-10-11 13:17:49 UTC
Permalink
Post by Uli Zappe
Hm, but isn't this exactly the issue I reported in "BUG: libssh2_channel_read() does not handle files with size = 0 correctly" where you replied that the issue does not occur anymore in current builds?
Uhm, forget that - I'm afraid I mixed something up.

Bye
Uli
________________________________________________________

Uli Zappe, Solmsstraße 5, D-65189 Wiesbaden, Germany
http://www.ritual.org
Fon: +49-700-ULIZAPPE
Fax: +49-700-ZAPPEFAX
________________________________________________________

_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Tim Martin
2010-10-11 13:19:48 UTC
Permalink
Post by Tim Martin
Post by Tim Martin
The call that is triggering the error is the one on line 385, and
the return
Post by Tim Martin
code from _libssh2_channel_read() is zero. I'll do a little more
digging
Post by Tim Martin
myself to see if I can trace any further what's happening
immediately before
Post by Tim Martin
this, but I don't konw if this makes any sense on its own?
I was just reading that code and returning error due to 0 there is a bug. It
just means that _libssh2_channel_read() didn't return any payload data in that
call.
Try changing the error codition to (rc < 0 and add a
else if(rc == 0)
continue;
... and see if that makes anything different. I think that's more appropriate
and I'm working in my end to try out such a modfication.
If I do that, I get what looks like an infinite loop (I gave up after 60 seconds).

All I've been able to figure out so far is that within _libssh2_channel_read it gets a NULL pointer from the call to _libssh2_list_first on line ~1708. Changing the condition as described above just causes it to hit that line again and again, getting NULL each time. I'm assuming it's to do with the call to _libssh2_transport_read() above not getting any data.

Tim
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Daniel Stenberg
2010-10-11 13:32:09 UTC
Permalink
Post by Tim Martin
Post by Daniel Stenberg
Try changing the error codition to (rc < 0 and add a
else if(rc == 0)
continue;
If I do that, I get what looks like an infinite loop (I gave up after 60 seconds).
Hm, it shouldn't get into an infinite loop. _libssh2_channel_read() should
return EAGAIN if called again without there being any data to return.
Post by Tim Martin
All I've been able to figure out so far is that within _libssh2_channel_read
it gets a NULL pointer from the call to _libssh2_list_first on line ~1708.
Changing the condition as described above just causes it to hit that line
again and again, getting NULL each time. I'm assuming it's to do with the
call to _libssh2_transport_read() above not getting any data.
Right, that code reads data off the socket, puts packets in queues for each
channel and the _libssh2_list_first() then returns the first packet for your
channel and when that is NULL there's nothing there to take care of.

In your log you posted there's an interesting EOF just before the -28:

[libssh2] 1.001004 Transport: Packet type 96 received, length=5
[libssh2] 1.001055 Conn: EOF received for channel 0/0
[libssh2] 1.001111 Failure Event: -28 - Timed out waiting for SCP response
[libssh2] 1.001164 Conn: Freeing channel 0/0 resources

... something makes the remote end close the channel!
--
/ daniel.haxx.se
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Tim Martin
2010-10-11 14:01:09 UTC
Permalink
Post by Tim Martin
Post by Tim Martin
Post by Daniel Stenberg
Try changing the error codition to (rc < 0 and add a
else if(rc == 0)
continue;
If I do that, I get what looks like an infinite loop (I gave up
after 60
Post by Tim Martin
seconds).
Hm, it shouldn't get into an infinite loop. _libssh2_channel_read()
should
return EAGAIN if called again without there being any data to return.
Post by Tim Martin
All I've been able to figure out so far is that within
_libssh2_channel_read
Post by Tim Martin
it gets a NULL pointer from the call to _libssh2_list_first on line
~1708.
Post by Tim Martin
Changing the condition as described above just causes it to hit that
line
Post by Tim Martin
again and again, getting NULL each time. I'm assuming it's to do
with the
Post by Tim Martin
call to _libssh2_transport_read() above not getting any data.
Right, that code reads data off the socket, puts packets in queues for
each
channel and the _libssh2_list_first() then returns the first packet
for your
channel and when that is NULL there's nothing there to take care of.
Just to summarise what I've found, the chain of return codes is:

_libssh2_recv() returns EAGAIN
_libssh2_transport_read() returns LIBSSH2_ERROR_EAGAIN
_libssh2_channel_read() returns 0
Post by Tim Martin
In your log you posted there's an interesting EOF just before the
[libssh2] 1.001004 Transport: Packet type 96 received, length=5
[libssh2] 1.001055 Conn: EOF received for channel 0/0
[libssh2] 1.001111 Failure Event: -28 - Timed out waiting for SCP
response
[libssh2] 1.001164 Conn: Freeing channel 0/0 resources
... something makes the remote end close the channel!
FWIW, the backtrace to the point where it logs "EOF received" is

#0 _libssh2_packet_add (session=0x804b008, data=0x8054288 "`", datalen=5, macstate=0) at packet.c:720
#1 0x0014f3ae in fullpacket (session=0x804b008, encrypted=1) at transport.c:253
#2 0x0014fc2c in _libssh2_transport_read (session=0x804b008) at transport.c:587
#3 0x00134fda in _libssh2_channel_read (channel=0x8052fd8, stream_id=0, buf=0x804f664 "", buflen=1) at channel.c:1693
#4 0x00141df1 in scp_recv (session=0x804b008, path=0x8048e18 "firewall.rsc", sb=0xbffff38c) at scp.c:385
#5 0x00142b03 in libssh2_scp_recv (session=0x804b008, path=0x8048e18 "firewall.rsc", sb=0xbffff38c) at scp.c:777
#6 0x08048b3b in main (argc=2, argv=0xbffff4c4) at test.c:74

Are we likely to be able to get much further with this without being able to peer inside the server code? The RouterOS guys are not unhelpful, but it's closed source and since this bug has been fixed in the (yet to be released) new version, I can't imagine I'd get a lot of support from them.

If it seems to be a server error with no simple workaround I'll probably have to give up on it. At least I know where I stand.

Tim
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Daniel Stenberg
2010-10-11 14:08:55 UTC
Permalink
Post by Tim Martin
Are we likely to be able to get much further with this without being able to
peer inside the server code? The RouterOS guys are not unhelpful, but it's
closed source and since this bug has been fixed in the (yet to be released)
new version, I can't imagine I'd get a lot of support from them.
If it seems to be a server error with no simple workaround I'll probably
have to give up on it. At least I know where I stand.
Right, I agree with that. It looks like the server closes the channel at this
early point - for some reason that certainly isn't clear to me.

So no, I can't think of any easy work-around off the top of my head.
--
/ daniel.haxx.se
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Loading...