diff mbox

PROBLEM: network data corruption (bisected to e5a4b0bb803b)

Message ID 20160724190237.GP2356@ZenIV.linux.org.uk
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Al Viro July 24, 2016, 7:02 p.m. UTC
On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:

> > The symptom is that downloaded files (http, ftp, and probably other
> > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > random locations. Only downloads that sustain a high speed for at least a
> > few seconds are corrupted. Anything small enough to be received in less
> > than about 5 seconds is not affected.

Can that sucker be reproduced with netcat?  That would eliminate all issues
with multi-iovec recvmsg(2), narrowing the things down quite bit.

Another thing (and if that works, it's *NOT* a proper fix - it would be
papering over the problem, but at least it would show where to look for
it) - try (on top of mainline) the following delta:

Comments

Alan Curry July 26, 2016, 4:57 a.m. UTC | #1
Al Viro wrote:
> On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:
> 
> > > The symptom is that downloaded files (http, ftp, and probably other
> > > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > > random locations. Only downloads that sustain a high speed for at least a
> > > few seconds are corrupted. Anything small enough to be received in less
> > > than about 5 seconds is not affected.
> 
> Can that sucker be reproduced with netcat?  That would eliminate all issues
> with multi-iovec recvmsg(2), narrowing the things down quite bit.

netcat seems to be immune. Comparing strace results, I didn't see any
recvmsg() calls in the other programs that have had the problem, but there
is an interesting difference: netcat calls select() to wait for the socket
to be ready for reading, where my other test programs just call read() and
let it block until ready.

So I wrote a small test program to isolate that difference. It downloads
a file using only read() and write() and a hardcoded HTTP request. It has
a select mode (main loop alternates read() and select() on the TCP socket)
and a noselect mode (main loop just read()s the TCP socket).

The program is included at the bottom of this message.

I ran it several times in both modes and got corruption if and only if the
noselect mode was used.

> 
> Another thing (and if that works, it's *NOT* a proper fix - it would be
> papering over the problem, but at least it would show where to look for
> it) - try (on top of mainline) the following delta:
> 
> diff --git a/net/core/datagram.c b/net/core/datagram.c

Will try that patch soon. Meanwhile, here's my test:

/* Demonstration program "dlbug".
   Usage: dlbug select > outfile
          or
          dlbug noselect > outfile
   outfile will contain the full HTTP response. Edit out the HTTP headers
   and what's left should be a valid gzip if the download worked. */

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <sys/select.h>

int main(int argc, char **argv)
{
  const char *request =
    "GET /debian/dists/stable/main/Contents-amd64.gz HTTP/1.0\r\n"
    "Host: ftp.us.debian.org\r\n"
    "\r\n";
  ssize_t request_len = strlen(request), w, r, copied;
  struct addrinfo hints, *host;
  int sock, err, doselect;
  char buf[10240];

  if(argc!=2 || (!strcmp(argv[1], "select") && !strcmp(argv[1], "noselect"))) {
    fprintf(stderr, "Usage: %s {select|noselect}\n", argv[0]);
    return 1;
  }

  doselect = !strcmp(argv[1], "select");

  memset(&hints, 0, sizeof hints);
  hints.ai_family = AF_INET;
  hints.ai_socktype = SOCK_STREAM;

  err = getaddrinfo("ftp.us.debian.org", 0, &hints, &host);
  if(err) {
    fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(err));
    return 1;
  }

  sock = socket(host->ai_family, host->ai_socktype, host->ai_protocol);
  if(sock < 0) {
    perror("socket");
    return 1;
  }

  ((struct sockaddr_in *)host->ai_addr)->sin_port = htons(80);

  if(connect(sock, host->ai_addr, host->ai_addrlen) < 0) {
    perror("connect");
    return 1;
  }

  while(request_len) {
    w = write(sock, request, request_len);
    if(w < 0) {
      perror("write to socket");
      return 1;
    }
    request += w;
    request_len -= w;
  }

  while((r = read(sock, buf, sizeof buf))) {
    if(r < 0) {
      perror("read from socket");
      return 1;
    }

    copied = 0;
    while(copied < r) {
      w = write(1, buf+copied, r-copied);
      if(w < 0) {
        perror("write to stdout");
        return 1;
      }
      copied += w;
    }

    if(doselect) {
      fd_set rfds;
      FD_ZERO(&rfds);
      FD_SET(sock, &rfds);
      select(sock+1, &rfds, 0, 0, 0);
    }
  }

  return 0;
}
Christian Lamparter July 26, 2016, 1:59 p.m. UTC | #2
On Tuesday, July 26, 2016 4:57:03 AM CEST Alan Curry wrote:
> Al Viro wrote:
> > On Sun, Jul 24, 2016 at 07:45:13PM +0200, Christian Lamparter wrote:
> > 
> > > > The symptom is that downloaded files (http, ftp, and probably other
> > > > protocols) have small corrupted segments (about 1-2 kilobytes long) in
> > > > random locations. Only downloads that sustain a high speed for at least a
> > > > few seconds are corrupted. Anything small enough to be received in less
> > > > than about 5 seconds is not affected.
> > 
> > Can that sucker be reproduced with netcat?  That would eliminate all issues
> > with multi-iovec recvmsg(2), narrowing the things down quite bit.
> 
> netcat seems to be immune. Comparing strace results, I didn't see any
> recvmsg() calls in the other programs that have had the problem, but there
> is an interesting difference: netcat calls select() to wait for the socket
> to be ready for reading, where my other test programs just call read() and
> let it block until ready.
> 
> So I wrote a small test program to isolate that difference. It downloads
> a file using only read() and write() and a hardcoded HTTP request. It has
> a select mode (main loop alternates read() and select() on the TCP socket)
> and a noselect mode (main loop just read()s the TCP socket).
> 
> The program is included at the bottom of this message.
> 
> I ran it several times in both modes and got corruption if and only if the
> noselect mode was used.
> 
> > 
> > Another thing (and if that works, it's *NOT* a proper fix - it would be
> > papering over the problem, but at least it would show where to look for
> > it) - try (on top of mainline) the following delta:
> > 
> > diff --git a/net/core/datagram.c b/net/core/datagram.c
> 
> Will try that patch soon. Meanwhile, here's my test:
> 
> /* Demonstration program "dlbug".
>    Usage: dlbug select > outfile
>           or
>           dlbug noselect > outfile
>    outfile will contain the full HTTP response. Edit out the HTTP headers
>    and what's left should be a valid gzip if the download worked. */
> [...]
Thanks, I gave the program a try with my WNDA3100 and a WN821N v2 devices.
I did not see any corruptions in any of the tests though. Can you tell me
something about your wireless network too? I would like to know what router
and firmware are you using? Also important: what's your wireless configuration?
(WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)

Probably the quickest and easiest way to get that information is by running
the following commands as root, when you are connected to your wifi network
and post the results:
# iw dev wlan0 link
# iw dev wlan0 scan dump

(You can of course remove your device's MACs, but please do it consistently).

Regards,
Christian
alexmcwhirter@triadic.us July 26, 2016, 6:15 p.m. UTC | #3
On 2016-07-26 09:59, Christian Lamparter wrote:
> Thanks, I gave the program a try with my WNDA3100 and a WN821N v2 
> devices.
> I did not see any corruptions in any of the tests though. Can you tell 
> me
> something about your wireless network too? I would like to know what 
> router
> and firmware are you using? Also important: what's your wireless 
> configuration?
> (WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)
> 
> Probably the quickest and easiest way to get that information is by 
> running
> the following commands as root, when you are connected to your wifi 
> network
> and post the results:
> # iw dev wlan0 link
> # iw dev wlan0 scan dump
> 
> (You can of course remove your device's MACs, but please do it 
> consistently).
> 
> Regards,
> Christian

I wonder if this is something that is only affecting certain NIC's? For 
example, it see this issue on sun4u boxes with tigon3 and hme 
interfaces. But on my sun4v boxes that have e1000 interfaces everything 
works fine.
Alan Curry July 27, 2016, 1:14 a.m. UTC | #4
Christian Lamparter wrote:
> Thanks, I gave the program a try with my WNDA3100 and a WN821N v2 devices.
> I did not see any corruptions in any of the tests though. Can you tell me
> something about your wireless network too? I would like to know what router
> and firmware are you using? Also important: what's your wireless configuration?

The router/access-point is a Comcast-issued Technicolor cable modem, model
TC8305C. The only thing I can find on it that looks like it might identify
a firmware version is this:

    System Software Version

    eMTA & DOCSIS Software Version: 01.E6.01.22.25
    Packet Cable: 2.0

I assume Comcast pushes firmware updates whenever they feel like it.

There is possibly another clue. I get this message from the kernel sometimes:

    ieee80211 phy0: invalid plcp cck rate (0).

I had this message appearing long before the data corruption bug started.
It never correlated with any actual problems, so I turned down the priority
level of the message to get it off the console, and forgot about it. I
was unable to discover what a "plcp" or "cck" is so the message means
nothing to me.

> (WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)
> 
> Probably the quickest and easiest way to get that information is by running
> the following commands as root, when you are connected to your wifi network
> and post the results:
> # iw dev wlan0 link
> # iw dev wlan0 scan dump

Connected to cc:03:fa:bf:e9:ea (on wlan0)
	SSID: HOME-E9EA
	freq: 2462
	RX: 20726719 bytes (106483 packets)
	TX: 5902478 bytes (44707 packets)
	signal: -43 dBm
	tx bitrate: 54.0 MBit/s

	bss flags:	short-slot-time
	dtim period:	1
	beacon int:	100

BSS cc:03:fa:bf:e9:ea(on wlan0) -- associated
	TSF: 236407205748 usec (2d, 17:40:07)
	freq: 2462
	beacon interval: 100 TUs
	capability: ESS Privacy ShortSlotTime (0x0411)
	signal: -33.00 dBm
	last seen: 634452 ms ago
	Information elements from Probe Response frame:
	SSID: HOME-E9EA
	Supported rates: 1.0* 2.0* 5.5* 11.0* 18.0 24.0* 36.0 54.0 
	DS Parameter set: channel 11
	ERP: <no flags>
	ERP D4.0: <no flags>
	RSN:	 * Version: 1
		 * Group cipher: TKIP
		 * Pairwise ciphers: CCMP TKIP
		 * Authentication suites: PSK
		 * Capabilities: 16-PTKSA-RC 1-GTKSA-RC (0x000c)
	Extended supported rates: 6.0* 9.0 12.0* 48.0 
	HT capabilities:
		Capabilities: 0x18bd
			RX LDPC
			HT20
			SM Power Save disabled
			RX Greenfield
			RX HT20 SGI
			TX STBC
			No RX STBC
			Max AMSDU length: 7935 bytes
			DSSS/CCK HT40
		Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
		Minimum RX AMPDU time spacing: 8 usec (0x06)
		HT RX MCS rate indexes supported: 0-23
		HT TX MCS rate indexes are undefined
	HT operation:
		 * primary channel: 11
		 * secondary channel offset: no secondary
		 * STA channel width: 20 MHz
		 * RIFS: 1
		 * HT protection: nonmember
		 * non-GF present: 1
		 * OBSS non-GF present: 1
		 * dual beacon: 0
		 * dual CTS protection: 0
		 * STBC beacon: 0
		 * L-SIG TXOP Prot: 0
		 * PCO active: 0
		 * PCO phase: 0
	WPS:	 * Version: 1.0
		 * Wi-Fi Protected Setup State: 2 (Configured)
		 * Response Type: 3 (AP)
		 * UUID: 6d1b1911-14a9-391c-cdee-89850a5aa1ef
		 * Manufacturer: Technicolor
		 * Model: Technicolor
		 * Model Number: 123456
		 * Serial Number: 0000001
		 * Primary Device Type: 6-0050f204-1
		 * Device name: TechnicolorAP
		 * Config methods: Display
		 * RF Bands: 0x1
		 * Unknown TLV (0x1049, 6 bytes): 00 37 2a 00 01 20
	WPA:	 * Version: 1
		 * Group cipher: TKIP
		 * Pairwise ciphers: CCMP TKIP
		 * Authentication suites: PSK
		 * Capabilities: 16-PTKSA-RC 1-GTKSA-RC (0x000c)
	WMM:	 * Parameter version 1
		 * u-APSD
		 * BE: CW 15-1023, AIFSN 3
		 * BK: CW 15-1023, AIFSN 7
		 * VI: CW 7-15, AIFSN 2, TXOP 3008 usec
		 * VO: CW 3-7, AIFSN 2, TXOP 1504 usec
Kalle Valo July 27, 2016, 6:39 a.m. UTC | #5
alexmcwhirter@triadic.us writes:

> On 2016-07-26 09:59, Christian Lamparter wrote:
>> Thanks, I gave the program a try with my WNDA3100 and a WN821N v2
>> devices.
>> I did not see any corruptions in any of the tests though. Can you
>> tell me
>> something about your wireless network too? I would like to know what
>> router
>> and firmware are you using? Also important: what's your wireless
>> configuration?
>> (WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)
>>
>> Probably the quickest and easiest way to get that information is by
>> running
>> the following commands as root, when you are connected to your wifi
>> network
>> and post the results:
>> # iw dev wlan0 link
>> # iw dev wlan0 scan dump
>>
>> (You can of course remove your device's MACs, but please do it
>> consistently).
>>
>> Regards,
>> Christian
>
> I wonder if this is something that is only affecting certain NIC's?
> For example, it see this issue on sun4u boxes with tigon3 and hme
> interfaces. But on my sun4v boxes that have e1000 interfaces
> everything works fine.

Kernel configuration might also make a difference. Trying to find an
Kconfig option which affects this could be useful.
Alan Curry July 27, 2016, 10:32 a.m. UTC | #6
Al Viro wrote:
> 
> Another thing (and if that works, it's *NOT* a proper fix - it would be
> papering over the problem, but at least it would show where to look for
> it) - try (on top of mainline) the following delta:

I tried it on top of v4.6.4 and on top of the very recent v4.7-2509-g59ebc44
from Linus, and still got corruption.

> 
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index b7de71f..0ee5995 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -734,7 +734,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
>  	if (!chunk)
>  		return 0;
>  
> -	if (msg_data_left(msg) < chunk) {
> +	if (iov_iter_single_seg_count(&msg->msg_iter) < chunk) {
>  		if (__skb_checksum_complete(skb))
>  			goto csum_error;
>  		if (skb_copy_datagram_msg(skb, hlen, msg, chunk))
>
alexmcwhirter@triadic.us July 27, 2016, 6:04 p.m. UTC | #7
Just to add some more information to this, the corruption seems to 
effect ssh as well.

Using a sun hme interface, occasionally upon an ssh connection it will 
refuse to authenticate a client with either password or cert 
authentication. Using wireshark to capture and decrypt the packets 
between the two machines, the data coming from the server seems good, 
but the data received by the server from the client is essentially 
garbage. Note that the client is sending valid data, but the server is 
corrupting it upon receipt. Closing the connection and starting a new 
one will remedy the login issue, but you do occasionally see corruption 
on the server side sporadically.

So far this only seems to occur on incoming data, outgoing data seems 
fine.
alexmcwhirter@triadic.us July 27, 2016, 11:02 p.m. UTC | #8
On 2016-07-27 14:04, alexmcwhirter@triadic.us wrote:
> Just to add some more information to this, the corruption seems to
> effect ssh as well.
> 
> Using a sun hme interface, occasionally upon an ssh connection it will
> refuse to authenticate a client with either password or cert
> authentication. Using wireshark to capture and decrypt the packets
> between the two machines, the data coming from the server seems good,
> but the data received by the server from the client is essentially
> garbage. Note that the client is sending valid data, but the server is
> corrupting it upon receipt. Closing the connection and starting a new
> one will remedy the login issue, but you do occasionally see
> corruption on the server side sporadically.
> 
> So far this only seems to occur on incoming data, outgoing data seems 
> fine.

Also, there is another patch the references this commit on sparc64 at 
least.

https://patchwork.kernel.org/patch/9221895/

I highly expect both my issue and OP's issue to revolve not around 
commit e5a4b0bb803b specifically, but around other code that no longer 
behaves as expected because of it.
David Miller July 27, 2016, 11:45 p.m. UTC | #9
From: alexmcwhirter@triadic.us
Date: Wed, 27 Jul 2016 19:02:40 -0400

> On 2016-07-27 14:04, alexmcwhirter@triadic.us wrote:
>> Just to add some more information to this, the corruption seems to
>> effect ssh as well.
>> Using a sun hme interface, occasionally upon an ssh connection it will
>> refuse to authenticate a client with either password or cert
>> authentication. Using wireshark to capture and decrypt the packets
>> between the two machines, the data coming from the server seems good,
>> but the data received by the server from the client is essentially
>> garbage. Note that the client is sending valid data, but the server is
>> corrupting it upon receipt. Closing the connection and starting a new
>> one will remedy the login issue, but you do occasionally see
>> corruption on the server side sporadically.
>> So far this only seems to occur on incoming data, outgoing data seems
>> fine.
> 
> Also, there is another patch the references this commit on sparc64 at
> least.
> 
> https://patchwork.kernel.org/patch/9221895/
> 
> I highly expect both my issue and OP's issue to revolve not around
> commit e5a4b0bb803b specifically, but around other code that no longer
> behaves as expected because of it.

Indeed, and that fault address rounding bug occurs two other times
in arch/sparc/lib/user_fixup.c

The mentioned patchwork patch should fix the bug and I'll get that
into my sparc tree, merged, and queued up for -stable ASAP.
alexmcwhirter@triadic.us July 28, 2016, 12:26 a.m. UTC | #10
On 2016-07-27 20:31, Al Viro wrote:
> On Wed, Jul 27, 2016 at 04:45:43PM -0700, David Miller wrote:
> 
>> > I highly expect both my issue and OP's issue to revolve not around
>> > commit e5a4b0bb803b specifically, but around other code that no longer
>> > behaves as expected because of it.
>> 
>> Indeed, and that fault address rounding bug occurs two other times
>> in arch/sparc/lib/user_fixup.c
>> 
>> The mentioned patchwork patch should fix the bug and I'll get that
>> into my sparc tree, merged, and queued up for -stable ASAP.
> 
> Plausible for sparc, but I don't see similar __copy_to_user_inatomic()
> bugs in case of x86_64...

I'm going to go ahead and say this is where my issue and the op's issue 
begin to branch apart from one another. He's seeing this on all incoming 
data, whereas i am only seeing it on ssl data and not on sun4v.

At this point i would say data from my issue is only going to cloud this 
issue as they seem to be two completely different issues revolving 
around the same commit. If i come across any relevant data for x86_64 
ill be sure to post it if this isn't resolved by then, but for now i'm 
going to refrain from submitting anything sparc related.
Al Viro July 28, 2016, 12:31 a.m. UTC | #11
On Wed, Jul 27, 2016 at 04:45:43PM -0700, David Miller wrote:

> > I highly expect both my issue and OP's issue to revolve not around
> > commit e5a4b0bb803b specifically, but around other code that no longer
> > behaves as expected because of it.
> 
> Indeed, and that fault address rounding bug occurs two other times
> in arch/sparc/lib/user_fixup.c
> 
> The mentioned patchwork patch should fix the bug and I'll get that
> into my sparc tree, merged, and queued up for -stable ASAP.

Plausible for sparc, but I don't see similar __copy_to_user_inatomic()
bugs in case of x86_64...
Al Viro July 28, 2016, 1:22 a.m. UTC | #12
On Wed, Jul 27, 2016 at 08:26:48PM -0400, alexmcwhirter@triadic.us wrote:

> I'm going to go ahead and say this is where my issue and the op's issue
> begin to branch apart from one another. He's seeing this on all incoming
> data, whereas i am only seeing it on ssl data and not on sun4v.
> 
> At this point i would say data from my issue is only going to cloud this
> issue as they seem to be two completely different issues revolving around
> the same commit. If i come across any relevant data for x86_64 ill be sure
> to post it if this isn't resolved by then, but for now i'm going to refrain
> from submitting anything sparc related.

Which just might mean that we have *three* issues here -
	(1) buggered __copy_to_user_inatomic() (and friends) on some sparcs
	(2) your ssl-only corruption
	(3) Alan's x86_64 corruption on plain TCP read - no ssl *or* sparc
anywhere, and no multi-segment recvmsg().  Which would strongly argue in
favour of some kind of copy_page_to_iter() breakage triggered when handling
a fragmented skb, as in (1).  Except that I don't see anything similar in
x86_64 uaccess primitives...
diff mbox

Patch

diff --git a/net/core/datagram.c b/net/core/datagram.c
index b7de71f..0ee5995 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -734,7 +734,7 @@  int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 	if (!chunk)
 		return 0;
 
-	if (msg_data_left(msg) < chunk) {
+	if (iov_iter_single_seg_count(&msg->msg_iter) < chunk) {
 		if (__skb_checksum_complete(skb))
 			goto csum_error;
 		if (skb_copy_datagram_msg(skb, hlen, msg, chunk))