diff mbox

ipv6: udp packets following an UFO enqueued packet need also be handled by UFO

Message ID 20131001232534.GM10771@order.stressinduktion.org
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Hannes Frederic Sowa Oct. 1, 2013, 11:25 p.m. UTC
On Tue, Oct 01, 2013 at 11:47:21PM +0200, Hannes Frederic Sowa wrote:
> The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
> get an oops.

This is incorrect, it just depends on the size of the writes and on the
interface mtu.

> IPv4 seems to work without problems, too.

I also get kernel oopses from IPv4 now, too.


So, skb_is_gso is not accurate enough in the output and we have to check if we
already started to append to skb frags. The following diff does resolve this
issue in both the IPv4 and IPv6 non-page-append output path but I am not
confident if it is correct:


Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jiri Pirko Oct. 2, 2013, 8:58 a.m. UTC | #1
Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
>On Tue, Oct 01, 2013 at 11:47:21PM +0200, Hannes Frederic Sowa wrote:
>> The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
>> get an oops.
>
>This is incorrect, it just depends on the size of the writes and on the
>interface mtu.
>
>> IPv4 seems to work without problems, too.
>
>I also get kernel oopses from IPv4 now, too.
>
>
>So, skb_is_gso is not accurate enough in the output and we have to check if we
>already started to append to skb frags. The following diff does resolve this
>issue in both the IPv4 and IPv6 non-page-append output path but I am not
>confident if it is correct:
>
>diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>index 6d56840..3565450 100644
>--- a/include/linux/skbuff.h
>+++ b/include/linux/skbuff.h
>@@ -1308,6 +1308,11 @@ static inline int skb_pagelen(const struct sk_buff *skb)
> 	return len + skb_headlen(skb);
> }
> 
>+static inline bool skb_has_frags(const struct sk_buff *skb)
>+{
>+	return skb_shinfo(skb)->nr_frags;
>+}
>+
> /**
>  * __skb_fill_page_desc - initialise a paged fragment in an skb
>  * @skb: buffer containing fragment to be initialised
>diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>index 7d8357b..8dc3d8d 100644
>--- a/net/ipv4/ip_output.c
>+++ b/net/ipv4/ip_output.c
>@@ -836,7 +836,7 @@ static int __ip_append_data(struct sock *sk,
> 		csummode = CHECKSUM_PARTIAL;
> 
> 	cork->length += length;
>-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
>+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> 	    (sk->sk_protocol == IPPROTO_UDP) &&
> 	    (rt->dst.dev->features & NETIF_F_UFO) && !rt->dst.header_len) {
> 		err = ip_ufo_append_data(sk, queue, getfrag, from, length,
>diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>index a54c45c..ded4f6f 100644
>--- a/net/ipv6/ip6_output.c
>+++ b/net/ipv6/ip6_output.c
>@@ -1227,7 +1227,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
> 	skb = skb_peek_tail(&sk->sk_write_queue);
> 	cork->length += length;
> 	if (((length > mtu) ||
>-	     (skb && skb_is_gso(skb))) &&
>+	     (skb && skb_has_frags(skb))) &&
> 	    (sk->sk_protocol == IPPROTO_UDP) &&
> 	    (rt->dst.dev->features & NETIF_F_UFO)) {
> 		err = ip6_ufo_append_data(sk, getfrag, from, length,
>
>Greetings,
>
>  Hannes
>


This seems correct to me. sk_is_gso would work as well is you apply my
patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
well" which does the setting of gso_size.

Jiri
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Oct. 2, 2013, 10:33 a.m. UTC | #2
Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
>On Tue, Oct 01, 2013 at 11:47:21PM +0200, Hannes Frederic Sowa wrote:
>> The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
>> get an oops.
>
>This is incorrect, it just depends on the size of the writes and on the
>interface mtu.
>
>> IPv4 seems to work without problems, too.
>
>I also get kernel oopses from IPv4 now, too.

I'm not able to trigger this with ipv4. Can you please send strace
output for this as well?

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 2, 2013, 10:41 a.m. UTC | #3
On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&

> 
> This seems correct to me. sk_is_gso would work as well is you apply my
> patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> well" which does the setting of gso_size.

Well, skb having frags or not should not be a concern :
Thats an allocation choice (lets say to avoid high order allocations). 

Setting gso_size is probably better.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 2, 2013, 12:01 p.m. UTC | #4
On Wed, Oct 02, 2013 at 12:33:33PM +0200, Jiri Pirko wrote:
> Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> >On Tue, Oct 01, 2013 at 11:47:21PM +0200, Hannes Frederic Sowa wrote:
> >> The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
> >> get an oops.
> >
> >This is incorrect, it just depends on the size of the writes and on the
> >interface mtu.
> >
> >> IPv4 seems to work without problems, too.
> >
> >I also get kernel oopses from IPv4 now, too.
> 
> I'm not able to trigger this with ipv4. Can you please send strace
> output for this as well?

I used this snippet on loopback with UFO enabled and lo mtu 1280.

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <linux/udp.h>
#include <stdio.h>

int test(int mtu)
{
        int fd;
        const int one = 1;
        const int off = 0;
        struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
        unsigned char buffer[3701];

        inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);

        fd = socket(AF_INET, SOCK_DGRAM, 0);
        connect(fd, (struct sockaddr *) &addr, sizeof(addr));

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));

        write(fd, " ", 1);
        write(fd, buffer, sizeof(buffer));
        write(fd, " ", 1);

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));

        close(fd);
}

int main() {
        test(1280);
}

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 2, 2013, 12:12 p.m. UTC | #5
Hi Eric!

On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> 
> > 
> > This seems correct to me. sk_is_gso would work as well is you apply my
> > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > well" which does the setting of gso_size.
> 
> Well, skb having frags or not should not be a concern :
> Thats an allocation choice (lets say to avoid high order allocations). 
> 
> Setting gso_size is probably better.

e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
approach") states:

"
skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
indicating that hardware has to do checksum calculation. Hardware should
compute the UDP checksum of complete datagram and also ip header checksum of
each fragmented IP packet.
"

This is the reason why I tried not to update the gso_size. If it is ok, I am
fine with that.

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 2, 2013, 1:03 p.m. UTC | #6
On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> Hi Eric!
> 
> On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > 
> > > 
> > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > well" which does the setting of gso_size.
> > 
> > Well, skb having frags or not should not be a concern :
> > Thats an allocation choice (lets say to avoid high order allocations). 
> > 
> > Setting gso_size is probably better.
> 
> e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> approach") states:
> 
> "
> skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> indicating that hardware has to do checksum calculation. Hardware should
> compute the UDP checksum of complete datagram and also ip header checksum of
> each fragmented IP packet.
> "
> 
> This is the reason why I tried not to update the gso_size. If it is ok, I am
> fine with that.

Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
mapping (skb->data with skb_headlen, which is fine) is used as the inband
header:

        if (offload_type == SKB_GSO_UDP)
                frg_cnt++; /* as Txd0 was used for inband header */

That is my only other hint that we maybe should not update gso_size and
gso_type. I guess software fallback does not have this problem, but I won't
have time to check until this evening.

I am really not sure if just setting gso_size does not break neterion UFO
offloading. :/

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 2, 2013, 3:14 p.m. UTC | #7
On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
> On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> > Hi Eric!
> > 
> > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > > 
> > > > 
> > > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > > well" which does the setting of gso_size.
> > > 
> > > Well, skb having frags or not should not be a concern :
> > > Thats an allocation choice (lets say to avoid high order allocations). 
> > > 
> > > Setting gso_size is probably better.
> > 
> > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> > approach") states:
> > 
> > "
> > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> > indicating that hardware has to do checksum calculation. Hardware should
> > compute the UDP checksum of complete datagram and also ip header checksum of
> > each fragmented IP packet.
> > "
> > 
> > This is the reason why I tried not to update the gso_size. If it is ok, I am
> > fine with that.
> 
> Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
> mapping (skb->data with skb_headlen, which is fine) is used as the inband
> header:
> 
>         if (offload_type == SKB_GSO_UDP)
>                 frg_cnt++; /* as Txd0 was used for inband header */
> 
> That is my only other hint that we maybe should not update gso_size and
> gso_type. I guess software fallback does not have this problem, but I won't
> have time to check until this evening.
> 
> I am really not sure if just setting gso_size does not break neterion UFO
> offloading. :/

Well, just ask Jon Mason to double check ;)

I think the commit intent was to set gso_size :

   skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
    fragment going out of the adapter after IP fragmentation by hardware.

The fact that it states "skb->data will contain MAC/IP/UDP header and
skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.

If Neterion driver mandates that skb->head *only* contains the
MAC/IP/UDP header, that should be handled in the driver itself.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 2, 2013, 4:27 p.m. UTC | #8
Hi!

I have a question regarding UFO and the neterion driver, which as the only one
advertises hardware UFO support:

The patch discusses in this thread
http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
some semantics how packets are constructed before submitted to the driver.

We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
payload is attached in the skb's frags. With the changes discussed in this
thread it is possible that we also append to skb->data some amount of data
which is not targeted for the header. From reading the driver sources it seems
the hardware interprets the skb->data to skb_headlen as the header, so we
could include some data in the fragments more than once.

Do you think this change is safe? Otherwise I would suggest that the UFO
capability is switched off until the driver signals the hardware the start and
end of the headers correctly?

I left the mail below intact which points to the specific place in s2io.c
where I think the problem is.

On Wed, Oct 02, 2013 at 08:14:27AM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
> > On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> > > Hi Eric!
> > > 
> > > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > > > 
> > > > > 
> > > > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > > > well" which does the setting of gso_size.
> > > > 
> > > > Well, skb having frags or not should not be a concern :
> > > > Thats an allocation choice (lets say to avoid high order allocations). 
> > > > 
> > > > Setting gso_size is probably better.
> > > 
> > > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> > > approach") states:
> > > 
> > > "
> > > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> > > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> > > indicating that hardware has to do checksum calculation. Hardware should
> > > compute the UDP checksum of complete datagram and also ip header checksum of
> > > each fragmented IP packet.
> > > "
> > > 
> > > This is the reason why I tried not to update the gso_size. If it is ok, I am
> > > fine with that.
> > 
> > Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
> > mapping (skb->data with skb_headlen, which is fine) is used as the inband
> > header:
> > 
> >         if (offload_type == SKB_GSO_UDP)
> >                 frg_cnt++; /* as Txd0 was used for inband header */
> > 
> > That is my only other hint that we maybe should not update gso_size and
> > gso_type. I guess software fallback does not have this problem, but I won't
> > have time to check until this evening.
> > 
> > I am really not sure if just setting gso_size does not break neterion UFO
> > offloading. :/
> 
> Well, just ask Jon Mason to double check ;)
> 
> I think the commit intent was to set gso_size :
> 
>    skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
>     fragment going out of the adapter after IP fragmentation by hardware.
> 
> The fact that it states "skb->data will contain MAC/IP/UDP header and
> skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.
> 
> If Neterion driver mandates that skb->head *only* contains the
> MAC/IP/UDP header, that should be handled in the driver itself.

Thanks Eric for clearing this up.

I really thought it would be the common pattern for UFO to have only headers
in skb->data, so I didn't bother to ask in the first place.

Thanks,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 7, 2013, 4:53 p.m. UTC | #9
Hi Jon!

Maybe I got the wrong email address for the neterion driver from the
maintainers filer? If you are (still) affiliated with the neterion driver,
maybe you could have a short look at the quoted mail below?

Thanks,

  Hannes

On Wed, Oct 02, 2013 at 06:27:30PM +0200, Hannes Frederic Sowa wrote:
> Hi!
> 
> I have a question regarding UFO and the neterion driver, which as the only one
> advertises hardware UFO support:
> 
> The patch discusses in this thread
> http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> some semantics how packets are constructed before submitted to the driver.
> 
> We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> payload is attached in the skb's frags. With the changes discussed in this
> thread it is possible that we also append to skb->data some amount of data
> which is not targeted for the header. From reading the driver sources it seems
> the hardware interprets the skb->data to skb_headlen as the header, so we
> could include some data in the fragments more than once.
> 
> Do you think this change is safe? Otherwise I would suggest that the UFO
> capability is switched off until the driver signals the hardware the start and
> end of the headers correctly?
> 
> I left the mail below intact which points to the specific place in s2io.c
> where I think the problem is.
> 
> On Wed, Oct 02, 2013 at 08:14:27AM -0700, Eric Dumazet wrote:
> > On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
> > > On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> > > > Hi Eric!
> > > > 
> > > > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > > > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > > > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > > > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > > > > 
> > > > > > 
> > > > > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > > > > well" which does the setting of gso_size.
> > > > > 
> > > > > Well, skb having frags or not should not be a concern :
> > > > > Thats an allocation choice (lets say to avoid high order allocations). 
> > > > > 
> > > > > Setting gso_size is probably better.
> > > > 
> > > > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> > > > approach") states:
> > > > 
> > > > "
> > > > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> > > > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> > > > indicating that hardware has to do checksum calculation. Hardware should
> > > > compute the UDP checksum of complete datagram and also ip header checksum of
> > > > each fragmented IP packet.
> > > > "
> > > > 
> > > > This is the reason why I tried not to update the gso_size. If it is ok, I am
> > > > fine with that.
> > > 
> > > Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
> > > mapping (skb->data with skb_headlen, which is fine) is used as the inband
> > > header:
> > > 
> > >         if (offload_type == SKB_GSO_UDP)
> > >                 frg_cnt++; /* as Txd0 was used for inband header */
> > > 
> > > That is my only other hint that we maybe should not update gso_size and
> > > gso_type. I guess software fallback does not have this problem, but I won't
> > > have time to check until this evening.
> > > 
> > > I am really not sure if just setting gso_size does not break neterion UFO
> > > offloading. :/
> > 
> > Well, just ask Jon Mason to double check ;)
> > 
> > I think the commit intent was to set gso_size :
> > 
> >    skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
> >     fragment going out of the adapter after IP fragmentation by hardware.
> > 
> > The fact that it states "skb->data will contain MAC/IP/UDP header and
> > skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.
> > 
> > If Neterion driver mandates that skb->head *only* contains the
> > MAC/IP/UDP header, that should be handled in the driver itself.
> 
> Thanks Eric for clearing this up.
> 
> I really thought it would be the common pattern for UFO to have only headers
> in skb->data, so I didn't bother to ask in the first place.
> 
> Thanks,
> 
>   Hannes
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Mason Oct. 7, 2013, 5:19 p.m. UTC | #10
On Mon, Oct 07, 2013 at 06:53:43PM +0200, Hannes Frederic Sowa wrote:
> Hi Jon!
> 
> Maybe I got the wrong email address for the neterion driver from the
> maintainers filer? If you are (still) affiliated with the neterion driver,
> maybe you could have a short look at the quoted mail below?

Both are valid email addresses, but I prefer to address non-Intel
issues with my kudzu.us email account.

I apologize for not addressing your question yet.  What you are saying
makes sense, but I want to dig through the documentation and verify.
However, I haven't had the time.  I'll brew up a pot of coffee when I
get home and I'll get an answer to you before I go to bed tonight :)

Thanks,
Jon

> 
> Thanks,
> 
>   Hannes
> 
> On Wed, Oct 02, 2013 at 06:27:30PM +0200, Hannes Frederic Sowa wrote:
> > Hi!
> > 
> > I have a question regarding UFO and the neterion driver, which as the only one
> > advertises hardware UFO support:
> > 
> > The patch discusses in this thread
> > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > some semantics how packets are constructed before submitted to the driver.
> > 
> > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > payload is attached in the skb's frags. With the changes discussed in this
> > thread it is possible that we also append to skb->data some amount of data
> > which is not targeted for the header. From reading the driver sources it seems
> > the hardware interprets the skb->data to skb_headlen as the header, so we
> > could include some data in the fragments more than once.
> > 
> > Do you think this change is safe? Otherwise I would suggest that the UFO
> > capability is switched off until the driver signals the hardware the start and
> > end of the headers correctly?
> > 
> > I left the mail below intact which points to the specific place in s2io.c
> > where I think the problem is.
> > 
> > On Wed, Oct 02, 2013 at 08:14:27AM -0700, Eric Dumazet wrote:
> > > On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
> > > > On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> > > > > Hi Eric!
> > > > > 
> > > > > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > > > > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > > > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > > > > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > > > > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > > > > > 
> > > > > > > 
> > > > > > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > > > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > > > > > well" which does the setting of gso_size.
> > > > > > 
> > > > > > Well, skb having frags or not should not be a concern :
> > > > > > Thats an allocation choice (lets say to avoid high order allocations). 
> > > > > > 
> > > > > > Setting gso_size is probably better.
> > > > > 
> > > > > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> > > > > approach") states:
> > > > > 
> > > > > "
> > > > > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> > > > > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> > > > > indicating that hardware has to do checksum calculation. Hardware should
> > > > > compute the UDP checksum of complete datagram and also ip header checksum of
> > > > > each fragmented IP packet.
> > > > > "
> > > > > 
> > > > > This is the reason why I tried not to update the gso_size. If it is ok, I am
> > > > > fine with that.
> > > > 
> > > > Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
> > > > mapping (skb->data with skb_headlen, which is fine) is used as the inband
> > > > header:
> > > > 
> > > >         if (offload_type == SKB_GSO_UDP)
> > > >                 frg_cnt++; /* as Txd0 was used for inband header */
> > > > 
> > > > That is my only other hint that we maybe should not update gso_size and
> > > > gso_type. I guess software fallback does not have this problem, but I won't
> > > > have time to check until this evening.
> > > > 
> > > > I am really not sure if just setting gso_size does not break neterion UFO
> > > > offloading. :/
> > > 
> > > Well, just ask Jon Mason to double check ;)
> > > 
> > > I think the commit intent was to set gso_size :
> > > 
> > >    skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
> > >     fragment going out of the adapter after IP fragmentation by hardware.
> > > 
> > > The fact that it states "skb->data will contain MAC/IP/UDP header and
> > > skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.
> > > 
> > > If Neterion driver mandates that skb->head *only* contains the
> > > MAC/IP/UDP header, that should be handled in the driver itself.
> > 
> > Thanks Eric for clearing this up.
> > 
> > I really thought it would be the common pattern for UFO to have only headers
> > in skb->data, so I didn't bother to ask in the first place.
> > 
> > Thanks,
> > 
> >   Hannes
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 7, 2013, 5:27 p.m. UTC | #11
On Mon, Oct 07, 2013 at 10:19:41AM -0700, Jon Mason wrote:
> Both are valid email addresses, but I prefer to address non-Intel
> issues with my kudzu.us email account.

Ok.

> I apologize for not addressing your question yet.  What you are saying
> makes sense, but I want to dig through the documentation and verify.
> However, I haven't had the time.  I'll brew up a pot of coffee when I
> get home and I'll get an answer to you before I go to bed tonight :)

No need to hurry, I do not want to induce stress. ;)

I just wanted to make sure this does not get forgotten so we can apply Jiri's
patches anytime soon.

Enjoy your coffee,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Mason Oct. 8, 2013, 8:07 a.m. UTC | #12
On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi!
>
> I have a question regarding UFO and the neterion driver, which as the only one
> advertises hardware UFO support:
>
> The patch discusses in this thread
> http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> some semantics how packets are constructed before submitted to the driver.
>
> We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> payload is attached in the skb's frags. With the changes discussed in this
> thread it is possible that we also append to skb->data some amount of data
> which is not targeted for the header. From reading the driver sources it seems
> the hardware interprets the skb->data to skb_headlen as the header, so we
> could include some data in the fragments more than once.

From my reading of the HW Spec and a quick look at the driver, it
appears that the driver is using one entry in the TX ring for the
header and another for the body of the packet to be fragmented (which
is what the hardware wants).  I don't understand what you are saying,
but if you are asking if simply appending a new header & data to the
end of skb->data will get it out on the wire correct, I don't believe
it will.

I do have hardware that I can try the patch on, if you can walk me
through the use case (unless it is as easy as setup an IPv6 connection
and ping).

Time for sleep now....

Thanks,
Jon

>
> Do you think this change is safe? Otherwise I would suggest that the UFO
> capability is switched off until the driver signals the hardware the start and
> end of the headers correctly?
>
> I left the mail below intact which points to the specific place in s2io.c
> where I think the problem is.
>
> On Wed, Oct 02, 2013 at 08:14:27AM -0700, Eric Dumazet wrote:
>> On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
>> > On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
>> > > Hi Eric!
>> > >
>> > > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
>> > > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
>> > > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
>> > > > > >-    if (((length > mtu) || (skb && skb_is_gso(skb))) &&
>> > > > > >+    if (((length > mtu) || (skb && skb_has_frags(skb))) &&
>> > > >
>> > > > >
>> > > > > This seems correct to me. sk_is_gso would work as well is you apply my
>> > > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
>> > > > > well" which does the setting of gso_size.
>> > > >
>> > > > Well, skb having frags or not should not be a concern :
>> > > > Thats an allocation choice (lets say to avoid high order allocations).
>> > > >
>> > > > Setting gso_size is probably better.
>> > >
>> > > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
>> > > approach") states:
>> > >
>> > > "
>> > > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
>> > > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
>> > > indicating that hardware has to do checksum calculation. Hardware should
>> > > compute the UDP checksum of complete datagram and also ip header checksum of
>> > > each fragmented IP packet.
>> > > "
>> > >
>> > > This is the reason why I tried not to update the gso_size. If it is ok, I am
>> > > fine with that.
>> >
>> > Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
>> > mapping (skb->data with skb_headlen, which is fine) is used as the inband
>> > header:
>> >
>> >         if (offload_type == SKB_GSO_UDP)
>> >                 frg_cnt++; /* as Txd0 was used for inband header */
>> >
>> > That is my only other hint that we maybe should not update gso_size and
>> > gso_type. I guess software fallback does not have this problem, but I won't
>> > have time to check until this evening.
>> >
>> > I am really not sure if just setting gso_size does not break neterion UFO
>> > offloading. :/
>>
>> Well, just ask Jon Mason to double check ;)
>>
>> I think the commit intent was to set gso_size :
>>
>>    skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
>>     fragment going out of the adapter after IP fragmentation by hardware.
>>
>> The fact that it states "skb->data will contain MAC/IP/UDP header and
>> skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.
>>
>> If Neterion driver mandates that skb->head *only* contains the
>> MAC/IP/UDP header, that should be handled in the driver itself.
>
> Thanks Eric for clearing this up.
>
> I really thought it would be the common pattern for UFO to have only headers
> in skb->data, so I didn't bother to ask in the first place.
>
> Thanks,
>
>   Hannes
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 8, 2013, 1 p.m. UTC | #13
On Tue, 2013-10-08 at 01:07 -0700, Jon Mason wrote:
> On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Hi!
> >
> > I have a question regarding UFO and the neterion driver, which as the only one
> > advertises hardware UFO support:
> >
> > The patch discusses in this thread
> > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > some semantics how packets are constructed before submitted to the driver.
> >
> > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > payload is attached in the skb's frags. With the changes discussed in this
> > thread it is possible that we also append to skb->data some amount of data
> > which is not targeted for the header. From reading the driver sources it seems
> > the hardware interprets the skb->data to skb_headlen as the header, so we
> > could include some data in the fragments more than once.
> 
> From my reading of the HW Spec and a quick look at the driver, it
> appears that the driver is using one entry in the TX ring for the
> header and another for the body of the packet to be fragmented (which
> is what the hardware wants).  I don't understand what you are saying,
> but if you are asking if simply appending a new header & data to the
> end of skb->data will get it out on the wire correct, I don't believe
> it will.
> 
> I do have hardware that I can try the patch on, if you can walk me
> through the use case (unless it is as easy as setup an IPv6 connection
> and ping).

I think this behavior is quite common. Driver should certainly already
do the right thing, as TCP frames can have the same property.

bnx2x for example splits skb->head if it contains payload after headers.

drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c

                if (unlikely(skb_headlen(skb) > hlen)) {
                        nbd++;
                        bd_prod = bnx2x_tx_split(bp, txdata, tx_buf,
                                                 &tx_start_bd, hlen,
                                                 bd_prod);
                }


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 8, 2013, 2:53 p.m. UTC | #14
On Tue, Oct 08, 2013 at 01:07:29AM -0700, Jon Mason wrote:
> On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Hi!
> >
> > I have a question regarding UFO and the neterion driver, which as the only one
> > advertises hardware UFO support:
> >
> > The patch discusses in this thread
> > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > some semantics how packets are constructed before submitted to the driver.
> >
> > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > payload is attached in the skb's frags. With the changes discussed in this
> > thread it is possible that we also append to skb->data some amount of data
> > which is not targeted for the header. From reading the driver sources it seems
> > the hardware interprets the skb->data to skb_headlen as the header, so we
> > could include some data in the fragments more than once.
> 
> From my reading of the HW Spec and a quick look at the driver, it
> appears that the driver is using one entry in the TX ring for the
> header and another for the body of the packet to be fragmented (which
> is what the hardware wants).  I don't understand what you are saying,
> but if you are asking if simply appending a new header & data to the
> end of skb->data will get it out on the wire correct, I don't believe
> it will.

No this is not what I tried to say. I'll try to be more clear this
time. ;)

We start with an UDP socket which is corked. As soon as we write the
first few bytes (smaller than the mtu) onto this socket we put the
header in place and the rest of the data is just appended behind the
header directly in skb->data via plain ip_append_data.

Now a second write with a length > mtu happens: The ip(6)_append_data
will branch to ufo_append. This will fetch the first skb and append
to skb->frags.  gso_type and gso_size will be updated on this skb (this
currently does not happen but will with the patches discussed in this
thread).

If this packet is transmitted down to the device driver we have the udp
header in skb->data *and* also the payload from the first write. The
payload from the second write is appended as a frag and gso_type and
gso_size are set. This header+payload seem to be mapped just after the
ufo_in_band_v descriptor as the header in the first tx descriptor:

   4174         txdp->Buffer_Pointer = pci_map_single(sp->pdev, skb->data,
   4175                                               frg_len, PCI_DMA_TODEVICE);

frg_len is set to skb_headlen(skb). This happens right after setting up
the descriptor for the in-band ufo data.

My guess is that this data isn't split currently by the neterion driver
(at least I could not find it in the driver as Eric showed it for bnx2x)
so it might reappear in the packets when the hardware fragments the
packet and places the first tx ring in front of every packet.

Before these changes we never updated the gso_type and gso_size even when
we did append via UFO. So we never had payload in an UFO marked skb->data,
only the headers. Now we could also end up with a some payload in the
first TX ring, which you said is only for the header.

> I do have hardware that I can try the patch on, if you can walk me
> through the use case (unless it is as easy as setup an IPv6 connection
> and ping).

Ok, testing this should not be that complicated:

We can test this with plain IPv4/UDP sockets. I would suggest a net-next kernel
with this patch from Jiri applied: http://patchwork.ozlabs.org/patch/279691/

--- >8 ---
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <linux/udp.h>
#include <stdio.h>

int test(int mtu)
{
        int fd;
        const int one = 1;
        const int off = 0;
        struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
        unsigned char buffer[3701];

        inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);

        fd = socket(AF_INET, SOCK_DGRAM, 0);
        connect(fd, (struct sockaddr *) &addr, sizeof(addr));

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));

        write(fd, "    ", 4);
        write(fd, buffer, sizeof(buffer));
        write(fd, " ", 1);

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));

        close(fd);
}

int main() {
        test(1280);
}
--- >8 ---

I left out error handling so it is better observed with strace if
something went wrong.

You should change the port number and ip address to something reasonable
for your network. My guess would be that the spaces (0x20) of the first
write is now placed between UDP header and payload of every packet
fragmented by the hardware. Would be nice to hear that I am wrong. ;)

Be aware that the above program can cause memory corruption in the kernel
if you did not apply Jiri's patch.

Thanks for helping!

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 17, 2013, 4:45 a.m. UTC | #15
Hi Jon and Jiri!

Just wanted to remind you if you could have a look at this?

If you don't have time to test this may I know your assessment of the
situation? I could send a compile-time tested patch to disable UFO or if you
say so we could leave this as is.

Jiri, I would suggest you resend your patches then.

Thanks,

  Hannes

[top-posted by intention]

On Tue, Oct 08, 2013 at 04:53:31PM +0200, Hannes Frederic Sowa wrote:
> On Tue, Oct 08, 2013 at 01:07:29AM -0700, Jon Mason wrote:
> > On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
> > <hannes@stressinduktion.org> wrote:
> > > Hi!
> > >
> > > I have a question regarding UFO and the neterion driver, which as the only one
> > > advertises hardware UFO support:
> > >
> > > The patch discusses in this thread
> > > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > > some semantics how packets are constructed before submitted to the driver.
> > >
> > > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > > payload is attached in the skb's frags. With the changes discussed in this
> > > thread it is possible that we also append to skb->data some amount of data
> > > which is not targeted for the header. From reading the driver sources it seems
> > > the hardware interprets the skb->data to skb_headlen as the header, so we
> > > could include some data in the fragments more than once.
> > 
> > From my reading of the HW Spec and a quick look at the driver, it
> > appears that the driver is using one entry in the TX ring for the
> > header and another for the body of the packet to be fragmented (which
> > is what the hardware wants).  I don't understand what you are saying,
> > but if you are asking if simply appending a new header & data to the
> > end of skb->data will get it out on the wire correct, I don't believe
> > it will.
> 
> No this is not what I tried to say. I'll try to be more clear this
> time. ;)
> 
> We start with an UDP socket which is corked. As soon as we write the
> first few bytes (smaller than the mtu) onto this socket we put the
> header in place and the rest of the data is just appended behind the
> header directly in skb->data via plain ip_append_data.
> 
> Now a second write with a length > mtu happens: The ip(6)_append_data
> will branch to ufo_append. This will fetch the first skb and append
> to skb->frags.  gso_type and gso_size will be updated on this skb (this
> currently does not happen but will with the patches discussed in this
> thread).
> 
> If this packet is transmitted down to the device driver we have the udp
> header in skb->data *and* also the payload from the first write. The
> payload from the second write is appended as a frag and gso_type and
> gso_size are set. This header+payload seem to be mapped just after the
> ufo_in_band_v descriptor as the header in the first tx descriptor:
> 
>    4174         txdp->Buffer_Pointer = pci_map_single(sp->pdev, skb->data,
>    4175                                               frg_len, PCI_DMA_TODEVICE);
> 
> frg_len is set to skb_headlen(skb). This happens right after setting up
> the descriptor for the in-band ufo data.
> 
> My guess is that this data isn't split currently by the neterion driver
> (at least I could not find it in the driver as Eric showed it for bnx2x)
> so it might reappear in the packets when the hardware fragments the
> packet and places the first tx ring in front of every packet.
> 
> Before these changes we never updated the gso_type and gso_size even when
> we did append via UFO. So we never had payload in an UFO marked skb->data,
> only the headers. Now we could also end up with a some payload in the
> first TX ring, which you said is only for the header.
> 
> > I do have hardware that I can try the patch on, if you can walk me
> > through the use case (unless it is as easy as setup an IPv6 connection
> > and ping).
> 
> Ok, testing this should not be that complicated:
> 
> We can test this with plain IPv4/UDP sockets. I would suggest a net-next kernel
> with this patch from Jiri applied: http://patchwork.ozlabs.org/patch/279691/
> 
> --- >8 ---
> #include <sys/types.h>
> #include <sys/socket.h>
> #include <netinet/in.h>
> #include <arpa/inet.h>
> #include <linux/udp.h>
> #include <stdio.h>
> 
> int test(int mtu)
> {
>         int fd;
>         const int one = 1;
>         const int off = 0;
>         struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
>         unsigned char buffer[3701];
> 
>         inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
> 
>         fd = socket(AF_INET, SOCK_DGRAM, 0);
>         connect(fd, (struct sockaddr *) &addr, sizeof(addr));
> 
>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));
> 
>         write(fd, "    ", 4);
>         write(fd, buffer, sizeof(buffer));
>         write(fd, " ", 1);
> 
>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));
> 
>         close(fd);
> }
> 
> int main() {
>         test(1280);
> }
> --- >8 ---
> 
> I left out error handling so it is better observed with strace if
> something went wrong.
> 
> You should change the port number and ip address to something reasonable
> for your network. My guess would be that the spaces (0x20) of the first
> write is now placed between UDP header and payload of every packet
> fragmented by the hardware. Would be nice to hear that I am wrong. ;)
> 
> Be aware that the above program can cause memory corruption in the kernel
> if you did not apply Jiri's patch.
> 
> Thanks for helping!
> 
>   Hannes
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Jiri Pirko Oct. 18, 2013, 7:52 a.m. UTC | #16
Thu, Oct 17, 2013 at 06:45:52AM CEST, hannes@stressinduktion.org wrote:
>Hi Jon and Jiri!
>
>Just wanted to remind you if you could have a look at this?
>
>If you don't have time to test this may I know your assessment of the
>situation? I could send a compile-time tested patch to disable UFO or if you
>say so we could leave this as is.
>
>Jiri, I would suggest you resend your patches then.


Okay. I will.

>
>Thanks,
>
>  Hannes
>
>[top-posted by intention]
>
>On Tue, Oct 08, 2013 at 04:53:31PM +0200, Hannes Frederic Sowa wrote:
>> On Tue, Oct 08, 2013 at 01:07:29AM -0700, Jon Mason wrote:
>> > On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
>> > <hannes@stressinduktion.org> wrote:
>> > > Hi!
>> > >
>> > > I have a question regarding UFO and the neterion driver, which as the only one
>> > > advertises hardware UFO support:
>> > >
>> > > The patch discusses in this thread
>> > > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
>> > > some semantics how packets are constructed before submitted to the driver.
>> > >
>> > > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
>> > > payload is attached in the skb's frags. With the changes discussed in this
>> > > thread it is possible that we also append to skb->data some amount of data
>> > > which is not targeted for the header. From reading the driver sources it seems
>> > > the hardware interprets the skb->data to skb_headlen as the header, so we
>> > > could include some data in the fragments more than once.
>> > 
>> > From my reading of the HW Spec and a quick look at the driver, it
>> > appears that the driver is using one entry in the TX ring for the
>> > header and another for the body of the packet to be fragmented (which
>> > is what the hardware wants).  I don't understand what you are saying,
>> > but if you are asking if simply appending a new header & data to the
>> > end of skb->data will get it out on the wire correct, I don't believe
>> > it will.
>> 
>> No this is not what I tried to say. I'll try to be more clear this
>> time. ;)
>> 
>> We start with an UDP socket which is corked. As soon as we write the
>> first few bytes (smaller than the mtu) onto this socket we put the
>> header in place and the rest of the data is just appended behind the
>> header directly in skb->data via plain ip_append_data.
>> 
>> Now a second write with a length > mtu happens: The ip(6)_append_data
>> will branch to ufo_append. This will fetch the first skb and append
>> to skb->frags.  gso_type and gso_size will be updated on this skb (this
>> currently does not happen but will with the patches discussed in this
>> thread).
>> 
>> If this packet is transmitted down to the device driver we have the udp
>> header in skb->data *and* also the payload from the first write. The
>> payload from the second write is appended as a frag and gso_type and
>> gso_size are set. This header+payload seem to be mapped just after the
>> ufo_in_band_v descriptor as the header in the first tx descriptor:
>> 
>>    4174         txdp->Buffer_Pointer = pci_map_single(sp->pdev, skb->data,
>>    4175                                               frg_len, PCI_DMA_TODEVICE);
>> 
>> frg_len is set to skb_headlen(skb). This happens right after setting up
>> the descriptor for the in-band ufo data.
>> 
>> My guess is that this data isn't split currently by the neterion driver
>> (at least I could not find it in the driver as Eric showed it for bnx2x)
>> so it might reappear in the packets when the hardware fragments the
>> packet and places the first tx ring in front of every packet.
>> 
>> Before these changes we never updated the gso_type and gso_size even when
>> we did append via UFO. So we never had payload in an UFO marked skb->data,
>> only the headers. Now we could also end up with a some payload in the
>> first TX ring, which you said is only for the header.
>> 
>> > I do have hardware that I can try the patch on, if you can walk me
>> > through the use case (unless it is as easy as setup an IPv6 connection
>> > and ping).
>> 
>> Ok, testing this should not be that complicated:
>> 
>> We can test this with plain IPv4/UDP sockets. I would suggest a net-next kernel
>> with this patch from Jiri applied: http://patchwork.ozlabs.org/patch/279691/
>> 
>> --- >8 ---
>> #include <sys/types.h>
>> #include <sys/socket.h>
>> #include <netinet/in.h>
>> #include <arpa/inet.h>
>> #include <linux/udp.h>
>> #include <stdio.h>
>> 
>> int test(int mtu)
>> {
>>         int fd;
>>         const int one = 1;
>>         const int off = 0;
>>         struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
>>         unsigned char buffer[3701];
>> 
>>         inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
>> 
>>         fd = socket(AF_INET, SOCK_DGRAM, 0);
>>         connect(fd, (struct sockaddr *) &addr, sizeof(addr));
>> 
>>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));
>> 
>>         write(fd, "    ", 4);
>>         write(fd, buffer, sizeof(buffer));
>>         write(fd, " ", 1);
>> 
>>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));
>> 
>>         close(fd);
>> }
>> 
>> int main() {
>>         test(1280);
>> }
>> --- >8 ---
>> 
>> I left out error handling so it is better observed with strace if
>> something went wrong.
>> 
>> You should change the port number and ip address to something reasonable
>> for your network. My guess would be that the spaces (0x20) of the first
>> write is now placed between UDP header and payload of every packet
>> fragmented by the hardware. Would be nice to hear that I am wrong. ;)
>> 
>> Be aware that the above program can cause memory corruption in the kernel
>> if you did not apply Jiri's patch.
>> 
>> Thanks for helping!
>> 
>>   Hannes
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>
>-- 
>gruss,
>
>  Hannes
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jon Mason Oct. 23, 2013, 4:35 p.m. UTC | #17
On Wed, Oct 16, 2013 at 9:45 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi Jon and Jiri!
>
> Just wanted to remind you if you could have a look at this?
>
> If you don't have time to test this may I know your assessment of the
> situation? I could send a compile-time tested patch to disable UFO or if you
> say so we could leave this as is.

So, bad news.  My Xframe 2 adapter (the only variety that does UFO
offload) won't fit in a standard PCI(32) slot.  Since my PCI-X system
at home is faulty (I'm trying to fix it , but it won't be in the time
frame you want), there is no way for me to test it on the hardware.
Terribly sorry.

I am fine with this patch going out, since UFO is off by default.
I'll handle any issues once they are discovered.  Alternatively, we
could just kill UFO and make everyone's lives easier.

Thanks,
Jon

> Jiri, I would suggest you resend your patches then.
>
> Thanks,
>
>   Hannes
>
> [top-posted by intention]
>
> On Tue, Oct 08, 2013 at 04:53:31PM +0200, Hannes Frederic Sowa wrote:
>> On Tue, Oct 08, 2013 at 01:07:29AM -0700, Jon Mason wrote:
>> > On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
>> > <hannes@stressinduktion.org> wrote:
>> > > Hi!
>> > >
>> > > I have a question regarding UFO and the neterion driver, which as the only one
>> > > advertises hardware UFO support:
>> > >
>> > > The patch discusses in this thread
>> > > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
>> > > some semantics how packets are constructed before submitted to the driver.
>> > >
>> > > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
>> > > payload is attached in the skb's frags. With the changes discussed in this
>> > > thread it is possible that we also append to skb->data some amount of data
>> > > which is not targeted for the header. From reading the driver sources it seems
>> > > the hardware interprets the skb->data to skb_headlen as the header, so we
>> > > could include some data in the fragments more than once.
>> >
>> > From my reading of the HW Spec and a quick look at the driver, it
>> > appears that the driver is using one entry in the TX ring for the
>> > header and another for the body of the packet to be fragmented (which
>> > is what the hardware wants).  I don't understand what you are saying,
>> > but if you are asking if simply appending a new header & data to the
>> > end of skb->data will get it out on the wire correct, I don't believe
>> > it will.
>>
>> No this is not what I tried to say. I'll try to be more clear this
>> time. ;)
>>
>> We start with an UDP socket which is corked. As soon as we write the
>> first few bytes (smaller than the mtu) onto this socket we put the
>> header in place and the rest of the data is just appended behind the
>> header directly in skb->data via plain ip_append_data.
>>
>> Now a second write with a length > mtu happens: The ip(6)_append_data
>> will branch to ufo_append. This will fetch the first skb and append
>> to skb->frags.  gso_type and gso_size will be updated on this skb (this
>> currently does not happen but will with the patches discussed in this
>> thread).
>>
>> If this packet is transmitted down to the device driver we have the udp
>> header in skb->data *and* also the payload from the first write. The
>> payload from the second write is appended as a frag and gso_type and
>> gso_size are set. This header+payload seem to be mapped just after the
>> ufo_in_band_v descriptor as the header in the first tx descriptor:
>>
>>    4174         txdp->Buffer_Pointer = pci_map_single(sp->pdev, skb->data,
>>    4175                                               frg_len, PCI_DMA_TODEVICE);
>>
>> frg_len is set to skb_headlen(skb). This happens right after setting up
>> the descriptor for the in-band ufo data.
>>
>> My guess is that this data isn't split currently by the neterion driver
>> (at least I could not find it in the driver as Eric showed it for bnx2x)
>> so it might reappear in the packets when the hardware fragments the
>> packet and places the first tx ring in front of every packet.
>>
>> Before these changes we never updated the gso_type and gso_size even when
>> we did append via UFO. So we never had payload in an UFO marked skb->data,
>> only the headers. Now we could also end up with a some payload in the
>> first TX ring, which you said is only for the header.
>>
>> > I do have hardware that I can try the patch on, if you can walk me
>> > through the use case (unless it is as easy as setup an IPv6 connection
>> > and ping).
>>
>> Ok, testing this should not be that complicated:
>>
>> We can test this with plain IPv4/UDP sockets. I would suggest a net-next kernel
>> with this patch from Jiri applied: http://patchwork.ozlabs.org/patch/279691/
>>
>> --- >8 ---
>> #include <sys/types.h>
>> #include <sys/socket.h>
>> #include <netinet/in.h>
>> #include <arpa/inet.h>
>> #include <linux/udp.h>
>> #include <stdio.h>
>>
>> int test(int mtu)
>> {
>>         int fd;
>>         const int one = 1;
>>         const int off = 0;
>>         struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
>>         unsigned char buffer[3701];
>>
>>         inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
>>
>>         fd = socket(AF_INET, SOCK_DGRAM, 0);
>>         connect(fd, (struct sockaddr *) &addr, sizeof(addr));
>>
>>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));
>>
>>         write(fd, "    ", 4);
>>         write(fd, buffer, sizeof(buffer));
>>         write(fd, " ", 1);
>>
>>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));
>>
>>         close(fd);
>> }
>>
>> int main() {
>>         test(1280);
>> }
>> --- >8 ---
>>
>> I left out error handling so it is better observed with strace if
>> something went wrong.
>>
>> You should change the port number and ip address to something reasonable
>> for your network. My guess would be that the spaces (0x20) of the first
>> write is now placed between UDP header and payload of every packet
>> fragmented by the hardware. Would be nice to hear that I am wrong. ;)
>>
>> Be aware that the above program can cause memory corruption in the kernel
>> if you did not apply Jiri's patch.
>>
>> Thanks for helping!
>>
>>   Hannes
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> --
> gruss,
>
>   Hannes
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Oct. 23, 2013, 6:15 p.m. UTC | #18
On Wed, Oct 23, 2013 at 09:35:43AM -0700, Jon Mason wrote:
> On Wed, Oct 16, 2013 at 9:45 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Hi Jon and Jiri!
> >
> > Just wanted to remind you if you could have a look at this?
> >
> > If you don't have time to test this may I know your assessment of the
> > situation? I could send a compile-time tested patch to disable UFO or if you
> > say so we could leave this as is.
> 
> So, bad news.  My Xframe 2 adapter (the only variety that does UFO
> offload) won't fit in a standard PCI(32) slot.  Since my PCI-X system
> at home is faulty (I'm trying to fix it , but it won't be in the time
> frame you want), there is no way for me to test it on the hardware.
> Terribly sorry.
> 
> I am fine with this patch going out, since UFO is off by default.
> I'll handle any issues once they are discovered.  Alternatively, we
> could just kill UFO and make everyone's lives easier.

Oh, I missed that UFO is off by default.

If it turns out that UFO is causing broken frames it should be either
killed or (if you have the time for that) fixed. Because there shouldn't
be regressions in stable kernels I am fine with this resolution. Maybe
you can have a look at this problem once your hardware is fixed.

Thank you,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6d56840..3565450 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1308,6 +1308,11 @@  static inline int skb_pagelen(const struct sk_buff *skb)
 	return len + skb_headlen(skb);
 }
 
+static inline bool skb_has_frags(const struct sk_buff *skb)
+{
+	return skb_shinfo(skb)->nr_frags;
+}
+
 /**
  * __skb_fill_page_desc - initialise a paged fragment in an skb
  * @skb: buffer containing fragment to be initialised
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 7d8357b..8dc3d8d 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -836,7 +836,7 @@  static int __ip_append_data(struct sock *sk,
 		csummode = CHECKSUM_PARTIAL;
 
 	cork->length += length;
-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
 	    (sk->sk_protocol == IPPROTO_UDP) &&
 	    (rt->dst.dev->features & NETIF_F_UFO) && !rt->dst.header_len) {
 		err = ip_ufo_append_data(sk, queue, getfrag, from, length,
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a54c45c..ded4f6f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1227,7 +1227,7 @@  int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 	skb = skb_peek_tail(&sk->sk_write_queue);
 	cork->length += length;
 	if (((length > mtu) ||
-	     (skb && skb_is_gso(skb))) &&
+	     (skb && skb_has_frags(skb))) &&
 	    (sk->sk_protocol == IPPROTO_UDP) &&
 	    (rt->dst.dev->features & NETIF_F_UFO)) {
 		err = ip6_ufo_append_data(sk, getfrag, from, length,