diff mbox

pktgen and spin_lock_bh in xmit path

Message ID 4ADE0770.8060708@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Oct. 20, 2009, 6:54 p.m. UTC
Eric Dumazet a écrit :

> David, we are changing skb->mapping while transmitting it...
> 
> So yes, it can break pktgen if its skbs are cloned.
> 
> Maybe pktgen should call skb_record_rx_queue() instead of skb_set_queue_mapping()
> 
> Or maybe we should avoid this +/- 1 thing we do in skb_record_rx_queue(), since
> we use same skb field for recording rx_queue or tx_queue :(
> 

A possible fix would be :




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ben Greear Oct. 20, 2009, 8:16 p.m. UTC | #1
On 10/20/2009 11:54 AM, Eric Dumazet wrote:
> Eric Dumazet a écrit :
>
>> David, we are changing skb->mapping while transmitting it...
>>
>> So yes, it can break pktgen if its skbs are cloned.
>>
>> Maybe pktgen should call skb_record_rx_queue() instead of skb_set_queue_mapping()
>>
>> Or maybe we should avoid this +/- 1 thing we do in skb_record_rx_queue(), since
>> we use same skb field for recording rx_queue or tx_queue :(
>>
>
> A possible fix would be :

I tried your patch.  It's not crashing, but the link is bouncing around:

Oct 20 13:13:34 localhost kernel: ixgbe 0000:05:00.0: master disable timed out
Oct 20 13:13:35 localhost kernel: ixgbe 0000:05:00.0: master disable timed out
Oct 20 13:13:35 localhost kernel: ixgbe 0000:05:00.0: master disable timed out
...

Oct 20 13:14:06 localhost kernel: ixgbe 0000:05:00.0: master disable timed out
Oct 20 13:14:06 localhost kernel: ixgbe: eth9 NIC Link is Down
Oct 20 13:14:06 localhost kernel: ixgbe: eth9 NIC Link is Up 10 Gbps, Flow Control: RX/TX
Oct 20 13:14:09 localhost kernel: ixgbe: eth8 NIC Link is Up 10 Gbps, Flow Control: RX/TX
Oct 20 13:14:12 localhost kernel: ixgbe 0000:05:00.1: master disable timed out
Oct 20 13:14:15 localhost kernel: ixgbe: eth9 NIC Link is Up 10 Gbps, Flow Control: RX/TX
Oct 20 13:14:19 localhost kernel: ixgbe: eth9 NIC Link is Down
Oct 20 13:14:19 localhost kernel: ixgbe 0000:05:00.0: master disable timed out


I'm not sure if this is another weirdness in my system or something
else...

Ben
Ben Greear Oct. 20, 2009, 9:10 p.m. UTC | #2
On 10/20/2009 11:54 AM, Eric Dumazet wrote:

> -	queue_map = skb_get_queue_mapping(pkt_dev->skb);
> +	queue_map = pkt_dev->cur_queue_map;
> +	/*
> +	 * tells skb_tx_hash() to use this tx queue.
> +	 * We should reset skb->mapping before each xmit() because
> +	 * xmit() might change it.
> +	 */
> +	skb_record_rx_queue(pkt_dev->skb, queue_map);
>   	txq = netdev_get_tx_queue(odev, queue_map);

I think that must be wrong.  The record_rx_queue sets it to queue_map + 1,
but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes the
skb->queue_map and uses it as an index with no subtraction.

This causes watchdog timeouts because we are calling txq_trans_update in pktgen on
queue 0, for instance, but sending pkts on queue 1.  If queue 1 is ever
busy when the WD fires, link is reset.

Thanks,
Ben
Eric Dumazet Oct. 20, 2009, 9:22 p.m. UTC | #3
Ben Greear a écrit :
> On 10/20/2009 11:54 AM, Eric Dumazet wrote:
> 
>> -    queue_map = skb_get_queue_mapping(pkt_dev->skb);
>> +    queue_map = pkt_dev->cur_queue_map;
>> +    /*
>> +     * tells skb_tx_hash() to use this tx queue.
>> +     * We should reset skb->mapping before each xmit() because
>> +     * xmit() might change it.
>> +     */
>> +    skb_record_rx_queue(pkt_dev->skb, queue_map);
>>       txq = netdev_get_tx_queue(odev, queue_map);
> 
> I think that must be wrong.  The record_rx_queue sets it to queue_map + 1,
> but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes the
> skb->queue_map and uses it as an index with no subtraction.


Yes but check net/core/dev.c I quoted in my previous mail :
We change queue_map if skb goes through dev_queue_xmit()
(as done by macvlan)

u16 skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb)
{
        u32 hash;

        if (skb_rx_queue_recorded(skb)) {
                hash = skb_get_rx_queue(skb);
                while (unlikely(hash >= dev->real_num_tx_queues))
                        hash -= dev->real_num_tx_queues;
                return hash;
        }

        if (skb->sk && skb->sk->sk_hash)
                hash = skb->sk->sk_hash;
        else
                hash = skb->protocol;

        hash = jhash_1word(hash, skb_tx_hashrnd);

        return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32);
}
EXPORT_SYMBOL(skb_tx_hash);

static struct netdev_queue *dev_pick_tx(struct net_device *dev,
                                        struct sk_buff *skb)
{
        const struct net_device_ops *ops = dev->netdev_ops;
        u16 queue_index = 0;

        if (ops->ndo_select_queue)
                queue_index = ops->ndo_select_queue(dev, skb);
        else if (dev->real_num_tx_queues > 1)
                queue_index = skb_tx_hash(dev, skb);

        skb_set_queue_mapping(skb, queue_index);
        return netdev_get_tx_queue(dev, queue_index);
}

So if skb->queue_mapping was X+1 before entering dev_pick_tx(), it is X when
leaving dev_pick_tx()

> 
> This causes watchdog timeouts because we are calling txq_trans_update in
> pktgen on
> queue 0, for instance, but sending pkts on queue 1.  If queue 1 is ever
> busy when the WD fires, link is reset.
> 

Problem is not pktgen IMHO. 

Problem is skb->queue_mapping has different meaning if skb is directly given to a real device -> start_xmit()

( In this case skb->queue_mapping should be between [O ... real_num_tx_queues-1])

But if it goes through dev_queue_xmit(), it should be set between [1 .. real_num_tx_queues], because
dev_pick_tx() will decrement skb->queue_mapping

In fact skb->queue_mapping only works for forwarded packets, not locally generated ones.

I am too tired to cook a fix at this moment, sorry :(
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear Oct. 20, 2009, 9:30 p.m. UTC | #4
On 10/20/2009 02:22 PM, Eric Dumazet wrote:

> Problem is skb->queue_mapping has different meaning if skb is directly given to a real device ->  start_xmit()
>
> ( In this case skb->queue_mapping should be between [O ... real_num_tx_queues-1])
>
> But if it goes through dev_queue_xmit(), it should be set between [1 .. real_num_tx_queues], because
> dev_pick_tx() will decrement skb->queue_mapping
>
> In fact skb->queue_mapping only works for forwarded packets, not locally generated ones.
>
> I am too tired to cook a fix at this moment, sorry :(

That's definitely a nasty little issue.  Using skb_set_queue_mapping
in pktgen makes it run for me, but may just be getting lucky with the
mac-vlan interfaces which will do the dev_queue_xmit (but, I don't so much
care exactly what queue is used as long as things don't crash and the
link doesn't reset).

Don't worry about a quick patch on my account.  I seem to have it working
to at least some degree (no funny crashes, no link watchdog timeouts).

Thanks,
Ben
Krishna Kumar Oct. 21, 2009, 5:12 a.m. UTC | #5
Ben Greear <greearb@candelatech.com>  wrote on 10/21/2009 02:40:13 AM:

Coming back a bit to this post:

> > -   queue_map = skb_get_queue_mapping(pkt_dev->skb);
> > +   queue_map = pkt_dev->cur_queue_map;
> > +   /*
> > +    * tells skb_tx_hash() to use this tx queue.
> > +    * We should reset skb->mapping before each xmit() because
> > +    * xmit() might change it.
> > +    */
> > +   skb_record_rx_queue(pkt_dev->skb, queue_map);
> >      txq = netdev_get_tx_queue(odev, queue_map);
>
> I think that must be wrong.  The record_rx_queue sets it to queue_map +
1,
> but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes
the
> skb->queue_map and uses it as an index with no subtraction.

But that should work fine. record_rx_q sets queue_mapping to +1,
but skb_tx_hash calls skb_get_rx_queue, which does a -1 on this
value, and updates that value into queue_mapping. Hence it will
not cross the txq boundary. Drivers can use the queue_map value
directly without requiring to subtract.

Thanks,

- KK

>
> This causes watchdog timeouts because we are calling txq_trans_update in
pktgen on
> queue 0, for instance, but sending pkts on queue 1.  If queue 1 is ever
> busy when the WD fires, link is reset.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Greear Oct. 21, 2009, 5:32 a.m. UTC | #6
Krishna Kumar2 wrote:
> Ben Greear <greearb@candelatech.com>  wrote on 10/21/2009 02:40:13 AM:
>
> Coming back a bit to this post:
>
>   
>>> -   queue_map = skb_get_queue_mapping(pkt_dev->skb);
>>> +   queue_map = pkt_dev->cur_queue_map;
>>> +   /*
>>> +    * tells skb_tx_hash() to use this tx queue.
>>> +    * We should reset skb->mapping before each xmit() because
>>> +    * xmit() might change it.
>>> +    */
>>> +   skb_record_rx_queue(pkt_dev->skb, queue_map);
>>>      txq = netdev_get_tx_queue(odev, queue_map);
>>>       
>> I think that must be wrong.  The record_rx_queue sets it to queue_map + 1,
>>     
>> but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes the
>> skb->queue_map and uses it as an index with no subtraction.
>>     
>
> But that should work fine. record_rx_q sets queue_mapping to +1,
> but skb_tx_hash calls skb_get_rx_queue, which does a -1 on this
> value, and updates that value into queue_mapping. Hence it will
> not cross the txq boundary. Drivers can use the queue_map value
> directly without requiring to subtract.
>   
When using pktgen on real physical hardware, there is none of the 
skb_tx_hash or dev_queue_xmit
logic called, just the hard-start-xmit.  That is why it fails to update 
the proper queue with
his first patch. 

On virtual devices like mac-vlans, the logic probably worked ok since it 
goes
through dev_queue_xmit.

Thanks,
Ben
diff mbox

Patch

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 86acdba..bbe4b2d 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2554,7 +2554,6 @@  static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
 	__be16 *vlan_encapsulated_proto = NULL;  /* packet type ID field (or len) for VLAN tag */
 	__be16 *svlan_tci = NULL;                /* Encapsulates priority and SVLAN ID */
 	__be16 *svlan_encapsulated_proto = NULL; /* packet type ID field (or len) for SVLAN tag */
-	u16 queue_map;
 
 	if (pkt_dev->nr_labels)
 		protocol = htons(ETH_P_MPLS_UC);
@@ -2565,7 +2564,6 @@  static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
 	/* Update any of the values, used when we're incrementing various
 	 * fields.
 	 */
-	queue_map = pkt_dev->cur_queue_map;
 	mod_cur_headers(pkt_dev);
 
 	datalen = (odev->hard_header_len + 16) & ~0xf;
@@ -2605,7 +2603,6 @@  static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
 	skb->network_header = skb->tail;
 	skb->transport_header = skb->network_header + sizeof(struct iphdr);
 	skb_put(skb, sizeof(struct iphdr) + sizeof(struct udphdr));
-	skb_set_queue_mapping(skb, queue_map);
 	iph = ip_hdr(skb);
 	udph = udp_hdr(skb);
 
@@ -2896,7 +2893,6 @@  static struct sk_buff *fill_packet_ipv6(struct net_device *odev,
 	__be16 *vlan_encapsulated_proto = NULL;  /* packet type ID field (or len) for VLAN tag */
 	__be16 *svlan_tci = NULL;                /* Encapsulates priority and SVLAN ID */
 	__be16 *svlan_encapsulated_proto = NULL; /* packet type ID field (or len) for SVLAN tag */
-	u16 queue_map;
 
 	if (pkt_dev->nr_labels)
 		protocol = htons(ETH_P_MPLS_UC);
@@ -2907,7 +2903,6 @@  static struct sk_buff *fill_packet_ipv6(struct net_device *odev,
 	/* Update any of the values, used when we're incrementing various
 	 * fields.
 	 */
-	queue_map = pkt_dev->cur_queue_map;
 	mod_cur_headers(pkt_dev);
 
 	skb = __netdev_alloc_skb(odev,
@@ -2946,7 +2941,6 @@  static struct sk_buff *fill_packet_ipv6(struct net_device *odev,
 	skb->network_header = skb->tail;
 	skb->transport_header = skb->network_header + sizeof(struct ipv6hdr);
 	skb_put(skb, sizeof(struct ipv6hdr) + sizeof(struct udphdr));
-	skb_set_queue_mapping(skb, queue_map);
 	iph = ipv6_hdr(skb);
 	udph = udp_hdr(skb);
 
@@ -3437,7 +3431,13 @@  static void pktgen_xmit(struct pktgen_dev *pkt_dev)
 	if (pkt_dev->delay && pkt_dev->last_ok)
 		spin(pkt_dev, pkt_dev->next_tx);
 
-	queue_map = skb_get_queue_mapping(pkt_dev->skb);
+	queue_map = pkt_dev->cur_queue_map;
+	/*
+	 * tells skb_tx_hash() to use this tx queue.
+	 * We should reset skb->mapping before each xmit() because
+	 * xmit() might change it.
+	 */
+	skb_record_rx_queue(pkt_dev->skb, queue_map);
 	txq = netdev_get_tx_queue(odev, queue_map);
 
 	__netif_tx_lock_bh(txq);