pktgen and spin_lock_bh in xmit path

Message ID	4ADE0770.8060708@gmail.com
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Message-ID: <4ADE0770.8060708@gmail.com> Date: Tue, 20 Oct 2009 20:54:40 +0200 From: Eric Dumazet <eric.dumazet@gmail.com> User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 CC: Ben Greear <greearb@candelatech.com>, NetDev <netdev@vger.kernel.org>, robert@herjulf.net, "David S. Miller" <davem@davemloft.net> Subject: Re: pktgen and spin_lock_bh in xmit path References: <4ADD309B.1040505@candelatech.com> <4ADD32FA.6030409@gmail.com> <4ADD41F5.5080707@candelatech.com> <4ADDF560.1020509@candelatech.com> <4ADDF6E5.4070509@gmail.com> <4ADDF948.1050208@candelatech.com> <4ADE0306.6060101@gmail.com> In-Reply-To: <4ADE0306.6060101@gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit To: unlisted-recipients:; (no To-header on input) Sender: netdev-owner@vger.kernel.org Precedence: bulk

Eric Dumazet Oct. 20, 2009, 6:54 p.m. UTC

Eric Dumazet a écrit :

> David, we are changing skb->mapping while transmitting it...
> 
> So yes, it can break pktgen if its skbs are cloned.
> 
> Maybe pktgen should call skb_record_rx_queue() instead of skb_set_queue_mapping()
> 
> Or maybe we should avoid this +/- 1 thing we do in skb_record_rx_queue(), since
> we use same skb field for recording rx_queue or tx_queue :(
> 

A possible fix would be :




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ben Greear Oct. 20, 2009, 8:16 p.m. UTC | #1

On 10/20/2009 11:54 AM, Eric Dumazet wrote:
> Eric Dumazet a écrit :
>
>> David, we are changing skb->mapping while transmitting it...
>>
>> So yes, it can break pktgen if its skbs are cloned.
>>
>> Maybe pktgen should call skb_record_rx_queue() instead of skb_set_queue_mapping()
>>
>> Or maybe we should avoid this +/- 1 thing we do in skb_record_rx_queue(), since
>> we use same skb field for recording rx_queue or tx_queue :(
>>
>
> A possible fix would be :

I tried your patch.  It's not crashing, but the link is bouncing around:

Oct 20 13:13:34 localhost kernel: ixgbe 0000:05:00.0: master disable timed out
Oct 20 13:13:35 localhost kernel: ixgbe 0000:05:00.0: master disable timed out
Oct 20 13:13:35 localhost kernel: ixgbe 0000:05:00.0: master disable timed out
...

Oct 20 13:14:06 localhost kernel: ixgbe 0000:05:00.0: master disable timed out
Oct 20 13:14:06 localhost kernel: ixgbe: eth9 NIC Link is Down
Oct 20 13:14:06 localhost kernel: ixgbe: eth9 NIC Link is Up 10 Gbps, Flow Control: RX/TX
Oct 20 13:14:09 localhost kernel: ixgbe: eth8 NIC Link is Up 10 Gbps, Flow Control: RX/TX
Oct 20 13:14:12 localhost kernel: ixgbe 0000:05:00.1: master disable timed out
Oct 20 13:14:15 localhost kernel: ixgbe: eth9 NIC Link is Up 10 Gbps, Flow Control: RX/TX
Oct 20 13:14:19 localhost kernel: ixgbe: eth9 NIC Link is Down
Oct 20 13:14:19 localhost kernel: ixgbe 0000:05:00.0: master disable timed out


I'm not sure if this is another weirdness in my system or something
else...

Ben

Ben Greear Oct. 20, 2009, 9:10 p.m. UTC | #2

On 10/20/2009 11:54 AM, Eric Dumazet wrote:

> -	queue_map = skb_get_queue_mapping(pkt_dev->skb);
> +	queue_map = pkt_dev->cur_queue_map;
> +	/*
> +	 * tells skb_tx_hash() to use this tx queue.
> +	 * We should reset skb->mapping before each xmit() because
> +	 * xmit() might change it.
> +	 */
> +	skb_record_rx_queue(pkt_dev->skb, queue_map);
>   	txq = netdev_get_tx_queue(odev, queue_map);

I think that must be wrong.  The record_rx_queue sets it to queue_map + 1,
but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes the
skb->queue_map and uses it as an index with no subtraction.

This causes watchdog timeouts because we are calling txq_trans_update in pktgen on
queue 0, for instance, but sending pkts on queue 1.  If queue 1 is ever
busy when the WD fires, link is reset.

Thanks,
Ben

Eric Dumazet Oct. 20, 2009, 9:22 p.m. UTC | #3

Ben Greear a écrit :
> On 10/20/2009 11:54 AM, Eric Dumazet wrote:
> 
>> -    queue_map = skb_get_queue_mapping(pkt_dev->skb);
>> +    queue_map = pkt_dev->cur_queue_map;
>> +    /*
>> +     * tells skb_tx_hash() to use this tx queue.
>> +     * We should reset skb->mapping before each xmit() because
>> +     * xmit() might change it.
>> +     */
>> +    skb_record_rx_queue(pkt_dev->skb, queue_map);
>>       txq = netdev_get_tx_queue(odev, queue_map);
> 
> I think that must be wrong.  The record_rx_queue sets it to queue_map + 1,
> but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes the
> skb->queue_map and uses it as an index with no subtraction.


Yes but check net/core/dev.c I quoted in my previous mail :
We change queue_map if skb goes through dev_queue_xmit()
(as done by macvlan)

u16 skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb)
{
        u32 hash;

        if (skb_rx_queue_recorded(skb)) {
                hash = skb_get_rx_queue(skb);
                while (unlikely(hash >= dev->real_num_tx_queues))
                        hash -= dev->real_num_tx_queues;
                return hash;
        }

        if (skb->sk && skb->sk->sk_hash)
                hash = skb->sk->sk_hash;
        else
                hash = skb->protocol;

        hash = jhash_1word(hash, skb_tx_hashrnd);

        return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32);
}
EXPORT_SYMBOL(skb_tx_hash);

static struct netdev_queue *dev_pick_tx(struct net_device *dev,
                                        struct sk_buff *skb)
{
        const struct net_device_ops *ops = dev->netdev_ops;
        u16 queue_index = 0;

        if (ops->ndo_select_queue)
                queue_index = ops->ndo_select_queue(dev, skb);
        else if (dev->real_num_tx_queues > 1)
                queue_index = skb_tx_hash(dev, skb);

        skb_set_queue_mapping(skb, queue_index);
        return netdev_get_tx_queue(dev, queue_index);
}

So if skb->queue_mapping was X+1 before entering dev_pick_tx(), it is X when
leaving dev_pick_tx()

> 
> This causes watchdog timeouts because we are calling txq_trans_update in
> pktgen on
> queue 0, for instance, but sending pkts on queue 1.  If queue 1 is ever
> busy when the WD fires, link is reset.
> 

Problem is not pktgen IMHO. 

Problem is skb->queue_mapping has different meaning if skb is directly given to a real device -> start_xmit()

( In this case skb->queue_mapping should be between [O ... real_num_tx_queues-1])

But if it goes through dev_queue_xmit(), it should be set between [1 .. real_num_tx_queues], because
dev_pick_tx() will decrement skb->queue_mapping

In fact skb->queue_mapping only works for forwarded packets, not locally generated ones.

I am too tired to cook a fix at this moment, sorry :(
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ben Greear Oct. 20, 2009, 9:30 p.m. UTC | #4

On 10/20/2009 02:22 PM, Eric Dumazet wrote:

> Problem is skb->queue_mapping has different meaning if skb is directly given to a real device ->  start_xmit()
>
> ( In this case skb->queue_mapping should be between [O ... real_num_tx_queues-1])
>
> But if it goes through dev_queue_xmit(), it should be set between [1 .. real_num_tx_queues], because
> dev_pick_tx() will decrement skb->queue_mapping
>
> In fact skb->queue_mapping only works for forwarded packets, not locally generated ones.
>
> I am too tired to cook a fix at this moment, sorry :(

That's definitely a nasty little issue.  Using skb_set_queue_mapping
in pktgen makes it run for me, but may just be getting lucky with the
mac-vlan interfaces which will do the dev_queue_xmit (but, I don't so much
care exactly what queue is used as long as things don't crash and the
link doesn't reset).

Don't worry about a quick patch on my account.  I seem to have it working
to at least some degree (no funny crashes, no link watchdog timeouts).

Thanks,
Ben

Krishna Kumar Oct. 21, 2009, 5:12 a.m. UTC | #5

Ben Greear <greearb@candelatech.com>  wrote on 10/21/2009 02:40:13 AM:

Coming back a bit to this post:

> > -   queue_map = skb_get_queue_mapping(pkt_dev->skb);
> > +   queue_map = pkt_dev->cur_queue_map;
> > +   /*
> > +    * tells skb_tx_hash() to use this tx queue.
> > +    * We should reset skb->mapping before each xmit() because
> > +    * xmit() might change it.
> > +    */
> > +   skb_record_rx_queue(pkt_dev->skb, queue_map);
> >      txq = netdev_get_tx_queue(odev, queue_map);
>
> I think that must be wrong.  The record_rx_queue sets it to queue_map +
1,
> but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes
the
> skb->queue_map and uses it as an index with no subtraction.

But that should work fine. record_rx_q sets queue_mapping to +1,
but skb_tx_hash calls skb_get_rx_queue, which does a -1 on this
value, and updates that value into queue_mapping. Hence it will
not cross the txq boundary. Drivers can use the queue_map value
directly without requiring to subtract.

Thanks,

- KK

>
> This causes watchdog timeouts because we are calling txq_trans_update in
pktgen on
> queue 0, for instance, but sending pkts on queue 1.  If queue 1 is ever
> busy when the WD fires, link is reset.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ben Greear Oct. 21, 2009, 5:32 a.m. UTC | #6

Krishna Kumar2 wrote:
> Ben Greear <greearb@candelatech.com>  wrote on 10/21/2009 02:40:13 AM:
>
> Coming back a bit to this post:
>
>   
>>> -   queue_map = skb_get_queue_mapping(pkt_dev->skb);
>>> +   queue_map = pkt_dev->cur_queue_map;
>>> +   /*
>>> +    * tells skb_tx_hash() to use this tx queue.
>>> +    * We should reset skb->mapping before each xmit() because
>>> +    * xmit() might change it.
>>> +    */
>>> +   skb_record_rx_queue(pkt_dev->skb, queue_map);
>>>      txq = netdev_get_tx_queue(odev, queue_map);
>>>       
>> I think that must be wrong.  The record_rx_queue sets it to queue_map + 1,
>>     
>> but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes the
>> skb->queue_map and uses it as an index with no subtraction.
>>     
>
> But that should work fine. record_rx_q sets queue_mapping to +1,
> but skb_tx_hash calls skb_get_rx_queue, which does a -1 on this
> value, and updates that value into queue_mapping. Hence it will
> not cross the txq boundary. Drivers can use the queue_map value
> directly without requiring to subtract.
>   
When using pktgen on real physical hardware, there is none of the 
skb_tx_hash or dev_queue_xmit
logic called, just the hard-start-xmit.  That is why it fails to update 
the proper queue with
his first patch. 

On virtual devices like mac-vlans, the logic probably worked ok since it 
goes
through dev_queue_xmit.

Thanks,
Ben

pktgen and spin_lock_bh in xmit path

Commit Message

Comments

Patch