Message ID | 4ADE0770.8060708@gmail.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On 10/20/2009 11:54 AM, Eric Dumazet wrote: > Eric Dumazet a écrit : > >> David, we are changing skb->mapping while transmitting it... >> >> So yes, it can break pktgen if its skbs are cloned. >> >> Maybe pktgen should call skb_record_rx_queue() instead of skb_set_queue_mapping() >> >> Or maybe we should avoid this +/- 1 thing we do in skb_record_rx_queue(), since >> we use same skb field for recording rx_queue or tx_queue :( >> > > A possible fix would be : I tried your patch. It's not crashing, but the link is bouncing around: Oct 20 13:13:34 localhost kernel: ixgbe 0000:05:00.0: master disable timed out Oct 20 13:13:35 localhost kernel: ixgbe 0000:05:00.0: master disable timed out Oct 20 13:13:35 localhost kernel: ixgbe 0000:05:00.0: master disable timed out ... Oct 20 13:14:06 localhost kernel: ixgbe 0000:05:00.0: master disable timed out Oct 20 13:14:06 localhost kernel: ixgbe: eth9 NIC Link is Down Oct 20 13:14:06 localhost kernel: ixgbe: eth9 NIC Link is Up 10 Gbps, Flow Control: RX/TX Oct 20 13:14:09 localhost kernel: ixgbe: eth8 NIC Link is Up 10 Gbps, Flow Control: RX/TX Oct 20 13:14:12 localhost kernel: ixgbe 0000:05:00.1: master disable timed out Oct 20 13:14:15 localhost kernel: ixgbe: eth9 NIC Link is Up 10 Gbps, Flow Control: RX/TX Oct 20 13:14:19 localhost kernel: ixgbe: eth9 NIC Link is Down Oct 20 13:14:19 localhost kernel: ixgbe 0000:05:00.0: master disable timed out I'm not sure if this is another weirdness in my system or something else... Ben
On 10/20/2009 11:54 AM, Eric Dumazet wrote: > - queue_map = skb_get_queue_mapping(pkt_dev->skb); > + queue_map = pkt_dev->cur_queue_map; > + /* > + * tells skb_tx_hash() to use this tx queue. > + * We should reset skb->mapping before each xmit() because > + * xmit() might change it. > + */ > + skb_record_rx_queue(pkt_dev->skb, queue_map); > txq = netdev_get_tx_queue(odev, queue_map); I think that must be wrong. The record_rx_queue sets it to queue_map + 1, but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes the skb->queue_map and uses it as an index with no subtraction. This causes watchdog timeouts because we are calling txq_trans_update in pktgen on queue 0, for instance, but sending pkts on queue 1. If queue 1 is ever busy when the WD fires, link is reset. Thanks, Ben
Ben Greear a écrit : > On 10/20/2009 11:54 AM, Eric Dumazet wrote: > >> - queue_map = skb_get_queue_mapping(pkt_dev->skb); >> + queue_map = pkt_dev->cur_queue_map; >> + /* >> + * tells skb_tx_hash() to use this tx queue. >> + * We should reset skb->mapping before each xmit() because >> + * xmit() might change it. >> + */ >> + skb_record_rx_queue(pkt_dev->skb, queue_map); >> txq = netdev_get_tx_queue(odev, queue_map); > > I think that must be wrong. The record_rx_queue sets it to queue_map + 1, > but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes the > skb->queue_map and uses it as an index with no subtraction. Yes but check net/core/dev.c I quoted in my previous mail : We change queue_map if skb goes through dev_queue_xmit() (as done by macvlan) u16 skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb) { u32 hash; if (skb_rx_queue_recorded(skb)) { hash = skb_get_rx_queue(skb); while (unlikely(hash >= dev->real_num_tx_queues)) hash -= dev->real_num_tx_queues; return hash; } if (skb->sk && skb->sk->sk_hash) hash = skb->sk->sk_hash; else hash = skb->protocol; hash = jhash_1word(hash, skb_tx_hashrnd); return (u16) (((u64) hash * dev->real_num_tx_queues) >> 32); } EXPORT_SYMBOL(skb_tx_hash); static struct netdev_queue *dev_pick_tx(struct net_device *dev, struct sk_buff *skb) { const struct net_device_ops *ops = dev->netdev_ops; u16 queue_index = 0; if (ops->ndo_select_queue) queue_index = ops->ndo_select_queue(dev, skb); else if (dev->real_num_tx_queues > 1) queue_index = skb_tx_hash(dev, skb); skb_set_queue_mapping(skb, queue_index); return netdev_get_tx_queue(dev, queue_index); } So if skb->queue_mapping was X+1 before entering dev_pick_tx(), it is X when leaving dev_pick_tx() > > This causes watchdog timeouts because we are calling txq_trans_update in > pktgen on > queue 0, for instance, but sending pkts on queue 1. If queue 1 is ever > busy when the WD fires, link is reset. > Problem is not pktgen IMHO. Problem is skb->queue_mapping has different meaning if skb is directly given to a real device -> start_xmit() ( In this case skb->queue_mapping should be between [O ... real_num_tx_queues-1]) But if it goes through dev_queue_xmit(), it should be set between [1 .. real_num_tx_queues], because dev_pick_tx() will decrement skb->queue_mapping In fact skb->queue_mapping only works for forwarded packets, not locally generated ones. I am too tired to cook a fix at this moment, sorry :( -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/20/2009 02:22 PM, Eric Dumazet wrote: > Problem is skb->queue_mapping has different meaning if skb is directly given to a real device -> start_xmit() > > ( In this case skb->queue_mapping should be between [O ... real_num_tx_queues-1]) > > But if it goes through dev_queue_xmit(), it should be set between [1 .. real_num_tx_queues], because > dev_pick_tx() will decrement skb->queue_mapping > > In fact skb->queue_mapping only works for forwarded packets, not locally generated ones. > > I am too tired to cook a fix at this moment, sorry :( That's definitely a nasty little issue. Using skb_set_queue_mapping in pktgen makes it run for me, but may just be getting lucky with the mac-vlan interfaces which will do the dev_queue_xmit (but, I don't so much care exactly what queue is used as long as things don't crash and the link doesn't reset). Don't worry about a quick patch on my account. I seem to have it working to at least some degree (no funny crashes, no link watchdog timeouts). Thanks, Ben
Ben Greear <greearb@candelatech.com> wrote on 10/21/2009 02:40:13 AM: Coming back a bit to this post: > > - queue_map = skb_get_queue_mapping(pkt_dev->skb); > > + queue_map = pkt_dev->cur_queue_map; > > + /* > > + * tells skb_tx_hash() to use this tx queue. > > + * We should reset skb->mapping before each xmit() because > > + * xmit() might change it. > > + */ > > + skb_record_rx_queue(pkt_dev->skb, queue_map); > > txq = netdev_get_tx_queue(odev, queue_map); > > I think that must be wrong. The record_rx_queue sets it to queue_map + 1, > but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes the > skb->queue_map and uses it as an index with no subtraction. But that should work fine. record_rx_q sets queue_mapping to +1, but skb_tx_hash calls skb_get_rx_queue, which does a -1 on this value, and updates that value into queue_mapping. Hence it will not cross the txq boundary. Drivers can use the queue_map value directly without requiring to subtract. Thanks, - KK > > This causes watchdog timeouts because we are calling txq_trans_update in pktgen on > queue 0, for instance, but sending pkts on queue 1. If queue 1 is ever > busy when the WD fires, link is reset. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Krishna Kumar2 wrote: > Ben Greear <greearb@candelatech.com> wrote on 10/21/2009 02:40:13 AM: > > Coming back a bit to this post: > > >>> - queue_map = skb_get_queue_mapping(pkt_dev->skb); >>> + queue_map = pkt_dev->cur_queue_map; >>> + /* >>> + * tells skb_tx_hash() to use this tx queue. >>> + * We should reset skb->mapping before each xmit() because >>> + * xmit() might change it. >>> + */ >>> + skb_record_rx_queue(pkt_dev->skb, queue_map); >>> txq = netdev_get_tx_queue(odev, queue_map); >>> >> I think that must be wrong. The record_rx_queue sets it to queue_map + 1, >> >> but the hard-start-xmit method (in ixgbe/ixgbe_main.c, at least), takes the >> skb->queue_map and uses it as an index with no subtraction. >> > > But that should work fine. record_rx_q sets queue_mapping to +1, > but skb_tx_hash calls skb_get_rx_queue, which does a -1 on this > value, and updates that value into queue_mapping. Hence it will > not cross the txq boundary. Drivers can use the queue_map value > directly without requiring to subtract. > When using pktgen on real physical hardware, there is none of the skb_tx_hash or dev_queue_xmit logic called, just the hard-start-xmit. That is why it fails to update the proper queue with his first patch. On virtual devices like mac-vlans, the logic probably worked ok since it goes through dev_queue_xmit. Thanks, Ben
diff --git a/net/core/pktgen.c b/net/core/pktgen.c index 86acdba..bbe4b2d 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -2554,7 +2554,6 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev, __be16 *vlan_encapsulated_proto = NULL; /* packet type ID field (or len) for VLAN tag */ __be16 *svlan_tci = NULL; /* Encapsulates priority and SVLAN ID */ __be16 *svlan_encapsulated_proto = NULL; /* packet type ID field (or len) for SVLAN tag */ - u16 queue_map; if (pkt_dev->nr_labels) protocol = htons(ETH_P_MPLS_UC); @@ -2565,7 +2564,6 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev, /* Update any of the values, used when we're incrementing various * fields. */ - queue_map = pkt_dev->cur_queue_map; mod_cur_headers(pkt_dev); datalen = (odev->hard_header_len + 16) & ~0xf; @@ -2605,7 +2603,6 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev, skb->network_header = skb->tail; skb->transport_header = skb->network_header + sizeof(struct iphdr); skb_put(skb, sizeof(struct iphdr) + sizeof(struct udphdr)); - skb_set_queue_mapping(skb, queue_map); iph = ip_hdr(skb); udph = udp_hdr(skb); @@ -2896,7 +2893,6 @@ static struct sk_buff *fill_packet_ipv6(struct net_device *odev, __be16 *vlan_encapsulated_proto = NULL; /* packet type ID field (or len) for VLAN tag */ __be16 *svlan_tci = NULL; /* Encapsulates priority and SVLAN ID */ __be16 *svlan_encapsulated_proto = NULL; /* packet type ID field (or len) for SVLAN tag */ - u16 queue_map; if (pkt_dev->nr_labels) protocol = htons(ETH_P_MPLS_UC); @@ -2907,7 +2903,6 @@ static struct sk_buff *fill_packet_ipv6(struct net_device *odev, /* Update any of the values, used when we're incrementing various * fields. */ - queue_map = pkt_dev->cur_queue_map; mod_cur_headers(pkt_dev); skb = __netdev_alloc_skb(odev, @@ -2946,7 +2941,6 @@ static struct sk_buff *fill_packet_ipv6(struct net_device *odev, skb->network_header = skb->tail; skb->transport_header = skb->network_header + sizeof(struct ipv6hdr); skb_put(skb, sizeof(struct ipv6hdr) + sizeof(struct udphdr)); - skb_set_queue_mapping(skb, queue_map); iph = ipv6_hdr(skb); udph = udp_hdr(skb); @@ -3437,7 +3431,13 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev) if (pkt_dev->delay && pkt_dev->last_ok) spin(pkt_dev, pkt_dev->next_tx); - queue_map = skb_get_queue_mapping(pkt_dev->skb); + queue_map = pkt_dev->cur_queue_map; + /* + * tells skb_tx_hash() to use this tx queue. + * We should reset skb->mapping before each xmit() because + * xmit() might change it. + */ + skb_record_rx_queue(pkt_dev->skb, queue_map); txq = netdev_get_tx_queue(odev, queue_map); __netif_tx_lock_bh(txq);