Message ID | 20220117155715.29431-2-tim.gardner@canonical.com |
---|---|
State | New |
Headers | show |
Series | [hirsute/linux] veth: Do not record rx queue hint in veth_xmit | expand |
On 17.01.22 16:57, Tim Gardner wrote: > From: Daniel Borkmann <daniel@iogearbox.net> > > BugLink: https://bugs.launchpad.net/bugs/1958155 > > commit 710ad98c363a66a0cd8526465426c5c5f8377ee0 upstream. > > Laurent reported that they have seen a significant amount of TCP retransmissions > at high throughput from applications residing in network namespaces talking to > the outside world via veths. The drops were seen on the qdisc layer (fq_codel, > as per systemd default) of the phys device such as ena or virtio_net due to all > traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the > setup was _not_ using XDP on veths as the issue is generic.) > > More specifically, after edbea9220251 ("veth: Store queue_mapping independently > of XDP prog presence") which made it all the way back to v4.19.184+, > skb_record_rx_queue() would set skb->queue_mapping to 1 (given 1 RX and 1 TX > queue by default for veths) instead of leaving at 0. > > This is eventually retained and callbacks like ena_select_queue() will also pick > single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic > is forwarded to that device via upper stack or other means. Similarly, for others > not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might > call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well. > > In general, it is a _bad_ idea for virtual devices like veth to mess around with > queue selection [by default]. Given dev->real_num_tx_queues is by default 1, > the skb->queue_mapping was left untouched, and so prior to edbea9220251 the > netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device. > > Unbreak this and restore prior behavior by removing the skb_record_rx_queue() > from veth_xmit() altogether. > > If the veth peer has an XDP program attached, then it would return the first RX > queue index in xdp_md->rx_queue_index (unless configured in non-default manner). > However, this is still better than breaking the generic case. > > Fixes: edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence") > Fixes: 638264dc9022 ("veth: Support per queue XDP ring") > Reported-by: Laurent Bernaille <laurent.bernaille@datadoghq.com> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > Cc: Toshiaki Makita <toshiaki.makita1@gmail.com> > Cc: Eric Dumazet <eric.dumazet@gmail.com> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: John Fastabend <john.fastabend@gmail.com> > Cc: Willem de Bruijn <willemb@google.com> > Acked-by: John Fastabend <john.fastabend@gmail.com> > Reviewed-by: Eric Dumazet <edumazet@google.com> > Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > (cherry picked from commit 5f76445a31b79be85b337cce5f35affb77cc18fc linux-stable linux-5.10.y) > Signed-off-by: Tim Gardner <tim.gardner@canonical.com> > --- Applied to focal:linux-hwe-5.11/hwe-5.11-next (Hirsute EOL). Thanks. -Stefan > drivers/net/veth.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index 36abe756282e..0fc3210ca69c 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -301,7 +301,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) > if (rxq < rcv->real_num_rx_queues) { > rq = &rcv_priv->rq[rxq]; > rcv_xdp = rcu_access_pointer(rq->xdp_prog); > - skb_record_rx_queue(skb, rxq); > } > > skb_tx_timestamp(skb);
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 36abe756282e..0fc3210ca69c 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -301,7 +301,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev) if (rxq < rcv->real_num_rx_queues) { rq = &rcv_priv->rq[rxq]; rcv_xdp = rcu_access_pointer(rq->xdp_prog); - skb_record_rx_queue(skb, rxq); } skb_tx_timestamp(skb);