Message ID | 1469841772-119013-7-git-send-email-diproiettod@vmware.com |
---|---|
State | Superseded |
Headers | show |
> >Hi Daniele. Thanks for posting this. Hi Ilya, I actually implemented this patch as part of Daniele's MTU patchset, based on my earlier patch - Daniele mainly rebased it to head of master :) Thanks for your feedback - I've responded inline. Cheers, Mark >I have almost same patch in my local branch. > >I didn't test this with physical DPDK NICs yet, but I have few >high level comments: > >1. Do you thought about renaming of 'mtu_request' inside netdev-dpdk > to 'requested_mtu'? I think, this would be more clear and > consistent with other configurable parameters (n_rxq, n_txq, ...). 'mtu_request' was the name suggested by Daniele, following a discussion with colleagues. I don't have strong feelings either way, so I'll leave Daniele to comment. > >2. I'd prefer not to fail reconfiguration if there is no enough memory > for new mempool. I think, it'll be common situation when we are > requesting more memory than we have. Failure leads to destruction > of the port and inability to reconnect to vhost-user port after > re-creation if vhost is in server mode. We can just keep old > mempool and inform user via VLOG_ERR. > Agreed - I'll modify V2 accordingly. >3. Minor issues inline. Comments on these inline also. > >What do you think? > >Best regards, Ilya Maximets. > >On 30.07.2016 04:22, Daniele Di Proietto wrote: >> From: Mark Kavanagh <mark.b.kavanagh@intel.com> >> >> Add support for Jumbo Frames to DPDK-enabled port types, >> using single-segment-mbufs. >> >> Using this approach, the amount of memory allocated to each mbuf >> to store frame data is increased to a value greater than 1518B >> (typical Ethernet maximum frame length). The increased space >> available in the mbuf means that an entire Jumbo Frame of a specific >> size can be carried in a single mbuf, as opposed to partitioning >> it across multiple mbuf segments. >> >> The amount of space allocated to each mbuf to hold frame data is >> defined dynamically by the user with ovs-vsctl, via the 'mtu_request' >> parameter. >> >> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> >> [diproiettod@vmware.com rebased] >> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> >> --- >> INSTALL.DPDK-ADVANCED.md | 59 +++++++++++++++++- >> INSTALL.DPDK.md | 1 - >> NEWS | 1 + >> lib/netdev-dpdk.c | 151 +++++++++++++++++++++++++++++++++++++++-------- >> 4 files changed, 185 insertions(+), 27 deletions(-) >> >> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md >> index 191e69e..5cd64bf 100755 >> --- a/INSTALL.DPDK-ADVANCED.md >> +++ b/INSTALL.DPDK-ADVANCED.md >> @@ -1,5 +1,5 @@ >> OVS DPDK ADVANCED INSTALL GUIDE >> -================================= >> +=============================== >> >> ## Contents >> >> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE >> 7. [QOS](#qos) >> 8. [Rate Limiting](#rl) >> 9. [Flow Control](#fc) >> -10. [Vsperf](#vsperf) >> +10. [Jumbo Frames](#jumbo) >> +11. [Vsperf](#vsperf) >> >> ## <a name="overview"></a> 1. Overview >> >> @@ -862,7 +863,59 @@ respective parameter. To disable the flow control at tx side, >> >> `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false` >> >> -## <a name="vsperf"></a> 10. Vsperf >> +## <a name="jumbo"></a> 10. Jumbo Frames >> + >> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). To >> +enable Jumbo Frames support for a DPDK port, change the Interface's `mtu_request` >> +attribute to a sufficiently large value. >> + >> +e.g. Add a DPDK Phy port with MTU of 9000: >> + >> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set Interface dpdk0 >mtu_request=9000` >> + >> +e.g. Change the MTU of an existing port to 6200: >> + >> +`ovs-vsctl set Interface dpdk0 mtu_request=6200` >> + >> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are >> +increased, such that a full Jumbo Frame of a specific size may be accommodated >> +within a single mbuf segment. >> + >> +Jumbo frame support has been validated against 9728B frames (largest frame size >> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames >> +(particularly in use cases involving East-West traffic only), and other DPDK NIC >> +drivers may be supported. >> + >> +### 9.1 vHost Ports and Jumbo Frames >> + >> +Some additional configuration is needed to take advantage of jumbo frames with >> +vhost ports: >> + >> + 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in >> + the QEMU command line snippet below: >> + >> + ``` >> + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' >> + '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' >> + ``` >> + >> + 2. Where virtio devices are bound to the Linux kernel driver in a guest >> + environment (i.e. interfaces are not bound to an in-guest DPDK driver), >> + the MTU of those logical network interfaces must also be increased to a >> + sufficiently large value. This avoids segmentation of Jumbo Frames >> + received in the guest. Note that 'MTU' refers to the length of the IP >> + packet only, and not that of the entire frame. >> + >> + To calculate the exact MTU of a standard IPv4 frame, subtract the L2 >> + header and CRC lengths (i.e. 18B) from the max supported frame size. >> + So, to set the MTU for a 9018B Jumbo Frame: >> + >> + ``` >> + ifconfig eth1 mtu 9000 >> + ``` >> +>>>>>>> 5ec921d... netdev-dpdk: add support for Jumbo Frames > >Looks like rebasing artefact. Yup - I'll remove in V2. > >> + >> +## <a name="vsperf"></a> 11. Vsperf >> >> Vsperf project goal is to develop vSwitch test framework that can be used to >> validate the suitability of different vSwitch implementations in a Telco deployment >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >> index 7609aa7..25c79de 100644 >> --- a/INSTALL.DPDK.md >> +++ b/INSTALL.DPDK.md >> @@ -590,7 +590,6 @@ can be found in [Vhost Walkthrough]. >> >> ## <a name="ovslimits"></a> 6. Limitations >> >> - - Supports MTU size 1500, MTU setting for DPDK netdevs will be in future OVS release. >> - Currently DPDK ports does not use HW offload functionality. >> - Network Interface Firmware requirements: >> Each release of DPDK is validated against a specific firmware version for >> diff --git a/NEWS b/NEWS >> index 0ff5616..c004e5f 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -68,6 +68,7 @@ Post-v2.5.0 >> is enabled in DPDK. >> * Basic connection tracking for the userspace datapath (no ALG, >> fragmentation or NAT support yet) >> + * Jumbo frame support >> - Increase number of registers to 16. >> - ovs-benchmark: This utility has been removed due to lack of use and >> bitrot. >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >> index 0b6e410..68639ae 100644 >> --- a/lib/netdev-dpdk.c >> +++ b/lib/netdev-dpdk.c >> @@ -82,6 +82,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); >> + sizeof(struct dp_packet) \ >> + RTE_PKTMBUF_HEADROOM) >> #define NETDEV_DPDK_MBUF_ALIGN 1024 >> +#define NETDEV_DPDK_MAX_PKT_LEN 9728 >> >> /* Max and min number of packets in the mempool. OVS tries to allocate a >> * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have >> @@ -336,6 +337,7 @@ struct netdev_dpdk { >> struct ovs_mutex mutex OVS_ACQ_AFTER(dpdk_mutex); >> >> struct dpdk_mp *dpdk_mp; >> + int mtu_request; >> int mtu; >> int socket_id; >> int buf_size; >> @@ -474,10 +476,19 @@ dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex) >> dmp->mtu = mtu; >> dmp->refcount = 1; >> mbp_priv.mbuf_data_room_size = MBUF_SIZE(mtu) - sizeof(struct dp_packet); >> - mbp_priv.mbuf_priv_size = sizeof (struct dp_packet) - >> - sizeof (struct rte_mbuf); >> + mbp_priv.mbuf_priv_size = sizeof (struct dp_packet) >> + - sizeof (struct rte_mbuf); >> + /* XXX: this is a really rough method of provisioning memory. >> + * It's impossible to determine what the exact memory requirements are when >> + * the number of ports and rxqs that utilize a particular mempool can change >> + * dynamically at runtime. For the moment, use this rough heurisitic. >> + */ >> + if (mtu >= ETHER_MTU) { >> + mp_size = MAX_NB_MBUF; >> + } else { >> + mp_size = MIN_NB_MBUF; >> + } >> >> - mp_size = MAX_NB_MBUF; >> do { >> if (snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_mp_%d_%d_%u", >> dmp->mtu, dmp->socket_id, mp_size) < 0) { >> @@ -522,6 +533,32 @@ dpdk_mp_put(struct dpdk_mp *dmp) >> #endif >> } >> >> +static int >> +dpdk_mp_configure(struct netdev_dpdk *dev) >> + OVS_REQUIRES(dpdk_mutex) >> + OVS_REQUIRES(dev->mutex) >> +{ >> + uint32_t buf_size = dpdk_buf_size(dev->mtu); >> + struct dpdk_mp *mp_old, *mp; >> + >> + mp_old = dev->dpdk_mp; > >Do we need this variable? We can just put old mempool before >assigning the new one. Agreed - will remove in V2. > >> + >> + mp = dpdk_mp_get(dev->socket_id, FRAME_LEN_TO_MTU(buf_size)); >> + if (!mp) { >> + VLOG_ERR("Insufficient memory to create memory pool for netdev %s\n", >> + dev->up.name); > >+1 space character. Sure - will fix. > >> + return ENOMEM; >> + } >> + >> + dev->dpdk_mp = mp; >> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >> + >> + dpdk_mp_put(mp_old); >> + >> + return 0; >> +} >> + >> + >> static void >> check_link_status(struct netdev_dpdk *dev) >> { >> @@ -573,7 +610,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int >n_txq) >> { >> int diag = 0; >> int i; >> + struct rte_eth_conf conf = port_conf; >> >> + if (dev->mtu > ETHER_MTU) { >> + conf.rxmode.jumbo_frame = 1; >> + conf.rxmode.max_rx_pkt_len = dev->max_packet_len; >> + } else { >> + conf.rxmode.jumbo_frame = 0; >> + conf.rxmode.max_rx_pkt_len = 0; > >I know, it was implemented this way in the original patch, but I'm >not sure that all DPDK drivers will handle zero value of >'max_rx_pkt_len' in a right way. Out of interest, can you point to any examples of DPDK drivers handling max_rx_pkt_len = 0 with unintended behaviour? This field should only be of relevance if rxmode.jumbo_frame = 1, and 0 is its default value; if it is an issue though, I can just default instead to ETHER_MAX_LEN. > >> + } >> /* A device may report more queues than it makes available (this has >> * been observed for Intel xl710, which reserves some of them for >> * SRIOV): rte_eth_*_queue_setup will fail if a queue is not >> @@ -584,8 +629,10 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int >n_txq) >> VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq); >> } >> >> - diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &port_conf); >> + diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf); >> if (diag) { >> + VLOG_WARN("Interface %s eth_dev setup error %s\n", >> + dev->up.name, rte_strerror(-diag)); >> break; >> } >> >> @@ -738,7 +785,6 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no, >> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> int sid; >> int err = 0; >> - uint32_t buf_size; >> >> ovs_mutex_init(&dev->mutex); >> ovs_mutex_lock(&dev->mutex); >> @@ -759,13 +805,11 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no, >> dev->port_id = port_no; >> dev->type = type; >> dev->flags = 0; >> - dev->mtu = ETHER_MTU; >> + dev->mtu_request = dev->mtu = ETHER_MTU; >> dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >> >> - buf_size = dpdk_buf_size(dev->mtu); >> - dev->dpdk_mp = dpdk_mp_get(dev->socket_id, FRAME_LEN_TO_MTU(buf_size)); >> - if (!dev->dpdk_mp) { >> - err = ENOMEM; >> + err = dpdk_mp_configure(dev); >> + if (err) { >> goto unlock; >> } >> >> @@ -986,6 +1030,7 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) >> smap_add_format(args, "configured_rx_queues", "%d", netdev->n_rxq); >> smap_add_format(args, "requested_tx_queues", "%d", dev->requested_n_txq); >> smap_add_format(args, "configured_tx_queues", "%d", netdev->n_txq); >> + smap_add_format(args, "mtu", "%d", dev->mtu); >> ovs_mutex_unlock(&dev->mutex); >> >> return 0; >> @@ -1362,6 +1407,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid, >> struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts; >> unsigned int total_pkts = cnt; >> unsigned int qos_pkts = cnt; >> + unsigned int mtu_dropped = 0; >> int retries = 0; >> >> qid = dev->tx_q[qid % netdev->n_txq].map; >> @@ -1383,25 +1429,41 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid, >> do { >> int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; >> unsigned int tx_pkts; >> + unsigned int try_tx_pkts = cnt; >> >> + for (unsigned int i = 0; i < cnt; i++) { >> + if (cur_pkts[i]->pkt_len > dev->max_packet_len) { >> + try_tx_pkts = i; >> + break; >> + } >> + } >> + if (!try_tx_pkts) { >> + cur_pkts++; >> + mtu_dropped++; >> + cnt--; >> + continue; >> + } >> tx_pkts = rte_vhost_enqueue_burst(virtio_dev, vhost_qid, >> - cur_pkts, cnt); >> + cur_pkts, try_tx_pkts); >> if (OVS_LIKELY(tx_pkts)) { >> /* Packets have been sent.*/ >> cnt -= tx_pkts; >> /* Prepare for possible retry.*/ >> cur_pkts = &cur_pkts[tx_pkts]; >> + if (tx_pkts != try_tx_pkts) { >> + retries++; >> + } >> } else { >> /* No packets sent - do not retry.*/ >> break; >> } >> - } while (cnt && (retries++ < VHOST_ENQ_RETRY_NUM)); >> + } while (cnt && (retries <= VHOST_ENQ_RETRY_NUM)); >> >> rte_spinlock_unlock(&dev->tx_q[qid].tx_lock); >> >> rte_spinlock_lock(&dev->stats_lock); >> - cnt += qos_pkts; >> - netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts, cnt); >> + netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts, >> + cnt + mtu_dropped + qos_pkts); >> rte_spinlock_unlock(&dev->stats_lock); >> >> out: >> @@ -1635,6 +1697,26 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup) >> } >> >> static int >> +netdev_dpdk_set_mtu(struct netdev *netdev, int mtu) >> +{ >> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> + >> + if (MTU_TO_FRAME_LEN(mtu) > NETDEV_DPDK_MAX_PKT_LEN) { >> + VLOG_WARN("Unsupported MTU (%d)\n", mtu); >> + return EINVAL; >> + } >> + >> + ovs_mutex_lock(&dev->mutex); >> + if (dev->mtu_request != mtu) { >> + dev->mtu_request = mtu; >> + netdev_request_reconfigure(netdev); >> + } >> + ovs_mutex_unlock(&dev->mutex); >> + >> + return 0; >> +} >> + >> +static int >> netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier); >> >> static int >> @@ -2787,7 +2869,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) >> ovs_mutex_lock(&dev->mutex); >> >> if (netdev->n_txq == dev->requested_n_txq >> - && netdev->n_rxq == dev->requested_n_rxq) { >> + && netdev->n_rxq == dev->requested_n_rxq >> + && dev->mtu == dev->mtu_request) { >> /* Reconfiguration is unnecessary */ >> >> goto out; >> @@ -2795,6 +2878,14 @@ netdev_dpdk_reconfigure(struct netdev *netdev) >> >> rte_eth_dev_stop(dev->port_id); >> >> + if (dev->mtu != dev->mtu_request) { >> + dev->mtu = dev->mtu_request; >> + err = dpdk_mp_configure(dev); >> + if (err) { >> + goto out; >> + } >> + } >> + >> netdev->n_txq = dev->requested_n_txq; >> netdev->n_rxq = dev->requested_n_rxq; >> >> @@ -2802,6 +2893,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) >> err = dpdk_eth_dev_init(dev); >> netdev_dpdk_alloc_txq(dev, netdev->n_txq); >> >> + netdev_change_seq_changed(netdev); >> + >> out: >> >> ovs_mutex_unlock(&dev->mutex); >> @@ -2830,20 +2923,23 @@ netdev_dpdk_vhost_user_reconfigure(struct netdev *netdev) >> >> netdev_dpdk_remap_txqs(dev); >> >> - if (dev->requested_socket_id != dev->socket_id) { >> + if (dev->requested_socket_id != dev->socket_id >> + || dev->mtu_request != dev->mtu) { >> dev->socket_id = dev->requested_socket_id; >> - /* Change mempool to new NUMA Node */ >> - dpdk_mp_put(dev->dpdk_mp); >> - dev->dpdk_mp = dpdk_mp_get(dev->socket_id, dev->mtu); >> - if (!dev->dpdk_mp) { >> - err = ENOMEM; >> + dev->mtu = dev->mtu_request; >> + /* Change mempool to new NUMA Node and to new MTU. */ >> + err = dpdk_mp_configure(dev); >> + if (err) { >> + goto out; >> } >> + netdev_change_seq_changed(netdev); >> } >> >> if (virtio_dev) { >> virtio_dev->flags |= VIRTIO_DEV_RUNNING; >> } >> >> +out: >> ovs_mutex_unlock(&dev->mutex); >> ovs_mutex_unlock(&dpdk_mutex); >> >> @@ -2854,6 +2950,7 @@ static int >> netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev) >> { >> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> + int err = 0; >> >> ovs_mutex_lock(&dpdk_mutex); >> ovs_mutex_lock(&dev->mutex); >> @@ -2861,10 +2958,18 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev) >> netdev->n_txq = dev->requested_n_txq; >> netdev->n_rxq = 1; >> >> + if (dev->mtu_request != dev->mtu) { >> + /* Change mempool to new MTU. */ >> + err = dpdk_mp_configure(dev); >> + if (!err) { >> + netdev_change_seq_changed(netdev); >> + } >> + } >> + >> ovs_mutex_unlock(&dev->mutex); >> ovs_mutex_unlock(&dpdk_mutex); >> >> - return 0; >> + return err; >> } >> >> #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ >> @@ -2898,7 +3003,7 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev) >> netdev_dpdk_set_etheraddr, \ >> netdev_dpdk_get_etheraddr, \ >> netdev_dpdk_get_mtu, \ >> - NULL, /* set_mtu */ \ >> + netdev_dpdk_set_mtu, \ >> netdev_dpdk_get_ifindex, \ >> GET_CARRIER, \ >> netdev_dpdk_get_carrier_resets, \ >>
Hi, Mark. On 03.08.2016 15:14, Kavanagh, Mark B wrote: >> >> Hi Daniele. Thanks for posting this. > > Hi Ilya, > > I actually implemented this patch as part of Daniele's MTU patchset, based on my earlier patch - Daniele mainly rebased it to head of master :) > > Thanks for your feedback - I've responded inline. > > Cheers, > Mark > >> I have almost same patch in my local branch. >> >> I didn't test this with physical DPDK NICs yet, but I have few >> high level comments: >> >> 1. Do you thought about renaming of 'mtu_request' inside netdev-dpdk >> to 'requested_mtu'? I think, this would be more clear and >> consistent with other configurable parameters (n_rxq, n_txq, ...). > > 'mtu_request' was the name suggested by Daniele, following a discussion with colleagues. > I don't have strong feelings either way, so I'll leave Daniele to comment. I meant only renaming of 'netdev_dpdk->mtu_request' to 'netdev_dpdk->requested_mtu'. Database column should be 'mtu_request' as it is now. >> >> 2. I'd prefer not to fail reconfiguration if there is no enough memory >> for new mempool. I think, it'll be common situation when we are >> requesting more memory than we have. Failure leads to destruction >> of the port and inability to reconnect to vhost-user port after >> re-creation if vhost is in server mode. We can just keep old >> mempool and inform user via VLOG_ERR. >> > Agreed - I'll modify V2 accordingly. > > >> 3. Minor issues inline. > > Comments on these inline also. > >> >> What do you think? >> >> Best regards, Ilya Maximets. >> >> On 30.07.2016 04:22, Daniele Di Proietto wrote: >>> From: Mark Kavanagh <mark.b.kavanagh@intel.com> >>> >>> Add support for Jumbo Frames to DPDK-enabled port types, >>> using single-segment-mbufs. >>> >>> Using this approach, the amount of memory allocated to each mbuf >>> to store frame data is increased to a value greater than 1518B >>> (typical Ethernet maximum frame length). The increased space >>> available in the mbuf means that an entire Jumbo Frame of a specific >>> size can be carried in a single mbuf, as opposed to partitioning >>> it across multiple mbuf segments. >>> >>> The amount of space allocated to each mbuf to hold frame data is >>> defined dynamically by the user with ovs-vsctl, via the 'mtu_request' >>> parameter. >>> >>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com> >>> [diproiettod@vmware.com rebased] >>> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> >>> --- >>> INSTALL.DPDK-ADVANCED.md | 59 +++++++++++++++++- >>> INSTALL.DPDK.md | 1 - >>> NEWS | 1 + >>> lib/netdev-dpdk.c | 151 +++++++++++++++++++++++++++++++++++++++-------- >>> 4 files changed, 185 insertions(+), 27 deletions(-) >>> >>> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md >>> index 191e69e..5cd64bf 100755 >>> --- a/INSTALL.DPDK-ADVANCED.md >>> +++ b/INSTALL.DPDK-ADVANCED.md >>> @@ -1,5 +1,5 @@ >>> OVS DPDK ADVANCED INSTALL GUIDE >>> -================================= >>> +=============================== >>> >>> ## Contents >>> >>> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE >>> 7. [QOS](#qos) >>> 8. [Rate Limiting](#rl) >>> 9. [Flow Control](#fc) >>> -10. [Vsperf](#vsperf) >>> +10. [Jumbo Frames](#jumbo) >>> +11. [Vsperf](#vsperf) >>> >>> ## <a name="overview"></a> 1. Overview >>> >>> @@ -862,7 +863,59 @@ respective parameter. To disable the flow control at tx side, >>> >>> `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false` >>> >>> -## <a name="vsperf"></a> 10. Vsperf >>> +## <a name="jumbo"></a> 10. Jumbo Frames >>> + >>> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). To >>> +enable Jumbo Frames support for a DPDK port, change the Interface's `mtu_request` >>> +attribute to a sufficiently large value. >>> + >>> +e.g. Add a DPDK Phy port with MTU of 9000: >>> + >>> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set Interface dpdk0 >> mtu_request=9000` >>> + >>> +e.g. Change the MTU of an existing port to 6200: >>> + >>> +`ovs-vsctl set Interface dpdk0 mtu_request=6200` >>> + >>> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are >>> +increased, such that a full Jumbo Frame of a specific size may be accommodated >>> +within a single mbuf segment. >>> + >>> +Jumbo frame support has been validated against 9728B frames (largest frame size >>> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames >>> +(particularly in use cases involving East-West traffic only), and other DPDK NIC >>> +drivers may be supported. >>> + >>> +### 9.1 vHost Ports and Jumbo Frames >>> + >>> +Some additional configuration is needed to take advantage of jumbo frames with >>> +vhost ports: >>> + >>> + 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in >>> + the QEMU command line snippet below: >>> + >>> + ``` >>> + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' >>> + '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' >>> + ``` >>> + >>> + 2. Where virtio devices are bound to the Linux kernel driver in a guest >>> + environment (i.e. interfaces are not bound to an in-guest DPDK driver), >>> + the MTU of those logical network interfaces must also be increased to a >>> + sufficiently large value. This avoids segmentation of Jumbo Frames >>> + received in the guest. Note that 'MTU' refers to the length of the IP >>> + packet only, and not that of the entire frame. >>> + >>> + To calculate the exact MTU of a standard IPv4 frame, subtract the L2 >>> + header and CRC lengths (i.e. 18B) from the max supported frame size. >>> + So, to set the MTU for a 9018B Jumbo Frame: >>> + >>> + ``` >>> + ifconfig eth1 mtu 9000 >>> + ``` >>> +>>>>>>> 5ec921d... netdev-dpdk: add support for Jumbo Frames >> >> Looks like rebasing artefact. > > Yup - I'll remove in V2. > >> >>> + >>> +## <a name="vsperf"></a> 11. Vsperf >>> >>> Vsperf project goal is to develop vSwitch test framework that can be used to >>> validate the suitability of different vSwitch implementations in a Telco deployment >>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >>> index 7609aa7..25c79de 100644 >>> --- a/INSTALL.DPDK.md >>> +++ b/INSTALL.DPDK.md >>> @@ -590,7 +590,6 @@ can be found in [Vhost Walkthrough]. >>> >>> ## <a name="ovslimits"></a> 6. Limitations >>> >>> - - Supports MTU size 1500, MTU setting for DPDK netdevs will be in future OVS release. >>> - Currently DPDK ports does not use HW offload functionality. >>> - Network Interface Firmware requirements: >>> Each release of DPDK is validated against a specific firmware version for >>> diff --git a/NEWS b/NEWS >>> index 0ff5616..c004e5f 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -68,6 +68,7 @@ Post-v2.5.0 >>> is enabled in DPDK. >>> * Basic connection tracking for the userspace datapath (no ALG, >>> fragmentation or NAT support yet) >>> + * Jumbo frame support >>> - Increase number of registers to 16. >>> - ovs-benchmark: This utility has been removed due to lack of use and >>> bitrot. >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>> index 0b6e410..68639ae 100644 >>> --- a/lib/netdev-dpdk.c >>> +++ b/lib/netdev-dpdk.c >>> @@ -82,6 +82,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); >>> + sizeof(struct dp_packet) \ >>> + RTE_PKTMBUF_HEADROOM) >>> #define NETDEV_DPDK_MBUF_ALIGN 1024 >>> +#define NETDEV_DPDK_MAX_PKT_LEN 9728 >>> >>> /* Max and min number of packets in the mempool. OVS tries to allocate a >>> * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have >>> @@ -336,6 +337,7 @@ struct netdev_dpdk { >>> struct ovs_mutex mutex OVS_ACQ_AFTER(dpdk_mutex); >>> >>> struct dpdk_mp *dpdk_mp; >>> + int mtu_request; >>> int mtu; >>> int socket_id; >>> int buf_size; >>> @@ -474,10 +476,19 @@ dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex) >>> dmp->mtu = mtu; >>> dmp->refcount = 1; >>> mbp_priv.mbuf_data_room_size = MBUF_SIZE(mtu) - sizeof(struct dp_packet); >>> - mbp_priv.mbuf_priv_size = sizeof (struct dp_packet) - >>> - sizeof (struct rte_mbuf); >>> + mbp_priv.mbuf_priv_size = sizeof (struct dp_packet) >>> + - sizeof (struct rte_mbuf); >>> + /* XXX: this is a really rough method of provisioning memory. >>> + * It's impossible to determine what the exact memory requirements are when >>> + * the number of ports and rxqs that utilize a particular mempool can change >>> + * dynamically at runtime. For the moment, use this rough heurisitic. >>> + */ >>> + if (mtu >= ETHER_MTU) { >>> + mp_size = MAX_NB_MBUF; >>> + } else { >>> + mp_size = MIN_NB_MBUF; >>> + } >>> >>> - mp_size = MAX_NB_MBUF; >>> do { >>> if (snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_mp_%d_%d_%u", >>> dmp->mtu, dmp->socket_id, mp_size) < 0) { >>> @@ -522,6 +533,32 @@ dpdk_mp_put(struct dpdk_mp *dmp) >>> #endif >>> } >>> >>> +static int >>> +dpdk_mp_configure(struct netdev_dpdk *dev) >>> + OVS_REQUIRES(dpdk_mutex) >>> + OVS_REQUIRES(dev->mutex) >>> +{ >>> + uint32_t buf_size = dpdk_buf_size(dev->mtu); >>> + struct dpdk_mp *mp_old, *mp; >>> + >>> + mp_old = dev->dpdk_mp; >> >> Do we need this variable? We can just put old mempool before >> assigning the new one. > > Agreed - will remove in V2. > >> >>> + >>> + mp = dpdk_mp_get(dev->socket_id, FRAME_LEN_TO_MTU(buf_size)); >>> + if (!mp) { >>> + VLOG_ERR("Insufficient memory to create memory pool for netdev %s\n", >>> + dev->up.name); >> >> +1 space character. > > Sure - will fix. > >> >>> + return ENOMEM; >>> + } >>> + >>> + dev->dpdk_mp = mp; >>> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>> + >>> + dpdk_mp_put(mp_old); >>> + >>> + return 0; >>> +} >>> + >>> + >>> static void >>> check_link_status(struct netdev_dpdk *dev) >>> { >>> @@ -573,7 +610,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int >> n_txq) >>> { >>> int diag = 0; >>> int i; >>> + struct rte_eth_conf conf = port_conf; >>> >>> + if (dev->mtu > ETHER_MTU) { >>> + conf.rxmode.jumbo_frame = 1; >>> + conf.rxmode.max_rx_pkt_len = dev->max_packet_len; >>> + } else { >>> + conf.rxmode.jumbo_frame = 0; >>> + conf.rxmode.max_rx_pkt_len = 0; >> >> I know, it was implemented this way in the original patch, but I'm >> not sure that all DPDK drivers will handle zero value of >> 'max_rx_pkt_len' in a right way. > > Out of interest, can you point to any examples of DPDK drivers handling max_rx_pkt_len = 0 with unintended behaviour? > This field should only be of relevance if rxmode.jumbo_frame = 1, and 0 is its default value; if it is an issue though, I don't know exactly. One place in DPDK i40e driver confuses me a little. It's function 'i40evf_rxq_init()' in drivers/net/i40e/i40e_ethdev_vf.c . It checks the value of 'rxq->max_pkt_len' if 'dev_conf.rxmode.jumbo_frame == 0'. > I can just default instead to ETHER_MAX_LEN. Will VLANs be handled properly in this case?
diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md index 191e69e..5cd64bf 100755 --- a/INSTALL.DPDK-ADVANCED.md +++ b/INSTALL.DPDK-ADVANCED.md @@ -1,5 +1,5 @@ OVS DPDK ADVANCED INSTALL GUIDE -================================= +=============================== ## Contents @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE 7. [QOS](#qos) 8. [Rate Limiting](#rl) 9. [Flow Control](#fc) -10. [Vsperf](#vsperf) +10. [Jumbo Frames](#jumbo) +11. [Vsperf](#vsperf) ## <a name="overview"></a> 1. Overview @@ -862,7 +863,59 @@ respective parameter. To disable the flow control at tx side, `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false` -## <a name="vsperf"></a> 10. Vsperf +## <a name="jumbo"></a> 10. Jumbo Frames + +By default, DPDK ports are configured with standard Ethernet MTU (1500B). To +enable Jumbo Frames support for a DPDK port, change the Interface's `mtu_request` +attribute to a sufficiently large value. + +e.g. Add a DPDK Phy port with MTU of 9000: + +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set Interface dpdk0 mtu_request=9000` + +e.g. Change the MTU of an existing port to 6200: + +`ovs-vsctl set Interface dpdk0 mtu_request=6200` + +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are +increased, such that a full Jumbo Frame of a specific size may be accommodated +within a single mbuf segment. + +Jumbo frame support has been validated against 9728B frames (largest frame size +supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames +(particularly in use cases involving East-West traffic only), and other DPDK NIC +drivers may be supported. + +### 9.1 vHost Ports and Jumbo Frames + +Some additional configuration is needed to take advantage of jumbo frames with +vhost ports: + + 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in + the QEMU command line snippet below: + + ``` + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' + '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' + ``` + + 2. Where virtio devices are bound to the Linux kernel driver in a guest + environment (i.e. interfaces are not bound to an in-guest DPDK driver), + the MTU of those logical network interfaces must also be increased to a + sufficiently large value. This avoids segmentation of Jumbo Frames + received in the guest. Note that 'MTU' refers to the length of the IP + packet only, and not that of the entire frame. + + To calculate the exact MTU of a standard IPv4 frame, subtract the L2 + header and CRC lengths (i.e. 18B) from the max supported frame size. + So, to set the MTU for a 9018B Jumbo Frame: + + ``` + ifconfig eth1 mtu 9000 + ``` +>>>>>>> 5ec921d... netdev-dpdk: add support for Jumbo Frames + +## <a name="vsperf"></a> 11. Vsperf Vsperf project goal is to develop vSwitch test framework that can be used to validate the suitability of different vSwitch implementations in a Telco deployment diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 7609aa7..25c79de 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -590,7 +590,6 @@ can be found in [Vhost Walkthrough]. ## <a name="ovslimits"></a> 6. Limitations - - Supports MTU size 1500, MTU setting for DPDK netdevs will be in future OVS release. - Currently DPDK ports does not use HW offload functionality. - Network Interface Firmware requirements: Each release of DPDK is validated against a specific firmware version for diff --git a/NEWS b/NEWS index 0ff5616..c004e5f 100644 --- a/NEWS +++ b/NEWS @@ -68,6 +68,7 @@ Post-v2.5.0 is enabled in DPDK. * Basic connection tracking for the userspace datapath (no ALG, fragmentation or NAT support yet) + * Jumbo frame support - Increase number of registers to 16. - ovs-benchmark: This utility has been removed due to lack of use and bitrot. diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0b6e410..68639ae 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -82,6 +82,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + sizeof(struct dp_packet) \ + RTE_PKTMBUF_HEADROOM) #define NETDEV_DPDK_MBUF_ALIGN 1024 +#define NETDEV_DPDK_MAX_PKT_LEN 9728 /* Max and min number of packets in the mempool. OVS tries to allocate a * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have @@ -336,6 +337,7 @@ struct netdev_dpdk { struct ovs_mutex mutex OVS_ACQ_AFTER(dpdk_mutex); struct dpdk_mp *dpdk_mp; + int mtu_request; int mtu; int socket_id; int buf_size; @@ -474,10 +476,19 @@ dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex) dmp->mtu = mtu; dmp->refcount = 1; mbp_priv.mbuf_data_room_size = MBUF_SIZE(mtu) - sizeof(struct dp_packet); - mbp_priv.mbuf_priv_size = sizeof (struct dp_packet) - - sizeof (struct rte_mbuf); + mbp_priv.mbuf_priv_size = sizeof (struct dp_packet) + - sizeof (struct rte_mbuf); + /* XXX: this is a really rough method of provisioning memory. + * It's impossible to determine what the exact memory requirements are when + * the number of ports and rxqs that utilize a particular mempool can change + * dynamically at runtime. For the moment, use this rough heurisitic. + */ + if (mtu >= ETHER_MTU) { + mp_size = MAX_NB_MBUF; + } else { + mp_size = MIN_NB_MBUF; + } - mp_size = MAX_NB_MBUF; do { if (snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_mp_%d_%d_%u", dmp->mtu, dmp->socket_id, mp_size) < 0) { @@ -522,6 +533,32 @@ dpdk_mp_put(struct dpdk_mp *dmp) #endif } +static int +dpdk_mp_configure(struct netdev_dpdk *dev) + OVS_REQUIRES(dpdk_mutex) + OVS_REQUIRES(dev->mutex) +{ + uint32_t buf_size = dpdk_buf_size(dev->mtu); + struct dpdk_mp *mp_old, *mp; + + mp_old = dev->dpdk_mp; + + mp = dpdk_mp_get(dev->socket_id, FRAME_LEN_TO_MTU(buf_size)); + if (!mp) { + VLOG_ERR("Insufficient memory to create memory pool for netdev %s\n", + dev->up.name); + return ENOMEM; + } + + dev->dpdk_mp = mp; + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); + + dpdk_mp_put(mp_old); + + return 0; +} + + static void check_link_status(struct netdev_dpdk *dev) { @@ -573,7 +610,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) { int diag = 0; int i; + struct rte_eth_conf conf = port_conf; + if (dev->mtu > ETHER_MTU) { + conf.rxmode.jumbo_frame = 1; + conf.rxmode.max_rx_pkt_len = dev->max_packet_len; + } else { + conf.rxmode.jumbo_frame = 0; + conf.rxmode.max_rx_pkt_len = 0; + } /* A device may report more queues than it makes available (this has * been observed for Intel xl710, which reserves some of them for * SRIOV): rte_eth_*_queue_setup will fail if a queue is not @@ -584,8 +629,10 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq); } - diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &port_conf); + diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf); if (diag) { + VLOG_WARN("Interface %s eth_dev setup error %s\n", + dev->up.name, rte_strerror(-diag)); break; } @@ -738,7 +785,6 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no, struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int sid; int err = 0; - uint32_t buf_size; ovs_mutex_init(&dev->mutex); ovs_mutex_lock(&dev->mutex); @@ -759,13 +805,11 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no, dev->port_id = port_no; dev->type = type; dev->flags = 0; - dev->mtu = ETHER_MTU; + dev->mtu_request = dev->mtu = ETHER_MTU; dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); - buf_size = dpdk_buf_size(dev->mtu); - dev->dpdk_mp = dpdk_mp_get(dev->socket_id, FRAME_LEN_TO_MTU(buf_size)); - if (!dev->dpdk_mp) { - err = ENOMEM; + err = dpdk_mp_configure(dev); + if (err) { goto unlock; } @@ -986,6 +1030,7 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) smap_add_format(args, "configured_rx_queues", "%d", netdev->n_rxq); smap_add_format(args, "requested_tx_queues", "%d", dev->requested_n_txq); smap_add_format(args, "configured_tx_queues", "%d", netdev->n_txq); + smap_add_format(args, "mtu", "%d", dev->mtu); ovs_mutex_unlock(&dev->mutex); return 0; @@ -1362,6 +1407,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid, struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts; unsigned int total_pkts = cnt; unsigned int qos_pkts = cnt; + unsigned int mtu_dropped = 0; int retries = 0; qid = dev->tx_q[qid % netdev->n_txq].map; @@ -1383,25 +1429,41 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid, do { int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; unsigned int tx_pkts; + unsigned int try_tx_pkts = cnt; + for (unsigned int i = 0; i < cnt; i++) { + if (cur_pkts[i]->pkt_len > dev->max_packet_len) { + try_tx_pkts = i; + break; + } + } + if (!try_tx_pkts) { + cur_pkts++; + mtu_dropped++; + cnt--; + continue; + } tx_pkts = rte_vhost_enqueue_burst(virtio_dev, vhost_qid, - cur_pkts, cnt); + cur_pkts, try_tx_pkts); if (OVS_LIKELY(tx_pkts)) { /* Packets have been sent.*/ cnt -= tx_pkts; /* Prepare for possible retry.*/ cur_pkts = &cur_pkts[tx_pkts]; + if (tx_pkts != try_tx_pkts) { + retries++; + } } else { /* No packets sent - do not retry.*/ break; } - } while (cnt && (retries++ < VHOST_ENQ_RETRY_NUM)); + } while (cnt && (retries <= VHOST_ENQ_RETRY_NUM)); rte_spinlock_unlock(&dev->tx_q[qid].tx_lock); rte_spinlock_lock(&dev->stats_lock); - cnt += qos_pkts; - netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts, cnt); + netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts, + cnt + mtu_dropped + qos_pkts); rte_spinlock_unlock(&dev->stats_lock); out: @@ -1635,6 +1697,26 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup) } static int +netdev_dpdk_set_mtu(struct netdev *netdev, int mtu) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + + if (MTU_TO_FRAME_LEN(mtu) > NETDEV_DPDK_MAX_PKT_LEN) { + VLOG_WARN("Unsupported MTU (%d)\n", mtu); + return EINVAL; + } + + ovs_mutex_lock(&dev->mutex); + if (dev->mtu_request != mtu) { + dev->mtu_request = mtu; + netdev_request_reconfigure(netdev); + } + ovs_mutex_unlock(&dev->mutex); + + return 0; +} + +static int netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier); static int @@ -2787,7 +2869,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) ovs_mutex_lock(&dev->mutex); if (netdev->n_txq == dev->requested_n_txq - && netdev->n_rxq == dev->requested_n_rxq) { + && netdev->n_rxq == dev->requested_n_rxq + && dev->mtu == dev->mtu_request) { /* Reconfiguration is unnecessary */ goto out; @@ -2795,6 +2878,14 @@ netdev_dpdk_reconfigure(struct netdev *netdev) rte_eth_dev_stop(dev->port_id); + if (dev->mtu != dev->mtu_request) { + dev->mtu = dev->mtu_request; + err = dpdk_mp_configure(dev); + if (err) { + goto out; + } + } + netdev->n_txq = dev->requested_n_txq; netdev->n_rxq = dev->requested_n_rxq; @@ -2802,6 +2893,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) err = dpdk_eth_dev_init(dev); netdev_dpdk_alloc_txq(dev, netdev->n_txq); + netdev_change_seq_changed(netdev); + out: ovs_mutex_unlock(&dev->mutex); @@ -2830,20 +2923,23 @@ netdev_dpdk_vhost_user_reconfigure(struct netdev *netdev) netdev_dpdk_remap_txqs(dev); - if (dev->requested_socket_id != dev->socket_id) { + if (dev->requested_socket_id != dev->socket_id + || dev->mtu_request != dev->mtu) { dev->socket_id = dev->requested_socket_id; - /* Change mempool to new NUMA Node */ - dpdk_mp_put(dev->dpdk_mp); - dev->dpdk_mp = dpdk_mp_get(dev->socket_id, dev->mtu); - if (!dev->dpdk_mp) { - err = ENOMEM; + dev->mtu = dev->mtu_request; + /* Change mempool to new NUMA Node and to new MTU. */ + err = dpdk_mp_configure(dev); + if (err) { + goto out; } + netdev_change_seq_changed(netdev); } if (virtio_dev) { virtio_dev->flags |= VIRTIO_DEV_RUNNING; } +out: ovs_mutex_unlock(&dev->mutex); ovs_mutex_unlock(&dpdk_mutex); @@ -2854,6 +2950,7 @@ static int netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + int err = 0; ovs_mutex_lock(&dpdk_mutex); ovs_mutex_lock(&dev->mutex); @@ -2861,10 +2958,18 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev) netdev->n_txq = dev->requested_n_txq; netdev->n_rxq = 1; + if (dev->mtu_request != dev->mtu) { + /* Change mempool to new MTU. */ + err = dpdk_mp_configure(dev); + if (!err) { + netdev_change_seq_changed(netdev); + } + } + ovs_mutex_unlock(&dev->mutex); ovs_mutex_unlock(&dpdk_mutex); - return 0; + return err; } #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ @@ -2898,7 +3003,7 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev) netdev_dpdk_set_etheraddr, \ netdev_dpdk_get_etheraddr, \ netdev_dpdk_get_mtu, \ - NULL, /* set_mtu */ \ + netdev_dpdk_set_mtu, \ netdev_dpdk_get_ifindex, \ GET_CARRIER, \ netdev_dpdk_get_carrier_resets, \