diff mbox

[ovs-dev] netdev-dpdk: Add Jumbo Frame Support.

Message ID 1447254362-17271-1-git-send-email-mark.b.kavanagh@intel.com
State Changes Requested
Headers show

Commit Message

Mark Kavanagh Nov. 11, 2015, 3:06 p.m. UTC
Add support for Jumbo Frames to DPDK-enabled port types,
using single-segment-mbufs.

Using this approach, the amount of memory allocated for each mbuf
to store frame data is increased to a value greater than 1518B
(typical Ethernet maximum frame length). The increased space
available in the mbuf means that an entire Jumbo Frame can be carried
in a single mbuf, as opposed to partitioning it across multiple mbuf
segments.

The amount of space allocated to each mbuf to hold frame data is
defined by the user at compile time; if this frame length is not a
multiple of the DPDK NIC driver's minimum Rx buffer length, the frame
length is rounded up to the closest value that is.

Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
---
 INSTALL.DPDK.md   |   67 ++++++++++++++++++++-
 lib/netdev-dpdk.c |  176 ++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 207 insertions(+), 36 deletions(-)

Comments

Flavio Leitner Nov. 17, 2015, 5:25 p.m. UTC | #1
On Wed, Nov 11, 2015 at 03:06:02PM +0000, Mark Kavanagh wrote:
> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
> 
> Using this approach, the amount of memory allocated for each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame can be carried
> in a single mbuf, as opposed to partitioning it across multiple mbuf
> segments.
> 
> The amount of space allocated to each mbuf to hold frame data is
> defined by the user at compile time; if this frame length is not a
> multiple of the DPDK NIC driver's minimum Rx buffer length, the frame
> length is rounded up to the closest value that is.
> 
> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> ---
>  INSTALL.DPDK.md   |   67 ++++++++++++++++++++-
>  lib/netdev-dpdk.c |  176 ++++++++++++++++++++++++++++++++++++++++++-----------
>  2 files changed, 207 insertions(+), 36 deletions(-)
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 96b686c..9a30f88 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -859,10 +859,70 @@ by adding the following string:
>  to <interface> sections of all network devices used by DPDK. Parameter 'N'
>  determines how many queues can be used by the guest.
>  
> +
> +Jumbo Frames
> +------------
> +
> +Support for Jumbo Frames may be enabled at compile-time for DPDK-type ports.

It seems this could be dynamic and proportional to the MTU being used
by the port and not a compile-time option which depends on the NIC
hardware specs. Perhaps I am missing something.

Thanks,
fbl
Qiu, Michael Nov. 18, 2015, 1:41 a.m. UTC | #2
On 2015/11/18 1:25, Flavio Leitner wrote:
> On Wed, Nov 11, 2015 at 03:06:02PM +0000, Mark Kavanagh wrote:
>> Add support for Jumbo Frames to DPDK-enabled port types,
>> using single-segment-mbufs.
>>
>> Using this approach, the amount of memory allocated for each mbuf
>> to store frame data is increased to a value greater than 1518B
>> (typical Ethernet maximum frame length). The increased space
>> available in the mbuf means that an entire Jumbo Frame can be carried
>> in a single mbuf, as opposed to partitioning it across multiple mbuf
>> segments.
>>
>> The amount of space allocated to each mbuf to hold frame data is
>> defined by the user at compile time; if this frame length is not a
>> multiple of the DPDK NIC driver's minimum Rx buffer length, the frame
>> length is rounded up to the closest value that is.
>>
>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>> ---
>>  INSTALL.DPDK.md   |   67 ++++++++++++++++++++-
>>  lib/netdev-dpdk.c |  176 ++++++++++++++++++++++++++++++++++++++++++-----------
>>  2 files changed, 207 insertions(+), 36 deletions(-)
>>
>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>> index 96b686c..9a30f88 100644
>> --- a/INSTALL.DPDK.md
>> +++ b/INSTALL.DPDK.md
>> @@ -859,10 +859,70 @@ by adding the following string:
>>  to <interface> sections of all network devices used by DPDK. Parameter 'N'
>>  determines how many queues can be used by the guest.
>>  
>> +
>> +Jumbo Frames
>> +------------
>> +
>> +Support for Jumbo Frames may be enabled at compile-time for DPDK-type ports.
> It seems this could be dynamic and proportional to the MTU being used
> by the port and not a compile-time option which depends on the NIC
> hardware specs. Perhaps I am missing something.

It make sense.

And this patch really solve a big issue that when I transmit 1400 packet
size, ovs-vsvitchd will return a "Bus error", although 1400 should not
be a Jumbo frame.

Mark, could it be an option when start vswitchd with dpdk, thus when
users try to using Jumbo Frame, it will not need to re-compile ovs, just
an advise :)

What's more, we could config it in run time, just like we dynamically
config the queue numbers, but this has a lot work to do, it could be a
further feature.

Thanks,
Michael
> Thanks,
> fbl
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
Mark Kavanagh Nov. 18, 2015, 3:03 p.m. UTC | #3
>On Wed, Nov 11, 2015 at 03:06:02PM +0000, Mark Kavanagh wrote:
>> Add support for Jumbo Frames to DPDK-enabled port types,
>> using single-segment-mbufs.
>>
>> Using this approach, the amount of memory allocated for each mbuf
>> to store frame data is increased to a value greater than 1518B
>> (typical Ethernet maximum frame length). The increased space
>> available in the mbuf means that an entire Jumbo Frame can be carried
>> in a single mbuf, as opposed to partitioning it across multiple mbuf
>> segments.
>>
>> The amount of space allocated to each mbuf to hold frame data is
>> defined by the user at compile time; if this frame length is not a
>> multiple of the DPDK NIC driver's minimum Rx buffer length, the frame
>> length is rounded up to the closest value that is.
>>
>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>> ---
>>  INSTALL.DPDK.md   |   67 ++++++++++++++++++++-
>>  lib/netdev-dpdk.c |  176 ++++++++++++++++++++++++++++++++++++++++++-----------
>>  2 files changed, 207 insertions(+), 36 deletions(-)
>>
>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>> index 96b686c..9a30f88 100644
>> --- a/INSTALL.DPDK.md
>> +++ b/INSTALL.DPDK.md
>> @@ -859,10 +859,70 @@ by adding the following string:
>>  to <interface> sections of all network devices used by DPDK. Parameter 'N'
>>  determines how many queues can be used by the guest.
>>
>> +
>> +Jumbo Frames
>> +------------
>> +
>> +Support for Jumbo Frames may be enabled at compile-time for DPDK-type ports.
>
>It seems this could be dynamic and proportional to the MTU being used
>by the port and not a compile-time option which depends on the NIC
>hardware specs. Perhaps I am missing something.
>

Hi Flavio - thanks for your feedback.

Just to clarify, when you say 'dynamic', I presume that you mean that the MTU can be specified at runtime. Bearing this in mind, is the behavior that you have in mind that:
	- the MTU for each DPDK port should be specified when the port is added to a bridge?
	- and the granularity at which MTUs are assigned to DPDK-type ports should be on a per-port basis, rather than the (admittedly) coarse-grained 'one-for-all' approach implemented here?

It's worth noting that due to the fact that DPDK ports are outside the reach of the Linux kernel, their MTU can't be changed using standard tools, such as 'ifconfig', 'ip link', etc. Furthermore, the OVS 'Interface' table's 'MTU' attribute cannot currently be set programmatically, outside of the aforementioned tools. In this way, the MTU of DPDK ports is always limited in how dynamic it can be, hence, once set, the MTU for a DPDK port is (currently) immutable.

I'm not quite sure if I follow the second part of your comment regarding proportionality to the port's MTU - could you elaborate a bit more on this?

Thanks in advance,
Mark

>Thanks,
>fbl
Mark Kavanagh Nov. 18, 2015, 3:13 p.m. UTC | #4
>
>On 2015/11/18 1:25, Flavio Leitner wrote:
>> On Wed, Nov 11, 2015 at 03:06:02PM +0000, Mark Kavanagh wrote:
>>> Add support for Jumbo Frames to DPDK-enabled port types,
>>> using single-segment-mbufs.
>>>
>>> Using this approach, the amount of memory allocated for each mbuf
>>> to store frame data is increased to a value greater than 1518B
>>> (typical Ethernet maximum frame length). The increased space
>>> available in the mbuf means that an entire Jumbo Frame can be carried
>>> in a single mbuf, as opposed to partitioning it across multiple mbuf
>>> segments.
>>>
>>> The amount of space allocated to each mbuf to hold frame data is
>>> defined by the user at compile time; if this frame length is not a
>>> multiple of the DPDK NIC driver's minimum Rx buffer length, the frame
>>> length is rounded up to the closest value that is.
>>>
>>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>>> ---
>>>  INSTALL.DPDK.md   |   67 ++++++++++++++++++++-
>>>  lib/netdev-dpdk.c |  176 ++++++++++++++++++++++++++++++++++++++++++-----------
>>>  2 files changed, 207 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>>> index 96b686c..9a30f88 100644
>>> --- a/INSTALL.DPDK.md
>>> +++ b/INSTALL.DPDK.md
>>> @@ -859,10 +859,70 @@ by adding the following string:
>>>  to <interface> sections of all network devices used by DPDK. Parameter 'N'
>>>  determines how many queues can be used by the guest.
>>>
>>> +
>>> +Jumbo Frames
>>> +------------
>>> +
>>> +Support for Jumbo Frames may be enabled at compile-time for DPDK-type ports.
>> It seems this could be dynamic and proportional to the MTU being used
>> by the port and not a compile-time option which depends on the NIC
>> hardware specs. Perhaps I am missing something.
>
>It make sense.
>
>And this patch really solve a big issue that when I transmit 1400 packet
>size, ovs-vsvitchd will return a "Bus error", although 1400 should not
>be a Jumbo frame.
>

Hi Michael,

Thanks for your feedback; I'm glad to hear that the patch solved that issue for you - I'm curious as to why you experience the Bus error for 1400B packet though.

>Mark, could it be an option when start vswitchd with dpdk, thus when
>users try to using Jumbo Frame, it will not need to re-compile ovs, just
>an advise :)
>

Agreed - I had intended to implement this functionality further down the line, depending on how this patch was received. I'll add a runtime flag in v2.

>What's more, we could config it in run time, just like we dynamically
>config the queue numbers, but this has a lot work to do, it could be a
>further feature.

Do you mean the 'other_config: n-dpdk-rxqs' field in the Open_vSwitch table?

I'm not sure how amenable a solution like this would be to the maintainers, seeing as there is already an 'MTU' field present in the 'Interfaces' table. Ben/Pravin - any thoughts as to supporting a separate MTU field for DPDK interfaces?

Thanks,
Mark

>
>Thanks,
>Michael
>> Thanks,
>> fbl
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
>>
Qiu, Michael Nov. 19, 2015, 5:11 a.m. UTC | #5
On 2015/11/18 23:13, Kavanagh, Mark B wrote:
>> On 2015/11/18 1:25, Flavio Leitner wrote:
>>> On Wed, Nov 11, 2015 at 03:06:02PM +0000, Mark Kavanagh wrote:
>>>> Add support for Jumbo Frames to DPDK-enabled port types,
>>>> using single-segment-mbufs.
>>>>
>>>> Using this approach, the amount of memory allocated for each mbuf
>>>> to store frame data is increased to a value greater than 1518B
>>>> (typical Ethernet maximum frame length). The increased space
>>>> available in the mbuf means that an entire Jumbo Frame can be carried
>>>> in a single mbuf, as opposed to partitioning it across multiple mbuf
>>>> segments.
>>>>
>>>> The amount of space allocated to each mbuf to hold frame data is
>>>> defined by the user at compile time; if this frame length is not a
>>>> multiple of the DPDK NIC driver's minimum Rx buffer length, the frame
>>>> length is rounded up to the closest value that is.
>>>>
>>>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>>>> ---
>>>>  INSTALL.DPDK.md   |   67 ++++++++++++++++++++-
>>>>  lib/netdev-dpdk.c |  176 ++++++++++++++++++++++++++++++++++++++++++-----------
>>>>  2 files changed, 207 insertions(+), 36 deletions(-)
>>>>
>>>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>>>> index 96b686c..9a30f88 100644
>>>> --- a/INSTALL.DPDK.md
>>>> +++ b/INSTALL.DPDK.md
>>>> @@ -859,10 +859,70 @@ by adding the following string:
>>>>  to <interface> sections of all network devices used by DPDK. Parameter 'N'
>>>>  determines how many queues can be used by the guest.
>>>>
>>>> +
>>>> +Jumbo Frames
>>>> +------------
>>>> +
>>>> +Support for Jumbo Frames may be enabled at compile-time for DPDK-type ports.
>>> It seems this could be dynamic and proportional to the MTU being used
>>> by the port and not a compile-time option which depends on the NIC
>>> hardware specs. Perhaps I am missing something.
>> It make sense.
>>
>> And this patch really solve a big issue that when I transmit 1400 packet
>> size, ovs-vsvitchd will return a "Bus error", although 1400 should not
>> be a Jumbo frame.
>>
> Hi Michael,
>
> Thanks for your feedback; I'm glad to hear that the patch solved that issue for you - I'm curious as to why you experience the Bus error for 1400B packet though.


I'm debug for it now, seems DPDK has bug, I will find out who modified
the mbuf address.


>
>> Mark, could it be an option when start vswitchd with dpdk, thus when
>> users try to using Jumbo Frame, it will not need to re-compile ovs, just
>> an advise :)
>>
> Agreed - I had intended to implement this functionality further down the line, depending on how this patch was received. I'll add a runtime flag in v2.

OK, Thanks

>
>> What's more, we could config it in run time, just like we dynamically
>> config the queue numbers, but this has a lot work to do, it could be a
>> further feature.
> Do you mean the 'other_config: n-dpdk-rxqs' field in the Open_vSwitch table?
>
> I'm not sure how amenable a solution like this would be to the maintainers, seeing as there is already an 'MTU' field present in the 'Interfaces' table. Ben/Pravin - any thoughts as to supporting a separate MTU field for DPDK interfaces?

If there is MTU field in Interface, I think it OK, and no need to
separate it. Just tell MTU to DPDK is OK.


Thanks,
Michael
> Thanks,
> Mark
>
>> Thanks,
>> Michael
>>> Thanks,
>>> fbl
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> http://openvswitch.org/mailman/listinfo/dev
>>>
>
Qiu, Michael Nov. 20, 2015, 3:22 a.m. UTC | #6
On 2015/11/18 23:13, Kavanagh, Mark B wrote:
>> On 2015/11/18 1:25, Flavio Leitner wrote:
>>> On Wed, Nov 11, 2015 at 03:06:02PM +0000, Mark Kavanagh wrote:
>>>> Add support for Jumbo Frames to DPDK-enabled port types,
>>>> using single-segment-mbufs.
>>>>
>>>> Using this approach, the amount of memory allocated for each mbuf
>>>> to store frame data is increased to a value greater than 1518B
>>>> (typical Ethernet maximum frame length). The increased space
>>>> available in the mbuf means that an entire Jumbo Frame can be carried
>>>> in a single mbuf, as opposed to partitioning it across multiple mbuf
>>>> segments.
>>>>
>>>> The amount of space allocated to each mbuf to hold frame data is
>>>> defined by the user at compile time; if this frame length is not a
>>>> multiple of the DPDK NIC driver's minimum Rx buffer length, the frame
>>>> length is rounded up to the closest value that is.
>>>>
>>>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>>>> ---
>>>>  INSTALL.DPDK.md   |   67 ++++++++++++++++++++-
>>>>  lib/netdev-dpdk.c |  176 ++++++++++++++++++++++++++++++++++++++++++-----------
>>>>  2 files changed, 207 insertions(+), 36 deletions(-)
>>>>
>>>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>>>> index 96b686c..9a30f88 100644
>>>> --- a/INSTALL.DPDK.md
>>>> +++ b/INSTALL.DPDK.md
>>>> @@ -859,10 +859,70 @@ by adding the following string:
>>>>  to <interface> sections of all network devices used by DPDK. Parameter 'N'
>>>>  determines how many queues can be used by the guest.
>>>>
>>>> +
>>>> +Jumbo Frames
>>>> +------------
>>>> +
>>>> +Support for Jumbo Frames may be enabled at compile-time for DPDK-type ports.
>>> It seems this could be dynamic and proportional to the MTU being used
>>> by the port and not a compile-time option which depends on the NIC
>>> hardware specs. Perhaps I am missing something.
>> It make sense.
>>
>> And this patch really solve a big issue that when I transmit 1400 packet
>> size, ovs-vsvitchd will return a "Bus error", although 1400 should not
>> be a Jumbo frame.
>>
> Hi Michael,
>
> Thanks for your feedback; I'm glad to hear that the patch solved that issue for you - I'm curious as to why you experience the Bus error for 1400B packet though.

Finally, I find out that the root cause is the ovs has an issue with
init mbuf, ovs required buf_len 0x700, only set it in mbuf, and dpdk
need to know it in rte_pktmbuf_pool_init(), but ovs do nothing in it,
and you patch happend to fix that.

Thanks,
Michael
>
>> Mark, could it be an option when start vswitchd with dpdk, thus when
>> users try to using Jumbo Frame, it will not need to re-compile ovs, just
>> an advise :)
>>
> Agreed - I had intended to implement this functionality further down the line, depending on how this patch was received. I'll add a runtime flag in v2.
>
>> What's more, we could config it in run time, just like we dynamically
>> config the queue numbers, but this has a lot work to do, it could be a
>> further feature.
> Do you mean the 'other_config: n-dpdk-rxqs' field in the Open_vSwitch table?
>
> I'm not sure how amenable a solution like this would be to the maintainers, seeing as there is already an 'MTU' field present in the 'Interfaces' table. Ben/Pravin - any thoughts as to supporting a separate MTU field for DPDK interfaces?
>
> Thanks,
> Mark
>
>> Thanks,
>> Michael
>>> Thanks,
>>> fbl
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> http://openvswitch.org/mailman/listinfo/dev
>>>
>
Flavio Leitner Nov. 23, 2015, 12:34 p.m. UTC | #7
On Wed, Nov 18, 2015 at 03:03:42PM +0000, Kavanagh, Mark B wrote:
> 
> >On Wed, Nov 11, 2015 at 03:06:02PM +0000, Mark Kavanagh wrote:
> >> Add support for Jumbo Frames to DPDK-enabled port types,
> >> using single-segment-mbufs.
> >>
> >> Using this approach, the amount of memory allocated for each mbuf
> >> to store frame data is increased to a value greater than 1518B
> >> (typical Ethernet maximum frame length). The increased space
> >> available in the mbuf means that an entire Jumbo Frame can be carried
> >> in a single mbuf, as opposed to partitioning it across multiple mbuf
> >> segments.
> >>
> >> The amount of space allocated to each mbuf to hold frame data is
> >> defined by the user at compile time; if this frame length is not a
> >> multiple of the DPDK NIC driver's minimum Rx buffer length, the frame
> >> length is rounded up to the closest value that is.
> >>
> >> Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
> >> ---
> >>  INSTALL.DPDK.md   |   67 ++++++++++++++++++++-
> >>  lib/netdev-dpdk.c |  176 ++++++++++++++++++++++++++++++++++++++++++-----------
> >>  2 files changed, 207 insertions(+), 36 deletions(-)
> >>
> >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> >> index 96b686c..9a30f88 100644
> >> --- a/INSTALL.DPDK.md
> >> +++ b/INSTALL.DPDK.md
> >> @@ -859,10 +859,70 @@ by adding the following string:
> >>  to <interface> sections of all network devices used by DPDK. Parameter 'N'
> >>  determines how many queues can be used by the guest.
> >>
> >> +
> >> +Jumbo Frames
> >> +------------
> >> +
> >> +Support for Jumbo Frames may be enabled at compile-time for DPDK-type ports.
> >
> >It seems this could be dynamic and proportional to the MTU being used
> >by the port and not a compile-time option which depends on the NIC
> >hardware specs. Perhaps I am missing something.
> >
> 
> Hi Flavio - thanks for your feedback.
> 
> Just to clarify, when you say 'dynamic', I presume that you mean
> that the MTU can be specified at runtime. Bearing this in mind, is
> the behavior that you have in mind that:
> 	- the MTU for each DPDK port should be specified when the port is added to a bridge?
> 	- and the granularity at which MTUs are assigned to DPDK-type
> 	ports should be on a per-port basis, rather than the (admittedly)
> 	coarse-grained 'one-for-all' approach implemented here?

It should be per-port basis like we have for non-DPDK cases and the
bridge might reconfigure itself to maintain the low common value on
all ports.  Look at the update_mtu() function.

> It's worth noting that due to the fact that DPDK ports are outside
> the reach of the Linux kernel, their MTU can't be changed using
> standard tools, such as 'ifconfig', 'ip link', etc. Furthermore, the
> OVS 'Interface' table's 'MTU' attribute cannot currently be set
> programmatically, outside of the aforementioned tools. In this way,
> the MTU of DPDK ports is always limited in how dynamic it can be,
> hence, once set, the MTU for a DPDK port is (currently) immutable.

I have the same understanding.  However, a hardcoded option isn't a
good solution if we want to support other MTU sizes.  Think about
Debian/Ubuntu/Fedora distros having to provide few different packages
with special values that might not even be the one needed.


> I'm not quite sure if I follow the second part of your comment
> regarding proportionality to the port's MTU - could you elaborate a
> bit more on this?

Dynamic in terms of hardcoded versus run-time option.  The
proportional part refers to the size of the allocated buffers.
We should be able to change those at run-time.

Does that make sense?

fbl
diff mbox

Patch

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 96b686c..9a30f88 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -859,10 +859,70 @@  by adding the following string:
 to <interface> sections of all network devices used by DPDK. Parameter 'N'
 determines how many queues can be used by the guest.
 
+
+Jumbo Frames
+------------
+
+Support for Jumbo Frames may be enabled at compile-time for DPDK-type ports.
+Note that if enabled, the mbuf segment size for all DPDK ports is increased, in
+order to accommodate a full Jumbo Frame inside a single mbuf segment. This value
+is also immutable. Note that if non-datapath ports are added to a bridge, the
+value of their MTU will not affect that of the DPDK ports; this is in-keeping
+with the current functionality of DPDK-enabled ports.
+
+To avail of Jumbo Frame support, some source code modifications are
+required, specifically to `lib/netdev-dpdk.c`:
+
+  1. Uncomment the following line to enable JF support:
+
+     ```
+     #define NETDEV_DPDK_JUMBO
+     ```
+
+  2. Adjust the value of `NETDEV_DPDK_MAX_FRAME_LEN` to the required Jumbo
+     Frame size. Consult the datasheet for the NIC in use to determine the max
+     frame size supported by your hardware. Also take into consideration that
+     the DPDK NIC driver allocates RX buffers at a particular granularity
+     (currently 1024B, i.e. NETDEV_DPDK_DEFAULT_RX_BUFSIZE, for the `igb_uio` &
+     `i40e` drivers, respectively). Consequently, the value assigned to
+     NETDEV_DPDK_MAX_FRAME_LEN at compile time should be a multiple of the
+     driver's buffer size. If not, the value used to configure the 'dpdk' ports
+     is rounded up to the next compatible value. Jumbo frame support has been
+     validated against 13312B frames, using the DPDK `igb_uio` driver, but
+     larger frames and other DPDK NIC drivers may theoretically be supported.
+
+NOTE: The use of Jumbo Frames may affect throughput of lower-sized packets; if
+throughput for small-packet workloads is critical, then do not enable this
+feature.
+
+vHost Ports and Jumbo Frames
+----------------------------
+vHost ports require additional configuration to enable Jumbo Frame support.
+
+  1. `mergeable buffers` must be enabled for all vHost port types,
+      as demonstrated in the QEMU command line snippet, below:
+
+      ```
+      '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
+      '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
+      ```
+
+  2. Guests utilizing vHost ports with `virtio-net` backend (as opposed to
+     `virtio-pmd`) must also increase the MTU of their network interfaces,
+     to avoid segmentation of Jumbo Frames in the guest. Note that 'MTU' refers
+     to the length of the IP packet only, and not that of the entire frame. To
+     calculate the exact MTU, subtract the L2 header and trailer lengths
+     (i.e. 18B) from the max supported frame size.
+     e.g. set the MTU for a 13312B Jumbo Frame:
+
+      ```
+      ifconfig eth1 mtu 13294
+      ```
+
+
 Restrictions:
 -------------
 
-  - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
   - Currently DPDK port does not make use any offload functionality.
   - DPDK-vHost support works with 1G huge pages.
 
@@ -903,6 +963,11 @@  Restrictions:
     the next release of DPDK (which includes the above patch) is available and
     integrated into OVS.
 
+  Jumbo Frames:
+  - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. The source of
+  this issue is currently being investigated.
+
+
 Bug Reporting:
 --------------
 
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4658416..c835303 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -62,20 +62,30 @@  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
 #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
 #define OVS_VPORT_DPDK "ovs_dpdk"
 
+/* Uncomment to enable Jumbo Frame support */
+/* #define NETDEV_DPDK_JUMBO */
+
+#define NETDEV_DPDK_JUMBO_DISABLE      0
+#define NETDEV_DPDK_JUMBO_ENABLE       1
+#define NETDEV_DPDK_DEFAULT_RX_BUFSIZE 1024
+
 /*
  * need to reserve tons of extra space in the mbufs so we can align the
  * DMA addresses to 4KB.
  * The minimum mbuf size is limited to avoid scatter behaviour and drop in
  * performance for standard Ethernet MTU.
  */
-#define MTU_TO_MAX_LEN(mtu)  ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN)
-#define MBUF_SIZE_MTU(mtu)   (MTU_TO_MAX_LEN(mtu)        \
-                              + sizeof(struct dp_packet) \
-                              + RTE_PKTMBUF_HEADROOM)
-#define MBUF_SIZE_DRIVER     (2048                       \
-                              + sizeof (struct rte_mbuf) \
-                              + RTE_PKTMBUF_HEADROOM)
-#define MBUF_SIZE(mtu)       MAX(MBUF_SIZE_MTU(mtu), MBUF_SIZE_DRIVER)
+#define MTU_TO_FRAME_LEN(mtu)       ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN)
+#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)- ETHER_HDR_LEN - ETHER_CRC_LEN)
+#define MBUF_SEGMENT_SIZE(mtu)      ( MTU_TO_FRAME_LEN(mtu)      \
+                                    + sizeof(struct dp_packet)   \
+                                    + RTE_PKTMBUF_HEADROOM)
+/* This value should be specified as a multiple of the DPDK NIC driver's
+ * 'min_rx_bufsize' attribute (currently 1024B for 'igb_uio'). If the value
+ * specified is not such a multiple, the value used to configure the netdev
+ * will be rounded up to the next compatible value, via the
+ * 'dpdk_frame_len' function; in that case, this value will be ignored  */
+#define NETDEV_DPDK_MAX_FRAME_LEN    13312
 
 /* Max and min number of packets in the mempool.  OVS tries to allocate a
  * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have
@@ -114,7 +124,13 @@  static const struct rte_eth_conf port_conf = {
         .header_split   = 0, /* Header Split disabled */
         .hw_ip_checksum = 0, /* IP checksum offload disabled */
         .hw_vlan_filter = 0, /* VLAN filtering disabled */
-        .jumbo_frame    = 0, /* Jumbo Frame Support disabled */
+#ifdef NETDEV_DPDK_JUMBO
+        .jumbo_frame    = NETDEV_DPDK_JUMBO_ENABLE, /* Jumbo Frame Support enabled */
+        .max_rx_pkt_len = UINT32_MAX, /* Set value in a copy of \
+                                this struct later, based on netdev's MTU */
+#else
+        .jumbo_frame    = NETDEV_DPDK_JUMBO_DISABLE, /* Jumbo Frame Support disabled */
+#endif
         .hw_strip_crc   = 0,
     },
     .rx_adv_conf = {
@@ -254,6 +270,43 @@  is_dpdk_class(const struct netdev_class *class)
     return class->construct == netdev_dpdk_construct;
 }
 
+/* DPDK NIC drivers allocate RX buffers at a particular granularity
+ * (specified by rte_eth_dev_info.min_rx_bufsize - currently 1K for igb_uio).
+ * If 'frame_len' is not a multiple of this value, insufficient
+ * buffers will be allocated to accomodate the packet in its entirety.
+ * Return the value closest to 'frame_len' that is a multiple of the
+ * driver's 'min_rx_bufsize' which enables the driver to receive the
+ * entire packet.
+ */
+static uint32_t
+dpdk_frame_len(struct netdev_dpdk *netdev, int frame_len)
+{
+    struct rte_eth_dev_info info;
+    uint32_t buf_size;
+    int len = 0;
+
+    /* All VHost ports currently use '-1' as their port_id */
+    if(netdev->type != DPDK_DEV_VHOST) {
+        rte_eth_dev_info_get(netdev->port_id, &info);
+        buf_size = info.min_rx_bufsize;
+    } else {
+        buf_size = NETDEV_DPDK_DEFAULT_RX_BUFSIZE;
+    }
+
+    if(frame_len % buf_size != 0) {
+        len = buf_size * ((frame_len/buf_size) + 1);
+#ifdef NETDEV_DPDK_JUMBO
+        VLOG_WARN("User-specified frame length %d is not compatible with "
+                  "minimum DPDK RX buffer length, and will be increased to"
+                  "%d\n", frame_len, len);
+#endif
+    } else {
+        len = frame_len;
+    }
+
+    return len;
+}
+
 /* XXX: use dpdk malloc for entire OVS. in fact huge page should be used
  * for all other segments data, bss and text. */
 
@@ -280,31 +333,70 @@  free_dpdk_buf(struct dp_packet *p)
 }
 
 static void
-__rte_pktmbuf_init(struct rte_mempool *mp,
-                   void *opaque_arg OVS_UNUSED,
-                   void *_m,
-                   unsigned i OVS_UNUSED)
+ovs_rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg)
 {
-    struct rte_mbuf *m = _m;
-    uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet);
+    struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv;
+    struct rte_pktmbuf_pool_private default_mbp_priv;
+    uint16_t roomsz;
 
     RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet));
 
-    memset(m, 0, mp->elt_size);
+    /* if no structure is provided, assume no mbuf private area */
 
-    /* start of buffer is just after mbuf structure */
-    m->buf_addr = (char *)m + sizeof(struct dp_packet);
-    m->buf_physaddr = rte_mempool_virt2phy(mp, m) +
-                    sizeof(struct dp_packet);
-    m->buf_len = (uint16_t)buf_len;
+    user_mbp_priv = opaque_arg;
+    if (user_mbp_priv == NULL) {
+        default_mbp_priv.mbuf_priv_size = 0;
+        if (mp->elt_size > sizeof(struct dp_packet)) {
+            roomsz = mp->elt_size - sizeof(struct dp_packet);
+        } else {
+            roomsz = 0;
+        }
+        default_mbp_priv.mbuf_data_room_size = roomsz;
+        user_mbp_priv = &default_mbp_priv;
+    }
 
-    /* keep some headroom between start of buffer and data */
-    m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len);
+    RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet) +
+        user_mbp_priv->mbuf_data_room_size +
+        user_mbp_priv->mbuf_priv_size);
 
-    /* init some constant fields */
-    m->pool = mp;
-    m->nb_segs = 1;
-    m->port = 0xff;
+    mbp_priv = rte_mempool_get_priv(mp);
+    memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv));
+}
+
+/* Initialise some fields in the mbuf structure that are not modified by the
+ * user once created (origin pool, buffer start address, etc.*/
+static void
+__ovs_rte_pktmbuf_init(struct rte_mempool *mp,
+                       void *opaque_arg OVS_UNUSED,
+                       void *_m,
+                       unsigned i OVS_UNUSED)
+{
+	struct rte_mbuf *m = _m;
+	uint32_t buf_size, buf_len, priv_size;
+
+	priv_size = rte_pktmbuf_priv_size(mp);
+	buf_size = sizeof(struct dp_packet) + priv_size;
+	buf_len = rte_pktmbuf_data_room_size(mp);
+
+	RTE_MBUF_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == priv_size);
+	RTE_MBUF_ASSERT(mp->elt_size >= buf_size);
+	RTE_MBUF_ASSERT(buf_len <= UINT16_MAX);
+
+	memset(m, 0, mp->elt_size);
+
+	/* start of buffer is after dp_packet structure and priv data */
+	m->priv_size = priv_size;
+	m->buf_addr = (char *)m + buf_size;
+	m->buf_physaddr = rte_mempool_virt2phy(mp, m) + buf_size;
+	m->buf_len = (uint16_t)buf_len;
+
+	/* keep some headroom between start of buffer and data */
+	m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, (uint16_t)m->buf_len);
+
+	/* init some constant fields */
+	m->pool = mp;
+	m->nb_segs = 1;
+	m->port = 0xff;
 }
 
 static void
@@ -315,7 +407,7 @@  ovs_rte_pktmbuf_init(struct rte_mempool *mp,
 {
     struct rte_mbuf *m = _m;
 
-    __rte_pktmbuf_init(mp, opaque_arg, _m, i);
+    __ovs_rte_pktmbuf_init(mp, opaque_arg, m, i);
 
     dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len);
 }
@@ -326,6 +418,7 @@  dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex)
     struct dpdk_mp *dmp = NULL;
     char mp_name[RTE_MEMPOOL_NAMESIZE];
     unsigned mp_size;
+    struct rte_pktmbuf_pool_private mbp_priv;
 
     LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) {
         if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
@@ -338,6 +431,8 @@  dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex)
     dmp->socket_id = socket_id;
     dmp->mtu = mtu;
     dmp->refcount = 1;
+    mbp_priv.mbuf_data_room_size = MTU_TO_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM;
+    mbp_priv.mbuf_priv_size = 0;
 
     mp_size = MAX_NB_MBUF;
     do {
@@ -346,10 +441,10 @@  dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex)
             return NULL;
         }
 
-        dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu),
+        dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SEGMENT_SIZE(mtu),
                                      MP_CACHE_SZ,
                                      sizeof(struct rte_pktmbuf_pool_private),
-                                     rte_pktmbuf_pool_init, NULL,
+                                     ovs_rte_pktmbuf_pool_init, &mbp_priv,
                                      ovs_rte_pktmbuf_init, NULL,
                                      socket_id, 0);
     } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >= MIN_NB_MBUF);
@@ -433,6 +528,7 @@  dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
 {
     int diag = 0;
     int i;
+    struct rte_eth_conf conf = port_conf;
 
     /* A device may report more queues than it makes available (this has
      * been observed for Intel xl710, which reserves some of them for
@@ -444,7 +540,11 @@  dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
             VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq);
         }
 
-        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &port_conf);
+#ifdef NETDEV_DPDK_JUMBO
+        conf.rxmode.max_rx_pkt_len = dpdk_frame_len(dev,
+                                                    NETDEV_DPDK_MAX_FRAME_LEN);
+#endif
+        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf);
         if (diag) {
             break;
         }
@@ -586,6 +686,7 @@  netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
     struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
     int sid;
     int err = 0;
+    uint32_t max_frame_len;
 
     ovs_mutex_init(&netdev->mutex);
     ovs_mutex_lock(&netdev->mutex);
@@ -605,8 +706,13 @@  netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
     netdev->port_id = port_no;
     netdev->type = type;
     netdev->flags = 0;
-    netdev->mtu = ETHER_MTU;
-    netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu);
+#ifdef NETDEV_DPDK_JUMBO
+    max_frame_len = dpdk_frame_len(netdev, NETDEV_DPDK_MAX_FRAME_LEN);
+#else
+    max_frame_len = dpdk_frame_len(netdev, ETHER_MAX_LEN);
+#endif
+    netdev->mtu = FRAME_LEN_TO_MTU(max_frame_len);
+    netdev->max_packet_len = max_frame_len;
 
     netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu);
     if (!netdev->dpdk_mp) {
@@ -1386,14 +1492,14 @@  netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
     old_mp = dev->dpdk_mp;
     dev->dpdk_mp = mp;
     dev->mtu = mtu;
-    dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
+    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
 
     err = dpdk_eth_dev_init(dev);
     if (err) {
         dpdk_mp_put(mp);
         dev->mtu = old_mtu;
         dev->dpdk_mp = old_mp;
-        dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
+        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
         dpdk_eth_dev_init(dev);
         goto out;
     }