[ovs-dev,v5] Use TPACKET_V3 to accelerate veth for userspace datapath
diff mbox series

Message ID 20200224125842.16620-1-yang_y_yi@126.com
State New
Headers show
Series
  • [ovs-dev,v5] Use TPACKET_V3 to accelerate veth for userspace datapath
Related show

Commit Message

yang_y_yi@126.com Feb. 24, 2020, 12:58 p.m. UTC
From: Yi Yang <yangyi01@inspur.com>

We can avoid high system call overhead by using TPACKET_V3
and using DPDK-like poll to receive and send packets (Note: send
still needs to call sendto to trigger final packet transmission).

From Linux kernel 3.10 on, TPACKET_V3 has been supported,
so all the Linux kernels current OVS supports can run
TPACKET_V3 without any problem.

I can see about 30% performance improvement for veth compared to
last recvmmsg optimization if I use TPACKET_V3, it is about 1.98
Gbps, but it was 1.47 Gbps before.

TPACKET_V3 can support TSO, it can work only if your kernel can
support, this has been verified on Ubuntu 18.04 5.3.0-40-generic
, if you find the performance is very poor, please turn off tso
for veth interfces in case userspace-tso-enable is set to true.

Signed-off-by: Yi Yang <yangyi01@inspur.com>
Co-authored-by: William Tu <u9012063@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
---
 acinclude.m4                     |  12 ++
 configure.ac                     |   1 +
 include/linux/automake.mk        |   1 +
 include/linux/if_packet.h        | 128 +++++++++++++
 include/sparse/linux/if_packet.h | 111 ++++++++++++
 lib/netdev-linux-private.h       |  22 +++
 lib/netdev-linux.c               | 377 ++++++++++++++++++++++++++++++++++++++-
 7 files changed, 646 insertions(+), 6 deletions(-)
 create mode 100644 include/linux/if_packet.h

Changelog:
- v4->v5
 * Fix travis build issues
 * Fix comments issues (capitalize the first letter)
 * Verify TSO on Ubuntu 18.04 3.5.0-40-generic

- v3->v4
 * Fix sparse check errors

- v2->v3
 * Fix build issues in case HAVE_TPACKET_V3 is not defined
 * Add tso-related support code
 * make sure it can work normally in case userspace-tso-enable is true

- v1->v2
 * Remove TPACKET_V1 and TPACKET_V2 which is obsolete
 * Add include/linux/if_packet.h
 * Change include/sparse/linux/if_packet.h

Comments

William Tu Feb. 25, 2020, 10:32 p.m. UTC | #1
On Mon, Feb 24, 2020 at 5:01 AM <yang_y_yi@126.com> wrote:
>
> From: Yi Yang <yangyi01@inspur.com>
>
> We can avoid high system call overhead by using TPACKET_V3
> and using DPDK-like poll to receive and send packets (Note: send
> still needs to call sendto to trigger final packet transmission).
>
> From Linux kernel 3.10 on, TPACKET_V3 has been supported,
> so all the Linux kernels current OVS supports can run
> TPACKET_V3 without any problem.
>
> I can see about 30% performance improvement for veth compared to
> last recvmmsg optimization if I use TPACKET_V3, it is about 1.98
> Gbps, but it was 1.47 Gbps before.
>
> TPACKET_V3 can support TSO, it can work only if your kernel can
> support, this has been verified on Ubuntu 18.04 5.3.0-40-generic
> , if you find the performance is very poor, please turn off tso
> for veth interfces in case userspace-tso-enable is set to true.

Do you test the performance of enabling TSO?

Using veth (like your run-iperf3.sh) and with kernel 5.3.
Without your patch, with TSO enabled, I can get around 6Gbps
But
with this patch, with TSO enabled,
the performance drops to 1.9Gbps.

Regards,
William
Yi Yang (杨燚)-云服务集团 Feb. 26, 2020, 12:42 a.m. UTC | #2
William, I tested it in VM because I doesn't have physical machine with new
kernel, the number is similar, mine is about 2 Gbps, can you show me your
iperf output for the case without this patch, I saw a very strange
phenomenon, with this patch, retransmit is almost 0, but without this patch,
retransmit is extremely high, maybe this can explain why somehow, I will
investigate this further, I doubt two consecutive packets have overlap for
the case without this patch, you can check source code, from source code,
this patch is more efficient, maybe this is an issue on kernel side
otherwise. I'll check it further.

-----邮件原件----- 
发件人: dev [mailto:ovs-dev-bounces@openvswitch.org] 代表 William Tu
发送时间: 2020年2月26日 6:32
收件人: yang_y_yi@126.com
抄送: yang_y_yi <yang_y_yi@163.com>; ovs-dev <ovs-dev@openvswitch.org>
主题: Re: [ovs-dev] [PATCH v5] Use TPACKET_V3 to accelerate veth for
userspace datapath

On Mon, Feb 24, 2020 at 5:01 AM <yang_y_yi@126.com> wrote:
>
> From: Yi Yang <yangyi01@inspur.com>
>
> We can avoid high system call overhead by using TPACKET_V3 and using 
> DPDK-like poll to receive and send packets (Note: send still needs to 
> call sendto to trigger final packet transmission).
>
> From Linux kernel 3.10 on, TPACKET_V3 has been supported, so all the 
> Linux kernels current OVS supports can run
> TPACKET_V3 without any problem.
>
> I can see about 30% performance improvement for veth compared to last 
> recvmmsg optimization if I use TPACKET_V3, it is about 1.98 Gbps, but 
> it was 1.47 Gbps before.
>
> TPACKET_V3 can support TSO, it can work only if your kernel can 
> support, this has been verified on Ubuntu 18.04 5.3.0-40-generic , if 
> you find the performance is very poor, please turn off tso for veth 
> interfces in case userspace-tso-enable is set to true.

Do you test the performance of enabling TSO?

Using veth (like your run-iperf3.sh) and with kernel 5.3.
Without your patch, with TSO enabled, I can get around 6Gbps But with this
patch, with TSO enabled, the performance drops to 1.9Gbps.

Regards,
William
Yi Yang (杨燚)-云服务集团 Feb. 26, 2020, 1:06 a.m. UTC | #3
This is the result in my VM for the case without this patch. Retr number is
really very very high, this is super abnormal. Physical machine has same
phenomenon. 

vagrant@ubuntu1804:~$ sudo ./run-iperf3.sh
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 54566 connected to 10.15.1.3 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-10.00  sec  4.07 GBytes  3.50 Gbits/sec  37926    170 KBytes
[  4]  10.00-20.00  sec  3.62 GBytes  3.11 Gbits/sec  32138    170 KBytes
[  4]  20.00-30.00  sec  3.81 GBytes  3.27 Gbits/sec  36448    235 KBytes
[  4]  30.00-40.00  sec  4.01 GBytes  3.45 Gbits/sec  38133    153 KBytes
[  4]  40.00-50.00  sec  3.94 GBytes  3.39 Gbits/sec  39605    184 KBytes
[  4]  50.00-60.00  sec  3.74 GBytes  3.21 Gbits/sec  33365    185 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-60.00  sec  23.2 GBytes  3.32 Gbits/sec  217615
sender
[  4]   0.00-60.00  sec  23.2 GBytes  3.32 Gbits/sec
receiver

Server output:
Accepted connection from 10.15.1.2, port 54564
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 54566
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.00  sec  4.07 GBytes  3.49 Gbits/sec
[  5]  10.00-20.00  sec  3.62 GBytes  3.11 Gbits/sec
[  5]  20.00-30.00  sec  3.81 GBytes  3.27 Gbits/sec
[  5]  30.00-40.00  sec  4.01 GBytes  3.45 Gbits/sec
[  5]  40.00-50.00  sec  3.94 GBytes  3.39 Gbits/sec
[  5]  50.00-60.00  sec  3.74 GBytes  3.21 Gbits/sec
[  5]  60.00-60.00  sec   127 KBytes  1.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-60.00  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-60.00  sec  23.2 GBytes  3.32 Gbits/sec
receiver


iperf Done.
vagrant@ubuntu1804:~$

-----邮件原件-----
发件人: dev [mailto:ovs-dev-bounces@openvswitch.org] 代表 William Tu
发送时间: 2020年2月26日 6:32
收件人: yang_y_yi@126.com
抄送: yang_y_yi <yang_y_yi@163.com>; ovs-dev <ovs-dev@openvswitch.org>
主题: Re: [ovs-dev] [PATCH v5] Use TPACKET_V3 to accelerate veth for
userspace datapath

On Mon, Feb 24, 2020 at 5:01 AM <yang_y_yi@126.com> wrote:
>
> From: Yi Yang <yangyi01@inspur.com>
>
> We can avoid high system call overhead by using TPACKET_V3 and using 
> DPDK-like poll to receive and send packets (Note: send still needs to 
> call sendto to trigger final packet transmission).
>
> From Linux kernel 3.10 on, TPACKET_V3 has been supported, so all the 
> Linux kernels current OVS supports can run
> TPACKET_V3 without any problem.
>
> I can see about 30% performance improvement for veth compared to last 
> recvmmsg optimization if I use TPACKET_V3, it is about 1.98 Gbps, but 
> it was 1.47 Gbps before.
>
> TPACKET_V3 can support TSO, it can work only if your kernel can 
> support, this has been verified on Ubuntu 18.04 5.3.0-40-generic , if 
> you find the performance is very poor, please turn off tso for veth 
> interfces in case userspace-tso-enable is set to true.

Do you test the performance of enabling TSO?

Using veth (like your run-iperf3.sh) and with kernel 5.3.
Without your patch, with TSO enabled, I can get around 6Gbps But with this
patch, with TSO enabled, the performance drops to 1.9Gbps.

Regards,
William
Yi Yang (杨燚)-云服务集团 Feb. 26, 2020, 1:41 a.m. UTC | #4
In the same environment, but I used tap but not veth, retr number is 0 for
the case without this patch (of course, I applied Flavio's tap enable patch)

vagrant@ubuntu1804:~$ sudo ./run-iperf3.sh
Connecting to host 10.15.1.3, port 5201
[  4] local 10.15.1.2 port 54572 connected to 10.15.1.3 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-10.00  sec  12.6 GBytes  10.9 Gbits/sec    0   3.14 MBytes
[  4]  10.00-20.00  sec  12.8 GBytes  11.0 Gbits/sec    0   3.14 MBytes
[  4]  20.00-30.00  sec  10.2 GBytes  8.76 Gbits/sec    0   3.14 MBytes
[  4]  30.00-40.00  sec  10.0 GBytes  8.63 Gbits/sec    0   3.14 MBytes
[  4]  40.00-50.00  sec  10.4 GBytes  8.94 Gbits/sec    0   3.14 MBytes
[  4]  50.00-60.00  sec  10.8 GBytes  9.31 Gbits/sec    0   3.14 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec    0             sender
[  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec
receiver

Server output:
Accepted connection from 10.15.1.2, port 54570
[  5] local 10.15.1.3 port 5201 connected to 10.15.1.2 port 54572
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.00  sec  12.6 GBytes  10.9 Gbits/sec
[  5]  10.00-20.00  sec  12.8 GBytes  11.0 Gbits/sec
[  5]  20.00-30.00  sec  10.2 GBytes  8.76 Gbits/sec
[  5]  30.00-40.00  sec  10.0 GBytes  8.63 Gbits/sec
[  5]  40.00-50.00  sec  10.4 GBytes  8.94 Gbits/sec
[  5]  50.00-60.00  sec  10.8 GBytes  9.31 Gbits/sec
[  5]  60.00-60.00  sec  1.75 MBytes  9.25 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-60.00  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec
receiver


iperf Done.
vagrant@ubuntu1804:~$

-----邮件原件-----
发件人: dev [mailto:ovs-dev-bounces@openvswitch.org] 代表 William Tu
发送时间: 2020年2月26日 6:32
收件人: yang_y_yi@126.com
抄送: yang_y_yi <yang_y_yi@163.com>; ovs-dev <ovs-dev@openvswitch.org>
主题: Re: [ovs-dev] [PATCH v5] Use TPACKET_V3 to accelerate veth for
userspace datapath

On Mon, Feb 24, 2020 at 5:01 AM <yang_y_yi@126.com> wrote:
>
> From: Yi Yang <yangyi01@inspur.com>
>
> We can avoid high system call overhead by using TPACKET_V3 and using 
> DPDK-like poll to receive and send packets (Note: send still needs to 
> call sendto to trigger final packet transmission).
>
> From Linux kernel 3.10 on, TPACKET_V3 has been supported, so all the 
> Linux kernels current OVS supports can run
> TPACKET_V3 without any problem.
>
> I can see about 30% performance improvement for veth compared to last 
> recvmmsg optimization if I use TPACKET_V3, it is about 1.98 Gbps, but 
> it was 1.47 Gbps before.
>
> TPACKET_V3 can support TSO, it can work only if your kernel can 
> support, this has been verified on Ubuntu 18.04 5.3.0-40-generic , if 
> you find the performance is very poor, please turn off tso for veth 
> interfces in case userspace-tso-enable is set to true.

Do you test the performance of enabling TSO?

Using veth (like your run-iperf3.sh) and with kernel 5.3.
Without your patch, with TSO enabled, I can get around 6Gbps But with this
patch, with TSO enabled, the performance drops to 1.9Gbps.

Regards,
William
William Tu Feb. 27, 2020, 3:30 p.m. UTC | #5
On Tue, Feb 25, 2020 at 5:41 PM Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> wrote:
>
> In the same environment, but I used tap but not veth, retr number is 0 for
> the case without this patch (of course, I applied Flavio's tap enable patch)
>

Right, because tap does not use the tpacket_v3 mmap packet, so it works fine.

> vagrant@ubuntu1804:~$ sudo ./run-iperf3.sh
> Connecting to host 10.15.1.3, port 5201
> [  4] local 10.15.1.2 port 54572 connected to 10.15.1.3 port 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-10.00  sec  12.6 GBytes  10.9 Gbits/sec    0   3.14 MBytes
> [  4]  10.00-20.00  sec  12.8 GBytes  11.0 Gbits/sec    0   3.14 MBytes
> [  4]  20.00-30.00  sec  10.2 GBytes  8.76 Gbits/sec    0   3.14 MBytes
> [  4]  30.00-40.00  sec  10.0 GBytes  8.63 Gbits/sec    0   3.14 MBytes
> [  4]  40.00-50.00  sec  10.4 GBytes  8.94 Gbits/sec    0   3.14 MBytes
> [  4]  50.00-60.00  sec  10.8 GBytes  9.31 Gbits/sec    0   3.14 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec    0             sender
> [  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec
> receiver
>
<snip>
> >
> > I can see about 30% performance improvement for veth compared to last
> > recvmmsg optimization if I use TPACKET_V3, it is about 1.98 Gbps, but
> > it was 1.47 Gbps before.
> >
> > TPACKET_V3 can support TSO, it can work only if your kernel can
> > support, this has been verified on Ubuntu 18.04 5.3.0-40-generic , if
> > you find the performance is very poor, please turn off tso for veth
> > interfces in case userspace-tso-enable is set to true.
>
> Do you test the performance of enabling TSO?
>
> Using veth (like your run-iperf3.sh) and with kernel 5.3.
> Without your patch, with TSO enabled, I can get around 6Gbps But with this
> patch, with TSO enabled, the performance drops to 1.9Gbps.
>

Are you investigating this issue?
William
Yi Yang (杨燚)-云服务集团 Feb. 28, 2020, 12:46 a.m. UTC | #6
William. here I don't use my patch, I just showed you tap is ok, veth is not ok, by capturing packets, I'm very sure the packets are truncated, veth's packets are different from tap's, during big packets, all the packet sizes are about 64K, but not such pattern, 1514 followed a big packet, so I think the code for veth is wrong. Yes, I'm verifying it, will send out a patch to fix this issue once it is verified.

-----邮件原件-----
发件人: William Tu [mailto:u9012063@gmail.com] 
发送时间: 2020年2月27日 23:30
收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com>
抄送: yang_y_yi@126.com; yang_y_yi@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] [PATCH v5] Use TPACKET_V3 to accelerate veth for userspace datapath

On Tue, Feb 25, 2020 at 5:41 PM Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> wrote:
>
> In the same environment, but I used tap but not veth, retr number is 0 
> for the case without this patch (of course, I applied Flavio's tap 
> enable patch)
>

Right, because tap does not use the tpacket_v3 mmap packet, so it works fine.

> vagrant@ubuntu1804:~$ sudo ./run-iperf3.sh Connecting to host 
> 10.15.1.3, port 5201 [  4] local 10.15.1.2 port 54572 connected to 
> 10.15.1.3 port 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-10.00  sec  12.6 GBytes  10.9 Gbits/sec    0   3.14 MBytes
> [  4]  10.00-20.00  sec  12.8 GBytes  11.0 Gbits/sec    0   3.14 MBytes
> [  4]  20.00-30.00  sec  10.2 GBytes  8.76 Gbits/sec    0   3.14 MBytes
> [  4]  30.00-40.00  sec  10.0 GBytes  8.63 Gbits/sec    0   3.14 MBytes
> [  4]  40.00-50.00  sec  10.4 GBytes  8.94 Gbits/sec    0   3.14 MBytes
> [  4]  50.00-60.00  sec  10.8 GBytes  9.31 Gbits/sec    0   3.14 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec    0             sender
> [  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec
> receiver
>
<snip>
> >
> > I can see about 30% performance improvement for veth compared to 
> > last recvmmsg optimization if I use TPACKET_V3, it is about 1.98 
> > Gbps, but it was 1.47 Gbps before.
> >
> > TPACKET_V3 can support TSO, it can work only if your kernel can 
> > support, this has been verified on Ubuntu 18.04 5.3.0-40-generic , 
> > if you find the performance is very poor, please turn off tso for 
> > veth interfces in case userspace-tso-enable is set to true.
>
> Do you test the performance of enabling TSO?
>
> Using veth (like your run-iperf3.sh) and with kernel 5.3.
> Without your patch, with TSO enabled, I can get around 6Gbps But with 
> this patch, with TSO enabled, the performance drops to 1.9Gbps.
>

Are you investigating this issue?
William
Yi Yang (杨燚)-云服务集团 March 6, 2020, 2:40 p.m. UTC | #7
Hi, William

After I checked some doubtable points, I think performance issue lies in tpacket_v3 kernel side implementation, so I fall back it to original one in case userspace-tso-enabled is set to true. V6 has been sent out, please review v6.

-----邮件原件-----
发件人: William Tu [mailto:u9012063@gmail.com] 
发送时间: 2020年2月27日 23:30
收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com>
抄送: yang_y_yi@126.com; yang_y_yi@163.com; ovs-dev@openvswitch.org
主题: Re: [ovs-dev] [PATCH v5] Use TPACKET_V3 to accelerate veth for userspace datapath

On Tue, Feb 25, 2020 at 5:41 PM Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> wrote:
>
> In the same environment, but I used tap but not veth, retr number is 0 
> for the case without this patch (of course, I applied Flavio's tap 
> enable patch)
>

Right, because tap does not use the tpacket_v3 mmap packet, so it works fine.

> vagrant@ubuntu1804:~$ sudo ./run-iperf3.sh Connecting to host 
> 10.15.1.3, port 5201 [  4] local 10.15.1.2 port 54572 connected to 
> 10.15.1.3 port 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-10.00  sec  12.6 GBytes  10.9 Gbits/sec    0   3.14 MBytes
> [  4]  10.00-20.00  sec  12.8 GBytes  11.0 Gbits/sec    0   3.14 MBytes
> [  4]  20.00-30.00  sec  10.2 GBytes  8.76 Gbits/sec    0   3.14 MBytes
> [  4]  30.00-40.00  sec  10.0 GBytes  8.63 Gbits/sec    0   3.14 MBytes
> [  4]  40.00-50.00  sec  10.4 GBytes  8.94 Gbits/sec    0   3.14 MBytes
> [  4]  50.00-60.00  sec  10.8 GBytes  9.31 Gbits/sec    0   3.14 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec    0             sender
> [  4]   0.00-60.00  sec  67.0 GBytes  9.59 Gbits/sec
> receiver
>
<snip>
> >
> > I can see about 30% performance improvement for veth compared to 
> > last recvmmsg optimization if I use TPACKET_V3, it is about 1.98 
> > Gbps, but it was 1.47 Gbps before.
> >
> > TPACKET_V3 can support TSO, it can work only if your kernel can 
> > support, this has been verified on Ubuntu 18.04 5.3.0-40-generic , 
> > if you find the performance is very poor, please turn off tso for 
> > veth interfces in case userspace-tso-enable is set to true.
>
> Do you test the performance of enabling TSO?
>
> Using veth (like your run-iperf3.sh) and with kernel 5.3.
> Without your patch, with TSO enabled, I can get around 6Gbps But with 
> this patch, with TSO enabled, the performance drops to 1.9Gbps.
>

Are you investigating this issue?
William
William Tu March 9, 2020, 10:46 p.m. UTC | #8
On Fri, Mar 6, 2020 at 6:42 AM Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com> wrote:
>
> Hi, William
>
> After I checked some doubtable points, I think performance issue lies in tpacket_v3 kernel side implementation, so I fall back it to original one in case userspace-tso-enabled is set to true. V6 has been sent out, please review v6.
>
Thanks!
I will take a look later this week!
William

Patch
diff mbox series

diff --git a/acinclude.m4 b/acinclude.m4
index 1212a46..b39bbb9 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -1093,6 +1093,18 @@  AC_DEFUN([OVS_CHECK_IF_DL],
       AC_SEARCH_LIBS([pcap_open_live], [pcap])
    fi])
 
+dnl OVS_CHECK_LINUX_TPACKET
+dnl
+dnl Configure Linux TPACKET.
+AC_DEFUN([OVS_CHECK_LINUX_TPACKET], [
+  AC_COMPILE_IFELSE([
+    AC_LANG_PROGRAM([#include <linux/if_packet.h>], [
+        struct tpacket3_hdr x =  { 0 };
+    ])],
+    [AC_DEFINE([HAVE_TPACKET_V3], [1],
+    [Define to 1 if struct tpacket3_hdr is available.])])
+])
+
 dnl Checks for buggy strtok_r.
 dnl
 dnl Some versions of glibc 2.7 has a bug in strtok_r when compiling
diff --git a/configure.ac b/configure.ac
index 1877aae..b61a1f4 100644
--- a/configure.ac
+++ b/configure.ac
@@ -89,6 +89,7 @@  OVS_CHECK_VISUAL_STUDIO_DDK
 OVS_CHECK_COVERAGE
 OVS_CHECK_NDEBUG
 OVS_CHECK_NETLINK
+OVS_CHECK_LINUX_TPACKET
 OVS_CHECK_OPENSSL
 OVS_CHECK_LIBCAPNG
 OVS_CHECK_LOGDIR
diff --git a/include/linux/automake.mk b/include/linux/automake.mk
index 8f063f4..a659e65 100644
--- a/include/linux/automake.mk
+++ b/include/linux/automake.mk
@@ -1,4 +1,5 @@ 
 noinst_HEADERS += \
+	include/linux/if_packet.h \
 	include/linux/netlink.h \
 	include/linux/netfilter/nf_conntrack_sctp.h \
 	include/linux/pkt_cls.h \
diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h
new file mode 100644
index 0000000..e20aacc
--- /dev/null
+++ b/include/linux/if_packet.h
@@ -0,0 +1,128 @@ 
+#ifndef __LINUX_IF_PACKET_WRAPPER_H
+#define __LINUX_IF_PACKET_WRAPPER_H 1
+
+#ifdef HAVE_TPACKET_V3
+#include_next <linux/if_packet.h>
+#else
+#define HAVE_TPACKET_V3 1
+
+struct sockaddr_pkt {
+        unsigned short  spkt_family;
+        unsigned char   spkt_device[14];
+        uint16_t        spkt_protocol;
+};
+
+struct sockaddr_ll {
+        unsigned short  sll_family;
+        uint16_t        sll_protocol;
+        int             sll_ifindex;
+        unsigned short  sll_hatype;
+        unsigned char   sll_pkttype;
+        unsigned char   sll_halen;
+        unsigned char   sll_addr[8];
+};
+
+/* Packet types */
+#define PACKET_HOST                     0 /* To us                */
+#define PACKET_OTHERHOST                3 /* To someone else    */
+#define PACKET_LOOPBACK                 5 /* MC/BRD frame looped back */
+
+/* Packet socket options */
+#define PACKET_RX_RING                  5
+#define PACKET_VERSION                 10
+#define PACKET_TX_RING                 13
+#define PACKET_VNET_HDR                15
+
+/* Rx ring - header status */
+#define TP_STATUS_KERNEL                0
+#define TP_STATUS_USER            (1 << 0)
+#define TP_STATUS_VLAN_VALID      (1 << 4) /* auxdata has valid tp_vlan_tci */
+#define TP_STATUS_VLAN_TPID_VALID (1 << 6) /* auxdata has valid tp_vlan_tpid */
+
+/* Tx ring - header status */
+#define TP_STATUS_SEND_REQUEST    (1 << 0)
+#define TP_STATUS_SENDING         (1 << 1)
+
+struct tpacket_hdr {
+    unsigned long tp_status;
+    unsigned int tp_len;
+    unsigned int tp_snaplen;
+    unsigned short tp_mac;
+    unsigned short tp_net;
+    unsigned int tp_sec;
+    unsigned int tp_usec;
+};
+
+#define TPACKET_ALIGNMENT 16
+#define TPACKET_ALIGN(x) (((x)+TPACKET_ALIGNMENT-1)&~(TPACKET_ALIGNMENT-1))
+
+struct tpacket_hdr_variant1 {
+    uint32_t tp_rxhash;
+    uint32_t tp_vlan_tci;
+    uint16_t tp_vlan_tpid;
+    uint16_t tp_padding;
+};
+
+struct tpacket3_hdr {
+    uint32_t  tp_next_offset;
+    uint32_t  tp_sec;
+    uint32_t  tp_nsec;
+    uint32_t  tp_snaplen;
+    uint32_t  tp_len;
+    uint32_t  tp_status;
+    uint16_t  tp_mac;
+    uint16_t  tp_net;
+    /* pkt_hdr variants */
+    union {
+        struct tpacket_hdr_variant1 hv1;
+    };
+    uint8_t  tp_padding[8];
+};
+
+struct tpacket_bd_ts {
+    unsigned int ts_sec;
+    union {
+        unsigned int ts_usec;
+        unsigned int ts_nsec;
+    };
+};
+
+struct tpacket_hdr_v1 {
+    uint32_t block_status;
+    uint32_t num_pkts;
+    uint32_t offset_to_first_pkt;
+    uint32_t blk_len;
+    uint64_t __attribute__((aligned(8))) seq_num;
+    struct tpacket_bd_ts ts_first_pkt, ts_last_pkt;
+};
+
+union tpacket_bd_header_u {
+    struct tpacket_hdr_v1 bh1;
+};
+
+struct tpacket_block_desc {
+    uint32_t version;
+    uint32_t offset_to_priv;
+    union tpacket_bd_header_u hdr;
+};
+
+#define TPACKET3_HDRLEN \
+    (TPACKET_ALIGN(sizeof(struct tpacket3_hdr)) + sizeof(struct sockaddr_ll))
+
+enum tpacket_versions {
+    TPACKET_V1,
+    TPACKET_V2,
+    TPACKET_V3
+};
+
+struct tpacket_req3 {
+    unsigned int tp_block_size; /* Minimal size of contiguous block */
+    unsigned int tp_block_nr; /* Number of blocks */
+    unsigned int tp_frame_size; /* Size of frame */
+    unsigned int tp_frame_nr; /* Total number of frames */
+    unsigned int tp_retire_blk_tov; /* Timeout in msecs */
+    unsigned int tp_sizeof_priv; /* Offset to private data area */
+    unsigned int tp_feature_req_word;
+};
+#endif /* HAVE_TPACKET_V3 */
+#endif /* __LINUX_IF_PACKET_WRAPPER_H */
diff --git a/include/sparse/linux/if_packet.h b/include/sparse/linux/if_packet.h
index 5ff6d47..0ac3fce 100644
--- a/include/sparse/linux/if_packet.h
+++ b/include/sparse/linux/if_packet.h
@@ -5,6 +5,7 @@ 
 #error "Use this header only with sparse.  It is not a correct implementation."
 #endif
 
+#include <openvswitch/types.h>
 #include_next <linux/if_packet.h>
 
 /* Fix endianness of 'spkt_protocol' and 'sll_protocol' members. */
@@ -27,4 +28,114 @@  struct sockaddr_ll {
         unsigned char   sll_addr[8];
 };
 
+/* Packet types */
+#define PACKET_HOST                     0 /* To us                */
+#define PACKET_OTHERHOST                3 /* To someone else 	*/
+#define PACKET_LOOPBACK                 5 /* MC/BRD frame looped back */
+
+/* Packet socket options */
+#define PACKET_RX_RING                  5
+#define PACKET_VERSION                 10
+#define PACKET_TX_RING                 13
+#define PACKET_VNET_HDR                15
+
+/* Rx ring - header status */
+#define TP_STATUS_KERNEL                0
+#define TP_STATUS_USER            (1 << 0)
+#define TP_STATUS_VLAN_VALID      (1 << 4) /* auxdata has valid tp_vlan_tci */
+#define TP_STATUS_VLAN_TPID_VALID (1 << 6) /* auxdata has valid tp_vlan_tpid */
+
+/* Tx ring - header status */
+#define TP_STATUS_SEND_REQUEST    (1 << 0)
+#define TP_STATUS_SENDING         (1 << 1)
+
+#define tpacket_hdr rpl_tpacket_hdr
+struct tpacket_hdr {
+    unsigned long tp_status;
+    unsigned int tp_len;
+    unsigned int tp_snaplen;
+    unsigned short tp_mac;
+    unsigned short tp_net;
+    unsigned int tp_sec;
+    unsigned int tp_usec;
+};
+
+#define TPACKET_ALIGNMENT 16
+#define TPACKET_ALIGN(x) (((x)+TPACKET_ALIGNMENT-1)&~(TPACKET_ALIGNMENT-1))
+
+#define tpacket_hdr_variant1 rpl_tpacket_hdr_variant1
+struct tpacket_hdr_variant1 {
+    uint32_t tp_rxhash;
+    uint32_t tp_vlan_tci;
+    uint16_t tp_vlan_tpid;
+    uint16_t tp_padding;
+};
+
+#define tpacket3_hdr rpl_tpacket3_hdr
+struct tpacket3_hdr {
+    uint32_t  tp_next_offset;
+    uint32_t  tp_sec;
+    uint32_t  tp_nsec;
+    uint32_t  tp_snaplen;
+    uint32_t  tp_len;
+    uint32_t  tp_status;
+    uint16_t  tp_mac;
+    uint16_t  tp_net;
+    /* pkt_hdr variants */
+    union {
+        struct tpacket_hdr_variant1 hv1;
+    };
+    uint8_t  tp_padding[8];
+};
+
+#define tpacket_bd_ts rpl_tpacket_bd_ts
+struct tpacket_bd_ts {
+    unsigned int ts_sec;
+    union {
+        unsigned int ts_usec;
+        unsigned int ts_nsec;
+    };
+};
+
+#define tpacket_hdr_v1 rpl_tpacket_hdr_v1
+struct tpacket_hdr_v1 {
+    uint32_t block_status;
+    uint32_t num_pkts;
+    uint32_t offset_to_first_pkt;
+    uint32_t blk_len;
+    uint64_t __attribute__((aligned(8))) seq_num;
+    struct tpacket_bd_ts ts_first_pkt, ts_last_pkt;
+};
+
+#define tpacket_bd_header_u rpl_tpacket_bd_header_u
+union tpacket_bd_header_u {
+    struct tpacket_hdr_v1 bh1;
+};
+
+#define tpacket_block_desc rpl_tpacket_block_desc
+struct tpacket_block_desc {
+    uint32_t version;
+    uint32_t offset_to_priv;
+    union tpacket_bd_header_u hdr;
+};
+
+#define TPACKET3_HDRLEN \
+    (TPACKET_ALIGN(sizeof(struct tpacket3_hdr)) + sizeof(struct sockaddr_ll))
+
+enum rpl_tpacket_versions {
+    TPACKET_V1,
+    TPACKET_V2,
+    TPACKET_V3
+};
+
+#define tpacket_req3 rpl_tpacket_req3
+struct tpacket_req3 {
+    unsigned int tp_block_size; /* Minimal size of contiguous block */
+    unsigned int tp_block_nr; /* Number of blocks */
+    unsigned int tp_frame_size; /* Size of frame */
+    unsigned int tp_frame_nr; /* Total number of frames */
+    unsigned int tp_retire_blk_tov; /* Timeout in msecs */
+    unsigned int tp_sizeof_priv; /* Offset to private data area */
+    unsigned int tp_feature_req_word;
+};
 #endif
diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
index c7c515f..ccd58f4 100644
--- a/lib/netdev-linux-private.h
+++ b/lib/netdev-linux-private.h
@@ -26,6 +26,7 @@ 
 #include <linux/mii.h>
 #include <stdint.h>
 #include <stdbool.h>
+#include <linux/if_packet.h>
 
 #include "dp-packet.h"
 #include "netdev-afxdp.h"
@@ -41,6 +42,22 @@  struct netdev;
 /* The maximum packet length is 16 bits */
 #define LINUX_RXQ_TSO_MAX_LEN 65535
 
+#ifdef HAVE_TPACKET_V3
+struct tpacket_ring {
+    int sockfd;
+    struct iovec *rd;
+    uint8_t *mm_space;
+    size_t mm_len, rd_len;
+    struct sockaddr_ll ll;
+    int type, rd_num, flen;
+    struct tpacket_req3 req;
+    uint32_t block_num;
+    uint32_t frame_num;
+    uint32_t frame_num_in_block;
+    void * ppd;
+};
+#endif /* HAVE_TPACKET_V3 */
+
 struct netdev_rxq_linux {
     struct netdev_rxq up;
     bool is_tap;
@@ -105,6 +122,11 @@  struct netdev_linux {
 
     int numa_id;                /* NUMA node id. */
 
+#ifdef HAVE_TPACKET_V3
+    struct tpacket_ring *tp_rx_ring;
+    struct tpacket_ring *tp_tx_ring;
+#endif
+
 #ifdef HAVE_AF_XDP
     /* AF_XDP information. */
     struct xsk_socket_info **xsks;
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index c6f3d27..b0ca4b0 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -48,6 +48,9 @@ 
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
+#ifdef HAVE_TPACKET_V3
+#include <sys/mman.h>
+#endif
 
 #include "coverage.h"
 #include "dp-packet.h"
@@ -1055,6 +1058,111 @@  netdev_linux_rxq_alloc(void)
     return &rx->up;
 }
 
+#ifdef HAVE_TPACKET_V3
+static inline struct tpacket3_hdr *
+tpacket_get_next_frame(struct tpacket_ring *ring, uint32_t frame_num)
+{
+    uint8_t *f0 = ring->rd[0].iov_base;
+
+    return ALIGNED_CAST(struct tpacket3_hdr *,
+               f0 + (frame_num * ring->req.tp_frame_size));
+}
+
+/*
+ * ring->rd_num is tp_block_nr, ring->flen is tp_block_size
+ */
+static inline void
+tpacket_fill_ring(struct tpacket_ring *ring, unsigned int blocks, int type)
+{
+    if (type == PACKET_RX_RING) {
+        ring->req.tp_retire_blk_tov = 0;
+        ring->req.tp_sizeof_priv = 0;
+        ring->req.tp_feature_req_word = 0;
+    }
+
+    if (userspace_tso_enabled()) {
+        /* For TX ring, the whole packet must be in one frame
+         * so tp_frame_size must big enough to accommodate
+         * 64K packet, tpacket3_hdr will occupy some bytes,
+         * the final frame size is 64K + 4K = 68K.
+         */
+        ring->req.tp_frame_size = (getpagesize() << 4) + getpagesize();
+        ring->req.tp_block_size = ring->req.tp_frame_size;
+    } else {
+        ring->req.tp_block_size = getpagesize() << 2;
+        ring->req.tp_frame_size = TPACKET_ALIGNMENT << 7;
+    }
+
+    ring->req.tp_block_nr = blocks;
+
+    ring->req.tp_frame_nr = ring->req.tp_block_size /
+                             ring->req.tp_frame_size *
+                             ring->req.tp_block_nr;
+
+    ring->mm_len = ring->req.tp_block_size * ring->req.tp_block_nr;
+    ring->rd_num = ring->req.tp_block_nr;
+    ring->flen = ring->req.tp_block_size;
+}
+
+static int
+tpacket_setup_ring(int sock, struct tpacket_ring *ring, int type)
+{
+    int ret = 0;
+    unsigned int blocks;
+
+    if (userspace_tso_enabled()) {
+        blocks = 128;
+    } else {
+        blocks = 256;
+    }
+    ring->type = type;
+    tpacket_fill_ring(ring, blocks, type);
+    ret = setsockopt(sock, SOL_PACKET, type, &ring->req,
+                     sizeof(ring->req));
+
+    if (ret == -1) {
+        return -1;
+    }
+
+    ring->rd_len = ring->rd_num * sizeof(*ring->rd);
+    ring->rd = xmalloc(ring->rd_len);
+    if (ring->rd == NULL) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static inline int
+tpacket_mmap_rx_tx_ring(int sock, struct tpacket_ring *rx_ring,
+                struct tpacket_ring *tx_ring)
+{
+    int i;
+
+    rx_ring->mm_space = mmap(NULL, rx_ring->mm_len + tx_ring->mm_len,
+                          PROT_READ | PROT_WRITE,
+                          MAP_SHARED | MAP_LOCKED | MAP_POPULATE, sock, 0);
+    if (rx_ring->mm_space == MAP_FAILED) {
+        return -1;
+    }
+
+    memset(rx_ring->rd, 0, rx_ring->rd_len);
+    for (i = 0; i < rx_ring->rd_num; ++i) {
+            rx_ring->rd[i].iov_base = rx_ring->mm_space + (i * rx_ring->flen);
+            rx_ring->rd[i].iov_len = rx_ring->flen;
+    }
+
+    tx_ring->mm_space = rx_ring->mm_space + rx_ring->mm_len;
+    memset(tx_ring->rd, 0, tx_ring->rd_len);
+    for (i = 0; i < tx_ring->rd_num; ++i) {
+            tx_ring->rd[i].iov_base = tx_ring->mm_space + (i * tx_ring->flen);
+            tx_ring->rd[i].iov_len = tx_ring->flen;
+    }
+
+    return 0;
+}
+#endif
+
 static int
 netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
 {
@@ -1062,6 +1170,9 @@  netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
     struct netdev *netdev_ = rx->up.netdev;
     struct netdev_linux *netdev = netdev_linux_cast(netdev_);
     int error;
+#ifdef HAVE_TPACKET_V3
+    int ver = TPACKET_V3;
+#endif
 
     ovs_mutex_lock(&netdev->mutex);
     rx->is_tap = is_tap_netdev(netdev_);
@@ -1070,6 +1181,7 @@  netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
     } else {
         struct sockaddr_ll sll;
         int ifindex, val;
+
         /* Result of tcpdump -dd inbound */
         static const struct sock_filter filt[] = {
             { 0x28, 0, 0, 0xfffff004 }, /* ldh [0] */
@@ -1082,7 +1194,7 @@  netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
         };
 
         /* Create file descriptor. */
-        rx->fd = socket(PF_PACKET, SOCK_RAW, 0);
+        rx->fd = socket(PF_PACKET, SOCK_RAW, (OVS_FORCE int) htons(ETH_P_ALL));
         if (rx->fd < 0) {
             error = errno;
             VLOG_ERR("failed to create raw socket (%s)", ovs_strerror(error));
@@ -1106,6 +1218,49 @@  netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
             goto error;
         }
 
+#ifdef HAVE_TPACKET_V3
+        /* TPACKET_V3 ring setup must be after setsockopt
+         * PACKET_VNET_HDR because PACKET_VNET_HDR will return error
+         * (EBUSY) if ring is set up
+         */
+        error = setsockopt(rx->fd, SOL_PACKET, PACKET_VERSION, &ver,
+                           sizeof(ver));
+        if (error != 0) {
+            error = errno;
+            VLOG_ERR("%s: failed to set tpacket version (%s)",
+                     netdev_get_name(netdev_), ovs_strerror(error));
+            goto error;
+        }
+        netdev->tp_rx_ring = xzalloc(sizeof(struct tpacket_ring));
+        netdev->tp_tx_ring = xzalloc(sizeof(struct tpacket_ring));
+        netdev->tp_rx_ring->sockfd = rx->fd;
+        netdev->tp_tx_ring->sockfd = rx->fd;
+        error = tpacket_setup_ring(rx->fd, netdev->tp_rx_ring,
+                                   PACKET_RX_RING);
+        if (error != 0) {
+            error = errno;
+            VLOG_ERR("%s: failed to set tpacket rx ring (%s)",
+                     netdev_get_name(netdev_), ovs_strerror(error));
+            goto error;
+        }
+        error = tpacket_setup_ring(rx->fd, netdev->tp_tx_ring,
+                                   PACKET_TX_RING);
+        if (error != 0) {
+            error = errno;
+            VLOG_ERR("%s: failed to set tpacket tx ring (%s)",
+                     netdev_get_name(netdev_), ovs_strerror(error));
+            goto error;
+        }
+        error = tpacket_mmap_rx_tx_ring(rx->fd, netdev->tp_rx_ring,
+                                       netdev->tp_tx_ring);
+        if (error != 0) {
+            error = errno;
+            VLOG_ERR("%s: failed to mmap tpacket rx & tx ring (%s)",
+                     netdev_get_name(netdev_), ovs_strerror(error));
+            goto error;
+        }
+#endif
+
         /* Set non-blocking mode. */
         error = set_nonblocking(rx->fd);
         if (error) {
@@ -1120,9 +1275,14 @@  netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
 
         /* Bind to specific ethernet device. */
         memset(&sll, 0, sizeof sll);
-        sll.sll_family = AF_PACKET;
+        sll.sll_family = PF_PACKET;
+#ifdef HAVE_TPACKET_V3
+        sll.sll_hatype = 0;
+        sll.sll_pkttype = 0;
+        sll.sll_halen = 0;
+#endif
         sll.sll_ifindex = ifindex;
-        sll.sll_protocol = htons(ETH_P_ALL);
+        sll.sll_protocol = (OVS_FORCE ovs_be16) htons(ETH_P_ALL);
         if (bind(rx->fd, (struct sockaddr *) &sll, sizeof sll) < 0) {
             error = errno;
             VLOG_ERR("%s: failed to bind raw socket (%s)",
@@ -1159,6 +1319,17 @@  netdev_linux_rxq_destruct(struct netdev_rxq *rxq_)
     int i;
 
     if (!rx->is_tap) {
+#ifdef HAVE_TPACKET_V3
+        struct netdev_linux *netdev = netdev_linux_cast(rx->up.netdev);
+
+        if (netdev->tp_rx_ring) {
+            munmap(netdev->tp_rx_ring->mm_space,
+                   2 * netdev->tp_rx_ring->mm_len);
+            free(netdev->tp_rx_ring->rd);
+            free(netdev->tp_tx_ring->rd);
+        }
+#endif
+
         close(rx->fd);
     }
 
@@ -1175,6 +1346,7 @@  netdev_linux_rxq_dealloc(struct netdev_rxq *rxq_)
     free(rx);
 }
 
+#ifndef HAVE_TPACKET_V3
 static ovs_be16
 auxdata_to_vlan_tpid(const struct tpacket_auxdata *aux, bool double_tagged)
 {
@@ -1342,6 +1514,7 @@  netdev_linux_batch_rxq_recv_sock(struct netdev_rxq_linux *rx, int mtu,
 
     return 0;
 }
+#endif /* HAVE_TPACKET_V3 */
 
 /*
  * Receive packets from tap by batch process for better performance,
@@ -1435,6 +1608,109 @@  netdev_linux_batch_rxq_recv_tap(struct netdev_rxq_linux *rx, int mtu,
     return 0;
 }
 
+#ifdef HAVE_TPACKET_V3
+static int
+netdev_linux_batch_recv_tpacket(struct netdev_rxq_linux *rx, int mtu,
+                                struct dp_packet_batch *batch)
+{
+    struct netdev *netdev_ = netdev_rxq_get_netdev(&rx->up);
+    struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+    struct dp_packet *buffer;
+    int i = 0;
+    unsigned int block_num;
+    unsigned int fn_in_block;
+    struct tpacket_block_desc *pbd;
+    struct tpacket3_hdr *ppd;
+    int virtio_net_hdr_size;
+    size_t buffer_len;
+
+    if (userspace_tso_enabled()) {
+        virtio_net_hdr_size = sizeof(struct virtio_net_hdr);
+    } else {
+        virtio_net_hdr_size = 0;
+    }
+    buffer_len = virtio_net_hdr_size + VLAN_ETH_HEADER_LEN + mtu;
+
+    ppd = ALIGNED_CAST(struct tpacket3_hdr *, netdev->tp_rx_ring->ppd);
+    block_num = netdev->tp_rx_ring->block_num;
+    fn_in_block = netdev->tp_rx_ring->frame_num_in_block;
+    pbd = ALIGNED_CAST(struct tpacket_block_desc *,
+              netdev->tp_rx_ring->rd[block_num].iov_base);
+
+    while (i < NETDEV_MAX_BURST) {
+        if ((pbd->hdr.bh1.block_status & TP_STATUS_USER) == 0) {
+            break;
+        }
+        if (fn_in_block == 0) {
+            ppd = ALIGNED_CAST(struct tpacket3_hdr *, (uint8_t *) pbd +
+                                   pbd->hdr.bh1.offset_to_first_pkt);
+        }
+
+        if (ppd->tp_snaplen > (mtu + VLAN_ETH_HEADER_LEN)) {
+            buffer_len = virtio_net_hdr_size + VLAN_ETH_HEADER_LEN
+                         + ppd->tp_snaplen;
+        }
+
+        buffer = dp_packet_new_with_headroom(buffer_len, DP_NETDEV_HEADROOM);
+        memcpy(dp_packet_data(buffer),
+               (uint8_t *) ppd + ppd->tp_mac - virtio_net_hdr_size,
+               ppd->tp_snaplen + virtio_net_hdr_size);
+        dp_packet_set_size(buffer,
+                           dp_packet_size(buffer) + ppd->tp_snaplen
+                               + virtio_net_hdr_size);
+
+        if (virtio_net_hdr_size && netdev_linux_parse_vnet_hdr(buffer)) {
+            /* Unexpected error situation: the virtio header is not present
+             * or corrupted. Drop the packet but continue in case next ones
+             * are correct. */
+            dp_packet_delete(buffer);
+            netdev->rx_dropped += 1;
+            VLOG_WARN_RL(&rl, "%s: Dropped packet: Invalid virtio net header",
+                         netdev_get_name(netdev_));
+        } else {
+            if (ppd->tp_status & TP_STATUS_VLAN_VALID) {
+                struct eth_header *eth;
+                bool double_tagged;
+                ovs_be16 vlan_tpid;
+
+                eth = dp_packet_data(buffer);
+                double_tagged = eth->eth_type == htons(ETH_TYPE_VLAN_8021Q);
+                if (ppd->tp_status & TP_STATUS_VLAN_TPID_VALID) {
+                    vlan_tpid = htons(ppd->hv1.tp_vlan_tpid);
+                } else if (double_tagged) {
+                    vlan_tpid = htons(ETH_TYPE_VLAN_8021AD);
+                } else {
+                    vlan_tpid = htons(ETH_TYPE_VLAN_8021Q);
+                }
+                eth_push_vlan(buffer, vlan_tpid, htons(ppd->hv1.tp_vlan_tci));
+            }
+            dp_packet_batch_add(batch, buffer);
+        }
+
+        fn_in_block++;
+        if (fn_in_block >= pbd->hdr.bh1.num_pkts) {
+            pbd->hdr.bh1.block_status = TP_STATUS_KERNEL;
+            block_num = (block_num + 1) %
+                            netdev->tp_rx_ring->req.tp_block_nr;
+            pbd = (struct tpacket_block_desc *)
+                     netdev->tp_rx_ring->rd[block_num].iov_base;
+            fn_in_block = 0;
+            ppd = NULL;
+        } else {
+            ppd = ALIGNED_CAST(struct tpacket3_hdr *,
+                   (uint8_t *) ppd + ppd->tp_next_offset);
+        }
+        i++;
+    }
+
+    netdev->tp_rx_ring->block_num = block_num;
+    netdev->tp_rx_ring->frame_num_in_block = fn_in_block;
+    netdev->tp_rx_ring->ppd = ppd;
+
+    return 0;
+}
+#endif /* HAVE_TPACKET_V3 */
+
 static int
 netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
                       int *qfill)
@@ -1466,9 +1742,15 @@  netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
     }
 
     dp_packet_batch_init(batch);
-    retval = (rx->is_tap
-              ? netdev_linux_batch_rxq_recv_tap(rx, mtu, batch)
-              : netdev_linux_batch_rxq_recv_sock(rx, mtu, batch));
+    if (rx->is_tap) {
+        retval = netdev_linux_batch_rxq_recv_tap(rx, mtu, batch);
+    } else {
+#ifndef HAVE_TPACKET_V3
+        retval = netdev_linux_batch_rxq_recv_sock(rx, mtu, batch);
+#else
+        retval = netdev_linux_batch_recv_tpacket(rx, mtu, batch);
+#endif
+    }
 
     if (retval) {
         if (retval != EAGAIN && retval != EMSGSIZE) {
@@ -1509,6 +1791,7 @@  netdev_linux_rxq_drain(struct netdev_rxq *rxq_)
     }
 }
 
+#ifndef HAVE_TPACKET_V3
 static int
 netdev_linux_sock_batch_send(int sock, int ifindex, bool tso, int mtu,
                              struct dp_packet_batch *batch)
@@ -1553,6 +1836,7 @@  netdev_linux_sock_batch_send(int sock, int ifindex, bool tso, int mtu,
     free(iov);
     return error;
 }
+#endif /* HAVE_TPACKET_V3 */
 
 /* Use the tap fd to send 'batch' to tap device 'netdev'.  Using the tap fd is
  * essential, because packets sent to a tap device with an AF_PACKET socket
@@ -1673,6 +1957,83 @@  netdev_linux_get_numa_id(const struct netdev *netdev_)
     return numa_id;
 }
 
+#ifdef HAVE_TPACKET_V3
+static inline int
+tpacket_tx_is_ready(void * next_frame)
+{
+    struct tpacket3_hdr *hdr = ALIGNED_CAST(struct tpacket3_hdr *, next_frame);
+
+    return !(hdr->tp_status & (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING));
+}
+
+static int
+netdev_linux_tpacket_batch_send(struct netdev *netdev_, bool tso, int mtu,
+                            struct dp_packet_batch *batch)
+{
+    struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+    struct dp_packet *packet;
+    int sockfd;
+    ssize_t bytes_sent;
+    int total_pkts = 0;
+
+    unsigned int frame_nr = netdev->tp_tx_ring->req.tp_frame_nr;
+    unsigned int frame_num = netdev->tp_tx_ring->frame_num;
+
+    /* The Linux tap driver returns EIO if the device is not up,
+     * so if the device is not up, don't waste time sending it.
+     * However, if the device is in another network namespace
+     * then OVS can't retrieve the state. In that case, send the
+     * packets anyway. */
+    if (netdev->present && !(netdev->ifi_flags & IFF_UP)) {
+        netdev->tx_dropped += dp_packet_batch_size(batch);
+        return 0;
+    }
+
+    DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+        size_t size;
+        struct tpacket3_hdr *ppd;
+
+        if (tso) {
+            netdev_linux_prepend_vnet_hdr(packet, mtu);
+        }
+
+        size = dp_packet_size(packet);
+        ppd = tpacket_get_next_frame(netdev->tp_tx_ring, frame_num);
+
+        if (!tpacket_tx_is_ready(ppd)) {
+            break;
+        }
+        ppd->tp_snaplen = size;
+        ppd->tp_len = size;
+        ppd->tp_next_offset = 0;
+
+        memcpy((uint8_t *)ppd + TPACKET3_HDRLEN - sizeof(struct sockaddr_ll),
+               dp_packet_data(packet),
+               size);
+        ppd->tp_status = TP_STATUS_SEND_REQUEST;
+        frame_num = (frame_num + 1) % frame_nr;
+        total_pkts++;
+    }
+    netdev->tp_tx_ring->frame_num = frame_num;
+
+    /* Kick-off transmits */
+    if (total_pkts != 0) {
+        sockfd = netdev->tp_tx_ring->sockfd;
+        bytes_sent = sendto(sockfd, NULL, 0, MSG_DONTWAIT, NULL, 0);
+        if (bytes_sent == -1 &&
+                errno != ENOBUFS && errno != EAGAIN) {
+            /*
+             * In case of an ENOBUFS/EAGAIN error all of the enqueued
+             * packets will be considered successful even though only some
+             * are sent.
+             */
+            netdev->tx_dropped += dp_packet_batch_size(batch);
+        }
+    }
+    return 0;
+}
+#endif
+
 /* Sends 'batch' on 'netdev'.  Returns 0 if successful, otherwise a positive
  * errno value.  Returns EAGAIN without blocking if the packet cannot be queued
  * immediately.  Returns EMSGSIZE if a partial packet was transmitted or if
@@ -1712,7 +2073,11 @@  netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED,
             goto free_batch;
         }
 
+#ifndef HAVE_TPACKET_V3
         error = netdev_linux_sock_batch_send(sock, ifindex, tso, mtu, batch);
+#else
+        error = netdev_linux_tpacket_batch_send(netdev_, tso, mtu, batch);
+#endif
     } else {
         error = netdev_linux_tap_batch_send(netdev_, tso, mtu, batch);
     }