diff mbox

[ovs-dev,v4,1/4] Add support for 802.1ad (QinQ tunneling)

Message ID 20160712153857.3559-2-shaw.leon@gmail.com
State Changes Requested
Headers show

Commit Message

Xiao Liang July 12, 2016, 3:38 p.m. UTC
Flow key handleing changes:
- Add VLAN header array in struct flow, to record multiple 802.1q VLAN
  headers.
- Add dpif multi-VLAN capability probing. If datapath supports multi-VLAN,
  increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.

Refacter VLAN handling in dpif-xlate:
- Introduce 'xvlan' to track VLAN stack during flow processing.
- Input and output VLAN translation according to the xbundle type.

Push VLAN action support:
- Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
- Support push_vlan on dot1q packets.

Add new port VLAN mode "dot1q-tunnel":
- Example:
    ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100
  Pushes another VLAN 100 header on packets (tagged and untagged) on ingress,
  and pops it on egress.
- Customer VLAN check:
    ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100 cvlans=10,20
  Only customer VLAN of 10 and 20 are allowed.

Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
---
 include/openvswitch/flow.h        |  13 +-
 include/openvswitch/ofp-actions.h |  10 +-
 include/openvswitch/packets.h     |   5 +
 lib/dpctl.c                       |  29 ++-
 lib/dpif-netdev.c                 |   7 +-
 lib/flow.c                        | 110 ++++++----
 lib/flow.h                        |  19 +-
 lib/match.c                       |  47 ++--
 lib/meta-flow.c                   |  22 +-
 lib/nx-match.c                    |  14 +-
 lib/odp-util.c                    | 234 +++++++++++++-------
 lib/odp-util.h                    |   4 +-
 lib/ofp-actions.c                 |  61 +++---
 lib/ofp-util.c                    |  56 ++---
 lib/tnl-ports.c                   |   2 +-
 ofproto/bond.c                    |   2 +-
 ofproto/ofproto-dpif-ipfix.c      |   6 +-
 ofproto/ofproto-dpif-rid.h        |   2 +-
 ofproto/ofproto-dpif-sflow.c      |   4 +-
 ofproto/ofproto-dpif-xlate.c      | 436 ++++++++++++++++++++++++++------------
 ofproto/ofproto-dpif-xlate.h      |   6 +-
 ofproto/ofproto-dpif.c            |  74 ++++++-
 ofproto/ofproto.h                 |   8 +-
 ovn/controller/pinctrl.c          |   5 +-
 tests/test-classifier.c           |  17 +-
 utilities/ovs-ofctl.c             |  29 +--
 vswitchd/bridge.c                 |  27 ++-
 vswitchd/vswitch.ovsschema        |  14 +-
 vswitchd/vswitch.xml              |  31 +++
 29 files changed, 884 insertions(+), 410 deletions(-)

Comments

Xiao Liang July 27, 2016, 3:51 a.m. UTC | #1
On Tue, Jul 26, 2016 at 12:52 AM, Ben Pfaff <blp@ovn.org> wrote:
> On Tue, Jul 12, 2016 at 11:38:54PM +0800, Xiao Liang wrote:
>> Flow key handleing changes:
>> - Add VLAN header array in struct flow, to record multiple 802.1q VLAN
>>   headers.
>> - Add dpif multi-VLAN capability probing. If datapath supports multi-VLAN,
>>   increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.
>>
>> Refacter VLAN handling in dpif-xlate:
>> - Introduce 'xvlan' to track VLAN stack during flow processing.
>> - Input and output VLAN translation according to the xbundle type.
>>
>> Push VLAN action support:
>> - Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
>> - Support push_vlan on dot1q packets.
>>
>> Add new port VLAN mode "dot1q-tunnel":
>> - Example:
>>     ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100
>>   Pushes another VLAN 100 header on packets (tagged and untagged) on ingress,
>>   and pops it on egress.
>> - Customer VLAN check:
>>     ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100 cvlans=10,20
>>   Only customer VLAN of 10 and 20 are allowed.
>>
>> Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
>
> The following incremental fixes some warnings from "sparse".  The one
> from odp-util.c seems petty, but the others correct real conceptual
> errors even if they would not be bugs in practice.
>
> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index 46ff6de..56a6145 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> @@ -5047,7 +5047,7 @@ parse_8021q_onward(const struct nlattr *attrs[OVS_KEY_ATTR_MAX + 1],
>
>      while (encaps < FLOW_MAX_VLAN_HEADERS &&
>             (is_mask?
> -            (src_flow->vlan[encaps].tci & htons(VLAN_CFI)) :
> +            (src_flow->vlan[encaps].tci & htons(VLAN_CFI)) != 0 :
>              eth_type_vlan(flow->dl_type))) {
>          /* Calculate fitness of outer attributes. */
>          encap  = (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_ENCAP)
> diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
> index c4b656e..7184184 100644
> --- a/lib/ofp-actions.c
> +++ b/lib/ofp-actions.c
> @@ -1666,7 +1666,7 @@ static void
>  format_PUSH_VLAN(const struct ofpact_push_vlan *push_vlan, struct ds *s)
>  {
>      ds_put_format(s, "%spush_vlan:%s%#"PRIx16,
> -                  colors.param, colors.end, htons(push_vlan->ethertype));
> +                  colors.param, colors.end, ntohs(push_vlan->ethertype));
>  }
>
>  /* Action structure for OFPAT10_SET_DL_SRC/DST and OFPAT11_SET_DL_SRC/DST. */
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index fd41ac8..90cf74a 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -1920,7 +1920,7 @@ xvlan_extract(const struct flow *flow, struct xvlan *xvlan)
>                  !(flow->vlan[i].tci & htons(VLAN_CFI))) {
>              break;
>          }
> -        xvlan[i].tpid = htons(flow->vlan[i].tpid);
> +        xvlan[i].tpid = ntohs(flow->vlan[i].tpid);
>          xvlan[i].vid = vlan_tci_to_vid(flow->vlan[i].tci);
>          xvlan[i].pcp = flow->vlan[i].tci & htons(VLAN_PCP_MASK);
>      }

Thanks for pointing it out.
Ben Pfaff July 27, 2016, 4:50 a.m. UTC | #2
On Wed, Jul 27, 2016 at 11:51:25AM +0800, Xiao Liang wrote:
> On Tue, Jul 26, 2016 at 12:52 AM, Ben Pfaff <blp@ovn.org> wrote:
> > On Tue, Jul 12, 2016 at 11:38:54PM +0800, Xiao Liang wrote:
> >> Flow key handleing changes:
> >> - Add VLAN header array in struct flow, to record multiple 802.1q VLAN
> >>   headers.
> >> - Add dpif multi-VLAN capability probing. If datapath supports multi-VLAN,
> >>   increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.
> >>
> >> Refacter VLAN handling in dpif-xlate:
> >> - Introduce 'xvlan' to track VLAN stack during flow processing.
> >> - Input and output VLAN translation according to the xbundle type.
> >>
> >> Push VLAN action support:
> >> - Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
> >> - Support push_vlan on dot1q packets.
> >>
> >> Add new port VLAN mode "dot1q-tunnel":
> >> - Example:
> >>     ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100
> >>   Pushes another VLAN 100 header on packets (tagged and untagged) on ingress,
> >>   and pops it on egress.
> >> - Customer VLAN check:
> >>     ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100 cvlans=10,20
> >>   Only customer VLAN of 10 and 20 are allowed.
> >>
> >> Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
> >
> > The following incremental fixes some warnings from "sparse".  The one
> > from odp-util.c seems petty, but the others correct real conceptual
> > errors even if they would not be bugs in practice.
> >
> > diff --git a/lib/odp-util.c b/lib/odp-util.c
> > index 46ff6de..56a6145 100644
> > --- a/lib/odp-util.c
> > +++ b/lib/odp-util.c
> > @@ -5047,7 +5047,7 @@ parse_8021q_onward(const struct nlattr *attrs[OVS_KEY_ATTR_MAX + 1],
> >
> >      while (encaps < FLOW_MAX_VLAN_HEADERS &&
> >             (is_mask?
> > -            (src_flow->vlan[encaps].tci & htons(VLAN_CFI)) :
> > +            (src_flow->vlan[encaps].tci & htons(VLAN_CFI)) != 0 :
> >              eth_type_vlan(flow->dl_type))) {
> >          /* Calculate fitness of outer attributes. */
> >          encap  = (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_ENCAP)
> > diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
> > index c4b656e..7184184 100644
> > --- a/lib/ofp-actions.c
> > +++ b/lib/ofp-actions.c
> > @@ -1666,7 +1666,7 @@ static void
> >  format_PUSH_VLAN(const struct ofpact_push_vlan *push_vlan, struct ds *s)
> >  {
> >      ds_put_format(s, "%spush_vlan:%s%#"PRIx16,
> > -                  colors.param, colors.end, htons(push_vlan->ethertype));
> > +                  colors.param, colors.end, ntohs(push_vlan->ethertype));
> >  }
> >
> >  /* Action structure for OFPAT10_SET_DL_SRC/DST and OFPAT11_SET_DL_SRC/DST. */
> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > index fd41ac8..90cf74a 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -1920,7 +1920,7 @@ xvlan_extract(const struct flow *flow, struct xvlan *xvlan)
> >                  !(flow->vlan[i].tci & htons(VLAN_CFI))) {
> >              break;
> >          }
> > -        xvlan[i].tpid = htons(flow->vlan[i].tpid);
> > +        xvlan[i].tpid = ntohs(flow->vlan[i].tpid);
> >          xvlan[i].vid = vlan_tci_to_vid(flow->vlan[i].tci);
> >          xvlan[i].pcp = flow->vlan[i].tci & htons(VLAN_PCP_MASK);
> >      }
> 
> Thanks for pointing it out.

I'm working on a thorough review of this series and hope to provide it
tomorro.w
Xiao Liang July 31, 2016, 12:22 a.m. UTC | #3
Thanks! I'm working on code changes according to your comments. I
think we need more discussion about the ethertype matching. Please see
inline.

Thanks,
Xiao

On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:
> On Tue, Jul 12, 2016 at 11:38:54PM +0800, Xiao Liang wrote:
>> Flow key handleing changes:
>> - Add VLAN header array in struct flow, to record multiple 802.1q VLAN
>>   headers.
>> - Add dpif multi-VLAN capability probing. If datapath supports multi-VLAN,
>>   increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.
>>
>> Refacter VLAN handling in dpif-xlate:
>> - Introduce 'xvlan' to track VLAN stack during flow processing.
>> - Input and output VLAN translation according to the xbundle type.
>>
>> Push VLAN action support:
>> - Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
>> - Support push_vlan on dot1q packets.
>>
>> Add new port VLAN mode "dot1q-tunnel":
>> - Example:
>>     ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100
>>   Pushes another VLAN 100 header on packets (tagged and untagged) on ingress,
>>   and pops it on egress.
>> - Customer VLAN check:
>>     ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100 cvlans=10,20
>>   Only customer VLAN of 10 and 20 are allowed.
>>
>> Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
>
> Thanks for working on this!  I can see that it was a great deal of
> work.  In addition to the "sparse" related fixes from a few days ago,
> I have some more detailed comments.
>
> It looks like this patch causes some tests to fail and then later
> patches fix them.  This is not our practice: a patch should never make
> tests fails (at least not intentionally).  Instead, if a patch
> requires changes to tests, then the same patch should update the
> tests.  If the later patches are required to fix tests, they should be
> folded into this one.

I will squash the commits.

>
> Our practice is to give arrays names that are plural, like 'vlans'.
>
> Let's talk about VLANs in the flow struct.  The use of an array of
> TPID+TCI seems fine.  I think, however, that we need to define and
> enforce an invariant: if vlan[i] matches on anything, for i > 0, then
> every vlan[j], for j < i, needs to match on (tci & VLAN_CFI) != 0.
> Otherwise we can end up with nonsensical matches like one that matches
> on vlan[0] being not present (i.e. match on (vlan[0] & VLAN_CFI) == 0)
> but vlan[1] being present (i.e. match on (vlan[1] & VLAN_CFI) != 0).
> We enforce a similar invariant for MPLS.
>

You are right, thanks.

> I'm concerned about backward compatibility.  Consider some application
> built on Open vSwitch using OpenFlow.  Today, it can distinguish
> single-tagged and double-tagged packets (that use outer Ethertype
> 0x8100), as follows:
>
>     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
>       dl_type.
>
>     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
>
> With this patch, this won't work, because neither kind of packet has a
> VLAN dl_type.  Instead, applications need to first match on the outer
> VLAN, then pop it off, then match on the inner VLAN.  This difference
> could lead to security problems in applications.  An application
> might, for example, want to pop an outer VLAN and forward the packet,
> but only if there is no inner VLAN.  If it is implemented according to
> the previous rules, then it will not notice the inner VLAN.

Maybe some applications are implemented this way, but they are
probably wrong. OpenFlow says eth_type is "ethernet type of the
OpenFlow packet payload, after VLAN tags", so it is the payload
ethtype for a double-tagged packet. It's the same for single-tagged
packet: application must explicitly match vlan_tci to decide whether
it has VLAN tag.
The problem is that there is currently no way to peek inner VLAN
without popping the outer one. I heard from Tom that you have plan to
support TPID matching. Is it possible to add extension match fields
like TPID1, TPID2 to match both TPIDs? This may also provide a way to
differentiate 0x8100 and 0x88a8.

>
> There are probably multiple ways to deal with this problem.  Let me
> propose one.  It is somewhat awkward, so there might be a more
> graceful way.  Basically the idea is to retain the current view from
> an OpenFlow perspective:
>
>     - Packet without tags: vlan_tci == 0, dl_type is non-VLAN
>     - Packet with 1 tag:   vlan_tci != 0, dl_type is non-VLAN
>     - Packet with 2+ tags: vlan_tci != 0, dl_type is 2nd VLAN
>
> So, when a packet with 2 tags pops off the outermost one, dl_type
> becomes the inner Ethertype (such as IPv4) and vlan_tci becomes the
> single remaining VLAN.
>
> I think there's another backward compatibility risk.  This patch adds
> support for TPID 0x88a8 without adding any way for OpenFlow
> applications to distinguish 88a8 from 8100.  This means that
> applications that previously handled 0x8100 VLANs will now handle
> 0x88a8 VLANs whereas previously they were able to filter these out.  I
> don't have a wonderful idea on how to handle this but I think we need
> some way.  (The OpenFlow spec is totally unhelpful here.)
>
> The tests seem incomplete in that they do not seem to add much in the
> way of tests for OpenFlow handling of multiple VLANs.  I'd feel more
> confident if it added some.
>

I found some tests added in Tom's patches. I'm trying to include them
and other tests.

> In a few places it would be more convenient to make struct
> flow_vlan_hdr into a union, like this:
>
>     union flow_vlan_hdr {
>         ovs_be32 qtag;
>         struct {
>             ovs_be16 tpid;  /* ETH_TYPE_VLAN_DOT1Q or ETH_TYPE_DOT1AD */
>             ovs_be16 tci;
>         };
>     };

That's a good idea.

>
> We could take advantage of it like this, for example:
>
> @@ -326,16 +326,12 @@ parse_mpls(const void **datap, size_t *sizep)
>  }
>
>  static inline bool
> -parse_vlan(const void **datap, size_t *sizep, struct flow_vlan_hdr *vlan_hdrs)
> +parse_vlan(const void **datap, size_t *sizep, union flow_vlan_hdr *vlan_hdrs)
>  {
>      int encaps;
>      const ovs_be16 *eth_type;
> -    struct qtag_prefix {
> -        ovs_be16 eth_type;
> -        ovs_be16 tci;
> -    };
>
> -    memset(vlan_hdrs, 0, sizeof(struct flow_vlan_hdr) * FLOW_MAX_VLAN_HEADERS);
> +    memset(vlan_hdrs, 0, sizeof(union flow_vlan_hdr) * FLOW_MAX_VLAN_HEADERS);
>      data_pull(datap, sizep, ETH_ADDR_LEN * 2);
>
>      eth_type = *datap;
> @@ -343,11 +339,11 @@ parse_vlan(const void **datap, size_t *sizep, struct flow_vlan_hdr *vlan_hdrs)
>      for (encaps = 0;
>           eth_type_vlan(*eth_type) && encaps < FLOW_MAX_VLAN_HEADERS;
>           encaps++) {
> -        if (OVS_LIKELY(*sizep
> -                       >= sizeof(struct qtag_prefix) + sizeof(ovs_be16))) {
> -            const struct qtag_prefix *qp = data_pull(datap, sizep, sizeof *qp);
> -            vlan_hdrs[encaps].tpid = qp->eth_type;
> -            vlan_hdrs[encaps].tci = qp->tci | htons(VLAN_CFI);
> +        if (OVS_LIKELY(*sizep >= sizeof(ovs_be32) + sizeof(ovs_be16))) {
> +            const ovs_16aligned_be32 *qp = data_pull(datap, sizep, sizeof *qp);
> +            vlan_hdrs[encaps].qtag = get_16aligned_be32(qp);
> +            vlan_hdrs[encaps].tci |= htons(VLAN_CFI);
> +
>              eth_type = *datap;
>          } else {
>              return false;
> diff --git a/lib/flow.h b/lib/flow.h
> index 4882e26..4851a32 100644
> --- a/lib/flow.h
> +++ b/lib/flow.h
> @@ -735,8 +735,8 @@ static inline uint16_t
>  miniflow_get_vid(const struct miniflow *flow, size_t n)
>  {
>      if (n < FLOW_MAX_VLAN_HEADERS) {
> -        uint32_t u32 = MINIFLOW_GET_U32(flow, vlan[n]);
> -        return vlan_tci_to_vid(((struct flow_vlan_hdr *)&u32)->tci);
> +        union flow_vlan_hdr hdr = { .qtag = MINIFLOW_GET_BE32(flow, vlan[n]) };
> +        return vlan_tci_to_vid(hdr.tci);
>      }
>      return 0;
>  }
> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index 56a6145..6838d7d 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> @@ -5560,8 +5560,7 @@ commit_vlan_action(const struct flow* flow, struct flow *base,
>      for (base_n--, flow_n--;
>           base_n >= 0 && flow_n >= 0;
>           base_n--, flow_n--) {
> -        if (memcmp(&base->vlan[base_n], &flow->vlan[flow_n],
> -                   sizeof(struct flow_vlan_hdr))) {
> +        if (base->vlan[base_n].qtag != flow->vlan[flow_n].qtag) {
>              break;
>          }
>      }
> @@ -5569,7 +5568,7 @@ commit_vlan_action(const struct flow* flow, struct flow *base,
>      /* Pop all mismatching vlan of base, push thoses of flow */
>      for (; base_n >= 0; base_n--) {
>          nl_msg_put_flag(odp_actions, OVS_ACTION_ATTR_POP_VLAN);
> -        memset(&wc->masks.vlan[base_n], 0xff, sizeof(wc->masks.vlan[base_n]));
> +        wc->masks.vlan[base_n] = OVS_BE32_MAX;
>      }
>
>      for (; flow_n >= 0; flow_n--) {
>
> It seems risky to fail to add a way for match_format() to print out
> matches past the first VLAN.  It is likely to cause great confusion at
> some point.
>
Yes, I need to think carefully about this.

> This code uses the term "shift" for what is usually termed "push".  A
> "shift" can go in either direction.  I'd use "push".
>
Yes, "push" looks symmetric. I used "shift" because it leaves room for
a header rather than push data.

> I suggest that flow_{shift,push}_vlan() should take an optional 'wc'
> argument like so many other functions that modify flows.  This would
> make it more like the MPLS code and thus easier to understand.
>
> I'm not confident about the change to handle_packet_upcall().
>
> struct xvlan xvlan might be easier to use as just a flow_vlan_hdr.

flow_vlan_hdr is in big-endian and in which vlan-id/pcp are encoded. I
defined struct xvlan to avoid conversion each time.

>
> The commit_vlan_action() function would be better if it was more like
> commit_mpls_action(), to make it easier to verify correctness.  The
> main difference is that at the datapath level VLAN only has push and
> pop, not "set", but that's a straightforward change.  I'd introduce
> functions to count and find common VLAN tags, like we have for MPLS.

Agree, extracting a function makes it more clear.

>
> I'll probably have further comments on later versions of this patch.
>
> Thanks again for working on this!
Eric Garver Aug. 3, 2016, 2:25 p.m. UTC | #4
On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang wrote:
> Thanks! I'm working on code changes according to your comments. I
> think we need more discussion about the ethertype matching. Please see
> inline.

Can we reach an agreement on the ethertype matching? It has implications
on the kernel piece as well.

> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:
> > On Tue, Jul 12, 2016 at 11:38:54PM +0800, Xiao Liang wrote:
.. snip ..
> > I'm concerned about backward compatibility.  Consider some application
> > built on Open vSwitch using OpenFlow.  Today, it can distinguish
> > single-tagged and double-tagged packets (that use outer Ethertype
> > 0x8100), as follows:
> >
> >     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
> >       dl_type.
> >
> >     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
> >
> > With this patch, this won't work, because neither kind of packet has a
> > VLAN dl_type.  Instead, applications need to first match on the outer
> > VLAN, then pop it off, then match on the inner VLAN.  This difference
> > could lead to security problems in applications.  An application
> > might, for example, want to pop an outer VLAN and forward the packet,
> > but only if there is no inner VLAN.  If it is implemented according to
> > the previous rules, then it will not notice the inner VLAN.
> 
> Maybe some applications are implemented this way, but they are
> probably wrong. OpenFlow says eth_type is "ethernet type of the
> OpenFlow packet payload, after VLAN tags", so it is the payload
> ethtype for a double-tagged packet. It's the same for single-tagged
> packet: application must explicitly match vlan_tci to decide whether
> it has VLAN tag.
> The problem is that there is currently no way to peek inner VLAN
> without popping the outer one. I heard from Tom that you have plan to
> support TPID matching. Is it possible to add extension match fields
> like TPID1, TPID2 to match both TPIDs? This may also provide a way to
> differentiate 0x8100 and 0x88a8.

I'm in agreement with Xiao here.

> > There are probably multiple ways to deal with this problem.  Let me
> > propose one.  It is somewhat awkward, so there might be a more
> > graceful way.  Basically the idea is to retain the current view from
> > an OpenFlow perspective:
> >
> >     - Packet without tags: vlan_tci == 0, dl_type is non-VLAN
> >     - Packet with 1 tag:   vlan_tci != 0, dl_type is non-VLAN
> >     - Packet with 2+ tags: vlan_tci != 0, dl_type is 2nd VLAN
> >
> > So, when a packet with 2 tags pops off the outermost one, dl_type
> > becomes the inner Ethertype (such as IPv4) and vlan_tci becomes the
> > single remaining VLAN.
> >
> > I think there's another backward compatibility risk.  This patch adds
> > support for TPID 0x88a8 without adding any way for OpenFlow
> > applications to distinguish 88a8 from 8100.  This means that
> > applications that previously handled 0x8100 VLANs will now handle
> > 0x88a8 VLANs whereas previously they were able to filter these out.  I
> > don't have a wonderful idea on how to handle this but I think we need
> > some way.  (The OpenFlow spec is totally unhelpful here.)

The OF 1.1 spec reads "the outermost VLAN tag", which is ambiguous with
802.1ad.

However, OF 1.5 clarifies to say the "outermost _802.1q_ tag". See pages
77 and 205 of the spec. To me this implies that OF does not specify
matching the 802.1ad tag at all (i.e vlan_tci=.. should match frames
with outer TPID 0x8100, but not 0x88a8). It does specify pushing/popping
802.1ad though.

I believe if we follow the OF 1.5 definition it removes most (all?)
backwards compatibility issues raised by Ben. But we also can't match on
0x88a8 VLANs. Maybe that can be solved with out-of-spec explicit TPID
matching like Xiao mentioned above.

> >
> > The tests seem incomplete in that they do not seem to add much in the
> > way of tests for OpenFlow handling of multiple VLANs.  I'd feel more
> > confident if it added some.
> >
> 
> I found some tests added in Tom's patches. I'm trying to include them
> and other tests.
.. snip ..
Ben Pfaff Aug. 3, 2016, 10:07 p.m. UTC | #5
Thanks for the replies, I have some further responses below.

On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang wrote:
> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:
> > I'm concerned about backward compatibility.  Consider some application
> > built on Open vSwitch using OpenFlow.  Today, it can distinguish
> > single-tagged and double-tagged packets (that use outer Ethertype
> > 0x8100), as follows:
> >
> >     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
> >       dl_type.
> >
> >     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
> >
> > With this patch, this won't work, because neither kind of packet has a
> > VLAN dl_type.  Instead, applications need to first match on the outer
> > VLAN, then pop it off, then match on the inner VLAN.  This difference
> > could lead to security problems in applications.  An application
> > might, for example, want to pop an outer VLAN and forward the packet,
> > but only if there is no inner VLAN.  If it is implemented according to
> > the previous rules, then it will not notice the inner VLAN.
> 
> Maybe some applications are implemented this way, but they are
> probably wrong. OpenFlow says eth_type is "ethernet type of the
> OpenFlow packet payload, after VLAN tags", so it is the payload
> ethtype for a double-tagged packet. It's the same for single-tagged
> packet: application must explicitly match vlan_tci to decide whether
> it has VLAN tag.

OpenFlow does say that, but it's inconsistent with long-standing Open
vSwitch practice and will cause backward incompatibility and, worse,
security problems.  If we need the official OpenFlow behavior then I
think we'll need to add a feature switch to turn on that behavior.

> > This code uses the term "shift" for what is usually termed "push".  A
> > "shift" can go in either direction.  I'd use "push".
> >
> Yes, "push" looks symmetric. I used "shift" because it leaves room for
> a header rather than push data.

Sometimes we use the longer name "push_uninit" in Open vSwitch to make
it clear that what is being pushed is not initialized, for example see
dp_packet_push_uninit(), nl_msg_push_uninit(), ofpbuf_push_uninit() and
the related ds_put_uninit().

However, when I look at your calls to the "shift" function, it looks
like most of them could easily be written to provide the new header
contents as an argument.
Ben Pfaff Aug. 3, 2016, 10:18 p.m. UTC | #6
On Wed, Aug 03, 2016 at 10:25:21AM -0400, Eric Garver wrote:
> On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang wrote:
> > On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:
> > > On Tue, Jul 12, 2016 at 11:38:54PM +0800, Xiao Liang wrote:
> .. snip ..
> > > I'm concerned about backward compatibility.  Consider some application
> > > built on Open vSwitch using OpenFlow.  Today, it can distinguish
> > > single-tagged and double-tagged packets (that use outer Ethertype
> > > 0x8100), as follows:
> > >
> > >     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
> > >       dl_type.
> > >
> > >     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
> > >
> > > With this patch, this won't work, because neither kind of packet has a
> > > VLAN dl_type.  Instead, applications need to first match on the outer
> > > VLAN, then pop it off, then match on the inner VLAN.  This difference
> > > could lead to security problems in applications.  An application
> > > might, for example, want to pop an outer VLAN and forward the packet,
> > > but only if there is no inner VLAN.  If it is implemented according to
> > > the previous rules, then it will not notice the inner VLAN.
> > 
> > Maybe some applications are implemented this way, but they are
> > probably wrong. OpenFlow says eth_type is "ethernet type of the
> > OpenFlow packet payload, after VLAN tags", so it is the payload
> > ethtype for a double-tagged packet. It's the same for single-tagged
> > packet: application must explicitly match vlan_tci to decide whether
> > it has VLAN tag.
> > The problem is that there is currently no way to peek inner VLAN
> > without popping the outer one. I heard from Tom that you have plan to
> > support TPID matching. Is it possible to add extension match fields
> > like TPID1, TPID2 to match both TPIDs? This may also provide a way to
> > differentiate 0x8100 and 0x88a8.
> 
> I'm in agreement with Xiao here.

I gave a response directly to Xiao.  Backward incompatibility is one
thing but adding gratuitous security vulnerabilities to existing
applications that have working since day one is not acceptable.

> > > There are probably multiple ways to deal with this problem.  Let me
> > > propose one.  It is somewhat awkward, so there might be a more
> > > graceful way.  Basically the idea is to retain the current view from
> > > an OpenFlow perspective:
> > >
> > >     - Packet without tags: vlan_tci == 0, dl_type is non-VLAN
> > >     - Packet with 1 tag:   vlan_tci != 0, dl_type is non-VLAN
> > >     - Packet with 2+ tags: vlan_tci != 0, dl_type is 2nd VLAN
> > >
> > > So, when a packet with 2 tags pops off the outermost one, dl_type
> > > becomes the inner Ethertype (such as IPv4) and vlan_tci becomes the
> > > single remaining VLAN.
> > >
> > > I think there's another backward compatibility risk.  This patch adds
> > > support for TPID 0x88a8 without adding any way for OpenFlow
> > > applications to distinguish 88a8 from 8100.  This means that
> > > applications that previously handled 0x8100 VLANs will now handle
> > > 0x88a8 VLANs whereas previously they were able to filter these out.  I
> > > don't have a wonderful idea on how to handle this but I think we need
> > > some way.  (The OpenFlow spec is totally unhelpful here.)
> 
> The OF 1.1 spec reads "the outermost VLAN tag", which is ambiguous with
> 802.1ad.
> 
> However, OF 1.5 clarifies to say the "outermost _802.1q_ tag". See pages
> 77 and 205 of the spec. To me this implies that OF does not specify
> matching the 802.1ad tag at all (i.e vlan_tci=.. should match frames
> with outer TPID 0x8100, but not 0x88a8). It does specify pushing/popping
> 802.1ad though.
> 
> I believe if we follow the OF 1.5 definition it removes most (all?)
> backwards compatibility issues raised by Ben. But we also can't match on
> 0x88a8 VLANs. Maybe that can be solved with out-of-spec explicit TPID
> matching like Xiao mentioned above.

The OpenFlow specs aren't written from this kind of language-lawyer
standpoint.  When they say 802.1Q they just mean VLAN.
Xiao Liang Aug. 7, 2016, 2:54 a.m. UTC | #7
On Thu, Aug 4, 2016 at 6:07 AM, Ben Pfaff <blp@ovn.org> wrote:
> Thanks for the replies, I have some further responses below.
>
> On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang wrote:
>> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:
>> > I'm concerned about backward compatibility.  Consider some application
>> > built on Open vSwitch using OpenFlow.  Today, it can distinguish
>> > single-tagged and double-tagged packets (that use outer Ethertype
>> > 0x8100), as follows:
>> >
>> >     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
>> >       dl_type.
>> >
>> >     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
>> >
>> > With this patch, this won't work, because neither kind of packet has a
>> > VLAN dl_type.  Instead, applications need to first match on the outer
>> > VLAN, then pop it off, then match on the inner VLAN.  This difference
>> > could lead to security problems in applications.  An application
>> > might, for example, want to pop an outer VLAN and forward the packet,
>> > but only if there is no inner VLAN.  If it is implemented according to
>> > the previous rules, then it will not notice the inner VLAN.
>>
>> Maybe some applications are implemented this way, but they are
>> probably wrong. OpenFlow says eth_type is "ethernet type of the
>> OpenFlow packet payload, after VLAN tags", so it is the payload
>> ethtype for a double-tagged packet. It's the same for single-tagged
>> packet: application must explicitly match vlan_tci to decide whether
>> it has VLAN tag.
>
> OpenFlow does say that, but it's inconsistent with long-standing Open
> vSwitch practice and will cause backward incompatibility and, worse,
> security problems.  If we need the official OpenFlow behavior then I
> think we'll need to add a feature switch to turn on that behavior.

It's a good idea to add a switch. I think QinQ can be disabled and
fallback to current behavior if the switch is off, since these legacy
applications are not written for QinQ.

>
>> > This code uses the term "shift" for what is usually termed "push".  A
>> > "shift" can go in either direction.  I'd use "push".
>> >
>> Yes, "push" looks symmetric. I used "shift" because it leaves room for
>> a header rather than push data.
>
> Sometimes we use the longer name "push_uninit" in Open vSwitch to make
> it clear that what is being pushed is not initialized, for example see
> dp_packet_push_uninit(), nl_msg_push_uninit(), ofpbuf_push_uninit() and
> the related ds_put_uninit().
>
> However, when I look at your calls to the "shift" function, it looks
> like most of them could easily be written to provide the new header
> contents as an argument.

Constructing and passing a new struct is a bit redundant. I think
push_uninit is good and clear.

Thanks,
Xiao
Ben Pfaff Aug. 7, 2016, 3:04 a.m. UTC | #8
On Sun, Aug 07, 2016 at 10:54:00AM +0800, Xiao Liang wrote:
> On Thu, Aug 4, 2016 at 6:07 AM, Ben Pfaff <blp@ovn.org> wrote:
> > Thanks for the replies, I have some further responses below.
> >
> > On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang wrote:
> >> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:
> >> > I'm concerned about backward compatibility.  Consider some application
> >> > built on Open vSwitch using OpenFlow.  Today, it can distinguish
> >> > single-tagged and double-tagged packets (that use outer Ethertype
> >> > 0x8100), as follows:
> >> >
> >> >     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
> >> >       dl_type.
> >> >
> >> >     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
> >> >
> >> > With this patch, this won't work, because neither kind of packet has a
> >> > VLAN dl_type.  Instead, applications need to first match on the outer
> >> > VLAN, then pop it off, then match on the inner VLAN.  This difference
> >> > could lead to security problems in applications.  An application
> >> > might, for example, want to pop an outer VLAN and forward the packet,
> >> > but only if there is no inner VLAN.  If it is implemented according to
> >> > the previous rules, then it will not notice the inner VLAN.
> >>
> >> Maybe some applications are implemented this way, but they are
> >> probably wrong. OpenFlow says eth_type is "ethernet type of the
> >> OpenFlow packet payload, after VLAN tags", so it is the payload
> >> ethtype for a double-tagged packet. It's the same for single-tagged
> >> packet: application must explicitly match vlan_tci to decide whether
> >> it has VLAN tag.
> >
> > OpenFlow does say that, but it's inconsistent with long-standing Open
> > vSwitch practice and will cause backward incompatibility and, worse,
> > security problems.  If we need the official OpenFlow behavior then I
> > think we'll need to add a feature switch to turn on that behavior.
> 
> It's a good idea to add a switch. I think QinQ can be disabled and
> fallback to current behavior if the switch is off, since these legacy
> applications are not written for QinQ.

OK.  I'm happy with that solution, as long as the implementation is
clean.

> >> > This code uses the term "shift" for what is usually termed "push".  A
> >> > "shift" can go in either direction.  I'd use "push".
> >> >
> >> Yes, "push" looks symmetric. I used "shift" because it leaves room for
> >> a header rather than push data.
> >
> > Sometimes we use the longer name "push_uninit" in Open vSwitch to make
> > it clear that what is being pushed is not initialized, for example see
> > dp_packet_push_uninit(), nl_msg_push_uninit(), ofpbuf_push_uninit() and
> > the related ds_put_uninit().
> >
> > However, when I look at your calls to the "shift" function, it looks
> > like most of them could easily be written to provide the new header
> > contents as an argument.
> 
> Constructing and passing a new struct is a bit redundant. I think
> push_uninit is good and clear.

OK, but passing a pair of ovs_be16s to a function isn't so unusual.
Eric Garver Aug. 19, 2016, 8:19 p.m. UTC | #9
On Sat, Aug 06, 2016 at 08:04:44PM -0700, Ben Pfaff wrote:
> On Sun, Aug 07, 2016 at 10:54:00AM +0800, Xiao Liang wrote:
> > On Thu, Aug 4, 2016 at 6:07 AM, Ben Pfaff <blp@ovn.org> wrote:
> > > Thanks for the replies, I have some further responses below.
> > >
> > > On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang wrote:
> > >> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:
> > >> > I'm concerned about backward compatibility.  Consider some application
> > >> > built on Open vSwitch using OpenFlow.  Today, it can distinguish
> > >> > single-tagged and double-tagged packets (that use outer Ethertype
> > >> > 0x8100), as follows:
> > >> >
> > >> >     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
> > >> >       dl_type.
> > >> >
> > >> >     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
> > >> >
> > >> > With this patch, this won't work, because neither kind of packet has a
> > >> > VLAN dl_type.  Instead, applications need to first match on the outer
> > >> > VLAN, then pop it off, then match on the inner VLAN.  This difference
> > >> > could lead to security problems in applications.  An application
> > >> > might, for example, want to pop an outer VLAN and forward the packet,
> > >> > but only if there is no inner VLAN.  If it is implemented according to
> > >> > the previous rules, then it will not notice the inner VLAN.
> > >>
> > >> Maybe some applications are implemented this way, but they are
> > >> probably wrong. OpenFlow says eth_type is "ethernet type of the
> > >> OpenFlow packet payload, after VLAN tags", so it is the payload
> > >> ethtype for a double-tagged packet. It's the same for single-tagged
> > >> packet: application must explicitly match vlan_tci to decide whether
> > >> it has VLAN tag.
> > >
> > > OpenFlow does say that, but it's inconsistent with long-standing Open
> > > vSwitch practice and will cause backward incompatibility and, worse,
> > > security problems.  If we need the official OpenFlow behavior then I
> > > think we'll need to add a feature switch to turn on that behavior.
> > 
> > It's a good idea to add a switch. I think QinQ can be disabled and
> > fallback to current behavior if the switch is off, since these legacy
> > applications are not written for QinQ.
> 
> OK.  I'm happy with that solution, as long as the implementation is
> clean.

Is a new flag, i.e. OVS_DP_F_8021AD, passed via
OVS_DP_ATTR_USER_FEATURES an appropriate way to communicate this to the
kernel?

> > >> > This code uses the term "shift" for what is usually termed "push".  A
> > >> > "shift" can go in either direction.  I'd use "push".
> > >> >
> > >> Yes, "push" looks symmetric. I used "shift" because it leaves room for
> > >> a header rather than push data.
> > >
> > > Sometimes we use the longer name "push_uninit" in Open vSwitch to make
> > > it clear that what is being pushed is not initialized, for example see
> > > dp_packet_push_uninit(), nl_msg_push_uninit(), ofpbuf_push_uninit() and
> > > the related ds_put_uninit().
> > >
> > > However, when I look at your calls to the "shift" function, it looks
> > > like most of them could easily be written to provide the new header
> > > contents as an argument.
> > 
> > Constructing and passing a new struct is a bit redundant. I think
> > push_uninit is good and clear.
> 
> OK, but passing a pair of ovs_be16s to a function isn't so unusual.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
Ben Pfaff Aug. 19, 2016, 8:24 p.m. UTC | #10
On Fri, Aug 19, 2016 at 04:19:31PM -0400, Eric Garver wrote:
> On Sat, Aug 06, 2016 at 08:04:44PM -0700, Ben Pfaff wrote:
> > On Sun, Aug 07, 2016 at 10:54:00AM +0800, Xiao Liang wrote:
> > > On Thu, Aug 4, 2016 at 6:07 AM, Ben Pfaff <blp@ovn.org> wrote:
> > > > Thanks for the replies, I have some further responses below.
> > > >
> > > > On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang wrote:
> > > >> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:
> > > >> > I'm concerned about backward compatibility.  Consider some application
> > > >> > built on Open vSwitch using OpenFlow.  Today, it can distinguish
> > > >> > single-tagged and double-tagged packets (that use outer Ethertype
> > > >> > 0x8100), as follows:
> > > >> >
> > > >> >     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
> > > >> >       dl_type.
> > > >> >
> > > >> >     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
> > > >> >
> > > >> > With this patch, this won't work, because neither kind of packet has a
> > > >> > VLAN dl_type.  Instead, applications need to first match on the outer
> > > >> > VLAN, then pop it off, then match on the inner VLAN.  This difference
> > > >> > could lead to security problems in applications.  An application
> > > >> > might, for example, want to pop an outer VLAN and forward the packet,
> > > >> > but only if there is no inner VLAN.  If it is implemented according to
> > > >> > the previous rules, then it will not notice the inner VLAN.
> > > >>
> > > >> Maybe some applications are implemented this way, but they are
> > > >> probably wrong. OpenFlow says eth_type is "ethernet type of the
> > > >> OpenFlow packet payload, after VLAN tags", so it is the payload
> > > >> ethtype for a double-tagged packet. It's the same for single-tagged
> > > >> packet: application must explicitly match vlan_tci to decide whether
> > > >> it has VLAN tag.
> > > >
> > > > OpenFlow does say that, but it's inconsistent with long-standing Open
> > > > vSwitch practice and will cause backward incompatibility and, worse,
> > > > security problems.  If we need the official OpenFlow behavior then I
> > > > think we'll need to add a feature switch to turn on that behavior.
> > > 
> > > It's a good idea to add a switch. I think QinQ can be disabled and
> > > fallback to current behavior if the switch is off, since these legacy
> > > applications are not written for QinQ.
> > 
> > OK.  I'm happy with that solution, as long as the implementation is
> > clean.
> 
> Is a new flag, i.e. OVS_DP_F_8021AD, passed via
> OVS_DP_ATTR_USER_FEATURES an appropriate way to communicate this to the
> kernel?

Why does the kernel need to know?
Eric Garver Aug. 19, 2016, 8:42 p.m. UTC | #11
On Fri, Aug 19, 2016 at 01:24:10PM -0700, Ben Pfaff wrote:
> On Fri, Aug 19, 2016 at 04:19:31PM -0400, Eric Garver wrote:
> > On Sat, Aug 06, 2016 at 08:04:44PM -0700, Ben Pfaff wrote:
> > > On Sun, Aug 07, 2016 at 10:54:00AM +0800, Xiao Liang wrote:
> > > > On Thu, Aug 4, 2016 at 6:07 AM, Ben Pfaff <blp@ovn.org> wrote:
> > > > > Thanks for the replies, I have some further responses below.
> > > > >
> > > > > On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang wrote:
> > > > >> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:
> > > > >> > I'm concerned about backward compatibility.  Consider some application
> > > > >> > built on Open vSwitch using OpenFlow.  Today, it can distinguish
> > > > >> > single-tagged and double-tagged packets (that use outer Ethertype
> > > > >> > 0x8100), as follows:
> > > > >> >
> > > > >> >     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
> > > > >> >       dl_type.
> > > > >> >
> > > > >> >     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
> > > > >> >
> > > > >> > With this patch, this won't work, because neither kind of packet has a
> > > > >> > VLAN dl_type.  Instead, applications need to first match on the outer
> > > > >> > VLAN, then pop it off, then match on the inner VLAN.  This difference
> > > > >> > could lead to security problems in applications.  An application
> > > > >> > might, for example, want to pop an outer VLAN and forward the packet,
> > > > >> > but only if there is no inner VLAN.  If it is implemented according to
> > > > >> > the previous rules, then it will not notice the inner VLAN.
> > > > >>
> > > > >> Maybe some applications are implemented this way, but they are
> > > > >> probably wrong. OpenFlow says eth_type is "ethernet type of the
> > > > >> OpenFlow packet payload, after VLAN tags", so it is the payload
> > > > >> ethtype for a double-tagged packet. It's the same for single-tagged
> > > > >> packet: application must explicitly match vlan_tci to decide whether
> > > > >> it has VLAN tag.
> > > > >
> > > > > OpenFlow does say that, but it's inconsistent with long-standing Open
> > > > > vSwitch practice and will cause backward incompatibility and, worse,
> > > > > security problems.  If we need the official OpenFlow behavior then I
> > > > > think we'll need to add a feature switch to turn on that behavior.
> > > > 
> > > > It's a good idea to add a switch. I think QinQ can be disabled and
> > > > fallback to current behavior if the switch is off, since these legacy
> > > > applications are not written for QinQ.
> > > 
> > > OK.  I'm happy with that solution, as long as the implementation is
> > > clean.
> > 
> > Is a new flag, i.e. OVS_DP_F_8021AD, passed via
> > OVS_DP_ATTR_USER_FEATURES an appropriate way to communicate this to the
> > kernel?
> 
> Why does the kernel need to know?

When extracting the key from a double tagged frame how does the kernel
know whether the second tag should be classified as second VLAN or an
Ethertype?
Ben Pfaff Aug. 19, 2016, 8:56 p.m. UTC | #12
On Fri, Aug 19, 2016 at 04:42:18PM -0400, Eric Garver wrote:
> On Fri, Aug 19, 2016 at 01:24:10PM -0700, Ben Pfaff wrote:
> > On Fri, Aug 19, 2016 at 04:19:31PM -0400, Eric Garver wrote:
> > > On Sat, Aug 06, 2016 at 08:04:44PM -0700, Ben Pfaff wrote:
> > > > On Sun, Aug 07, 2016 at 10:54:00AM +0800, Xiao Liang wrote:
> > > > > On Thu, Aug 4, 2016 at 6:07 AM, Ben Pfaff <blp@ovn.org> wrote:
> > > > > > Thanks for the replies, I have some further responses below.
> > > > > >
> > > > > > On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang wrote:
> > > > > >> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:
> > > > > >> > I'm concerned about backward compatibility.  Consider some application
> > > > > >> > built on Open vSwitch using OpenFlow.  Today, it can distinguish
> > > > > >> > single-tagged and double-tagged packets (that use outer Ethertype
> > > > > >> > 0x8100), as follows:
> > > > > >> >
> > > > > >> >     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
> > > > > >> >       dl_type.
> > > > > >> >
> > > > > >> >     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
> > > > > >> >
> > > > > >> > With this patch, this won't work, because neither kind of packet has a
> > > > > >> > VLAN dl_type.  Instead, applications need to first match on the outer
> > > > > >> > VLAN, then pop it off, then match on the inner VLAN.  This difference
> > > > > >> > could lead to security problems in applications.  An application
> > > > > >> > might, for example, want to pop an outer VLAN and forward the packet,
> > > > > >> > but only if there is no inner VLAN.  If it is implemented according to
> > > > > >> > the previous rules, then it will not notice the inner VLAN.
> > > > > >>
> > > > > >> Maybe some applications are implemented this way, but they are
> > > > > >> probably wrong. OpenFlow says eth_type is "ethernet type of the
> > > > > >> OpenFlow packet payload, after VLAN tags", so it is the payload
> > > > > >> ethtype for a double-tagged packet. It's the same for single-tagged
> > > > > >> packet: application must explicitly match vlan_tci to decide whether
> > > > > >> it has VLAN tag.
> > > > > >
> > > > > > OpenFlow does say that, but it's inconsistent with long-standing Open
> > > > > > vSwitch practice and will cause backward incompatibility and, worse,
> > > > > > security problems.  If we need the official OpenFlow behavior then I
> > > > > > think we'll need to add a feature switch to turn on that behavior.
> > > > > 
> > > > > It's a good idea to add a switch. I think QinQ can be disabled and
> > > > > fallback to current behavior if the switch is off, since these legacy
> > > > > applications are not written for QinQ.
> > > > 
> > > > OK.  I'm happy with that solution, as long as the implementation is
> > > > clean.
> > > 
> > > Is a new flag, i.e. OVS_DP_F_8021AD, passed via
> > > OVS_DP_ATTR_USER_FEATURES an appropriate way to communicate this to the
> > > kernel?
> > 
> > Why does the kernel need to know?
> 
> When extracting the key from a double tagged frame how does the kernel
> know whether the second tag should be classified as second VLAN or an
> Ethertype?

The kernel should always classify it as a second VLAN.
datapath/README.md explains this in detail, under "Basic rule for
evolving flow keys".
Mooney, Sean K Aug. 20, 2016, 1:41 a.m. UTC | #13
> -----Original Message-----

> From: dev [mailto:dev-bounces@openvswitch.org] On Behalf Of Ben Pfaff

> Sent: Friday, August 19, 2016 9:57 PM

> To: Xiao Liang <shaw.leon@gmail.com>; dev@openvswitch.org

> Subject: Re: [ovs-dev] [PATCH v4 1/4] Add support for 802.1ad (QinQ tunneling)

> 

> On Fri, Aug 19, 2016 at 04:42:18PM -0400, Eric Garver wrote:

> > On Fri, Aug 19, 2016 at 01:24:10PM -0700, Ben Pfaff wrote:

> > > On Fri, Aug 19, 2016 at 04:19:31PM -0400, Eric Garver wrote:

> > > > On Sat, Aug 06, 2016 at 08:04:44PM -0700, Ben Pfaff wrote:

> > > > > On Sun, Aug 07, 2016 at 10:54:00AM +0800, Xiao Liang wrote:

> > > > > > On Thu, Aug 4, 2016 at 6:07 AM, Ben Pfaff <blp@ovn.org> wrote:

> > > > > > > Thanks for the replies, I have some further responses below.

> > > > > > >

> > > > > > > On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang wrote:

> > > > > > >> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:

> > > > > > >> > I'm concerned about backward compatibility.  Consider

> > > > > > >> > some application built on Open vSwitch using OpenFlow.

> > > > > > >> > Today, it can distinguish single-tagged and double-tagged

> > > > > > >> > packets (that use outer Ethertype 0x8100), as follows:

> > > > > > >> >

> > > > > > >> >     - A single-tagged packet has vlan_tci != 0 and some non-VLAN

> > > > > > >> >       dl_type.

> > > > > > >> >

> > > > > > >> >     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.

> > > > > > >> >

> > > > > > >> > With this patch, this won't work, because neither kind of

> > > > > > >> > packet has a VLAN dl_type.  Instead, applications need to

> > > > > > >> > first match on the outer VLAN, then pop it off, then

> > > > > > >> > match on the inner VLAN.  This difference could lead to

> > > > > > >> > security problems in applications.  An application might,

> > > > > > >> > for example, want to pop an outer VLAN and forward the

> > > > > > >> > packet, but only if there is no inner VLAN.  If it is implemented

> according to the previous rules, then it will not notice the inner VLAN.

> > > > > > >>

> > > > > > >> Maybe some applications are implemented this way, but they

> > > > > > >> are probably wrong. OpenFlow says eth_type is "ethernet

> > > > > > >> type of the OpenFlow packet payload, after VLAN tags", so

> > > > > > >> it is the payload ethtype for a double-tagged packet. It's

> > > > > > >> the same for single-tagged

> > > > > > >> packet: application must explicitly match vlan_tci to

> > > > > > >> decide whether it has VLAN tag.

> > > > > > >

> > > > > > > OpenFlow does say that, but it's inconsistent with

> > > > > > > long-standing Open vSwitch practice and will cause backward

> > > > > > > incompatibility and, worse, security problems.  If we need

> > > > > > > the official OpenFlow behavior then I think we'll need to add a feature

> switch to turn on that behavior.

> > > > > >

> > > > > > It's a good idea to add a switch. I think QinQ can be disabled

> > > > > > and fallback to current behavior if the switch is off, since

> > > > > > these legacy applications are not written for QinQ.

> > > > >

> > > > > OK.  I'm happy with that solution, as long as the implementation

> > > > > is clean.

> > > >

> > > > Is a new flag, i.e. OVS_DP_F_8021AD, passed via

[Mooney, Sean K]  am I correct in assuming that this new flag will be set in the ovsdb
Either on the bridge or prereably globally in the other_config section fo the Open_vSwitch table.
Both will allow easy detection of the feature form openstack so we can detect if qinq can be used.
The openstack ovs neutron agent currently uses vlans for tenant isolatation so this would enable
Vlan transparency when qinq is available.
> > > > OVS_DP_ATTR_USER_FEATURES an appropriate way to communicate this

> > > > to the kernel?

> > >

> > > Why does the kernel need to know?

> >

> > When extracting the key from a double tagged frame how does the kernel

> > know whether the second tag should be classified as second VLAN or an

> > Ethertype?

> 

> The kernel should always classify it as a second VLAN.

> datapath/README.md explains this in detail, under "Basic rule for evolving flow

> keys".

[Mooney, Sean K] out of interest would it be possible to support 3 laryers of vlans and still be able to match on the inner packet headers.
As I said about neutron currently uses on level of vlans for tenant isolation
Having qinq would allow the tenant to send one level of vlan and neutron to use one
Layer of vlans for isolation. Support 3 layares fo vlans in ovs would allow the guest to use qinq 
Neutron to use vlans for isolation at the same time. 
> _______________________________________________

> dev mailing list

> dev@openvswitch.org

> http://openvswitch.org/mailman/listinfo/dev
Eric Garver Aug. 20, 2016, 2:02 p.m. UTC | #14
On Sat, Aug 20, 2016 at 01:41:26AM +0000, Mooney, Sean K wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@openvswitch.org] On Behalf Of Ben Pfaff
> > Sent: Friday, August 19, 2016 9:57 PM
> > To: Xiao Liang <shaw.leon@gmail.com>; dev@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH v4 1/4] Add support for 802.1ad (QinQ tunneling)
> > 
> > On Fri, Aug 19, 2016 at 04:42:18PM -0400, Eric Garver wrote:
> > > On Fri, Aug 19, 2016 at 01:24:10PM -0700, Ben Pfaff wrote:
> > > > On Fri, Aug 19, 2016 at 04:19:31PM -0400, Eric Garver wrote:
> > > > > On Sat, Aug 06, 2016 at 08:04:44PM -0700, Ben Pfaff wrote:
> > > > > > On Sun, Aug 07, 2016 at 10:54:00AM +0800, Xiao Liang wrote:
> > > > > > > On Thu, Aug 4, 2016 at 6:07 AM, Ben Pfaff <blp@ovn.org> wrote:
> > > > > > > > Thanks for the replies, I have some further responses below.
> > > > > > > >
> > > > > > > > On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang wrote:
> > > > > > > >> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:
> > > > > > > >> > I'm concerned about backward compatibility.  Consider
> > > > > > > >> > some application built on Open vSwitch using OpenFlow.
> > > > > > > >> > Today, it can distinguish single-tagged and double-tagged
> > > > > > > >> > packets (that use outer Ethertype 0x8100), as follows:
> > > > > > > >> >
> > > > > > > >> >     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
> > > > > > > >> >       dl_type.
> > > > > > > >> >
> > > > > > > >> >     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
> > > > > > > >> >
> > > > > > > >> > With this patch, this won't work, because neither kind of
> > > > > > > >> > packet has a VLAN dl_type.  Instead, applications need to
> > > > > > > >> > first match on the outer VLAN, then pop it off, then
> > > > > > > >> > match on the inner VLAN.  This difference could lead to
> > > > > > > >> > security problems in applications.  An application might,
> > > > > > > >> > for example, want to pop an outer VLAN and forward the
> > > > > > > >> > packet, but only if there is no inner VLAN.  If it is implemented
> > according to the previous rules, then it will not notice the inner VLAN.
> > > > > > > >>
> > > > > > > >> Maybe some applications are implemented this way, but they
> > > > > > > >> are probably wrong. OpenFlow says eth_type is "ethernet
> > > > > > > >> type of the OpenFlow packet payload, after VLAN tags", so
> > > > > > > >> it is the payload ethtype for a double-tagged packet. It's
> > > > > > > >> the same for single-tagged
> > > > > > > >> packet: application must explicitly match vlan_tci to
> > > > > > > >> decide whether it has VLAN tag.
> > > > > > > >
> > > > > > > > OpenFlow does say that, but it's inconsistent with
> > > > > > > > long-standing Open vSwitch practice and will cause backward
> > > > > > > > incompatibility and, worse, security problems.  If we need
> > > > > > > > the official OpenFlow behavior then I think we'll need to add a feature
> > switch to turn on that behavior.
> > > > > > >
> > > > > > > It's a good idea to add a switch. I think QinQ can be disabled
> > > > > > > and fallback to current behavior if the switch is off, since
> > > > > > > these legacy applications are not written for QinQ.
> > > > > >
> > > > > > OK.  I'm happy with that solution, as long as the implementation
> > > > > > is clean.
> > > > >
> > > > > Is a new flag, i.e. OVS_DP_F_8021AD, passed via
> [Mooney, Sean K]  am I correct in assuming that this new flag will be set in the ovsdb
> Either on the bridge or prereably globally in the other_config section fo the Open_vSwitch table.
> Both will allow easy detection of the feature form openstack so we can detect if qinq can be used.
> The openstack ovs neutron agent currently uses vlans for tenant isolatation so this would enable
> Vlan transparency when qinq is available.

No flag is needed. It was a miss understanding on my part.

> > > > > OVS_DP_ATTR_USER_FEATURES an appropriate way to communicate this
> > > > > to the kernel?
> > > >
> > > > Why does the kernel need to know?
> > >
> > > When extracting the key from a double tagged frame how does the kernel
> > > know whether the second tag should be classified as second VLAN or an
> > > Ethertype?
> > 
> > The kernel should always classify it as a second VLAN.
> > datapath/README.md explains this in detail, under "Basic rule for evolving flow
> > keys".

Thanks Ben. I understand now.

> [Mooney, Sean K] out of interest would it be possible to support 3 laryers of vlans and still be able to match on the inner packet headers.
> As I said about neutron currently uses on level of vlans for tenant isolation
> Having qinq would allow the tenant to send one level of vlan and neutron to use one
> Layer of vlans for isolation. Support 3 layares fo vlans in ovs would allow the guest to use qinq 
> Neutron to use vlans for isolation at the same time. 

Not with this series. It only supports two VLAN tags.
Xiao Liang Sept. 9, 2016, 4:26 a.m. UTC | #15
On Sat, Aug 20, 2016 at 9:41 AM, Mooney, Sean K <sean.k.mooney@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@openvswitch.org] On Behalf Of Ben Pfaff
>> Sent: Friday, August 19, 2016 9:57 PM
>> To: Xiao Liang <shaw.leon@gmail.com>; dev@openvswitch.org
>> Subject: Re: [ovs-dev] [PATCH v4 1/4] Add support for 802.1ad (QinQ tunneling)
>>
>> On Fri, Aug 19, 2016 at 04:42:18PM -0400, Eric Garver wrote:
>> > On Fri, Aug 19, 2016 at 01:24:10PM -0700, Ben Pfaff wrote:
>> > > On Fri, Aug 19, 2016 at 04:19:31PM -0400, Eric Garver wrote:
>> > > > On Sat, Aug 06, 2016 at 08:04:44PM -0700, Ben Pfaff wrote:
>> > > > > On Sun, Aug 07, 2016 at 10:54:00AM +0800, Xiao Liang wrote:
>> > > > > > On Thu, Aug 4, 2016 at 6:07 AM, Ben Pfaff <blp@ovn.org> wrote:
>> > > > > > > Thanks for the replies, I have some further responses below.
>> > > > > > >
>> > > > > > > On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang wrote:
>> > > > > > >> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp@ovn.org> wrote:
>> > > > > > >> > I'm concerned about backward compatibility.  Consider
>> > > > > > >> > some application built on Open vSwitch using OpenFlow.
>> > > > > > >> > Today, it can distinguish single-tagged and double-tagged
>> > > > > > >> > packets (that use outer Ethertype 0x8100), as follows:
>> > > > > > >> >
>> > > > > > >> >     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
>> > > > > > >> >       dl_type.
>> > > > > > >> >
>> > > > > > >> >     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
>> > > > > > >> >
>> > > > > > >> > With this patch, this won't work, because neither kind of
>> > > > > > >> > packet has a VLAN dl_type.  Instead, applications need to
>> > > > > > >> > first match on the outer VLAN, then pop it off, then
>> > > > > > >> > match on the inner VLAN.  This difference could lead to
>> > > > > > >> > security problems in applications.  An application might,
>> > > > > > >> > for example, want to pop an outer VLAN and forward the
>> > > > > > >> > packet, but only if there is no inner VLAN.  If it is implemented
>> according to the previous rules, then it will not notice the inner VLAN.
>> > > > > > >>
>> > > > > > >> Maybe some applications are implemented this way, but they
>> > > > > > >> are probably wrong. OpenFlow says eth_type is "ethernet
>> > > > > > >> type of the OpenFlow packet payload, after VLAN tags", so
>> > > > > > >> it is the payload ethtype for a double-tagged packet. It's
>> > > > > > >> the same for single-tagged
>> > > > > > >> packet: application must explicitly match vlan_tci to
>> > > > > > >> decide whether it has VLAN tag.
>> > > > > > >
>> > > > > > > OpenFlow does say that, but it's inconsistent with
>> > > > > > > long-standing Open vSwitch practice and will cause backward
>> > > > > > > incompatibility and, worse, security problems.  If we need
>> > > > > > > the official OpenFlow behavior then I think we'll need to add a feature
>> switch to turn on that behavior.
>> > > > > >
>> > > > > > It's a good idea to add a switch. I think QinQ can be disabled
>> > > > > > and fallback to current behavior if the switch is off, since
>> > > > > > these legacy applications are not written for QinQ.
>> > > > >
>> > > > > OK.  I'm happy with that solution, as long as the implementation
>> > > > > is clean.
>> > > >
>> > > > Is a new flag, i.e. OVS_DP_F_8021AD, passed via
> [Mooney, Sean K]  am I correct in assuming that this new flag will be set in the ovsdb
> Either on the bridge or prereably globally in the other_config section fo the Open_vSwitch table.
> Both will allow easy detection of the feature form openstack so we can detect if qinq can be used.
> The openstack ovs neutron agent currently uses vlans for tenant isolatation so this would enable
> Vlan transparency when qinq is available.
>> > > > OVS_DP_ATTR_USER_FEATURES an appropriate way to communicate this
>> > > > to the kernel?
>> > >
>> > > Why does the kernel need to know?
>> >
>> > When extracting the key from a double tagged frame how does the kernel
>> > know whether the second tag should be classified as second VLAN or an
>> > Ethertype?
>>
>> The kernel should always classify it as a second VLAN.
>> datapath/README.md explains this in detail, under "Basic rule for evolving flow
>> keys".
> [Mooney, Sean K] out of interest would it be possible to support 3 laryers of vlans and still be able to match on the inner packet headers.
> As I said about neutron currently uses on level of vlans for tenant isolation
> Having qinq would allow the tenant to send one level of vlan and neutron to use one
> Layer of vlans for isolation. Support 3 layares fo vlans in ovs would allow the guest to use qinq
> Neutron to use vlans for isolation at the same time.

I believe OVS doesn't have to be aware of the third VLAN in this case.
It can be treated as payload.

Thanks,
Xiao
Mooney, Sean K Sept. 9, 2016, 10:18 a.m. UTC | #16
> -----Original Message-----

> From: Xiao Liang [mailto:shaw.leon@gmail.com]

> Sent: Friday, September 9, 2016 5:27 AM

> To: Mooney, Sean K <sean.k.mooney@intel.com>

> Cc: Ben Pfaff <blp@ovn.org>; dev@openvswitch.org

> Subject: Re: [ovs-dev] [PATCH v4 1/4] Add support for 802.1ad (QinQ

> tunneling)

> 

> On Sat, Aug 20, 2016 at 9:41 AM, Mooney, Sean K

> <sean.k.mooney@intel.com> wrote:

> >

> >

> >> -----Original Message-----

> >> From: dev [mailto:dev-bounces@openvswitch.org] On Behalf Of Ben

> Pfaff

> >> Sent: Friday, August 19, 2016 9:57 PM

> >> To: Xiao Liang <shaw.leon@gmail.com>; dev@openvswitch.org

> >> Subject: Re: [ovs-dev] [PATCH v4 1/4] Add support for 802.1ad (QinQ

> >> tunneling)

> >>

> >> On Fri, Aug 19, 2016 at 04:42:18PM -0400, Eric Garver wrote:

> >> > On Fri, Aug 19, 2016 at 01:24:10PM -0700, Ben Pfaff wrote:

> >> > > On Fri, Aug 19, 2016 at 04:19:31PM -0400, Eric Garver wrote:

> >> > > > On Sat, Aug 06, 2016 at 08:04:44PM -0700, Ben Pfaff wrote:

> >> > > > > On Sun, Aug 07, 2016 at 10:54:00AM +0800, Xiao Liang wrote:

> >> > > > > > On Thu, Aug 4, 2016 at 6:07 AM, Ben Pfaff <blp@ovn.org>

> wrote:

> >> > > > > > > Thanks for the replies, I have some further responses

> below.

> >> > > > > > >

> >> > > > > > > On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang

> wrote:

> >> > > > > > >> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff

> <blp@ovn.org> wrote:

> >> > > > > > >> > I'm concerned about backward compatibility.  Consider

> >> > > > > > >> > some application built on Open vSwitch using

> OpenFlow.

> >> > > > > > >> > Today, it can distinguish single-tagged and

> >> > > > > > >> > double-tagged packets (that use outer Ethertype

> 0x8100), as follows:

> >> > > > > > >> >

> >> > > > > > >> >     - A single-tagged packet has vlan_tci != 0 and

> some non-VLAN

> >> > > > > > >> >       dl_type.

> >> > > > > > >> >

> >> > > > > > >> >     - A double-tagged packet has vlan_tci != 0 and a

> VLAN dl_type.

> >> > > > > > >> >

> >> > > > > > >> > With this patch, this won't work, because neither

> kind

> >> > > > > > >> > of packet has a VLAN dl_type.  Instead, applications

> >> > > > > > >> > need to first match on the outer VLAN, then pop it

> >> > > > > > >> > off, then match on the inner VLAN.  This difference

> >> > > > > > >> > could lead to security problems in applications.  An

> >> > > > > > >> > application might, for example, want to pop an outer

> >> > > > > > >> > VLAN and forward the packet, but only if there is no

> >> > > > > > >> > inner VLAN.  If it is implemented

> >> according to the previous rules, then it will not notice the inner

> VLAN.

> >> > > > > > >>

> >> > > > > > >> Maybe some applications are implemented this way, but

> >> > > > > > >> they are probably wrong. OpenFlow says eth_type is

> >> > > > > > >> "ethernet type of the OpenFlow packet payload, after

> >> > > > > > >> VLAN tags", so it is the payload ethtype for a

> >> > > > > > >> double-tagged packet. It's the same for single-tagged

> >> > > > > > >> packet: application must explicitly match vlan_tci to

> >> > > > > > >> decide whether it has VLAN tag.

> >> > > > > > >

> >> > > > > > > OpenFlow does say that, but it's inconsistent with

> >> > > > > > > long-standing Open vSwitch practice and will cause

> >> > > > > > > backward incompatibility and, worse, security problems.

> >> > > > > > > If we need the official OpenFlow behavior then I think

> >> > > > > > > we'll need to add a feature

> >> switch to turn on that behavior.

> >> > > > > >

> >> > > > > > It's a good idea to add a switch. I think QinQ can be

> >> > > > > > disabled and fallback to current behavior if the switch is

> >> > > > > > off, since these legacy applications are not written for

> QinQ.

> >> > > > >

> >> > > > > OK.  I'm happy with that solution, as long as the

> >> > > > > implementation is clean.

> >> > > >

> >> > > > Is a new flag, i.e. OVS_DP_F_8021AD, passed via

> > [Mooney, Sean K]  am I correct in assuming that this new flag will be

> > set in the ovsdb Either on the bridge or prereably globally in the

> other_config section fo the Open_vSwitch table.

> > Both will allow easy detection of the feature form openstack so we

> can detect if qinq can be used.

> > The openstack ovs neutron agent currently uses vlans for tenant

> > isolatation so this would enable Vlan transparency when qinq is

> available.

> >> > > > OVS_DP_ATTR_USER_FEATURES an appropriate way to communicate

> >> > > > this to the kernel?

> >> > >

> >> > > Why does the kernel need to know?

> >> >

> >> > When extracting the key from a double tagged frame how does the

> >> > kernel know whether the second tag should be classified as second

> >> > VLAN or an Ethertype?

> >>

> >> The kernel should always classify it as a second VLAN.

> >> datapath/README.md explains this in detail, under "Basic rule for

> >> evolving flow keys".

> > [Mooney, Sean K] out of interest would it be possible to support 3

> laryers of vlans and still be able to match on the inner packet

> headers.

> > As I said about neutron currently uses on level of vlans for tenant

> > isolation Having qinq would allow the tenant to send one level of

> vlan

> > and neutron to use one Layer of vlans for isolation. Support 3

> layares

> > fo vlans in ovs would allow the guest to use qinq Neutron to use

> vlans for isolation at the same time.

> 

> I believe OVS doesn't have to be aware of the third VLAN in this case.

> It can be treated as payload.

[Mooney, Sean K] correct ovs would not have to be aware of the third vlan in this case but
It would have to be able to look past it and read the l2 heardse of the packet for the normal action to work.
> 

> Thanks,

> Xiao
Xiao Liang Sept. 9, 2016, 3:06 p.m. UTC | #17
On Fri, Sep 9, 2016 at 6:18 PM, Mooney, Sean K <sean.k.mooney@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: Xiao Liang [mailto:shaw.leon@gmail.com]
>> Sent: Friday, September 9, 2016 5:27 AM
>> To: Mooney, Sean K <sean.k.mooney@intel.com>
>> Cc: Ben Pfaff <blp@ovn.org>; dev@openvswitch.org
>> Subject: Re: [ovs-dev] [PATCH v4 1/4] Add support for 802.1ad (QinQ
>> tunneling)
>>
>> On Sat, Aug 20, 2016 at 9:41 AM, Mooney, Sean K
>> <sean.k.mooney@intel.com> wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: dev [mailto:dev-bounces@openvswitch.org] On Behalf Of Ben
>> Pfaff
>> >> Sent: Friday, August 19, 2016 9:57 PM
>> >> To: Xiao Liang <shaw.leon@gmail.com>; dev@openvswitch.org
>> >> Subject: Re: [ovs-dev] [PATCH v4 1/4] Add support for 802.1ad (QinQ
>> >> tunneling)
>> >>
>> >> On Fri, Aug 19, 2016 at 04:42:18PM -0400, Eric Garver wrote:
>> >> > On Fri, Aug 19, 2016 at 01:24:10PM -0700, Ben Pfaff wrote:
>> >> > > On Fri, Aug 19, 2016 at 04:19:31PM -0400, Eric Garver wrote:
>> >> > > > On Sat, Aug 06, 2016 at 08:04:44PM -0700, Ben Pfaff wrote:
>> >> > > > > On Sun, Aug 07, 2016 at 10:54:00AM +0800, Xiao Liang wrote:
>> >> > > > > > On Thu, Aug 4, 2016 at 6:07 AM, Ben Pfaff <blp@ovn.org>
>> wrote:
>> >> > > > > > > Thanks for the replies, I have some further responses
>> below.
>> >> > > > > > >
>> >> > > > > > > On Sun, Jul 31, 2016 at 08:22:47AM +0800, Xiao Liang
>> wrote:
>> >> > > > > > >> On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff
>> <blp@ovn.org> wrote:
>> >> > > > > > >> > I'm concerned about backward compatibility.  Consider
>> >> > > > > > >> > some application built on Open vSwitch using
>> OpenFlow.
>> >> > > > > > >> > Today, it can distinguish single-tagged and
>> >> > > > > > >> > double-tagged packets (that use outer Ethertype
>> 0x8100), as follows:
>> >> > > > > > >> >
>> >> > > > > > >> >     - A single-tagged packet has vlan_tci != 0 and
>> some non-VLAN
>> >> > > > > > >> >       dl_type.
>> >> > > > > > >> >
>> >> > > > > > >> >     - A double-tagged packet has vlan_tci != 0 and a
>> VLAN dl_type.
>> >> > > > > > >> >
>> >> > > > > > >> > With this patch, this won't work, because neither
>> kind
>> >> > > > > > >> > of packet has a VLAN dl_type.  Instead, applications
>> >> > > > > > >> > need to first match on the outer VLAN, then pop it
>> >> > > > > > >> > off, then match on the inner VLAN.  This difference
>> >> > > > > > >> > could lead to security problems in applications.  An
>> >> > > > > > >> > application might, for example, want to pop an outer
>> >> > > > > > >> > VLAN and forward the packet, but only if there is no
>> >> > > > > > >> > inner VLAN.  If it is implemented
>> >> according to the previous rules, then it will not notice the inner
>> VLAN.
>> >> > > > > > >>
>> >> > > > > > >> Maybe some applications are implemented this way, but
>> >> > > > > > >> they are probably wrong. OpenFlow says eth_type is
>> >> > > > > > >> "ethernet type of the OpenFlow packet payload, after
>> >> > > > > > >> VLAN tags", so it is the payload ethtype for a
>> >> > > > > > >> double-tagged packet. It's the same for single-tagged
>> >> > > > > > >> packet: application must explicitly match vlan_tci to
>> >> > > > > > >> decide whether it has VLAN tag.
>> >> > > > > > >
>> >> > > > > > > OpenFlow does say that, but it's inconsistent with
>> >> > > > > > > long-standing Open vSwitch practice and will cause
>> >> > > > > > > backward incompatibility and, worse, security problems.
>> >> > > > > > > If we need the official OpenFlow behavior then I think
>> >> > > > > > > we'll need to add a feature
>> >> switch to turn on that behavior.
>> >> > > > > >
>> >> > > > > > It's a good idea to add a switch. I think QinQ can be
>> >> > > > > > disabled and fallback to current behavior if the switch is
>> >> > > > > > off, since these legacy applications are not written for
>> QinQ.
>> >> > > > >
>> >> > > > > OK.  I'm happy with that solution, as long as the
>> >> > > > > implementation is clean.
>> >> > > >
>> >> > > > Is a new flag, i.e. OVS_DP_F_8021AD, passed via
>> > [Mooney, Sean K]  am I correct in assuming that this new flag will be
>> > set in the ovsdb Either on the bridge or prereably globally in the
>> other_config section fo the Open_vSwitch table.
>> > Both will allow easy detection of the feature form openstack so we
>> can detect if qinq can be used.
>> > The openstack ovs neutron agent currently uses vlans for tenant
>> > isolatation so this would enable Vlan transparency when qinq is
>> available.
>> >> > > > OVS_DP_ATTR_USER_FEATURES an appropriate way to communicate
>> >> > > > this to the kernel?
>> >> > >
>> >> > > Why does the kernel need to know?
>> >> >
>> >> > When extracting the key from a double tagged frame how does the
>> >> > kernel know whether the second tag should be classified as second
>> >> > VLAN or an Ethertype?
>> >>
>> >> The kernel should always classify it as a second VLAN.
>> >> datapath/README.md explains this in detail, under "Basic rule for
>> >> evolving flow keys".
>> > [Mooney, Sean K] out of interest would it be possible to support 3
>> laryers of vlans and still be able to match on the inner packet
>> headers.
>> > As I said about neutron currently uses on level of vlans for tenant
>> > isolation Having qinq would allow the tenant to send one level of
>> vlan
>> > and neutron to use one Layer of vlans for isolation. Support 3
>> layares
>> > fo vlans in ovs would allow the guest to use qinq Neutron to use
>> vlans for isolation at the same time.
>>
>> I believe OVS doesn't have to be aware of the third VLAN in this case.
>> It can be treated as payload.
> [Mooney, Sean K] correct ovs would not have to be aware of the third vlan in this case but
> It would have to be able to look past it and read the l2 heardse of the packet for the normal action to work.

I think MAC addresses and top two VLANs should be enough for normal action.
diff mbox

Patch

diff --git a/include/openvswitch/flow.h b/include/openvswitch/flow.h
index 03d406b..0677a37 100644
--- a/include/openvswitch/flow.h
+++ b/include/openvswitch/flow.h
@@ -23,7 +23,7 @@ 
 /* This sequence number should be incremented whenever anything involving flows
  * or the wildcarding of flows changes.  This will cause build assertion
  * failures in places which likely need to be updated. */
-#define FLOW_WC_SEQ 35
+#define FLOW_WC_SEQ 36
 
 /* Number of Open vSwitch extension 32-bit registers. */
 #define FLOW_N_REGS 8
@@ -55,6 +55,10 @@  const char *flow_tun_flag_to_string(uint32_t flags);
 /* Maximum number of supported MPLS labels. */
 #define FLOW_MAX_MPLS_LABELS 3
 
+/* Maximum number of supported VLAN headers.
+ * Multiple of 2 to satisfy 64-bit alignment */
+#define FLOW_MAX_VLAN_HEADERS 2
+
 /*
  * A flow in the network.
  *
@@ -97,7 +101,8 @@  struct flow {
     struct eth_addr dl_dst;     /* Ethernet destination address. */
     struct eth_addr dl_src;     /* Ethernet source address. */
     ovs_be16 dl_type;           /* Ethernet frame type. */
-    ovs_be16 vlan_tci;          /* If 802.1Q, TCI | VLAN_CFI; otherwise 0. */
+    uint8_t pad2[2];            /* Pad to 64 bits. */
+    struct flow_vlan_hdr vlan[FLOW_MAX_VLAN_HEADERS];     /* VLAN headers */
     ovs_be32 mpls_lse[ROUND_UP(FLOW_MAX_MPLS_LABELS, 2)]; /* MPLS label stack
                                                              (with padding). */
     /* L3 (64-bit aligned) */
@@ -129,8 +134,8 @@  BUILD_ASSERT_DECL(sizeof(struct flow_tnl) % sizeof(uint64_t) == 0);
 
 /* Remember to update FLOW_WC_SEQ when changing 'struct flow'. */
 BUILD_ASSERT_DECL(offsetof(struct flow, igmp_group_ip4) + sizeof(uint32_t)
-                  == sizeof(struct flow_tnl) + 216
-                  && FLOW_WC_SEQ == 35);
+                  == sizeof(struct flow_tnl) + 224
+                  && FLOW_WC_SEQ == 36);
 
 /* Incremental points at which flow classification may be performed in
  * segments.
diff --git a/include/openvswitch/ofp-actions.h b/include/openvswitch/ofp-actions.h
index 0b8ccbb..3f6c3b4 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -66,7 +66,7 @@ 
     OFPACT(SET_VLAN_VID,    ofpact_vlan_vid,    ofpact, "set_vlan_vid") \
     OFPACT(SET_VLAN_PCP,    ofpact_vlan_pcp,    ofpact, "set_vlan_pcp") \
     OFPACT(STRIP_VLAN,      ofpact_null,        ofpact, "strip_vlan")   \
-    OFPACT(PUSH_VLAN,       ofpact_null,        ofpact, "push_vlan")    \
+    OFPACT(PUSH_VLAN,       ofpact_push_vlan,   ofpact, "push_vlan")    \
     OFPACT(SET_ETH_SRC,     ofpact_mac,         ofpact, "mod_dl_src")   \
     OFPACT(SET_ETH_DST,     ofpact_mac,         ofpact, "mod_dl_dst")   \
     OFPACT(SET_IPV4_SRC,    ofpact_ipv4,        ofpact, "mod_nw_src")   \
@@ -373,6 +373,14 @@  struct ofpact_vlan_pcp {
     bool flow_has_vlan;         /* VLAN present at action validation time? */
 };
 
+/* OFPACT_PUSH_VLAN.
+ *
+ * Used for OFPAT11_PUSH_VLAN. */
+struct ofpact_push_vlan {
+    struct ofpact ofpact;
+    ovs_be16 ethertype;
+};
+
 /* OFPACT_SET_ETH_SRC, OFPACT_SET_ETH_DST.
  *
  * Used for OFPAT10_SET_DL_SRC, OFPAT10_SET_DL_DST. */
diff --git a/include/openvswitch/packets.h b/include/openvswitch/packets.h
index 5d97309..e688321 100644
--- a/include/openvswitch/packets.h
+++ b/include/openvswitch/packets.h
@@ -61,4 +61,9 @@  union flow_in_port {
     ofp_port_t ofp_port;
 };
 
+struct flow_vlan_hdr {
+    ovs_be16 tpid;  /* ETH_TYPE_VLAN_DOT1Q or ETH_TYPE_DOT1AD */
+    ovs_be16 tci;
+};
+
 #endif /* packets.h */
diff --git a/lib/dpctl.c b/lib/dpctl.c
index b870e30..e8dd907 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -1492,6 +1492,7 @@  dpctl_normalize_actions(int argc, const char *argv[],
     struct ds s;
     int left;
     int i, error;
+    int encaps = 0;
 
     ds_init(&s);
 
@@ -1548,12 +1549,14 @@  dpctl_normalize_actions(int argc, const char *argv[],
         const struct ovs_action_push_vlan *push;
         switch(nl_attr_type(a)) {
         case OVS_ACTION_ATTR_POP_VLAN:
-            flow.vlan_tci = htons(0);
+            flow_pop_vlan(&flow);
             continue;
 
         case OVS_ACTION_ATTR_PUSH_VLAN:
+            flow_shift_vlan(&flow);
             push = nl_attr_get_unspec(a, sizeof *push);
-            flow.vlan_tci = push->vlan_tci;
+            flow.vlan[0].tpid = push->vlan_tpid;
+            flow.vlan[0].tci = push->vlan_tci;
             continue;
         }
 
@@ -1579,12 +1582,22 @@  dpctl_normalize_actions(int argc, const char *argv[],
 
         sort_output_actions(af->actions.data, af->actions.size);
 
-        if (af->flow.vlan_tci != htons(0)) {
-            dpctl_print(dpctl_p, "vlan(vid=%"PRIu16",pcp=%d): ",
-                        vlan_tci_to_vid(af->flow.vlan_tci),
-                        vlan_tci_to_pcp(af->flow.vlan_tci));
-        } else {
-            dpctl_print(dpctl_p, "no vlan: ");
+        for (encaps = 0; encaps < FLOW_MAX_VLAN_HEADERS; encaps ++) {
+            struct flow_vlan_hdr *vlan = &af->flow.vlan[encaps];
+            if (vlan->tci != htons(0)) {
+                dpctl_print(dpctl_p, "vlan(");
+                if (vlan->tpid != htons(ETH_TYPE_VLAN)) {
+                    dpctl_print(dpctl_p, "tpid=0x%04"PRIx16",", vlan->tpid);
+                }
+                dpctl_print(dpctl_p, "vid=%"PRIu16",pcp=%d): ",
+                            vlan_tci_to_vid(vlan->tci),
+                            vlan_tci_to_pcp(vlan->tci));
+            } else {
+                if (encaps == 0) {
+                    dpctl_print(dpctl_p, "no vlan: ");
+                }
+                break;
+            }
         }
 
         if (eth_type_mpls(af->flow.dl_type)) {
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e0107b7..26cfaff 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3753,6 +3753,7 @@  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet,
     struct match match;
     ovs_u128 ufid;
     int error;
+    int i;
 
     match.tun_md.valid = false;
     miniflow_expand(&key->mf, &match.flow);
@@ -3776,8 +3777,10 @@  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet,
      * VLAN.  Unless we refactor a lot of code that translates between
      * Netlink and struct flow representations, we have to do the same
      * here. */
-    if (!match.wc.masks.vlan_tci) {
-        match.wc.masks.vlan_tci = htons(0xffff);
+    for (i = 0; i < FLOW_MAX_VLAN_HEADERS; i++) {
+        if (!match.wc.masks.vlan[i].tci) {
+            match.wc.masks.vlan[i].tci = htons(0xffff);
+        }
     }
 
     /* We can't allow the packet batching in the next loop to execute
diff --git a/lib/flow.c b/lib/flow.c
index a4c1215..76e949b 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -72,8 +72,6 @@  const uint8_t flow_segment_u64s[4] = {
 
 /* miniflow_extract() assumes the following to be true to optimize the
  * extraction process. */
-ASSERT_SEQUENTIAL_SAME_WORD(dl_type, vlan_tci);
-
 ASSERT_SEQUENTIAL_SAME_WORD(nw_frag, nw_tos);
 ASSERT_SEQUENTIAL_SAME_WORD(nw_tos, nw_ttl);
 ASSERT_SEQUENTIAL_SAME_WORD(nw_ttl, nw_proto);
@@ -124,7 +122,7 @@  struct mf_ctx {
  * away.  Some GCC versions gave warnings on ALWAYS_INLINE, so these are
  * defined as macros. */
 
-#if (FLOW_WC_SEQ != 35)
+#if (FLOW_WC_SEQ != 36)
 #define MINIFLOW_ASSERT(X) ovs_assert(X)
 BUILD_MESSAGE("FLOW_WC_SEQ changed: miniflow_extract() will have runtime "
                "assertions enabled. Consider updating FLOW_WC_SEQ after "
@@ -327,26 +325,35 @@  parse_mpls(const void **datap, size_t *sizep)
     return MIN(count, FLOW_MAX_MPLS_LABELS);
 }
 
-static inline ovs_be16
-parse_vlan(const void **datap, size_t *sizep)
+static inline bool
+parse_vlan(const void **datap, size_t *sizep, struct flow_vlan_hdr *vlan_hdrs)
 {
-    const struct eth_header *eth = *datap;
-
+    int encaps;
+    const ovs_be16 *eth_type;
     struct qtag_prefix {
-        ovs_be16 eth_type;      /* ETH_TYPE_VLAN */
+        ovs_be16 eth_type;
         ovs_be16 tci;
     };
 
+    memset(vlan_hdrs, 0, sizeof(struct flow_vlan_hdr) * FLOW_MAX_VLAN_HEADERS);
     data_pull(datap, sizep, ETH_ADDR_LEN * 2);
 
-    if (eth->eth_type == htons(ETH_TYPE_VLAN)) {
+    eth_type = *datap;
+
+    for (encaps = 0;
+         eth_type_vlan(*eth_type) && encaps < FLOW_MAX_VLAN_HEADERS;
+         encaps++) {
         if (OVS_LIKELY(*sizep
                        >= sizeof(struct qtag_prefix) + sizeof(ovs_be16))) {
             const struct qtag_prefix *qp = data_pull(datap, sizep, sizeof *qp);
-            return qp->tci | htons(VLAN_CFI);
+            vlan_hdrs[encaps].tpid = qp->eth_type;
+            vlan_hdrs[encaps].tci = qp->tci | htons(VLAN_CFI);
+            eth_type = *datap;
+        } else {
+            return false;
         }
     }
-    return 0;
+    return true;
 }
 
 static inline ovs_be16
@@ -538,16 +545,20 @@  miniflow_extract(struct dp_packet *packet, struct miniflow *dst)
     if (OVS_UNLIKELY(size < sizeof(struct eth_header))) {
         goto out;
     } else {
-        ovs_be16 vlan_tci;
+        struct flow_vlan_hdr vlan[FLOW_MAX_VLAN_HEADERS];
 
         /* Link layer. */
         ASSERT_SEQUENTIAL(dl_dst, dl_src);
         miniflow_push_macs(mf, dl_dst, data);
-        /* dl_type, vlan_tci. */
-        vlan_tci = parse_vlan(&data, &size);
+        /* VLAN */
+        if (OVS_UNLIKELY(!parse_vlan(&data, &size, vlan))) {
+            goto out;
+        }
+        /* dl_type */
         dl_type = parse_ethertype(&data, &size);
         miniflow_push_be16(mf, dl_type, dl_type);
-        miniflow_push_be16(mf, vlan_tci, vlan_tci);
+        miniflow_pad_to_64(mf, dl_type);
+        miniflow_push_words_32(mf, vlan, vlan, FLOW_MAX_VLAN_HEADERS);
     }
 
     /* Parse mpls. */
@@ -842,7 +853,7 @@  flow_get_metadata(const struct flow *flow, struct match *flow_metadata)
 {
     int i;
 
-    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 35);
+    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
 
     match_init_catchall(flow_metadata);
     if (flow->tunnel.tun_id != htonll(0)) {
@@ -1248,7 +1259,7 @@  void flow_wildcards_init_for_packet(struct flow_wildcards *wc,
     memset(&wc->masks, 0x0, sizeof wc->masks);
 
     /* Update this function whenever struct flow changes. */
-    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 35);
+    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
 
     if (flow_tnl_dst_is_set(&flow->tunnel)) {
         if (flow->tunnel.flags & FLOW_TNL_F_KEY) {
@@ -1298,7 +1309,7 @@  void flow_wildcards_init_for_packet(struct flow_wildcards *wc,
     WC_MASK_FIELD(wc, dl_dst);
     WC_MASK_FIELD(wc, dl_src);
     WC_MASK_FIELD(wc, dl_type);
-    WC_MASK_FIELD(wc, vlan_tci);
+    WC_MASK_FIELD(wc, vlan);
 
     if (flow->dl_type == htons(ETH_TYPE_IP)) {
         WC_MASK_FIELD(wc, nw_src);
@@ -1365,7 +1376,7 @@  void
 flow_wc_map(const struct flow *flow, struct flowmap *map)
 {
     /* Update this function whenever struct flow changes. */
-    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 35);
+    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
 
     flowmap_init(map);
 
@@ -1391,7 +1402,7 @@  flow_wc_map(const struct flow *flow, struct flowmap *map)
     FLOWMAP_SET(map, dl_dst);
     FLOWMAP_SET(map, dl_src);
     FLOWMAP_SET(map, dl_type);
-    FLOWMAP_SET(map, vlan_tci);
+    FLOWMAP_SET(map, vlan);
     FLOWMAP_SET(map, ct_state);
     FLOWMAP_SET(map, ct_zone);
     FLOWMAP_SET(map, ct_mark);
@@ -1449,7 +1460,7 @@  void
 flow_wildcards_clear_non_packet_fields(struct flow_wildcards *wc)
 {
     /* Update this function whenever struct flow changes. */
-    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 35);
+    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
 
     memset(&wc->masks.metadata, 0, sizeof wc->masks.metadata);
     memset(&wc->masks.regs, 0, sizeof wc->masks.regs);
@@ -1584,7 +1595,7 @@  flow_wildcards_set_xreg_mask(struct flow_wildcards *wc, int idx, uint64_t mask)
 uint32_t
 miniflow_hash_5tuple(const struct miniflow *flow, uint32_t basis)
 {
-    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 35);
+    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
     uint32_t hash = basis;
 
     if (flow) {
@@ -1631,7 +1642,7 @@  ASSERT_SEQUENTIAL(ipv6_src, ipv6_dst);
 uint32_t
 flow_hash_5tuple(const struct flow *flow, uint32_t basis)
 {
-    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 35);
+    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
     uint32_t hash = basis;
 
     if (flow) {
@@ -1690,7 +1701,9 @@  flow_hash_symmetric_l4(const struct flow *flow, uint32_t basis)
     for (i = 0; i < ARRAY_SIZE(fields.eth_addr.be16); i++) {
         fields.eth_addr.be16[i] = flow->dl_src.be16[i] ^ flow->dl_dst.be16[i];
     }
-    fields.vlan_tci = flow->vlan_tci & htons(VLAN_VID_MASK);
+    for (i = 0; i < FLOW_MAX_VLAN_HEADERS; i++) {
+        fields.vlan_tci ^= flow->vlan[i].tci & htons(VLAN_VID_MASK);
+    }
     fields.eth_type = flow->dl_type;
 
     /* UDP source and destination port are not taken into account because they
@@ -1756,6 +1769,7 @@  void
 flow_random_hash_fields(struct flow *flow)
 {
     uint16_t rnd = random_uint16();
+    int i;
 
     /* Initialize to all zeros. */
     memset(flow, 0, sizeof *flow);
@@ -1763,7 +1777,10 @@  flow_random_hash_fields(struct flow *flow)
     eth_addr_random(&flow->dl_src);
     eth_addr_random(&flow->dl_dst);
 
-    flow->vlan_tci = (OVS_FORCE ovs_be16) (random_uint16() & VLAN_VID_MASK);
+    for (i = 0; i < FLOW_MAX_VLAN_HEADERS; i++) {
+        flow->vlan[i].tci =
+            (OVS_FORCE ovs_be16) (random_uint16() & VLAN_VID_MASK);
+    }
 
     /* Make most of the random flows IPv4, some IPv6, and rest random. */
     flow->dl_type = rnd < 0x8000 ? htons(ETH_TYPE_IP) :
@@ -1796,6 +1813,7 @@  void
 flow_mask_hash_fields(const struct flow *flow, struct flow_wildcards *wc,
                       enum nx_hash_fields fields)
 {
+    int i;
     switch (fields) {
     case NX_HASH_FIELDS_ETH_SRC:
         memset(&wc->masks.dl_src, 0xff, sizeof wc->masks.dl_src);
@@ -1815,7 +1833,9 @@  flow_mask_hash_fields(const struct flow *flow, struct flow_wildcards *wc,
             memset(&wc->masks.nw_proto, 0xff, sizeof wc->masks.nw_proto);
             flow_unwildcard_tp_ports(flow, wc);
         }
-        wc->masks.vlan_tci |= htons(VLAN_VID_MASK | VLAN_CFI);
+        for (i = 0; i < FLOW_MAX_VLAN_HEADERS; i++) {
+            wc->masks.vlan[i].tci |= htons(VLAN_VID_MASK | VLAN_CFI);
+        }
         break;
 
     case NX_HASH_FIELDS_SYMMETRIC_L3L4_UDP:
@@ -1927,11 +1947,11 @@  void
 flow_set_dl_vlan(struct flow *flow, ovs_be16 vid)
 {
     if (vid == htons(OFP10_VLAN_NONE)) {
-        flow->vlan_tci = htons(0);
+        flow->vlan[0].tci = htons(0);
     } else {
         vid &= htons(VLAN_VID_MASK);
-        flow->vlan_tci &= ~htons(VLAN_VID_MASK);
-        flow->vlan_tci |= htons(VLAN_CFI) | vid;
+        flow->vlan[0].tci &= ~htons(VLAN_VID_MASK);
+        flow->vlan[0].tci |= htons(VLAN_CFI) | vid;
     }
 }
 
@@ -1942,8 +1962,8 @@  void
 flow_set_vlan_vid(struct flow *flow, ovs_be16 vid)
 {
     ovs_be16 mask = htons(VLAN_VID_MASK | VLAN_CFI);
-    flow->vlan_tci &= ~mask;
-    flow->vlan_tci |= vid & mask;
+    flow->vlan[0].tci &= ~mask;
+    flow->vlan[0].tci |= vid & mask;
 }
 
 /* Sets the VLAN PCP that 'flow' matches to 'pcp', which should be in the
@@ -1957,8 +1977,23 @@  void
 flow_set_vlan_pcp(struct flow *flow, uint8_t pcp)
 {
     pcp &= 0x07;
-    flow->vlan_tci &= ~htons(VLAN_PCP_MASK);
-    flow->vlan_tci |= htons((pcp << VLAN_PCP_SHIFT) | VLAN_CFI);
+    flow->vlan[0].tci &= ~htons(VLAN_PCP_MASK);
+    flow->vlan[0].tci |= htons((pcp << VLAN_PCP_SHIFT) | VLAN_CFI);
+}
+
+void
+flow_pop_vlan(struct flow *flow) {
+    memmove(&flow->vlan[0], &flow->vlan[1],
+            sizeof(struct flow_vlan_hdr) * (FLOW_MAX_VLAN_HEADERS - 1));
+    memset(&flow->vlan[FLOW_MAX_VLAN_HEADERS - 1], 0,
+           sizeof(struct flow_vlan_hdr));
+}
+
+void
+flow_shift_vlan(struct flow *flow) {
+    memmove(&flow->vlan[1], &flow->vlan[0],
+            sizeof(struct flow_vlan_hdr) * (FLOW_MAX_VLAN_HEADERS - 1));
+    memset(&flow->vlan[0], 0, sizeof(struct flow_vlan_hdr));
 }
 
 /* Returns the number of MPLS LSEs present in 'flow'
@@ -2098,7 +2133,7 @@  flow_push_mpls(struct flow *flow, int n, ovs_be16 mpls_eth_type,
         flow->mpls_lse[0] = set_mpls_lse_values(ttl, tc, 1, htonl(label));
 
         /* Clear all L3 and L4 fields and dp_hash. */
-        BUILD_ASSERT(FLOW_WC_SEQ == 35);
+        BUILD_ASSERT(FLOW_WC_SEQ == 36);
         memset((char *) flow + FLOW_SEGMENT_2_ENDS_AT, 0,
                sizeof(struct flow) - FLOW_SEGMENT_2_ENDS_AT);
         flow->dp_hash = 0;
@@ -2276,6 +2311,7 @@  void
 flow_compose(struct dp_packet *p, const struct flow *flow)
 {
     size_t l4_len;
+    int encaps;
 
     /* eth_compose() sets l3 pointer and makes sure it is 32-bit aligned. */
     eth_compose(p, flow->dl_dst, flow->dl_src, ntohs(flow->dl_type), 0);
@@ -2285,8 +2321,10 @@  flow_compose(struct dp_packet *p, const struct flow *flow)
         return;
     }
 
-    if (flow->vlan_tci & htons(VLAN_CFI)) {
-        eth_push_vlan(p, htons(ETH_TYPE_VLAN), flow->vlan_tci);
+    for (encaps = FLOW_MAX_VLAN_HEADERS - 1; encaps >= 0; encaps--) {
+        if (flow->vlan[encaps].tci & htons(VLAN_CFI)) {
+            eth_push_vlan(p, flow->vlan[encaps].tpid, flow->vlan[encaps].tci);
+        }
     }
 
     if (flow->dl_type == htons(ETH_TYPE_IP)) {
diff --git a/lib/flow.h b/lib/flow.h
index 5479677..4882e26 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -89,6 +89,8 @@  static inline size_t flow_hash(const struct flow *, uint32_t basis);
 void flow_set_dl_vlan(struct flow *, ovs_be16 vid);
 void flow_set_vlan_vid(struct flow *, ovs_be16 vid);
 void flow_set_vlan_pcp(struct flow *, uint8_t pcp);
+void flow_pop_vlan(struct flow*);
+void flow_shift_vlan(struct flow*);
 
 int flow_count_mpls_labels(const struct flow *, struct flow_wildcards *);
 int flow_count_common_mpls_labels(const struct flow *a, int an,
@@ -645,7 +647,7 @@  static inline uint32_t miniflow_get_u32(const struct miniflow *,
                                         unsigned int u32_ofs);
 static inline ovs_be32 miniflow_get_be32(const struct miniflow *,
                                          unsigned int be32_ofs);
-static inline uint16_t miniflow_get_vid(const struct miniflow *);
+static inline uint16_t miniflow_get_vid(const struct miniflow *, size_t);
 static inline uint16_t miniflow_get_tcp_flags(const struct miniflow *);
 static inline ovs_be64 miniflow_get_metadata(const struct miniflow *);
 
@@ -683,7 +685,7 @@  static inline uint32_t minimask_get_u32(const struct minimask *,
                                         unsigned int u32_ofs);
 static inline ovs_be32 minimask_get_be32(const struct minimask *,
                                          unsigned int be32_ofs);
-static inline uint16_t minimask_get_vid_mask(const struct minimask *);
+static inline uint16_t minimask_get_vid_mask(const struct minimask *, size_t);
 static inline ovs_be64 minimask_get_metadata_mask(const struct minimask *);
 
 bool minimask_equal(const struct minimask *a, const struct minimask *b);
@@ -730,10 +732,13 @@  static inline ovs_be32 miniflow_get_be32(const struct miniflow *flow,
 /* Returns the VID within the vlan_tci member of the "struct flow" represented
  * by 'flow'. */
 static inline uint16_t
-miniflow_get_vid(const struct miniflow *flow)
+miniflow_get_vid(const struct miniflow *flow, size_t n)
 {
-    ovs_be16 tci = MINIFLOW_GET_BE16(flow, vlan_tci);
-    return vlan_tci_to_vid(tci);
+    if (n < FLOW_MAX_VLAN_HEADERS) {
+        uint32_t u32 = MINIFLOW_GET_U32(flow, vlan[n]);
+        return vlan_tci_to_vid(((struct flow_vlan_hdr *)&u32)->tci);
+    }
+    return 0;
 }
 
 /* Returns the uint32_t that would be at byte offset '4 * u32_ofs' if 'mask'
@@ -753,9 +758,9 @@  minimask_get_be32(const struct minimask *mask, unsigned int be32_ofs)
 /* Returns the VID mask within the vlan_tci member of the "struct
  * flow_wildcards" represented by 'mask'. */
 static inline uint16_t
-minimask_get_vid_mask(const struct minimask *mask)
+minimask_get_vid_mask(const struct minimask *mask, size_t n)
 {
-    return miniflow_get_vid(&mask->masks);
+    return miniflow_get_vid(&mask->masks, n);
 }
 
 /* Returns the value of the "tcp_flags" field in 'flow'. */
diff --git a/lib/match.c b/lib/match.c
index db78831..96fb16c 100644
--- a/lib/match.c
+++ b/lib/match.c
@@ -450,8 +450,8 @@  match_set_dl_tci(struct match *match, ovs_be16 tci)
 void
 match_set_dl_tci_masked(struct match *match, ovs_be16 tci, ovs_be16 mask)
 {
-    match->flow.vlan_tci = tci & mask;
-    match->wc.masks.vlan_tci = mask;
+    match->flow.vlan[0].tci = tci & mask;
+    match->wc.masks.vlan[0].tci = mask;
 }
 
 /* Modifies 'match' so that the VLAN VID is wildcarded.  If the PCP is already
@@ -460,9 +460,9 @@  match_set_dl_tci_masked(struct match *match, ovs_be16 tci, ovs_be16 mask)
 void
 match_set_any_vid(struct match *match)
 {
-    if (match->wc.masks.vlan_tci & htons(VLAN_PCP_MASK)) {
-        match->wc.masks.vlan_tci &= ~htons(VLAN_VID_MASK);
-        match->flow.vlan_tci &= ~htons(VLAN_VID_MASK);
+    if (match->wc.masks.vlan[0].tci & htons(VLAN_PCP_MASK)) {
+        match->wc.masks.vlan[0].tci &= ~htons(VLAN_VID_MASK);
+        match->flow.vlan[0].tci &= ~htons(VLAN_VID_MASK);
     } else {
         match_set_dl_tci_masked(match, htons(0), htons(0));
     }
@@ -481,9 +481,9 @@  match_set_dl_vlan(struct match *match, ovs_be16 dl_vlan)
 {
     flow_set_dl_vlan(&match->flow, dl_vlan);
     if (dl_vlan == htons(OFP10_VLAN_NONE)) {
-        match->wc.masks.vlan_tci = OVS_BE16_MAX;
+        match->wc.masks.vlan[0].tci = OVS_BE16_MAX;
     } else {
-        match->wc.masks.vlan_tci |= htons(VLAN_VID_MASK | VLAN_CFI);
+        match->wc.masks.vlan[0].tci |= htons(VLAN_VID_MASK | VLAN_CFI);
     }
 }
 
@@ -508,7 +508,8 @@  match_set_vlan_vid_masked(struct match *match, ovs_be16 vid, ovs_be16 mask)
 
     mask &= vid_mask;
     flow_set_vlan_vid(&match->flow, vid & mask);
-    match->wc.masks.vlan_tci = mask | (match->wc.masks.vlan_tci & pcp_mask);
+    match->wc.masks.vlan[0].tci =
+        mask | (match->wc.masks.vlan[0].tci & pcp_mask);
 }
 
 /* Modifies 'match' so that the VLAN PCP is wildcarded.  If the VID is already
@@ -517,9 +518,9 @@  match_set_vlan_vid_masked(struct match *match, ovs_be16 vid, ovs_be16 mask)
 void
 match_set_any_pcp(struct match *match)
 {
-    if (match->wc.masks.vlan_tci & htons(VLAN_VID_MASK)) {
-        match->wc.masks.vlan_tci &= ~htons(VLAN_PCP_MASK);
-        match->flow.vlan_tci &= ~htons(VLAN_PCP_MASK);
+    if (match->wc.masks.vlan[0].tci & htons(VLAN_VID_MASK)) {
+        match->wc.masks.vlan[0].tci &= ~htons(VLAN_PCP_MASK);
+        match->flow.vlan[0].tci &= ~htons(VLAN_PCP_MASK);
     } else {
         match_set_dl_tci_masked(match, htons(0), htons(0));
     }
@@ -531,7 +532,7 @@  void
 match_set_dl_vlan_pcp(struct match *match, uint8_t dl_vlan_pcp)
 {
     flow_set_vlan_pcp(&match->flow, dl_vlan_pcp);
-    match->wc.masks.vlan_tci |= htons(VLAN_CFI | VLAN_PCP_MASK);
+    match->wc.masks.vlan[0].tci |= htons(VLAN_CFI | VLAN_PCP_MASK);
 }
 
 /* Modifies 'match' so that the MPLS label 'idx' matches 'lse' exactly. */
@@ -1060,7 +1061,7 @@  match_format(const struct match *match, struct ds *s, int priority)
 
     int i;
 
-    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 35);
+    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
 
     if (priority != OFP_DEFAULT_PRIORITY) {
         ds_put_format(s, "%spriority=%s%d,",
@@ -1192,30 +1193,30 @@  match_format(const struct match *match, struct ds *s, int priority)
         ofputil_format_port(f->in_port.ofp_port, s);
         ds_put_char(s, ',');
     }
-    if (wc->masks.vlan_tci) {
-        ovs_be16 vid_mask = wc->masks.vlan_tci & htons(VLAN_VID_MASK);
-        ovs_be16 pcp_mask = wc->masks.vlan_tci & htons(VLAN_PCP_MASK);
-        ovs_be16 cfi = wc->masks.vlan_tci & htons(VLAN_CFI);
+    if (wc->masks.vlan[0].tci) {
+        ovs_be16 vid_mask = wc->masks.vlan[0].tci & htons(VLAN_VID_MASK);
+        ovs_be16 pcp_mask = wc->masks.vlan[0].tci & htons(VLAN_PCP_MASK);
+        ovs_be16 cfi = wc->masks.vlan[0].tci & htons(VLAN_CFI);
 
-        if (cfi && f->vlan_tci & htons(VLAN_CFI)
+        if (cfi && f->vlan[0].tci & htons(VLAN_CFI)
             && (!vid_mask || vid_mask == htons(VLAN_VID_MASK))
             && (!pcp_mask || pcp_mask == htons(VLAN_PCP_MASK))
             && (vid_mask || pcp_mask)) {
             if (vid_mask) {
                 ds_put_format(s, "%sdl_vlan=%s%"PRIu16",", colors.param,
-                              colors.end, vlan_tci_to_vid(f->vlan_tci));
+                              colors.end, vlan_tci_to_vid(f->vlan[0].tci));
             }
             if (pcp_mask) {
                 ds_put_format(s, "%sdl_vlan_pcp=%s%d,", colors.param,
-                              colors.end, vlan_tci_to_pcp(f->vlan_tci));
+                              colors.end, vlan_tci_to_pcp(f->vlan[0].tci));
             }
-        } else if (wc->masks.vlan_tci == htons(0xffff)) {
+        } else if (wc->masks.vlan[0].tci == htons(0xffff)) {
             ds_put_format(s, "%svlan_tci=%s0x%04"PRIx16",", colors.param,
-                          colors.end, ntohs(f->vlan_tci));
+                          colors.end, ntohs(f->vlan[0].tci));
         } else {
             ds_put_format(s, "%svlan_tci=%s0x%04"PRIx16"/0x%04"PRIx16",",
                           colors.param, colors.end,
-                          ntohs(f->vlan_tci), ntohs(wc->masks.vlan_tci));
+                          ntohs(f->vlan[0].tci), ntohs(wc->masks.vlan[0].tci));
         }
     }
     format_eth_masked(s, "dl_src", f->dl_src, wc->masks.dl_src);
diff --git a/lib/meta-flow.c b/lib/meta-flow.c
index e160de1..5ebe727 100644
--- a/lib/meta-flow.c
+++ b/lib/meta-flow.c
@@ -256,14 +256,14 @@  mf_is_all_wild(const struct mf_field *mf, const struct flow_wildcards *wc)
         return eth_addr_is_zero(wc->masks.arp_tha);
 
     case MFF_VLAN_TCI:
-        return !wc->masks.vlan_tci;
+        return !wc->masks.vlan[0].tci;
     case MFF_DL_VLAN:
-        return !(wc->masks.vlan_tci & htons(VLAN_VID_MASK));
+        return !(wc->masks.vlan[0].tci & htons(VLAN_VID_MASK));
     case MFF_VLAN_VID:
-        return !(wc->masks.vlan_tci & htons(VLAN_VID_MASK | VLAN_CFI));
+        return !(wc->masks.vlan[0].tci & htons(VLAN_VID_MASK | VLAN_CFI));
     case MFF_DL_VLAN_PCP:
     case MFF_VLAN_PCP:
-        return !(wc->masks.vlan_tci & htons(VLAN_PCP_MASK));
+        return !(wc->masks.vlan[0].tci & htons(VLAN_PCP_MASK));
 
     case MFF_MPLS_LABEL:
         return !(wc->masks.mpls_lse[0] & htonl(MPLS_LABEL_MASK));
@@ -377,7 +377,7 @@  mf_are_prereqs_ok(const struct mf_field *mf, const struct flow *flow)
     case MFP_IPV6:
         return flow->dl_type == htons(ETH_TYPE_IPV6);
     case MFP_VLAN_VID:
-        return (flow->vlan_tci & htons(VLAN_CFI)) != 0;
+        return (flow->vlan[0].tci & htons(VLAN_CFI)) != 0;
     case MFP_MPLS:
         return eth_type_mpls(flow->dl_type);
     case MFP_IP_ANY:
@@ -447,7 +447,7 @@  mf_mask_field_and_prereqs__(const struct mf_field *mf,
         /* dl_type always unwildcarded. */
         break;
     case MFP_VLAN_VID:
-        WC_MASK_FIELD_MASK(wc, vlan_tci, htons(VLAN_CFI));
+        WC_MASK_FIELD_MASK(wc, vlan[0].tci, htons(VLAN_CFI));
         break;
     case MFP_NONE:
         break;
@@ -718,19 +718,19 @@  mf_get_value(const struct mf_field *mf, const struct flow *flow,
         break;
 
     case MFF_VLAN_TCI:
-        value->be16 = flow->vlan_tci;
+        value->be16 = flow->vlan[0].tci;
         break;
 
     case MFF_DL_VLAN:
-        value->be16 = flow->vlan_tci & htons(VLAN_VID_MASK);
+        value->be16 = flow->vlan[0].tci & htons(VLAN_VID_MASK);
         break;
     case MFF_VLAN_VID:
-        value->be16 = flow->vlan_tci & htons(VLAN_VID_MASK | VLAN_CFI);
+        value->be16 = flow->vlan[0].tci & htons(VLAN_VID_MASK | VLAN_CFI);
         break;
 
     case MFF_DL_VLAN_PCP:
     case MFF_VLAN_PCP:
-        value->u8 = vlan_tci_to_pcp(flow->vlan_tci);
+        value->u8 = vlan_tci_to_pcp(flow->vlan[0].tci);
         break;
 
     case MFF_MPLS_LABEL:
@@ -1286,7 +1286,7 @@  mf_set_flow_value(const struct mf_field *mf,
         break;
 
     case MFF_VLAN_TCI:
-        flow->vlan_tci = value->be16;
+        flow->vlan[0].tci = value->be16;
         break;
 
     case MFF_DL_VLAN:
diff --git a/lib/nx-match.c b/lib/nx-match.c
index 9a2ada9..18548ed 100644
--- a/lib/nx-match.c
+++ b/lib/nx-match.c
@@ -917,7 +917,7 @@  nx_put_raw(struct ofpbuf *b, enum ofp_version oxm, const struct match *match,
     int match_len;
     int i;
 
-    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 35);
+    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
 
     /* Metadata. */
     if (match->wc.masks.dp_hash) {
@@ -960,8 +960,8 @@  nx_put_raw(struct ofpbuf *b, enum ofp_version oxm, const struct match *match,
     /* 802.1Q. */
     if (oxm) {
         ovs_be16 VID_CFI_MASK = htons(VLAN_VID_MASK | VLAN_CFI);
-        ovs_be16 vid = flow->vlan_tci & VID_CFI_MASK;
-        ovs_be16 mask = match->wc.masks.vlan_tci & VID_CFI_MASK;
+        ovs_be16 vid = flow->vlan[0].tci & VID_CFI_MASK;
+        ovs_be16 mask = match->wc.masks.vlan[0].tci & VID_CFI_MASK;
 
         if (mask == htons(VLAN_VID_MASK | VLAN_CFI)) {
             nxm_put_16(b, MFF_VLAN_VID, oxm, vid);
@@ -969,14 +969,14 @@  nx_put_raw(struct ofpbuf *b, enum ofp_version oxm, const struct match *match,
             nxm_put_16m(b, MFF_VLAN_VID, oxm, vid, mask);
         }
 
-        if (vid && vlan_tci_to_pcp(match->wc.masks.vlan_tci)) {
+        if (vid && vlan_tci_to_pcp(match->wc.masks.vlan[0].tci)) {
             nxm_put_8(b, MFF_VLAN_PCP, oxm,
-                      vlan_tci_to_pcp(flow->vlan_tci));
+                      vlan_tci_to_pcp(flow->vlan[0].tci));
         }
 
     } else {
-        nxm_put_16m(b, MFF_VLAN_TCI, oxm, flow->vlan_tci,
-                    match->wc.masks.vlan_tci);
+        nxm_put_16m(b, MFF_VLAN_TCI, oxm, flow->vlan[0].tci,
+                    match->wc.masks.vlan[0].tci);
     }
 
     /* MPLS. */
diff --git a/lib/odp-util.c b/lib/odp-util.c
index fd1ca9b..46ff6de 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -4270,7 +4270,9 @@  odp_flow_key_from_flow__(const struct odp_flow_key_parms *parms,
                          bool export_mask, struct ofpbuf *buf)
 {
     struct ovs_key_ethernet *eth_key;
-    size_t encap;
+    size_t encap[FLOW_MAX_VLAN_HEADERS];
+    int encaps = 0;
+    size_t max_vlan_headers;
     const struct flow *flow = parms->flow;
     const struct flow *data = export_mask ? parms->mask : parms->flow;
 
@@ -4312,19 +4314,47 @@  odp_flow_key_from_flow__(const struct odp_flow_key_parms *parms,
                                        sizeof *eth_key);
     get_ethernet_key(data, eth_key);
 
-    if (flow->vlan_tci != htons(0) || flow->dl_type == htons(ETH_TYPE_VLAN)) {
+    max_vlan_headers = MIN(parms->support.max_vlan_headers,
+                           FLOW_MAX_VLAN_HEADERS);
+    for (encaps = 0; encaps < max_vlan_headers; encaps++) {
+        if (flow->vlan[encaps].tpid != htons(0)) {
+            if (export_mask) {
+                nl_msg_put_be16(buf, OVS_KEY_ATTR_ETHERTYPE, OVS_BE16_MAX);
+            } else {
+                nl_msg_put_be16(buf, OVS_KEY_ATTR_ETHERTYPE,
+                                flow->vlan[encaps].tpid);
+            }
+            nl_msg_put_be16(buf, OVS_KEY_ATTR_VLAN, data->vlan[encaps].tci);
+            encap[encaps] = nl_msg_start_nested(buf, OVS_KEY_ATTR_ENCAP);
+            if (flow->vlan[encaps].tci == htons(0)) {
+                encaps++;
+                goto unencap;
+            }
+        } else {
+            break;
+        }
+    }
+
+    /* If dpif supports less VLAN headers than that in flow,  put ETHERTYPE and
+     * stop. */
+    if (encaps < FLOW_MAX_VLAN_HEADERS &&
+        flow->vlan[encaps].tpid != htons(0)) {
         if (export_mask) {
             nl_msg_put_be16(buf, OVS_KEY_ATTR_ETHERTYPE, OVS_BE16_MAX);
         } else {
-            nl_msg_put_be16(buf, OVS_KEY_ATTR_ETHERTYPE, htons(ETH_TYPE_VLAN));
+            nl_msg_put_be16(buf, OVS_KEY_ATTR_ETHERTYPE,
+                            flow->vlan[encaps].tpid);
         }
-        nl_msg_put_be16(buf, OVS_KEY_ATTR_VLAN, data->vlan_tci);
-        encap = nl_msg_start_nested(buf, OVS_KEY_ATTR_ENCAP);
-        if (flow->vlan_tci == htons(0)) {
-            goto unencap;
+        goto unencap;
+    }
+
+    if (eth_type_vlan(flow->dl_type)) {
+        if (export_mask) {
+            nl_msg_put_be16(buf, OVS_KEY_ATTR_ETHERTYPE, OVS_BE16_MAX);
+        } else {
+            nl_msg_put_be16(buf, OVS_KEY_ATTR_ETHERTYPE, flow->dl_type);
         }
-    } else {
-        encap = 0;
+        goto unencap;
     }
 
     if (ntohs(flow->dl_type) < ETH_TYPE_MIN) {
@@ -4441,8 +4471,8 @@  odp_flow_key_from_flow__(const struct odp_flow_key_parms *parms,
     }
 
 unencap:
-    if (encap) {
-        nl_msg_end_nested(buf, encap);
+    for (encaps--; encaps >= 0; encaps--) {
+        nl_msg_end_nested(buf, encap[encaps]);
     }
 }
 
@@ -5009,69 +5039,90 @@  parse_8021q_onward(const struct nlattr *attrs[OVS_KEY_ATTR_MAX + 1],
     static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
     bool is_mask = src_flow != flow;
 
-    const struct nlattr *encap
-        = (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_ENCAP)
-           ? attrs[OVS_KEY_ATTR_ENCAP] : NULL);
+    const struct nlattr *encap;
     enum odp_key_fitness encap_fitness;
-    enum odp_key_fitness fitness;
-
-    /* Calculate fitness of outer attributes. */
-    if (!is_mask) {
-        expected_attrs |= ((UINT64_C(1) << OVS_KEY_ATTR_VLAN) |
-                          (UINT64_C(1) << OVS_KEY_ATTR_ENCAP));
-    } else {
-        if (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_VLAN)) {
-            expected_attrs |= (UINT64_C(1) << OVS_KEY_ATTR_VLAN);
+    enum odp_key_fitness fitness[FLOW_MAX_VLAN_HEADERS];
+    enum odp_key_fitness max_fitness;
+    int encaps = 0;
+
+    while (encaps < FLOW_MAX_VLAN_HEADERS &&
+           (is_mask?
+            (src_flow->vlan[encaps].tci & htons(VLAN_CFI)) :
+            eth_type_vlan(flow->dl_type))) {
+        /* Calculate fitness of outer attributes. */
+        encap  = (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_ENCAP)
+           ? attrs[OVS_KEY_ATTR_ENCAP] : NULL);
+        /* In case dpif support less VLAN headers, nested VLAN and ENCAP are
+         * not necessary. */
+        if (!is_mask && encaps > 0) {
+            expected_attrs |= ((UINT64_C(1) << OVS_KEY_ATTR_VLAN) |
+                              (UINT64_C(1) << OVS_KEY_ATTR_ENCAP));
+        } else {
+            if (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_VLAN)) {
+                expected_attrs |= (UINT64_C(1) << OVS_KEY_ATTR_VLAN);
+            }
+            if (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_ENCAP)) {
+                expected_attrs |= (UINT64_C(1) << OVS_KEY_ATTR_ENCAP);
+            }
         }
-        if (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_ENCAP)) {
-            expected_attrs |= (UINT64_C(1) << OVS_KEY_ATTR_ENCAP);
+        fitness[encaps] = check_expectations(present_attrs, out_of_range_attr,
+                                     expected_attrs, key, key_len);
+
+        /* Set vlan_tci.
+         * Remove the TPID from dl_type since it's not the real Ethertype.  */
+        flow->vlan[encaps].tpid = flow->dl_type;
+        flow->dl_type = htons(0);
+        flow->vlan[encaps].tci =
+                        (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_VLAN)
+                        ? nl_attr_get_be16(attrs[OVS_KEY_ATTR_VLAN])
+                        : htons(0));
+        if (!is_mask) {
+            if (!(present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_VLAN))) {
+                return ODP_FIT_TOO_LITTLE;
+            } else if (flow->vlan[encaps].tci == htons(0)) {
+                /* Corner case for a truncated 802.1Q header. */
+                if (fitness[encaps] == ODP_FIT_PERFECT &&
+                    nl_attr_get_size(encap)) {
+                    return ODP_FIT_TOO_MUCH;
+                }
+                return fitness[encaps];
+            } else if (!(flow->vlan[encaps].tci & htons(VLAN_CFI))) {
+                VLOG_ERR_RL(&rl, "OVS_KEY_ATTR_VLAN 0x%04"PRIx16" is nonzero "
+                            "but CFI bit is not set",
+                            ntohs(flow->vlan[encaps].tci));
+                return ODP_FIT_ERROR;
+            }
+        } else {
+            if (!(present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_ENCAP))) {
+                return fitness[encaps];
+            }
         }
-    }
-    fitness = check_expectations(present_attrs, out_of_range_attr,
-                                 expected_attrs, key, key_len);
 
-    /* Set vlan_tci.
-     * Remove the TPID from dl_type since it's not the real Ethertype.  */
-    flow->dl_type = htons(0);
-    flow->vlan_tci = (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_VLAN)
-                      ? nl_attr_get_be16(attrs[OVS_KEY_ATTR_VLAN])
-                      : htons(0));
-    if (!is_mask) {
-        if (!(present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_VLAN))) {
-            return ODP_FIT_TOO_LITTLE;
-        } else if (flow->vlan_tci == htons(0)) {
-            /* Corner case for a truncated 802.1Q header. */
-            if (fitness == ODP_FIT_PERFECT && nl_attr_get_size(encap)) {
-                return ODP_FIT_TOO_MUCH;
-            }
-            return fitness;
-        } else if (!(flow->vlan_tci & htons(VLAN_CFI))) {
-            VLOG_ERR_RL(&rl, "OVS_KEY_ATTR_VLAN 0x%04"PRIx16" is nonzero "
-                        "but CFI bit is not set", ntohs(flow->vlan_tci));
+        /* Now parse the encapsulated attributes. */
+        if (!parse_flow_nlattrs(nl_attr_get(encap), nl_attr_get_size(encap),
+                                attrs, &present_attrs, &out_of_range_attr)) {
             return ODP_FIT_ERROR;
         }
-    } else {
-        if (!(present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_ENCAP))) {
-            return fitness;
+        expected_attrs = 0;
+
+        if (!parse_ethertype(attrs, present_attrs, &expected_attrs,
+                             flow, src_flow)) {
+            return ODP_FIT_ERROR;
         }
-    }
 
-    /* Now parse the encapsulated attributes. */
-    if (!parse_flow_nlattrs(nl_attr_get(encap), nl_attr_get_size(encap),
-                            attrs, &present_attrs, &out_of_range_attr)) {
-        return ODP_FIT_ERROR;
+        encaps++;
     }
-    expected_attrs = 0;
 
-    if (!parse_ethertype(attrs, present_attrs, &expected_attrs, flow, src_flow)) {
-        return ODP_FIT_ERROR;
-    }
     encap_fitness = parse_l2_5_onward(attrs, present_attrs, out_of_range_attr,
                                       expected_attrs, flow, key, key_len,
                                       src_flow);
 
     /* The overall fitness is the worse of the outer and inner attributes. */
-    return MAX(fitness, encap_fitness);
+    max_fitness = encap_fitness;
+    for (encaps--; encaps >= 0; encaps--) {
+        max_fitness = MAX(max_fitness, fitness[encaps]);
+    }
+    return max_fitness;
 }
 
 static enum odp_key_fitness
@@ -5182,16 +5233,17 @@  odp_flow_key_to_flow__(const struct nlattr *key, size_t key_len,
     }
 
     if (is_mask
-        ? (src_flow->vlan_tci & htons(VLAN_CFI)) != 0
-        : src_flow->dl_type == htons(ETH_TYPE_VLAN)) {
+        ? (src_flow->vlan[0].tci & htons(VLAN_CFI)) != 0
+        : eth_type_vlan(src_flow->dl_type)) {
         return parse_8021q_onward(attrs, present_attrs, out_of_range_attr,
                                   expected_attrs, flow, key, key_len, src_flow);
     }
     if (is_mask) {
         /* A missing VLAN mask means exact match on vlan_tci 0 (== no VLAN). */
-        flow->vlan_tci = htons(0xffff);
+        flow->vlan[0].tpid = htons(0xffff);
+        flow->vlan[0].tci = htons(0xffff);
         if (present_attrs & (UINT64_C(1) << OVS_KEY_ATTR_VLAN)) {
-            flow->vlan_tci = nl_attr_get_be16(attrs[OVS_KEY_ATTR_VLAN]);
+            flow->vlan[0].tci = nl_attr_get_be16(attrs[OVS_KEY_ATTR_VLAN]);
             expected_attrs |= (UINT64_C(1) << OVS_KEY_ATTR_VLAN);
         }
     }
@@ -5482,35 +5534,53 @@  commit_set_ether_addr_action(const struct flow *flow, struct flow *base_flow,
 }
 
 static void
-pop_vlan(struct flow *base,
-         struct ofpbuf *odp_actions, struct flow_wildcards *wc)
+commit_vlan_action(const struct flow* flow, struct flow *base,
+                   struct ofpbuf *odp_actions, struct flow_wildcards *wc)
 {
-    memset(&wc->masks.vlan_tci, 0xff, sizeof wc->masks.vlan_tci);
+    int flow_n, base_n;
 
-    if (base->vlan_tci & htons(VLAN_CFI)) {
-        nl_msg_put_flag(odp_actions, OVS_ACTION_ATTR_POP_VLAN);
-        base->vlan_tci = 0;
+    /* Find inner-most VLAN header of base */
+    for (base_n = 0; base_n < FLOW_MAX_VLAN_HEADERS; base_n++) {
+        if (!(base->vlan[base_n].tci & htons(VLAN_CFI))) {
+            break;
+        } else {
+            wc->masks.vlan[base_n].tci |= htons(VLAN_CFI);
+            wc->masks.vlan[base_n].tpid = htons(0xffff);
+        }
     }
-}
 
-static void
-commit_vlan_action(ovs_be16 vlan_tci, struct flow *base,
-                   struct ofpbuf *odp_actions, struct flow_wildcards *wc)
-{
-    if (base->vlan_tci == vlan_tci) {
-        return;
+    /* Find inner-most VLAN header of base */
+    for (flow_n = 0; flow_n < FLOW_MAX_VLAN_HEADERS; flow_n++) {
+        if (!(flow->vlan[flow_n].tci & htons(VLAN_CFI))) {
+            break;
+        }
+    }
+
+    /* Skip common headers */
+    for (base_n--, flow_n--;
+         base_n >= 0 && flow_n >= 0;
+         base_n--, flow_n--) {
+        if (memcmp(&base->vlan[base_n], &flow->vlan[flow_n],
+                   sizeof(struct flow_vlan_hdr))) {
+            break;
+        }
+    }
+
+    /* Pop all mismatching vlan of base, push thoses of flow */
+    for (; base_n >= 0; base_n--) {
+        nl_msg_put_flag(odp_actions, OVS_ACTION_ATTR_POP_VLAN);
+        memset(&wc->masks.vlan[base_n], 0xff, sizeof(wc->masks.vlan[base_n]));
     }
 
-    pop_vlan(base, odp_actions, wc);
-    if (vlan_tci & htons(VLAN_CFI)) {
+    for (; flow_n >= 0; flow_n--) {
         struct ovs_action_push_vlan vlan;
 
-        vlan.vlan_tpid = htons(ETH_TYPE_VLAN);
-        vlan.vlan_tci = vlan_tci;
+        vlan.vlan_tpid = flow->vlan[flow_n].tpid;
+        vlan.vlan_tci = flow->vlan[flow_n].tci;
         nl_msg_put_unspec(odp_actions, OVS_ACTION_ATTR_PUSH_VLAN,
                           &vlan, sizeof vlan);
     }
-    base->vlan_tci = vlan_tci;
+    memcpy(base->vlan, flow->vlan, sizeof(base->vlan));
 }
 
 /* Wildcarding already done at action translation time. */
@@ -5943,7 +6013,7 @@  commit_odp_actions(const struct flow *flow, struct flow *base,
     commit_set_port_action(flow, base, odp_actions, wc, use_masked);
     slow2 = commit_set_icmp_action(flow, base, odp_actions, wc);
     commit_mpls_action(flow, base, odp_actions);
-    commit_vlan_action(flow->vlan_tci, base, odp_actions, wc);
+    commit_vlan_action(flow, base, odp_actions, wc);
     commit_set_priority_action(flow, base, odp_actions, wc, use_masked);
     commit_set_pkt_mark_action(flow, base, odp_actions, wc, use_masked);
 
diff --git a/lib/odp-util.h b/lib/odp-util.h
index 53ee661..2393b44 100644
--- a/lib/odp-util.h
+++ b/lib/odp-util.h
@@ -141,7 +141,7 @@  void odp_portno_names_destroy(struct hmap *portno_names);
  * add another field and forget to adjust this value.
  */
 #define ODPUTIL_FLOW_KEY_BYTES 640
-BUILD_ASSERT_DECL(FLOW_WC_SEQ == 35);
+BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
 
 /* A buffer with sufficient size and alignment to hold an nlattr-formatted flow
  * key.  An array of "struct nlattr" might not, in theory, be sufficiently
@@ -167,6 +167,8 @@  int odp_flow_from_string(const char *s,
 /* Indicates support for various fields. This defines how flows will be
  * serialised. */
 struct odp_support {
+    /* Maximum number of 802.1q VLAN headers to serialize in a mask. */
+    size_t max_vlan_headers;
     /* Maximum number of MPLS label stack entries to serialise in a mask. */
     size_t max_mpls_depth;
 
diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index 997cc15..c4b656e 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -1619,24 +1619,24 @@  decode_OFPAT_RAW11_PUSH_VLAN(ovs_be16 eth_type,
                              enum ofp_version ofp_version OVS_UNUSED,
                              struct ofpbuf *out)
 {
-    if (eth_type != htons(ETH_TYPE_VLAN_8021Q)) {
-        /* XXX 802.1AD(QinQ) isn't supported at the moment */
+    struct ofpact_push_vlan *push_vlan;
+    if (!eth_type_vlan(eth_type)) {
         return OFPERR_OFPBAC_BAD_ARGUMENT;
     }
-    ofpact_put_PUSH_VLAN(out);
+    push_vlan = ofpact_put_PUSH_VLAN(out);
+    push_vlan->ethertype = eth_type;
     return 0;
 }
 
 static void
-encode_PUSH_VLAN(const struct ofpact_null *null OVS_UNUSED,
+encode_PUSH_VLAN(const struct ofpact_push_vlan *push_vlan,
                  enum ofp_version ofp_version, struct ofpbuf *out)
 {
     if (ofp_version == OFP10_VERSION) {
         /* PUSH is a side effect of a SET_VLAN_VID/PCP, which should
          * follow this action. */
     } else {
-        /* XXX ETH_TYPE_VLAN_8021AD case */
-        put_OFPAT11_PUSH_VLAN(out, htons(ETH_TYPE_VLAN_8021Q));
+        put_OFPAT11_PUSH_VLAN(out, push_vlan->ethertype);
     }
 }
 
@@ -1644,6 +1644,7 @@  static char * OVS_WARN_UNUSED_RESULT
 parse_PUSH_VLAN(char *arg, struct ofpbuf *ofpacts,
                 enum ofputil_protocol *usable_protocols OVS_UNUSED)
 {
+    struct ofpact_push_vlan *push_vlan;
     uint16_t ethertype;
     char *error;
 
@@ -1653,21 +1654,19 @@  parse_PUSH_VLAN(char *arg, struct ofpbuf *ofpacts,
         return error;
     }
 
-    if (ethertype != ETH_TYPE_VLAN_8021Q) {
-        /* XXX ETH_TYPE_VLAN_8021AD case isn't supported */
+    if (!eth_type_vlan(htons(ethertype))) {
         return xasprintf("%s: not a valid VLAN ethertype", arg);
     }
-
-    ofpact_put_PUSH_VLAN(ofpacts);
+    push_vlan = ofpact_put_PUSH_VLAN(ofpacts);
+    push_vlan->ethertype = htons(ethertype);
     return NULL;
 }
 
 static void
-format_PUSH_VLAN(const struct ofpact_null *a OVS_UNUSED, struct ds *s)
+format_PUSH_VLAN(const struct ofpact_push_vlan *push_vlan, struct ds *s)
 {
-    /* XXX 802.1AD case*/
     ds_put_format(s, "%spush_vlan:%s%#"PRIx16,
-                  colors.param, colors.end, ETH_TYPE_VLAN_8021Q);
+                  colors.param, colors.end, htons(push_vlan->ethertype));
 }
 
 /* Action structure for OFPAT10_SET_DL_SRC/DST and OFPAT11_SET_DL_SRC/DST. */
@@ -6789,43 +6788,44 @@  ofpact_check__(enum ofputil_protocol *usable_protocols, struct ofpact *a,
         /* Remember if we saw a vlan tag in the flow to aid translating to
          * OpenFlow 1.1+ if need be. */
         ofpact_get_SET_VLAN_VID(a)->flow_has_vlan =
-            (flow->vlan_tci & htons(VLAN_CFI)) == htons(VLAN_CFI);
-        if (!(flow->vlan_tci & htons(VLAN_CFI)) &&
+            (flow->vlan[0].tci & htons(VLAN_CFI)) == htons(VLAN_CFI);
+        if (!(flow->vlan[0].tci & htons(VLAN_CFI)) &&
             !ofpact_get_SET_VLAN_VID(a)->push_vlan_if_needed) {
             inconsistent_match(usable_protocols);
         }
         /* Temporary mark that we have a vlan tag. */
-        flow->vlan_tci |= htons(VLAN_CFI);
+        flow->vlan[0].tci |= htons(VLAN_CFI);
         return 0;
 
     case OFPACT_SET_VLAN_PCP:
         /* Remember if we saw a vlan tag in the flow to aid translating to
          * OpenFlow 1.1+ if need be. */
         ofpact_get_SET_VLAN_PCP(a)->flow_has_vlan =
-            (flow->vlan_tci & htons(VLAN_CFI)) == htons(VLAN_CFI);
-        if (!(flow->vlan_tci & htons(VLAN_CFI)) &&
+            (flow->vlan[0].tci & htons(VLAN_CFI)) == htons(VLAN_CFI);
+        if (!(flow->vlan[0].tci & htons(VLAN_CFI)) &&
             !ofpact_get_SET_VLAN_PCP(a)->push_vlan_if_needed) {
             inconsistent_match(usable_protocols);
         }
         /* Temporary mark that we have a vlan tag. */
-        flow->vlan_tci |= htons(VLAN_CFI);
+        flow->vlan[0].tci |= htons(VLAN_CFI);
         return 0;
 
     case OFPACT_STRIP_VLAN:
-        if (!(flow->vlan_tci & htons(VLAN_CFI))) {
+        if (!(flow->vlan[0].tci & htons(VLAN_CFI))) {
             inconsistent_match(usable_protocols);
         }
         /* Temporary mark that we have no vlan tag. */
-        flow->vlan_tci = htons(0);
+        flow_pop_vlan(flow);
         return 0;
 
     case OFPACT_PUSH_VLAN:
-        if (flow->vlan_tci & htons(VLAN_CFI)) {
-            /* Multiple VLAN headers not supported. */
+        if (flow->vlan[FLOW_MAX_VLAN_HEADERS - 1].tci & htons(VLAN_CFI)) {
+            /* Support maximum (FLOW_MAX_VLAN_HEADERS) VLAN headers. */
             return OFPERR_OFPBAC_BAD_TAG;
         }
         /* Temporary mark that we have a vlan tag. */
-        flow->vlan_tci |= htons(VLAN_CFI);
+        flow_shift_vlan(flow);
+        flow->vlan[0].tci |= htons(VLAN_CFI);
         return 0;
 
     case OFPACT_SET_ETH_SRC:
@@ -6872,7 +6872,8 @@  ofpact_check__(enum ofputil_protocol *usable_protocols, struct ofpact *a,
         mf = ofpact_get_SET_FIELD(a)->field;
         /* Require OXM_OF_VLAN_VID to have an existing VLAN header. */
         if (!mf_are_prereqs_ok(mf, flow) ||
-            (mf->id == MFF_VLAN_VID && !(flow->vlan_tci & htons(VLAN_CFI)))) {
+            (mf->id == MFF_VLAN_VID &&
+             !(flow->vlan[0].tci & htons(VLAN_CFI)))) {
             VLOG_WARN_RL(&rl, "set_field %s lacks correct prerequisities",
                          mf->name);
             return OFPERR_OFPBAC_MATCH_INCONSISTENT;
@@ -6880,11 +6881,11 @@  ofpact_check__(enum ofputil_protocol *usable_protocols, struct ofpact *a,
         /* Remember if we saw a vlan tag in the flow to aid translating to
          * OpenFlow 1.1 if need be. */
         ofpact_get_SET_FIELD(a)->flow_has_vlan =
-            (flow->vlan_tci & htons(VLAN_CFI)) == htons(VLAN_CFI);
+            (flow->vlan[0].tci & htons(VLAN_CFI)) == htons(VLAN_CFI);
         if (mf->id == MFF_VLAN_TCI) {
             /* The set field may add or remove the vlan tag,
              * Mark the status temporarily. */
-            flow->vlan_tci = ofpact_get_SET_FIELD(a)->value.be16;
+            flow->vlan[0].tci = ofpact_get_SET_FIELD(a)->value.be16;
         }
         return 0;
 
@@ -7045,9 +7046,11 @@  ofpacts_check(struct ofpact ofpacts[], size_t ofpacts_len,
 {
     struct ofpact *a;
     ovs_be16 dl_type = flow->dl_type;
-    ovs_be16 vlan_tci = flow->vlan_tci;
     uint8_t nw_proto = flow->nw_proto;
     enum ofperr error = 0;
+    struct flow_vlan_hdr vlan[FLOW_MAX_VLAN_HEADERS];
+
+    memcpy(&vlan, &flow->vlan, sizeof(vlan));
 
     OFPACT_FOR_EACH (a, ofpacts, ofpacts_len) {
         error = ofpact_check__(usable_protocols, a, flow,
@@ -7058,8 +7061,8 @@  ofpacts_check(struct ofpact ofpacts[], size_t ofpacts_len,
     }
     /* Restore fields that may have been modified. */
     flow->dl_type = dl_type;
-    flow->vlan_tci = vlan_tci;
     flow->nw_proto = nw_proto;
+    memcpy(&flow->vlan, &vlan, sizeof(vlan));
     return error;
 }
 
diff --git a/lib/ofp-util.c b/lib/ofp-util.c
index 2b214ea..9efe749 100644
--- a/lib/ofp-util.c
+++ b/lib/ofp-util.c
@@ -101,7 +101,7 @@  ofputil_netmask_to_wcbits(ovs_be32 netmask)
 void
 ofputil_wildcard_from_ofpfw10(uint32_t ofpfw, struct flow_wildcards *wc)
 {
-    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 35);
+    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
 
     /* Initialize most of wc. */
     flow_wildcards_init_catchall(wc);
@@ -141,10 +141,10 @@  ofputil_wildcard_from_ofpfw10(uint32_t ofpfw, struct flow_wildcards *wc)
 
     /* VLAN TCI mask. */
     if (!(ofpfw & OFPFW10_DL_VLAN_PCP)) {
-        wc->masks.vlan_tci |= htons(VLAN_PCP_MASK | VLAN_CFI);
+        wc->masks.vlan[0].tci |= htons(VLAN_PCP_MASK | VLAN_CFI);
     }
     if (!(ofpfw & OFPFW10_DL_VLAN)) {
-        wc->masks.vlan_tci |= htons(VLAN_VID_MASK | VLAN_CFI);
+        wc->masks.vlan[0].tci |= htons(VLAN_VID_MASK | VLAN_CFI);
     }
 }
 
@@ -182,8 +182,8 @@  ofputil_match_from_ofp10_match(const struct ofp10_match *ofmatch,
          * because we can't have a specific PCP without an 802.1Q header.
          * However, older versions of OVS treated this as matching packets
          * withut an 802.1Q header, so we do here too. */
-        match->flow.vlan_tci = htons(0);
-        match->wc.masks.vlan_tci = htons(0xffff);
+        match->flow.vlan[0].tci = htons(0);
+        match->wc.masks.vlan[0].tci = htons(0xffff);
     } else {
         ovs_be16 vid, pcp, tci;
         uint16_t hpcp;
@@ -192,7 +192,7 @@  ofputil_match_from_ofp10_match(const struct ofp10_match *ofmatch,
         hpcp = (ofmatch->dl_vlan_pcp << VLAN_PCP_SHIFT) & VLAN_PCP_MASK;
         pcp = htons(hpcp);
         tci = vid | pcp | htons(VLAN_CFI);
-        match->flow.vlan_tci = tci & match->wc.masks.vlan_tci;
+        match->flow.vlan[0].tci = tci & match->wc.masks.vlan[0].tci;
     }
 
     /* Clean up. */
@@ -241,22 +241,22 @@  ofputil_match_to_ofp10_match(const struct match *match,
     /* Translate VLANs. */
     ofmatch->dl_vlan = htons(0);
     ofmatch->dl_vlan_pcp = 0;
-    if (match->wc.masks.vlan_tci == htons(0)) {
+    if (match->wc.masks.vlan[0].tci == htons(0)) {
         ofpfw |= OFPFW10_DL_VLAN | OFPFW10_DL_VLAN_PCP;
-    } else if (match->wc.masks.vlan_tci & htons(VLAN_CFI)
-               && !(match->flow.vlan_tci & htons(VLAN_CFI))) {
+    } else if (match->wc.masks.vlan[0].tci & htons(VLAN_CFI)
+               && !(match->flow.vlan[0].tci & htons(VLAN_CFI))) {
         ofmatch->dl_vlan = htons(OFP10_VLAN_NONE);
     } else {
-        if (!(match->wc.masks.vlan_tci & htons(VLAN_VID_MASK))) {
+        if (!(match->wc.masks.vlan[0].tci & htons(VLAN_VID_MASK))) {
             ofpfw |= OFPFW10_DL_VLAN;
         } else {
-            ofmatch->dl_vlan = htons(vlan_tci_to_vid(match->flow.vlan_tci));
+            ofmatch->dl_vlan = htons(vlan_tci_to_vid(match->flow.vlan[0].tci));
         }
 
-        if (!(match->wc.masks.vlan_tci & htons(VLAN_PCP_MASK))) {
+        if (!(match->wc.masks.vlan[0].tci & htons(VLAN_PCP_MASK))) {
             ofpfw |= OFPFW10_DL_VLAN_PCP;
         } else {
-            ofmatch->dl_vlan_pcp = vlan_tci_to_pcp(match->flow.vlan_tci);
+            ofmatch->dl_vlan_pcp = vlan_tci_to_pcp(match->flow.vlan[0].tci);
         }
     }
 
@@ -344,17 +344,17 @@  ofputil_match_from_ofp11_match(const struct ofp11_match *ofmatch,
     if (!(wc & OFPFW11_DL_VLAN)) {
         if (ofmatch->dl_vlan == htons(OFPVID11_NONE)) {
             /* Match only packets without a VLAN tag. */
-            match->flow.vlan_tci = htons(0);
-            match->wc.masks.vlan_tci = OVS_BE16_MAX;
+            match->flow.vlan[0].tci = htons(0);
+            match->wc.masks.vlan[0].tci = OVS_BE16_MAX;
         } else {
             if (ofmatch->dl_vlan == htons(OFPVID11_ANY)) {
                 /* Match any packet with a VLAN tag regardless of VID. */
-                match->flow.vlan_tci = htons(VLAN_CFI);
-                match->wc.masks.vlan_tci = htons(VLAN_CFI);
+                match->flow.vlan[0].tci = htons(VLAN_CFI);
+                match->wc.masks.vlan[0].tci = htons(VLAN_CFI);
             } else if (ntohs(ofmatch->dl_vlan) < 4096) {
                 /* Match only packets with the specified VLAN VID. */
-                match->flow.vlan_tci = htons(VLAN_CFI) | ofmatch->dl_vlan;
-                match->wc.masks.vlan_tci = htons(VLAN_CFI | VLAN_VID_MASK);
+                match->flow.vlan[0].tci = htons(VLAN_CFI) | ofmatch->dl_vlan;
+                match->wc.masks.vlan[0].tci = htons(VLAN_CFI | VLAN_VID_MASK);
             } else {
                 /* Invalid VID. */
                 return OFPERR_OFPBMC_BAD_VALUE;
@@ -362,9 +362,9 @@  ofputil_match_from_ofp11_match(const struct ofp11_match *ofmatch,
 
             if (!(wc & OFPFW11_DL_VLAN_PCP)) {
                 if (ofmatch->dl_vlan_pcp <= 7) {
-                    match->flow.vlan_tci |= htons(ofmatch->dl_vlan_pcp
+                    match->flow.vlan[0].tci |= htons(ofmatch->dl_vlan_pcp
                                                   << VLAN_PCP_SHIFT);
-                    match->wc.masks.vlan_tci |= htons(VLAN_PCP_MASK);
+                    match->wc.masks.vlan[0].tci |= htons(VLAN_PCP_MASK);
                 } else {
                     /* Invalid PCP. */
                     return OFPERR_OFPBMC_BAD_VALUE;
@@ -482,23 +482,23 @@  ofputil_match_to_ofp11_match(const struct match *match,
     ofmatch->dl_dst = match->flow.dl_dst;
     ofmatch->dl_dst_mask = eth_addr_invert(match->wc.masks.dl_dst);
 
-    if (match->wc.masks.vlan_tci == htons(0)) {
+    if (match->wc.masks.vlan[0].tci == htons(0)) {
         wc |= OFPFW11_DL_VLAN | OFPFW11_DL_VLAN_PCP;
-    } else if (match->wc.masks.vlan_tci & htons(VLAN_CFI)
-               && !(match->flow.vlan_tci & htons(VLAN_CFI))) {
+    } else if (match->wc.masks.vlan[0].tci & htons(VLAN_CFI)
+               && !(match->flow.vlan[0].tci & htons(VLAN_CFI))) {
         ofmatch->dl_vlan = htons(OFPVID11_NONE);
         wc |= OFPFW11_DL_VLAN_PCP;
     } else {
-        if (!(match->wc.masks.vlan_tci & htons(VLAN_VID_MASK))) {
+        if (!(match->wc.masks.vlan[0].tci & htons(VLAN_VID_MASK))) {
             ofmatch->dl_vlan = htons(OFPVID11_ANY);
         } else {
-            ofmatch->dl_vlan = htons(vlan_tci_to_vid(match->flow.vlan_tci));
+            ofmatch->dl_vlan = htons(vlan_tci_to_vid(match->flow.vlan[0].tci));
         }
 
-        if (!(match->wc.masks.vlan_tci & htons(VLAN_PCP_MASK))) {
+        if (!(match->wc.masks.vlan[0].tci & htons(VLAN_PCP_MASK))) {
             wc |= OFPFW11_DL_VLAN_PCP;
         } else {
-            ofmatch->dl_vlan_pcp = vlan_tci_to_pcp(match->flow.vlan_tci);
+            ofmatch->dl_vlan_pcp = vlan_tci_to_pcp(match->flow.vlan[0].tci);
         }
     }
 
diff --git a/lib/tnl-ports.c b/lib/tnl-ports.c
index e945eae..32fe485 100644
--- a/lib/tnl-ports.c
+++ b/lib/tnl-ports.c
@@ -136,7 +136,7 @@  map_insert(odp_port_t port, struct eth_addr mac, struct in6_addr *addr,
         } else {
             match.wc.masks.ipv6_dst = in6addr_exact;
         }
-        match.wc.masks.vlan_tci = OVS_BE16_MAX;
+        match.wc.masks.vlan[0].tci = OVS_BE16_MAX;
         memset(&match.wc.masks.dl_dst, 0xff, sizeof (struct eth_addr));
 
         cls_rule_init(&p->cr, &match, 0); /* Priority == 0. */
diff --git a/ofproto/bond.c b/ofproto/bond.c
index 032b8f6..33fd1cd 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -1707,7 +1707,7 @@  static unsigned int
 bond_hash_tcp(const struct flow *flow, uint16_t vlan, uint32_t basis)
 {
     struct flow hash_flow = *flow;
-    hash_flow.vlan_tci = htons(vlan);
+    hash_flow.vlan[0].tci = htons(vlan);
 
     /* The symmetric quality of this hash function is not required, but
      * flow_hash_symmetric_l4 already exists, and is sufficient for our
diff --git a/ofproto/ofproto-dpif-ipfix.c b/ofproto/ofproto-dpif-ipfix.c
index 5744abb..804c5f8 100644
--- a/ofproto/ofproto-dpif-ipfix.c
+++ b/ofproto/ofproto-dpif-ipfix.c
@@ -1596,7 +1596,7 @@  ipfix_cache_entry_init(struct ipfix_flow_cache_entry *entry,
 
     /* Choose the right template ID matching the protocols in the
      * sampled packet. */
-    l2 = (flow->vlan_tci == 0) ? IPFIX_PROTO_L2_ETH : IPFIX_PROTO_L2_VLAN;
+    l2 = (flow->vlan[0].tci == 0) ? IPFIX_PROTO_L2_ETH : IPFIX_PROTO_L2_VLAN;
 
     switch(ntohs(flow->dl_type)) {
     case ETH_TYPE_IP:
@@ -1672,8 +1672,8 @@  ipfix_cache_entry_init(struct ipfix_flow_cache_entry *entry,
 
     if (l2 == IPFIX_PROTO_L2_VLAN) {
         struct ipfix_data_record_flow_key_vlan *data_vlan;
-        uint16_t vlan_id = vlan_tci_to_vid(flow->vlan_tci);
-        uint8_t priority = vlan_tci_to_pcp(flow->vlan_tci);
+        uint16_t vlan_id = vlan_tci_to_vid(flow->vlan[0].tci);
+        uint8_t priority = vlan_tci_to_pcp(flow->vlan[0].tci);
 
         data_vlan = dp_packet_put_zeros(&msg, sizeof *data_vlan);
         data_vlan->vlan_id = htons(vlan_id);
diff --git a/ofproto/ofproto-dpif-rid.h b/ofproto/ofproto-dpif-rid.h
index c34d760..3bca817 100644
--- a/ofproto/ofproto-dpif-rid.h
+++ b/ofproto/ofproto-dpif-rid.h
@@ -99,7 +99,7 @@  struct rule;
 /* Metadata for restoring pipeline context after recirculation.  Helpers
  * are inlined below to keep them together with the definition for easier
  * updates. */
-BUILD_ASSERT_DECL(FLOW_WC_SEQ == 35);
+BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
 
 struct frozen_metadata {
     /* Metadata in struct flow. */
diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c
index 7d0aa36..ef16426 100644
--- a/ofproto/ofproto-dpif-sflow.c
+++ b/ofproto/ofproto-dpif-sflow.c
@@ -1279,8 +1279,8 @@  dpif_sflow_received(struct dpif_sflow *ds, const struct dp_packet *packet,
     /* Add extended switch element. */
     memset(&switchElem, 0, sizeof(switchElem));
     switchElem.tag = SFLFLOW_EX_SWITCH;
-    switchElem.flowType.sw.src_vlan = vlan_tci_to_vid(flow->vlan_tci);
-    switchElem.flowType.sw.src_priority = vlan_tci_to_pcp(flow->vlan_tci);
+    switchElem.flowType.sw.src_vlan = vlan_tci_to_vid(flow->vlan[0].tci);
+    switchElem.flowType.sw.src_priority = vlan_tci_to_pcp(flow->vlan[0].tci);
 
     /* Retrieve data from user_action_cookie. */
     vlan_tci = cookie->sflow.vlan_tci;
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 1977b6b..fd41ac8 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -123,9 +123,13 @@  struct xbundle {
     struct lacp *lacp;             /* LACP handle or null. */
 
     enum port_vlan_mode vlan_mode; /* VLAN mode. */
+    uint16_t qinq_ethtype;         /* Ethertype of dot1q-tunnel interface
+                                    * either 0x8100 or 0x88a8. */
     int vlan;                      /* -1=trunk port, else a 12-bit VLAN ID. */
     unsigned long *trunks;         /* Bitmap of trunked VLANs, if 'vlan' == -1.
                                     * NULL if all VLANs are trunked. */
+    unsigned long *cvlans;         /* Bitmap of allowed customer vlans,
+                                    * NULL if all VLANs are allowed */
     bool use_priority_tags;        /* Use 802.1p tag for frames in VLAN 0? */
     bool floodable;                /* No port has OFPUTIL_PC_NO_FLOOD set? */
 };
@@ -376,6 +380,13 @@  struct xlate_ctx {
     enum xlate_error error;     /* Translation failed. */
 };
 
+/* Structure to track VLAN manipulation */
+struct xvlan {
+    uint16_t tpid;
+    uint16_t vid;
+    ovs_be16 pcp;
+};
+
 const char *xlate_strerror(enum xlate_error error)
 {
     switch (error) {
@@ -547,9 +558,19 @@  static void xlate_table_action(struct xlate_ctx *, ofp_port_t in_port,
                                uint8_t table_id, bool may_packet_in,
                                bool honor_table_miss);
 static bool input_vid_is_valid(uint16_t vid, struct xbundle *, bool warn);
-static uint16_t input_vid_to_vlan(const struct xbundle *, uint16_t vid);
+static void xvlan_copy(struct xvlan *dst, const struct xvlan *src);
+static void xvlan_copy_pop(struct xvlan *dst, const struct xvlan *src);
+static void xvlan_copy_shift(struct xvlan *dst, const struct xvlan *src);
+static void xvlan_extract(const struct flow *, struct xvlan *);
+static void xvlan_put(struct flow *, const struct xvlan *);
+static void xvlan_input_translate(const struct xbundle *,
+                                  const struct xvlan *in,
+                                  struct xvlan *xvlan);
+static void xvlan_output_translate(const struct xbundle *,
+                                   const struct xvlan *xvlan,
+                                   struct xvlan *out);
 static void output_normal(struct xlate_ctx *, const struct xbundle *,
-                          uint16_t vlan);
+                          const struct xvlan *);
 
 /* Optional bond recirculation parameter to compose_output_action(). */
 struct xlate_bond_recirc {
@@ -592,8 +613,10 @@  static void xlate_xbridge_set(struct xbridge *, struct dpif *,
                               bool forward_bpdu, bool has_in_band,
                               const struct dpif_backer_support *);
 static void xlate_xbundle_set(struct xbundle *xbundle,
-                              enum port_vlan_mode vlan_mode, int vlan,
-                              unsigned long *trunks, bool use_priority_tags,
+                              enum port_vlan_mode vlan_mode,
+                              uint16_t qinq_ethtype, int vlan,
+                              unsigned long *trunks, unsigned long *cvlans,
+                              bool use_priority_tags,
                               const struct bond *bond, const struct lacp *lacp,
                               bool floodable);
 static void xlate_xport_set(struct xport *xport, odp_port_t odp_port,
@@ -735,16 +758,19 @@  xlate_xbridge_set(struct xbridge *xbridge,
 
 static void
 xlate_xbundle_set(struct xbundle *xbundle,
-                  enum port_vlan_mode vlan_mode, int vlan,
-                  unsigned long *trunks, bool use_priority_tags,
+                  enum port_vlan_mode vlan_mode, uint16_t qinq_ethtype,
+                  int vlan, unsigned long *trunks, unsigned long *cvlans,
+                  bool use_priority_tags,
                   const struct bond *bond, const struct lacp *lacp,
                   bool floodable)
 {
     ovs_assert(xbundle->xbridge);
 
     xbundle->vlan_mode = vlan_mode;
+    xbundle->qinq_ethtype = qinq_ethtype;
     xbundle->vlan = vlan;
     xbundle->trunks = trunks;
+    xbundle->cvlans = cvlans;
     xbundle->use_priority_tags = use_priority_tags;
     xbundle->floodable = floodable;
 
@@ -838,8 +864,8 @@  xlate_xbundle_copy(struct xbridge *xbridge, struct xbundle *xbundle)
     new_xbundle->name = xstrdup(xbundle->name);
     xlate_xbundle_init(new_xcfg, new_xbundle);
 
-    xlate_xbundle_set(new_xbundle, xbundle->vlan_mode,
-                      xbundle->vlan, xbundle->trunks,
+    xlate_xbundle_set(new_xbundle, xbundle->vlan_mode, xbundle->qinq_ethtype,
+                      xbundle->vlan, xbundle->trunks, xbundle->cvlans,
                       xbundle->use_priority_tags, xbundle->bond, xbundle->lacp,
                       xbundle->floodable);
     LIST_FOR_EACH (xport, bundle_node, &xbundle->xports) {
@@ -1032,8 +1058,10 @@  xlate_remove_ofproto(struct ofproto_dpif *ofproto)
 
 void
 xlate_bundle_set(struct ofproto_dpif *ofproto, struct ofbundle *ofbundle,
-                 const char *name, enum port_vlan_mode vlan_mode, int vlan,
-                 unsigned long *trunks, bool use_priority_tags,
+                 const char *name, enum port_vlan_mode vlan_mode,
+                 uint16_t qinq_ethtype, int vlan,
+                 unsigned long *trunks, unsigned long *cvlans,
+                 bool use_priority_tags,
                  const struct bond *bond, const struct lacp *lacp,
                  bool floodable)
 {
@@ -1053,7 +1081,7 @@  xlate_bundle_set(struct ofproto_dpif *ofproto, struct ofbundle *ofbundle,
     free(xbundle->name);
     xbundle->name = xstrdup(name);
 
-    xlate_xbundle_set(xbundle, vlan_mode, vlan, trunks,
+    xlate_xbundle_set(xbundle, vlan_mode, qinq_ethtype, vlan, trunks, cvlans,
                       use_priority_tags, bond, lacp, floodable);
 }
 
@@ -1586,9 +1614,30 @@  xbundle_trunks_vlan(const struct xbundle *bundle, uint16_t vlan)
 }
 
 static bool
-xbundle_includes_vlan(const struct xbundle *xbundle, uint16_t vlan)
+xbundle_allows_cvlan(const struct xbundle *bundle, uint16_t vlan)
 {
-    return vlan == xbundle->vlan || xbundle_trunks_vlan(xbundle, vlan);
+    return (!bundle->cvlans || bitmap_is_set(bundle->cvlans, vlan));
+}
+
+static bool
+xbundle_includes_vlan(const struct xbundle *xbundle, const struct xvlan *xvlan)
+{
+    switch (xbundle->vlan_mode) {
+    case PORT_VLAN_ACCESS:
+        return xvlan[0].vid == xbundle->vlan && xvlan[1].vid == 0;
+
+    case PORT_VLAN_TRUNK:
+    case PORT_VLAN_NATIVE_UNTAGGED:
+    case PORT_VLAN_NATIVE_TAGGED:
+        return xbundle_trunks_vlan(xbundle, xvlan[0].vid);
+
+    case PORT_VLAN_DOT1Q_TUNNEL:
+        return xvlan[0].vid == xbundle->vlan &&
+               xbundle_allows_cvlan(xbundle, xvlan[1].vid);
+
+    default:
+        OVS_NOT_REACHED();
+    }
 }
 
 static mirror_mask_t
@@ -1669,11 +1718,13 @@  mirror_packet(struct xlate_ctx *ctx, struct xbundle *xbundle,
     /* Figure out what VLAN the packet is in (because mirrors can select
      * packets on basis of VLAN). */
     bool warn = ctx->xin->packet != NULL;
-    uint16_t vid = vlan_tci_to_vid(ctx->xin->flow.vlan_tci);
-    if (!input_vid_is_valid(vid, xbundle, warn)) {
+    struct xvlan in_xvlan[FLOW_MAX_VLAN_HEADERS];
+    struct xvlan xvlan[FLOW_MAX_VLAN_HEADERS];
+    xvlan_extract(&ctx->xin->flow, in_xvlan);
+    if (!input_vid_is_valid(in_xvlan[0].vid, xbundle, warn)) {
         return;
     }
-    uint16_t vlan = input_vid_to_vlan(xbundle, vid);
+    xvlan_input_translate(xbundle, in_xvlan, xvlan);
 
     const struct xbridge *xbridge = ctx->xbridge;
 
@@ -1715,9 +1766,9 @@  mirror_packet(struct xlate_ctx *ctx, struct xbundle *xbundle,
         /* If this mirror selects on the basis of VLAN, and it does not select
          * 'vlan', then discard this mirror and go on to the next one. */
         if (vlans) {
-            ctx->wc->masks.vlan_tci |= htons(VLAN_CFI | VLAN_VID_MASK);
+            ctx->wc->masks.vlan[0].tci |= htons(VLAN_CFI | VLAN_VID_MASK);
         }
-        if (vlans && !bitmap_is_set(vlans, vlan)) {
+        if (vlans && !bitmap_is_set(vlans, xvlan[0].vid)) {
             mirrors = zero_rightmost_1bit(mirrors);
             continue;
         }
@@ -1734,18 +1785,21 @@  mirror_packet(struct xlate_ctx *ctx, struct xbundle *xbundle,
             struct xlate_cfg *xcfg = ovsrcu_get(struct xlate_cfg *, &xcfgp);
             struct xbundle *out_xbundle = xbundle_lookup(xcfg, out);
             if (out_xbundle) {
-                output_normal(ctx, out_xbundle, vlan);
+                output_normal(ctx, out_xbundle, xvlan);
             }
-        } else if (vlan != out_vlan
+        } else if (xvlan[0].vid != out_vlan
                    && !eth_addr_is_reserved(ctx->xin->flow.dl_dst)) {
             struct xbundle *xbundle;
+            uint16_t old_vid = xvlan[0].vid;
 
+            xvlan[0].vid = out_vlan;
             LIST_FOR_EACH (xbundle, list_node, &xbridge->xbundles) {
-                if (xbundle_includes_vlan(xbundle, out_vlan)
+                if (xbundle_includes_vlan(xbundle, xvlan)
                     && !xbundle_mirror_out(xbridge, xbundle)) {
-                    output_normal(ctx, xbundle, out_vlan);
+                    output_normal(ctx, xbundle, xvlan);
                 }
             }
+            xvlan[0].vid = old_vid;
         }
 
         /* output_normal() could have recursively output (to different
@@ -1769,32 +1823,6 @@  mirror_ingress_packet(struct xlate_ctx *ctx)
     }
 }
 
-/* Given 'vid', the VID obtained from the 802.1Q header that was received as
- * part of a packet (specify 0 if there was no 802.1Q header), and 'in_xbundle',
- * the bundle on which the packet was received, returns the VLAN to which the
- * packet belongs.
- *
- * Both 'vid' and the return value are in the range 0...4095. */
-static uint16_t
-input_vid_to_vlan(const struct xbundle *in_xbundle, uint16_t vid)
-{
-    switch (in_xbundle->vlan_mode) {
-    case PORT_VLAN_ACCESS:
-        return in_xbundle->vlan;
-        break;
-
-    case PORT_VLAN_TRUNK:
-        return vid;
-
-    case PORT_VLAN_NATIVE_UNTAGGED:
-    case PORT_VLAN_NATIVE_TAGGED:
-        return vid ? vid : in_xbundle->vlan;
-
-    default:
-        OVS_NOT_REACHED();
-    }
-}
-
 /* Checks whether a packet with the given 'vid' may ingress on 'in_xbundle'.
  * If so, returns true.  Otherwise, returns false and, if 'warn' is true, logs
  * a warning.
@@ -1832,7 +1860,7 @@  input_vid_is_valid(uint16_t vid, struct xbundle *in_xbundle, bool warn)
         }
         /* Fall through. */
     case PORT_VLAN_TRUNK:
-        if (!xbundle_includes_vlan(in_xbundle, vid)) {
+        if (!xbundle_trunks_vlan(in_xbundle, vid)) {
             if (warn) {
                 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
                 VLOG_WARN_RL(&rl, "dropping VLAN %"PRIu16" packet "
@@ -1843,50 +1871,182 @@  input_vid_is_valid(uint16_t vid, struct xbundle *in_xbundle, bool warn)
         }
         return true;
 
+    case PORT_VLAN_DOT1Q_TUNNEL:
+        if (!xbundle_allows_cvlan(in_xbundle, vid)) {
+            if (warn) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+                VLOG_WARN_RL(&rl, "dropping VLAN %"PRIu16" packet received "
+                             "on port %s not configured for dot1q tunneling"
+                             "VLAN %"PRIu16, vid, in_xbundle->name, vid);
+            }
+            return false;
+        }
+        return true;
+
     default:
         OVS_NOT_REACHED();
     }
 
 }
 
-/* Given 'vlan', the VLAN that a packet belongs to, and
- * 'out_xbundle', a bundle on which the packet is to be output, returns the VID
- * that should be included in the 802.1Q header.  (If the return value is 0,
- * then the 802.1Q header should only be included in the packet if there is a
- * nonzero PCP.)
- *
- * Both 'vlan' and the return value are in the range 0...4095. */
-static uint16_t
-output_vlan_to_vid(const struct xbundle *out_xbundle, uint16_t vlan)
+static void
+xvlan_copy(struct xvlan *dst, const struct xvlan *src)
+{
+    memcpy(dst, src, sizeof(*dst) * FLOW_MAX_VLAN_HEADERS);
+}
+
+static void
+xvlan_copy_pop(struct xvlan *dst, const struct xvlan *src)
+{
+    memcpy(dst, src + 1, sizeof(*dst) * (FLOW_MAX_VLAN_HEADERS - 1));
+    memset(&dst[FLOW_MAX_VLAN_HEADERS - 1], 0, sizeof(*dst));
+}
+
+static void
+xvlan_copy_shift(struct xvlan *dst, const struct xvlan *src)
+{
+    memcpy(dst + 1, src, sizeof(*dst) * (FLOW_MAX_VLAN_HEADERS - 1));
+    memset(&dst[0], 0, sizeof(*dst));
+}
+
+/* Extract VLAN information (headers) from flow */
+static void
+xvlan_extract(const struct flow *flow, struct xvlan *xvlan)
+{
+    int i;
+    memset(xvlan, 0, sizeof(struct xvlan) * FLOW_MAX_VLAN_HEADERS);
+    for (i = 0; i < FLOW_MAX_VLAN_HEADERS; i++) {
+        if (!eth_type_vlan(flow->vlan[i].tpid) ||
+                !(flow->vlan[i].tci & htons(VLAN_CFI))) {
+            break;
+        }
+        xvlan[i].tpid = htons(flow->vlan[i].tpid);
+        xvlan[i].vid = vlan_tci_to_vid(flow->vlan[i].tci);
+        xvlan[i].pcp = flow->vlan[i].tci & htons(VLAN_PCP_MASK);
+    }
+}
+
+/* Put VLAN information (headers) to flow */
+static void
+xvlan_put(struct flow *flow, const struct xvlan *xvlan)
+{
+    ovs_be16 tci;
+    int i;
+    for (i = 0; i < FLOW_MAX_VLAN_HEADERS; i++) {
+        tci = htons(xvlan[i].vid) | (xvlan[i].pcp & htons(VLAN_PCP_MASK));
+        if (tci) {
+            tci |= htons(VLAN_CFI);
+            flow->vlan[i].tpid = xvlan[i].tpid ?
+                                 htons(xvlan[i].tpid) : htons(ETH_TYPE_VLAN);
+        }
+        flow->vlan[i].tci = tci;
+    }
+}
+
+/* Given 'in_xvlan', extracted from the input 802.1Q headers received as part
+ * of a packet, and 'in_xbundle', the bundle on which the packet was received,
+ * returns the VLANs of the packet during bridge internal processing. */
+static void
+xvlan_input_translate(const struct xbundle *in_xbundle,
+                      const struct xvlan *in_xvlan, struct xvlan *xvlan)
+{
+
+    switch (in_xbundle->vlan_mode) {
+    case PORT_VLAN_ACCESS:
+        memset(xvlan, 0, sizeof(*xvlan) * FLOW_MAX_VLAN_HEADERS);
+        xvlan->tpid = in_xvlan->tpid? in_xvlan->tpid: ETH_TYPE_VLAN;
+        xvlan->vid = in_xbundle->vlan;
+        xvlan->pcp = in_xvlan->pcp;
+        return;
+
+    case PORT_VLAN_TRUNK:
+        xvlan_copy(xvlan, in_xvlan);
+        return;
+
+    case PORT_VLAN_NATIVE_UNTAGGED:
+    case PORT_VLAN_NATIVE_TAGGED:
+        xvlan_copy(xvlan, in_xvlan);
+        if (!in_xvlan->vid) {
+            xvlan->tpid = in_xvlan->tpid? in_xvlan->tpid: ETH_TYPE_VLAN;
+            xvlan->vid = in_xbundle->vlan;
+            xvlan->pcp = in_xvlan->pcp;
+        }
+        return;
+
+    case PORT_VLAN_DOT1Q_TUNNEL:
+        xvlan_copy_shift(xvlan, in_xvlan);
+        xvlan->tpid = in_xbundle->qinq_ethtype;
+        xvlan->vid = in_xbundle->vlan;
+        xvlan->pcp = htons(0);
+        return;
+
+    default:
+        OVS_NOT_REACHED();
+    }
+}
+
+/* Given 'xvlan', the VLANs of a packet during internal processing, and
+ * 'out_xbundle', a bundle on which the packet is to be output, returns the
+ * VLANs that should be included in output packet. */
+static void
+xvlan_output_translate(const struct xbundle *out_xbundle,
+                       const struct xvlan *xvlan, struct xvlan *out_xvlan)
 {
     switch (out_xbundle->vlan_mode) {
     case PORT_VLAN_ACCESS:
-        return 0;
+        memset(out_xvlan, 0, sizeof(*out_xvlan) * FLOW_MAX_VLAN_HEADERS);
+        return;
 
     case PORT_VLAN_TRUNK:
     case PORT_VLAN_NATIVE_TAGGED:
-        return vlan;
+        xvlan_copy(out_xvlan, xvlan);
+        return;
 
     case PORT_VLAN_NATIVE_UNTAGGED:
-        return vlan == out_xbundle->vlan ? 0 : vlan;
+        if (xvlan->vid == out_xbundle->vlan) {
+            xvlan_copy_pop(out_xvlan, xvlan);
+        } else {
+            xvlan_copy(out_xvlan, xvlan);
+        }
+        return;
+
+    case PORT_VLAN_DOT1Q_TUNNEL:
+        xvlan_copy_pop(out_xvlan, xvlan);
+        return;
 
     default:
         OVS_NOT_REACHED();
     }
 }
 
+/* If output xbundle is dot1q-tunnel, set mask bits of cvlan */
+static void
+check_and_set_cvlan_mask(struct flow_wildcards *wc,
+                         const struct xbundle *xbundle)
+{
+    if (xbundle->vlan_mode == PORT_VLAN_DOT1Q_TUNNEL && xbundle->cvlans) {
+        wc->masks.vlan[1].tci = htons(0xffff);
+    }
+}
+
 static void
 output_normal(struct xlate_ctx *ctx, const struct xbundle *out_xbundle,
-              uint16_t vlan)
+              const struct xvlan *xvlan)
 {
-    ovs_be16 *flow_tci = &ctx->xin->flow.vlan_tci;
     uint16_t vid;
-    ovs_be16 tci, old_tci;
+    struct flow_vlan_hdr old_vlan[FLOW_MAX_VLAN_HEADERS];
     struct xport *xport;
     struct xlate_bond_recirc xr;
     bool use_recirc = false;
+    struct xvlan out_xvlan[FLOW_MAX_VLAN_HEADERS];
+
+    check_and_set_cvlan_mask(ctx->wc, out_xbundle);
 
-    vid = output_vlan_to_vid(out_xbundle, vlan);
+    xvlan_output_translate(out_xbundle, xvlan, out_xvlan);
+    if (out_xbundle->use_priority_tags) {
+        out_xvlan[0].pcp = ctx->xin->flow.vlan[0].tci & htons(VLAN_PCP_MASK);
+    }
+    vid = out_xvlan[0].vid;
     if (ovs_list_is_empty(&out_xbundle->xports)) {
         /* Partially configured bundle with no slaves.  Drop the packet. */
         return;
@@ -1941,18 +2101,11 @@  output_normal(struct xlate_ctx *ctx, const struct xbundle *out_xbundle,
         }
     }
 
-    old_tci = *flow_tci;
-    tci = htons(vid);
-    if (tci || out_xbundle->use_priority_tags) {
-        tci |= *flow_tci & htons(VLAN_PCP_MASK);
-        if (tci) {
-            tci |= htons(VLAN_CFI);
-        }
-    }
-    *flow_tci = tci;
+    memcpy(&old_vlan, &ctx->xin->flow.vlan, sizeof(old_vlan));
+    xvlan_put(&ctx->xin->flow, out_xvlan);
 
     compose_output_action(ctx, xport->ofp_port, use_recirc ? &xr : NULL);
-    *flow_tci = old_tci;
+    memcpy(&ctx->xin->flow.vlan, &old_vlan, sizeof(old_vlan));
 }
 
 /* A VM broadcasts a gratuitous ARP to indicate that it has resumed after
@@ -2287,7 +2440,8 @@  static void
 xlate_normal_mcast_send_group(struct xlate_ctx *ctx,
                               struct mcast_snooping *ms OVS_UNUSED,
                               struct mcast_group *grp,
-                              struct xbundle *in_xbundle, uint16_t vlan)
+                              struct xbundle *in_xbundle,
+                              const struct xvlan *xvlan)
     OVS_REQ_RDLOCK(ms->rwlock)
 {
     struct xlate_cfg *xcfg;
@@ -2299,7 +2453,7 @@  xlate_normal_mcast_send_group(struct xlate_ctx *ctx,
         mcast_xbundle = xbundle_lookup(xcfg, b->port);
         if (mcast_xbundle && mcast_xbundle != in_xbundle) {
             xlate_report(ctx, "forwarding to mcast group port");
-            output_normal(ctx, mcast_xbundle, vlan);
+            output_normal(ctx, mcast_xbundle, xvlan);
         } else if (!mcast_xbundle) {
             xlate_report(ctx, "mcast group port is unknown, dropping");
         } else {
@@ -2312,7 +2466,8 @@  xlate_normal_mcast_send_group(struct xlate_ctx *ctx,
 static void
 xlate_normal_mcast_send_mrouters(struct xlate_ctx *ctx,
                                  struct mcast_snooping *ms,
-                                 struct xbundle *in_xbundle, uint16_t vlan)
+                                 struct xbundle *in_xbundle,
+                                 const struct xvlan *xvlan)
     OVS_REQ_RDLOCK(ms->rwlock)
 {
     struct xlate_cfg *xcfg;
@@ -2324,7 +2479,7 @@  xlate_normal_mcast_send_mrouters(struct xlate_ctx *ctx,
         mcast_xbundle = xbundle_lookup(xcfg, mrouter->port);
         if (mcast_xbundle && mcast_xbundle != in_xbundle) {
             xlate_report(ctx, "forwarding to mcast router port");
-            output_normal(ctx, mcast_xbundle, vlan);
+            output_normal(ctx, mcast_xbundle, xvlan);
         } else if (!mcast_xbundle) {
             xlate_report(ctx, "mcast router port is unknown, dropping");
         } else {
@@ -2337,7 +2492,8 @@  xlate_normal_mcast_send_mrouters(struct xlate_ctx *ctx,
 static void
 xlate_normal_mcast_send_fports(struct xlate_ctx *ctx,
                                struct mcast_snooping *ms,
-                               struct xbundle *in_xbundle, uint16_t vlan)
+                               struct xbundle *in_xbundle,
+                               const struct xvlan *xvlan)
     OVS_REQ_RDLOCK(ms->rwlock)
 {
     struct xlate_cfg *xcfg;
@@ -2349,7 +2505,7 @@  xlate_normal_mcast_send_fports(struct xlate_ctx *ctx,
         mcast_xbundle = xbundle_lookup(xcfg, fport->port);
         if (mcast_xbundle && mcast_xbundle != in_xbundle) {
             xlate_report(ctx, "forwarding to mcast flood port");
-            output_normal(ctx, mcast_xbundle, vlan);
+            output_normal(ctx, mcast_xbundle, xvlan);
         } else if (!mcast_xbundle) {
             xlate_report(ctx, "mcast flood port is unknown, dropping");
         } else {
@@ -2362,7 +2518,8 @@  xlate_normal_mcast_send_fports(struct xlate_ctx *ctx,
 static void
 xlate_normal_mcast_send_rports(struct xlate_ctx *ctx,
                                struct mcast_snooping *ms,
-                               struct xbundle *in_xbundle, uint16_t vlan)
+                               struct xbundle *in_xbundle,
+                               const struct xvlan *xvlan)
     OVS_REQ_RDLOCK(ms->rwlock)
 {
     struct xlate_cfg *xcfg;
@@ -2374,7 +2531,7 @@  xlate_normal_mcast_send_rports(struct xlate_ctx *ctx,
         mcast_xbundle = xbundle_lookup(xcfg, rport->port);
         if (mcast_xbundle && mcast_xbundle != in_xbundle) {
             xlate_report(ctx, "forwarding Report to mcast flagged port");
-            output_normal(ctx, mcast_xbundle, vlan);
+            output_normal(ctx, mcast_xbundle, xvlan);
         } else if (!mcast_xbundle) {
             xlate_report(ctx, "mcast port is unknown, dropping the Report");
         } else {
@@ -2385,16 +2542,16 @@  xlate_normal_mcast_send_rports(struct xlate_ctx *ctx,
 
 static void
 xlate_normal_flood(struct xlate_ctx *ctx, struct xbundle *in_xbundle,
-                   uint16_t vlan)
+                   struct xvlan *xvlan)
 {
     struct xbundle *xbundle;
 
     LIST_FOR_EACH (xbundle, list_node, &ctx->xbridge->xbundles) {
         if (xbundle != in_xbundle
-            && xbundle_includes_vlan(xbundle, vlan)
+            && xbundle_includes_vlan(xbundle, xvlan)
             && xbundle->floodable
             && !xbundle_mirror_out(ctx->xbridge, xbundle)) {
-            output_normal(ctx, xbundle, vlan);
+            output_normal(ctx, xbundle, xvlan);
         }
     }
     ctx->nf_output_iface = NF_OUT_FLOOD;
@@ -2423,12 +2580,13 @@  xlate_normal(struct xlate_ctx *ctx)
     struct xport *in_port;
     struct mac_entry *mac;
     void *mac_port;
+    struct xvlan in_xvlan[FLOW_MAX_VLAN_HEADERS];
+    struct xvlan xvlan[FLOW_MAX_VLAN_HEADERS];
     uint16_t vlan;
-    uint16_t vid;
 
     memset(&wc->masks.dl_src, 0xff, sizeof wc->masks.dl_src);
     memset(&wc->masks.dl_dst, 0xff, sizeof wc->masks.dl_dst);
-    wc->masks.vlan_tci |= htons(VLAN_VID_MASK | VLAN_CFI);
+    wc->masks.vlan[0].tci |= htons(VLAN_VID_MASK | VLAN_CFI);
 
     in_xbundle = lookup_input_bundle(ctx->xbridge, flow->in_port.ofp_port,
                                      ctx->xin->packet != NULL, &in_port);
@@ -2438,8 +2596,8 @@  xlate_normal(struct xlate_ctx *ctx)
     }
 
     /* Drop malformed frames. */
-    if (flow->dl_type == htons(ETH_TYPE_VLAN) &&
-        !(flow->vlan_tci & htons(VLAN_CFI))) {
+    if (eth_type_vlan(flow->dl_type) &&
+        !(flow->vlan[0].tci & htons(VLAN_CFI))) {
         if (ctx->xin->packet != NULL) {
             static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
             VLOG_WARN_RL(&rl, "bridge %s: dropping packet with partial "
@@ -2463,12 +2621,14 @@  xlate_normal(struct xlate_ctx *ctx)
     }
 
     /* Check VLAN. */
-    vid = vlan_tci_to_vid(flow->vlan_tci);
-    if (!input_vid_is_valid(vid, in_xbundle, ctx->xin->packet != NULL)) {
+    xvlan_extract(flow, in_xvlan);
+    if (!input_vid_is_valid(in_xvlan[0].vid, in_xbundle,
+                            ctx->xin->packet != NULL)) {
         xlate_report(ctx, "disallowed VLAN VID for this input port, dropping");
         return;
     }
-    vlan = input_vid_to_vlan(in_xbundle, vid);
+    xvlan_input_translate(in_xbundle, in_xvlan, xvlan);
+    vlan = xvlan[0].vid;
 
     /* Check other admissibility requirements. */
     if (in_port && !is_admissible(ctx, in_port, vlan)) {
@@ -2515,7 +2675,7 @@  xlate_normal(struct xlate_ctx *ctx)
 
             if (mcast_snooping_is_membership(flow->tp_src)) {
                 ovs_rwlock_rdlock(&ms->rwlock);
-                xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, vlan);
+                xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, xvlan);
                 /* RFC4541: section 2.1.1, item 1: A snooping switch should
                  * forward IGMP Membership Reports only to those ports where
                  * multicast routers are attached.  Alternatively stated: a
@@ -2524,11 +2684,11 @@  xlate_normal(struct xlate_ctx *ctx)
                  * An administrative control may be provided to override this
                  * restriction, allowing the report messages to be flooded to
                  * other ports. */
-                xlate_normal_mcast_send_rports(ctx, ms, in_xbundle, vlan);
+                xlate_normal_mcast_send_rports(ctx, ms, in_xbundle, xvlan);
                 ovs_rwlock_unlock(&ms->rwlock);
             } else {
                 xlate_report(ctx, "multicast traffic, flooding");
-                xlate_normal_flood(ctx, in_xbundle, vlan);
+                xlate_normal_flood(ctx, in_xbundle, xvlan);
             }
             return;
         } else if (is_mld(flow, wc)) {
@@ -2539,12 +2699,12 @@  xlate_normal(struct xlate_ctx *ctx)
             }
             if (is_mld_report(flow, wc)) {
                 ovs_rwlock_rdlock(&ms->rwlock);
-                xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, vlan);
-                xlate_normal_mcast_send_rports(ctx, ms, in_xbundle, vlan);
+                xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, xvlan);
+                xlate_normal_mcast_send_rports(ctx, ms, in_xbundle, xvlan);
                 ovs_rwlock_unlock(&ms->rwlock);
             } else {
                 xlate_report(ctx, "MLD query, flooding");
-                xlate_normal_flood(ctx, in_xbundle, vlan);
+                xlate_normal_flood(ctx, in_xbundle, xvlan);
             }
         } else {
             if (is_ip_local_multicast(flow, wc)) {
@@ -2552,7 +2712,7 @@  xlate_normal(struct xlate_ctx *ctx)
                  * address in the 224.0.0.x range which are not IGMP must
                  * be forwarded on all ports */
                 xlate_report(ctx, "RFC4541: section 2.1.2, item 2, flooding");
-                xlate_normal_flood(ctx, in_xbundle, vlan);
+                xlate_normal_flood(ctx, in_xbundle, xvlan);
                 return;
             }
         }
@@ -2565,16 +2725,16 @@  xlate_normal(struct xlate_ctx *ctx)
             grp = mcast_snooping_lookup(ms, &flow->ipv6_dst, vlan);
         }
         if (grp) {
-            xlate_normal_mcast_send_group(ctx, ms, grp, in_xbundle, vlan);
-            xlate_normal_mcast_send_fports(ctx, ms, in_xbundle, vlan);
-            xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, vlan);
+            xlate_normal_mcast_send_group(ctx, ms, grp, in_xbundle, xvlan);
+            xlate_normal_mcast_send_fports(ctx, ms, in_xbundle, xvlan);
+            xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, xvlan);
         } else {
             if (mcast_snooping_flood_unreg(ms)) {
                 xlate_report(ctx, "unregistered multicast, flooding");
-                xlate_normal_flood(ctx, in_xbundle, vlan);
+                xlate_normal_flood(ctx, in_xbundle, xvlan);
             } else {
-                xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, vlan);
-                xlate_normal_mcast_send_fports(ctx, ms, in_xbundle, vlan);
+                xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, xvlan);
+                xlate_normal_mcast_send_fports(ctx, ms, in_xbundle, xvlan);
             }
         }
         ovs_rwlock_unlock(&ms->rwlock);
@@ -2589,7 +2749,7 @@  xlate_normal(struct xlate_ctx *ctx)
             struct xbundle *mac_xbundle = xbundle_lookup(xcfg, mac_port);
             if (mac_xbundle && mac_xbundle != in_xbundle) {
                 xlate_report(ctx, "forwarding to learned port");
-                output_normal(ctx, mac_xbundle, vlan);
+                output_normal(ctx, mac_xbundle, xvlan);
             } else if (!mac_xbundle) {
                 xlate_report(ctx, "learned port is unknown, dropping");
             } else {
@@ -2597,7 +2757,7 @@  xlate_normal(struct xlate_ctx *ctx)
             }
         } else {
             xlate_report(ctx, "no learned MAC for destination, flooding");
-            xlate_normal_flood(ctx, in_xbundle, vlan);
+            xlate_normal_flood(ctx, in_xbundle, xvlan);
         }
     }
 }
@@ -2726,7 +2886,7 @@  fix_sflow_action(struct xlate_ctx *ctx, unsigned int user_cookie_offset)
     ovs_assert(cookie->type == USER_ACTION_COOKIE_SFLOW);
 
     cookie->type = USER_ACTION_COOKIE_SFLOW;
-    cookie->sflow.vlan_tci = base->vlan_tci;
+    cookie->sflow.vlan_tci = base->vlan[0].tci;
 
     /* See http://www.sflow.org/sflow_version_5.txt (search for "Input/output
      * port information") for the interpretation of cookie->output. */
@@ -3007,7 +3167,7 @@  compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
 
     /* If 'struct flow' gets additional metadata, we'll need to zero it out
      * before traversing a patch port. */
-    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 35);
+    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 36);
     memset(&flow_tnl, 0, sizeof flow_tnl);
 
     if (!xport) {
@@ -3142,7 +3302,7 @@  compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
         return;
     }
 
-    flow_vlan_tci = flow->vlan_tci;
+    flow_vlan_tci = flow->vlan[0].tci;
     flow_pkt_mark = flow->pkt_mark;
     flow_nw_tos = flow->nw_tos;
 
@@ -3270,7 +3430,7 @@  compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
 
  out:
     /* Restore flow */
-    flow->vlan_tci = flow_vlan_tci;
+    flow->vlan[0].tci = flow_vlan_tci;
     flow->pkt_mark = flow_pkt_mark;
     flow->nw_tos = flow_nw_tos;
 }
@@ -4800,34 +4960,44 @@  do_xlate_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
             break;
 
         case OFPACT_SET_VLAN_VID:
-            wc->masks.vlan_tci |= htons(VLAN_VID_MASK | VLAN_CFI);
-            if (flow->vlan_tci & htons(VLAN_CFI) ||
+            wc->masks.vlan[0].tci |= htons(VLAN_VID_MASK | VLAN_CFI);
+            if (flow->vlan[0].tci & htons(VLAN_CFI) ||
                 ofpact_get_SET_VLAN_VID(a)->push_vlan_if_needed) {
-                flow->vlan_tci &= ~htons(VLAN_VID_MASK);
-                flow->vlan_tci |= (htons(ofpact_get_SET_VLAN_VID(a)->vlan_vid)
-                                   | htons(VLAN_CFI));
+                if (!flow->vlan[0].tpid) {
+                    flow->vlan[0].tpid = htons(ETH_TYPE_VLAN);
+                }
+                flow->vlan[0].tci &= ~htons(VLAN_VID_MASK);
+                flow->vlan[0].tci |=
+                    (htons(ofpact_get_SET_VLAN_VID(a)->vlan_vid) |
+                     htons(VLAN_CFI));
             }
             break;
 
         case OFPACT_SET_VLAN_PCP:
-            wc->masks.vlan_tci |= htons(VLAN_PCP_MASK | VLAN_CFI);
-            if (flow->vlan_tci & htons(VLAN_CFI) ||
+            wc->masks.vlan[0].tci |= htons(VLAN_PCP_MASK | VLAN_CFI);
+            if (flow->vlan[0].tci & htons(VLAN_CFI) ||
                 ofpact_get_SET_VLAN_PCP(a)->push_vlan_if_needed) {
-                flow->vlan_tci &= ~htons(VLAN_PCP_MASK);
-                flow->vlan_tci |= htons((ofpact_get_SET_VLAN_PCP(a)->vlan_pcp
-                                         << VLAN_PCP_SHIFT) | VLAN_CFI);
+                if (!flow->vlan[0].tpid) {
+                    flow->vlan[0].tpid = htons(ETH_TYPE_VLAN);
+                }
+                flow->vlan[0].tci &= ~htons(VLAN_PCP_MASK);
+                flow->vlan[0].tci |=
+                    htons((ofpact_get_SET_VLAN_PCP(a)->vlan_pcp
+                           << VLAN_PCP_SHIFT) | VLAN_CFI);
             }
             break;
 
         case OFPACT_STRIP_VLAN:
-            memset(&wc->masks.vlan_tci, 0xff, sizeof wc->masks.vlan_tci);
-            flow->vlan_tci = htons(0);
+            memset(&wc->masks.vlan[0], 0xff, sizeof wc->masks.vlan[0]);
+            flow_pop_vlan(flow);
             break;
 
         case OFPACT_PUSH_VLAN:
-            /* XXX 802.1AD(QinQ) */
-            memset(&wc->masks.vlan_tci, 0xff, sizeof wc->masks.vlan_tci);
-            flow->vlan_tci = htons(VLAN_CFI);
+            flow_shift_vlan(&wc->masks);
+            memset(&wc->masks.vlan[0], 0xff, sizeof wc->masks.vlan[0]);
+            flow_shift_vlan(flow);
+            flow->vlan[0].tpid = ofpact_get_PUSH_VLAN(a)->ethertype;
+            flow->vlan[0].tci = htons(VLAN_CFI);
             break;
 
         case OFPACT_SET_ETH_SRC:
@@ -4931,8 +5101,8 @@  do_xlate_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
             /* Set field action only ever overwrites packet's outermost
              * applicable header fields.  Do nothing if no header exists. */
             if (mf->id == MFF_VLAN_VID) {
-                wc->masks.vlan_tci |= htons(VLAN_CFI);
-                if (!(flow->vlan_tci & htons(VLAN_CFI))) {
+                wc->masks.vlan[0].tci |= htons(VLAN_CFI);
+                if (!(flow->vlan[0].tci & htons(VLAN_CFI))) {
                     break;
                 }
             } else if ((mf->id == MFF_MPLS_LABEL || mf->id == MFF_MPLS_TC)
@@ -5324,6 +5494,8 @@  xlate_wc_init(struct xlate_ctx *ctx)
 static void
 xlate_wc_finish(struct xlate_ctx *ctx)
 {
+    int i;
+
     /* Clear the metadata and register wildcard masks, because we won't
      * use non-header fields as part of the cache. */
     flow_wildcards_clear_non_packet_fields(ctx->wc);
@@ -5343,8 +5515,10 @@  xlate_wc_finish(struct xlate_ctx *ctx)
         ctx->wc->masks.tp_dst &= htons(UINT8_MAX);
     }
     /* VLAN_TCI CFI bit must be matched if any of the TCI is matched. */
-    if (ctx->wc->masks.vlan_tci) {
-        ctx->wc->masks.vlan_tci |= htons(VLAN_CFI);
+    for (i = 0; i < FLOW_MAX_VLAN_HEADERS; i++) {
+        if (ctx->wc->masks.vlan[i].tci) {
+            ctx->wc->masks.vlan[i].tci |= htons(VLAN_CFI);
+        }
     }
 }
 
diff --git a/ofproto/ofproto-dpif-xlate.h b/ofproto/ofproto-dpif-xlate.h
index 7808a60..dac0da5 100644
--- a/ofproto/ofproto-dpif-xlate.h
+++ b/ofproto/ofproto-dpif-xlate.h
@@ -160,8 +160,10 @@  void xlate_ofproto_set(struct ofproto_dpif *, const char *name, struct dpif *,
 void xlate_remove_ofproto(struct ofproto_dpif *);
 
 void xlate_bundle_set(struct ofproto_dpif *, struct ofbundle *,
-                      const char *name, enum port_vlan_mode, int vlan,
-                      unsigned long *trunks, bool use_priority_tags,
+                      const char *name, enum port_vlan_mode,
+                      uint16_t qinq_ethtype, int vlan,
+                      unsigned long *trunks, unsigned long *cvlans,
+                      bool use_priority_tags,
                       const struct bond *, const struct lacp *,
                       bool floodable);
 void xlate_bundle_remove(struct ofbundle *);
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index ce9383a..9d6b8a8 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -132,9 +132,11 @@  struct ofbundle {
     /* Configuration. */
     struct ovs_list ports;      /* Contains "struct ofport"s. */
     enum port_vlan_mode vlan_mode; /* VLAN mode */
+    uint16_t qinq_ethtype;
     int vlan;                   /* -1=trunk port, else a 12-bit VLAN ID. */
     unsigned long *trunks;      /* Bitmap of trunked VLANs, if 'vlan' == -1.
                                  * NULL if all VLANs are trunked. */
+    unsigned long *cvlans;
     struct lacp *lacp;          /* LACP if LACP is enabled, otherwise NULL. */
     struct bond *bond;          /* Nonnull iff more than one port. */
     bool use_priority_tags;     /* Use 802.1p tag for frames in VLAN 0? */
@@ -624,8 +626,9 @@  type_run(const char *type)
 
             HMAP_FOR_EACH (bundle, hmap_node, &ofproto->bundles) {
                 xlate_bundle_set(ofproto, bundle, bundle->name,
-                                 bundle->vlan_mode, bundle->vlan,
-                                 bundle->trunks, bundle->use_priority_tags,
+                                 bundle->vlan_mode, bundle->qinq_ethtype,
+                                 bundle->vlan, bundle->trunks, bundle->cvlans,
+                                 bundle->use_priority_tags,
                                  bundle->bond, bundle->lacp,
                                  bundle->floodable);
             }
@@ -1122,6 +1125,43 @@  check_variable_length_userdata(struct dpif_backer *backer)
     }
 }
 
+/* Tests number of 802.1q VLAN headers supported by 'backer''s datapath.
+ *
+ * Returns the number of elements in a struct flow's vlan
+ * if the datapath supports at least that many VLAN headers. */
+static size_t
+check_max_vlan_headers(struct dpif_backer *backer)
+{
+    struct flow flow;
+    struct odp_flow_key_parms odp_parms = {
+        .flow = &flow,
+        .support = {
+            .max_vlan_headers = SIZE_MAX
+        }
+    };
+    int n;
+
+    memset(&flow, 0, sizeof flow);
+    flow.dl_type = htons(ETH_TYPE_IP);
+    for (n = 0; n < FLOW_MAX_VLAN_HEADERS; n++) {
+        struct odputil_keybuf keybuf;
+        struct ofpbuf key;
+
+        flow_shift_vlan(&flow);
+        flow.vlan[0].tpid = htons(ETH_TYPE_VLAN);
+        flow.vlan[0].tci = htons(1) | htons(VLAN_CFI);
+
+        ofpbuf_use_stack(&key, &keybuf, sizeof keybuf);
+        odp_flow_key_from_flow(&odp_parms, &key);
+        if (!dpif_probe_feature(backer->dpif, "VLAN", &key, NULL)) {
+            break;
+        }
+    }
+
+    VLOG_INFO("%s: VLAN label stack length probed as %d",
+              dpif_name(backer->dpif), n);
+    return n;
+}
 /* Tests the MPLS label stack depth supported by 'backer''s datapath.
  *
  * Returns the number of elements in a struct flow's mpls_lse field
@@ -1318,6 +1358,7 @@  check_support(struct dpif_backer *backer)
     backer->support.variable_length_userdata = false;
 
     backer->support.odp.recirc = check_recirc(backer);
+    backer->support.odp.max_vlan_headers = check_max_vlan_headers(backer);
     backer->support.odp.max_mpls_depth = check_max_mpls_depth(backer);
     backer->support.masked_set_action = check_masked_set_action(backer);
     backer->support.trunc = check_trunc_action(backer);
@@ -2871,6 +2912,7 @@  bundle_destroy(struct ofbundle *bundle)
     hmap_remove(&ofproto->bundles, &bundle->hmap_node);
     free(bundle->name);
     free(bundle->trunks);
+    free(bundle->cvlans);
     lacp_unref(bundle->lacp);
     bond_unref(bundle->bond);
     free(bundle);
@@ -2884,7 +2926,8 @@  bundle_set(struct ofproto *ofproto_, void *aux,
     bool need_flush = false;
     struct ofport_dpif *port;
     struct ofbundle *bundle;
-    unsigned long *trunks;
+    unsigned long *trunks = NULL;
+    unsigned long *cvlans = NULL;
     int vlan;
     size_t i;
     bool ok;
@@ -2909,8 +2952,10 @@  bundle_set(struct ofproto *ofproto_, void *aux,
 
         ovs_list_init(&bundle->ports);
         bundle->vlan_mode = PORT_VLAN_TRUNK;
+        bundle->qinq_ethtype = ETH_TYPE_VLAN_8021AD;
         bundle->vlan = -1;
         bundle->trunks = NULL;
+        bundle->cvlans = NULL;
         bundle->use_priority_tags = s->use_priority_tags;
         bundle->lacp = NULL;
         bundle->bond = NULL;
@@ -2974,6 +3019,11 @@  bundle_set(struct ofproto *ofproto_, void *aux,
         need_flush = true;
     }
 
+    if (s->qinq_ethtype != bundle->qinq_ethtype) {
+        bundle->qinq_ethtype= s->qinq_ethtype;
+        need_flush = true;
+    }
+
     /* Set VLAN tag. */
     vlan = (s->vlan_mode == PORT_VLAN_TRUNK ? -1
             : s->vlan >= 0 && s->vlan <= 4095 ? s->vlan
@@ -3011,6 +3061,10 @@  bundle_set(struct ofproto *ofproto_, void *aux,
         }
         break;
 
+    case PORT_VLAN_DOT1Q_TUNNEL:
+        cvlans = CONST_CAST(unsigned long *, s->cvlans);
+        break;
+
     default:
         OVS_NOT_REACHED();
     }
@@ -3028,6 +3082,20 @@  bundle_set(struct ofproto *ofproto_, void *aux,
         free(trunks);
     }
 
+    if (!vlan_bitmap_equal(cvlans, bundle->cvlans)) {
+        free(bundle->cvlans);
+        if (cvlans== s->cvlans) {
+            bundle->cvlans= vlan_bitmap_clone(cvlans);
+        } else {
+            bundle->cvlans = cvlans;
+            cvlans = NULL;
+        }
+        need_flush = true;
+    }
+    if (cvlans != s->cvlans) {
+        free(cvlans);
+    }
+
     /* Bonding. */
     if (!ovs_list_is_short(&bundle->ports)) {
         bundle->ofproto->has_bonded_bundles = true;
diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
index a60327a..c2254ab 100644
--- a/ofproto/ofproto.h
+++ b/ofproto/ofproto.h
@@ -383,7 +383,11 @@  enum port_vlan_mode {
     /* Untagged incoming packets are part of 'vlan', as are incoming packets
      * tagged with 'vlan'.  Outgoing packets tagged with 'vlan' are untagged.
      * Other VLANs in 'trunks' are trunked. */
-    PORT_VLAN_NATIVE_UNTAGGED
+    PORT_VLAN_NATIVE_UNTAGGED,
+
+    /* 802.1q tunnel port. Incomming packets are added an outer vlan tag
+     * 'vlan'. If 'cvlans' is set, only allows VLANs in 'cvlans'. */
+    PORT_VLAN_DOT1Q_TUNNEL
 };
 
 /* Configuration of bundles. */
@@ -394,8 +398,10 @@  struct ofproto_bundle_settings {
     size_t n_slaves;
 
     enum port_vlan_mode vlan_mode; /* Selects mode for vlan and trunks */
+    uint16_t qinq_ethtype;
     int vlan;                   /* VLAN VID, except for PORT_VLAN_TRUNK. */
     unsigned long *trunks;      /* vlan_bitmap, except for PORT_VLAN_ACCESS. */
+    unsigned long *cvlans;
     bool use_priority_tags;     /* Use 802.1p tag for frames in VLAN 0? */
 
     struct bond_settings *bond; /* Must be nonnull iff if n_slaves > 1. */
diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index f960ea9..b534f14 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -150,8 +150,9 @@  pinctrl_handle_arp(const struct flow *ip_flow, const struct match *md,
     arp->ar_tha = eth_addr_zero;
     put_16aligned_be32(&arp->ar_tpa, ip_flow->nw_dst);
 
-    if (ip_flow->vlan_tci & htons(VLAN_CFI)) {
-        eth_push_vlan(&packet, htons(ETH_TYPE_VLAN_8021Q), ip_flow->vlan_tci);
+    if (ip_flow->vlan[0].tci & htons(VLAN_CFI)) {
+        eth_push_vlan(&packet, htons(ETH_TYPE_VLAN_8021Q),
+                      ip_flow->vlan[0].tci);
     }
 
     /* Compose actions.
diff --git a/tests/test-classifier.c b/tests/test-classifier.c
index c74c440..c123a5a 100644
--- a/tests/test-classifier.c
+++ b/tests/test-classifier.c
@@ -58,7 +58,7 @@  static bool versioned = false;
     CLS_FIELD(nw_src,            NW_SRC)      \
     CLS_FIELD(nw_dst,            NW_DST)      \
     CLS_FIELD(in_port.ofp_port,  IN_PORT)     \
-    CLS_FIELD(vlan_tci,          VLAN_TCI)    \
+    CLS_FIELD(vlan[0].tci,       VLAN_TCI)    \
     CLS_FIELD(dl_type,           DL_TYPE)     \
     CLS_FIELD(tp_src,            TP_SRC)      \
     CLS_FIELD(tp_dst,            TP_DST)      \
@@ -231,8 +231,8 @@  match(const struct cls_rule *wild_, const struct flow *fixed)
             eq = eth_addr_equal_except(fixed->dl_dst, wild.flow.dl_dst,
                                        wild.wc.masks.dl_dst);
         } else if (f_idx == CLS_F_IDX_VLAN_TCI) {
-            eq = !((fixed->vlan_tci ^ wild.flow.vlan_tci)
-                   & wild.wc.masks.vlan_tci);
+            eq = !((fixed->vlan[0].tci ^ wild.flow.vlan[0].tci)
+                   & wild.wc.masks.vlan[0].tci);
         } else if (f_idx == CLS_F_IDX_TUN_ID) {
             eq = !((fixed->tunnel.tun_id ^ wild.flow.tunnel.tun_id)
                    & wild.wc.masks.tunnel.tun_id);
@@ -427,7 +427,7 @@  compare_classifiers(struct classifier *cls, size_t n_invisible_rules,
         flow.metadata = metadata_values[get_value(&x, N_METADATA_VALUES)];
         flow.in_port.ofp_port = in_port_values[get_value(&x,
                                                    N_IN_PORT_VALUES)];
-        flow.vlan_tci = vlan_tci_values[get_value(&x, N_VLAN_TCI_VALUES)];
+        flow.vlan[0].tci = vlan_tci_values[get_value(&x, N_VLAN_TCI_VALUES)];
         flow.dl_type = dl_type_values[get_value(&x, N_DL_TYPE_VALUES)];
         flow.tp_src = tp_src_values[get_value(&x, N_TP_SRC_VALUES)];
         flow.tp_dst = tp_dst_values[get_value(&x, N_TP_DST_VALUES)];
@@ -688,7 +688,7 @@  make_rule(int wc_fields, int priority, int value_pat)
         } else if (f_idx == CLS_F_IDX_DL_DST) {
             WC_MASK_FIELD(&match.wc, dl_dst);
         } else if (f_idx == CLS_F_IDX_VLAN_TCI) {
-            match.wc.masks.vlan_tci = OVS_BE16_MAX;
+            match.wc.masks.vlan[0].tci = OVS_BE16_MAX;
         } else if (f_idx == CLS_F_IDX_TUN_ID) {
             match.wc.masks.tunnel.tun_id = OVS_BE64_MAX;
         } else if (f_idx == CLS_F_IDX_METADATA) {
@@ -1431,7 +1431,7 @@  benchmark(bool use_wc)
         flow->metadata = metadata_values[get_value(&x, N_METADATA_VALUES)];
         flow->in_port.ofp_port = in_port_values[get_value(&x,
                                                           N_IN_PORT_VALUES)];
-        flow->vlan_tci = vlan_tci_values[get_value(&x, N_VLAN_TCI_VALUES)];
+        flow->vlan[0].tci = vlan_tci_values[get_value(&x, N_VLAN_TCI_VALUES)];
         flow->dl_type = dl_type_values[get_value(&x, N_DL_TYPE_VALUES)];
         flow->tp_src = tp_src_values[get_value(&x, N_TP_SRC_VALUES)];
         flow->tp_dst = tp_dst_values[get_value(&x, N_TP_DST_VALUES)];
@@ -1702,7 +1702,10 @@  test_miniflow(struct ovs_cmdl_context *ctx OVS_UNUSED)
         miniflow = miniflow_create(&flow);
 
         /* Check that the flow equals its miniflow. */
-        assert(miniflow_get_vid(miniflow) == vlan_tci_to_vid(flow.vlan_tci));
+        for (i = 0; i < FLOW_MAX_VLAN_HEADERS; i++) {
+            assert(miniflow_get_vid(miniflow, i) ==
+                   vlan_tci_to_vid(flow.vlan[i].tci));
+        }
         for (i = 0; i < FLOW_U64S; i++) {
             assert(miniflow_get(miniflow, i) == flow_u64[i]);
         }
diff --git a/utilities/ovs-ofctl.c b/utilities/ovs-ofctl.c
index 8b02722..dda334e 100644
--- a/utilities/ovs-ofctl.c
+++ b/utilities/ovs-ofctl.c
@@ -3786,8 +3786,8 @@  ofctl_check_vlan(struct ovs_cmdl_context *ctx)
     enum ofputil_protocol usable_protocols; /* Unused for now. */
 
     match_init_catchall(&match);
-    match.flow.vlan_tci = htons(strtoul(ctx->argv[1], NULL, 16));
-    match.wc.masks.vlan_tci = htons(strtoul(ctx->argv[2], NULL, 16));
+    match.flow.vlan[0].tci = htons(strtoul(ctx->argv[1], NULL, 16));
+    match.wc.masks.vlan[0].tci = htons(strtoul(ctx->argv[2], NULL, 16));
 
     /* Convert to and from string. */
     string_s = match_to_string(&match, OFP_DEFAULT_PRIORITY);
@@ -3798,8 +3798,8 @@  ofctl_check_vlan(struct ovs_cmdl_context *ctx)
         ovs_fatal(0, "%s", error_s);
     }
     printf("%04"PRIx16"/%04"PRIx16"\n",
-           ntohs(fm.match.flow.vlan_tci),
-           ntohs(fm.match.wc.masks.vlan_tci));
+           ntohs(fm.match.flow.vlan[0].tci),
+           ntohs(fm.match.wc.masks.vlan[0].tci));
     free(string_s);
 
     /* Convert to and from NXM. */
@@ -3812,8 +3812,8 @@  ofctl_check_vlan(struct ovs_cmdl_context *ctx)
         printf("%s\n", ofperr_to_string(error));
     } else {
         printf("%04"PRIx16"/%04"PRIx16"\n",
-               ntohs(nxm_match.flow.vlan_tci),
-               ntohs(nxm_match.wc.masks.vlan_tci));
+               ntohs(nxm_match.flow.vlan[0].tci),
+               ntohs(nxm_match.wc.masks.vlan[0].tci));
     }
     free(nxm_s);
     ofpbuf_uninit(&nxm);
@@ -3827,14 +3827,15 @@  ofctl_check_vlan(struct ovs_cmdl_context *ctx)
     if (error) {
         printf("%s\n", ofperr_to_string(error));
     } else {
-        uint16_t vid = ntohs(nxm_match.flow.vlan_tci) &
+        uint16_t vid = ntohs(nxm_match.flow.vlan[0].tci) &
             (VLAN_VID_MASK | VLAN_CFI);
-        uint16_t mask = ntohs(nxm_match.wc.masks.vlan_tci) &
+        uint16_t mask = ntohs(nxm_match.wc.masks.vlan[0].tci) &
             (VLAN_VID_MASK | VLAN_CFI);
 
         printf("%04"PRIx16"/%04"PRIx16",", vid, mask);
-        if (vid && vlan_tci_to_pcp(nxm_match.wc.masks.vlan_tci)) {
-            printf("%02"PRIx8"\n", vlan_tci_to_pcp(nxm_match.flow.vlan_tci));
+        if (vid && vlan_tci_to_pcp(nxm_match.wc.masks.vlan[0].tci)) {
+            printf("%02"PRIx8"\n",
+                   vlan_tci_to_pcp(nxm_match.flow.vlan[0].tci));
         } else {
             printf("--\n");
         }
@@ -3850,8 +3851,8 @@  ofctl_check_vlan(struct ovs_cmdl_context *ctx)
            (of10_raw.wildcards & htonl(OFPFW10_DL_VLAN)) != 0,
            of10_raw.dl_vlan_pcp,
            (of10_raw.wildcards & htonl(OFPFW10_DL_VLAN_PCP)) != 0,
-           ntohs(of10_match.flow.vlan_tci),
-           ntohs(of10_match.wc.masks.vlan_tci));
+           ntohs(of10_match.flow.vlan[0].tci),
+           ntohs(of10_match.wc.masks.vlan[0].tci));
 
     /* Convert to and from OpenFlow 1.1. */
     ofputil_match_to_ofp11_match(&match, &of11_raw);
@@ -3861,8 +3862,8 @@  ofctl_check_vlan(struct ovs_cmdl_context *ctx)
            (of11_raw.wildcards & htonl(OFPFW11_DL_VLAN)) != 0,
            of11_raw.dl_vlan_pcp,
            (of11_raw.wildcards & htonl(OFPFW11_DL_VLAN_PCP)) != 0,
-           ntohs(of11_match.flow.vlan_tci),
-           ntohs(of11_match.wc.masks.vlan_tci));
+           ntohs(of11_match.flow.vlan[0].tci),
+           ntohs(of11_match.wc.masks.vlan[0].tci));
 }
 
 /* "print-error ENUM": Prints the type and code of ENUM for every OpenFlow
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index a5de84e..7915a1e 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -919,6 +919,11 @@  port_configure(struct port *port)
         s.trunks = vlan_bitmap_from_array(cfg->trunks, cfg->n_trunks);
     }
 
+    s.cvlans = NULL;
+    if (cfg->n_cvlans) {
+        s.cvlans = vlan_bitmap_from_array(cfg->cvlans, cfg->n_cvlans);
+    }
+
     /* Get VLAN mode. */
     if (cfg->vlan_mode) {
         if (!strcmp(cfg->vlan_mode, "access")) {
@@ -929,6 +934,8 @@  port_configure(struct port *port)
             s.vlan_mode = PORT_VLAN_NATIVE_TAGGED;
         } else if (!strcmp(cfg->vlan_mode, "native-untagged")) {
             s.vlan_mode = PORT_VLAN_NATIVE_UNTAGGED;
+        } else if (!strcmp(cfg->vlan_mode, "dot1q-tunnel")) {
+            s.vlan_mode = PORT_VLAN_DOT1Q_TUNNEL;
         } else {
             /* This "can't happen" because ovsdb-server should prevent it. */
             VLOG_WARN("port %s: unknown VLAN mode %s, falling "
@@ -938,7 +945,7 @@  port_configure(struct port *port)
     } else {
         if (s.vlan >= 0) {
             s.vlan_mode = PORT_VLAN_ACCESS;
-            if (cfg->n_trunks) {
+            if (cfg->n_trunks || cfg->n_cvlans) {
                 VLOG_WARN("port %s: ignoring trunks in favor of implicit vlan",
                           port->name);
             }
@@ -946,6 +953,24 @@  port_configure(struct port *port)
             s.vlan_mode = PORT_VLAN_TRUNK;
         }
     }
+
+    if (cfg->qinq_ethtype) {
+        if (!strcmp(cfg->qinq_ethtype, "802.1q") ||
+            !strcmp(cfg->qinq_ethtype, "0x8100")) {
+            s.qinq_ethtype = ETH_TYPE_VLAN_8021Q;
+        } else if (!strcmp(cfg->qinq_ethtype, "802.1ad") ||
+                   !strcmp(cfg->qinq_ethtype, "0x88a8")) {
+            s.qinq_ethtype = ETH_TYPE_VLAN_8021AD;
+        } else {
+            /* This "can't happen" because ovsdb-server should prevent it. */
+            VLOG_WARN("port %s: invalid QinQ ethertype %s, falling "
+                      "back to 802.1ad", port->name, cfg->qinq_ethtype);
+            s.qinq_ethtype = ETH_TYPE_VLAN_8021AD;
+        }
+    } else {
+        s.qinq_ethtype = ETH_TYPE_VLAN_8021AD;
+    }
+
     s.use_priority_tags = smap_get_bool(&cfg->other_config, "priority-tags",
                                         false);
 
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 32fdf28..2dbcbc6 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@ 
 {"name": "Open_vSwitch",
  "version": "7.13.0",
- "cksum": "889248633 22774",
+ "cksum": "880902477 23187",
  "tables": {
    "Open_vSwitch": {
      "columns": {
@@ -145,6 +145,11 @@ 
                           "minInteger": 0,
                           "maxInteger": 4095},
                   "min": 0, "max": 4096}},
+       "cvlans": {
+         "type": {"key": {"type": "integer",
+                          "minInteger": 0,
+                          "maxInteger": 4095},
+                  "min": 0, "max": 4096}},
        "tag": {
          "type": {"key": {"type": "integer",
                           "minInteger": 0,
@@ -152,7 +157,12 @@ 
                   "min": 0, "max": 1}},
        "vlan_mode": {
          "type": {"key": {"type": "string",
-           "enum": ["set", ["trunk", "access", "native-tagged", "native-untagged"]]},
+           "enum": ["set", ["trunk", "access", "native-tagged",
+                            "native-untagged", "dot1q-tunnel"]]},
+         "min": 0, "max": 1}},
+       "qinq_ethtype": {
+         "type": {"key": {"type": "string",
+           "enum": ["set", ["802.1q", "802.1ad", "0x88a8", "0x8100"]]},
          "min": 0, "max": 1}},
        "qos": {
          "type": {"key": {"type": "uuid",
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index fed6f56..b6396f9 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -1309,6 +1309,22 @@ 
           exception that a packet that egresses on a native-untagged port in
           the native VLAN will not have an 802.1Q header.
         </dd>
+
+        <dt>dot1q-tunnel</dt>
+        <dd>
+          <p>
+            A dot1q-tunnel port tunnels packets with VLAN specified in
+            <ref column="tag"/> column. That is it adds 802.1Q header, with
+            ethertype and VLAN ID specified in <ref column="qinq-ethtype"/>
+            and <ref column="tag"/>, to both tagged and untagged packets on
+            ingress, and pops dot1q header of this VLAN on egress.
+          </p>
+
+          <p>
+            If <ref column="cvlans"/> is set, only allows packets of these
+            VLANs.
+          </p>
+        </dd>
       </dl>
       <p>
         A packet will only egress through bridge ports that carry the VLAN of
@@ -1332,6 +1348,13 @@ 
         </ul>
       </column>
 
+      <column name="qinq_ethtype">
+        <p>
+          Ethertype of dot1q-tunnel port, could be either "802.1q"(0x8100) or
+          "802.1ad"(0x88a8). Defaults to "802.1ad".
+        </p>
+      </column>
+
       <column name="tag">
         <p>
           For an access port, the port's implicitly tagged VLAN.  For a
@@ -1353,6 +1376,14 @@ 
         </p>
       </column>
 
+      <column name="cvlans">
+        <p>
+          For a trunk-qinq port if specific cvlans are specified only those
+          cvlans are 1ad tunneled, others are dropped. If no cvlans specified
+          explicitly then all cvlans are 1ad tunneled.
+        </p>
+      </column>
+
       <column name="other_config" key="priority-tags"
               type='{"type": "boolean"}'>
         <p>