mbox

[GIT,net-next] Open vSwitch

Message ID 1406851057-1593-1-git-send-email-pshelar@nicira.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch.git net_next_ovs

Message

Pravin B Shelar July 31, 2014, 11:57 p.m. UTC
Following patches introduces flow mask cache. To process any packet
OVS need to apply flow mask to the flow and lookup the flow in flow table.
so packet processing performance is directly dependant on number of entries
in mask list.

Following patch adds mask cache so that we do not need to iterate over
all entries in mask list on every packet. We have seen good performance
improvement with this patch.

Before the mask-cache, a single stream which matched the first mask
got a throughput of about 900K pps. A stream which matched the 20th mask
got a throughput of about 400K pps. After the mask-cache patch, all
streams throughput went back up to 900K pps.

----------------------------------------------------------------

The following changes since commit 2f55daa5464e8dfc8787ec863b6d1094522dbd69:

  net: stmmac: Support devicetree configs for mcast and ucast filter entries (2014-07-31 15:31:02 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch.git net_next_ovs

for you to fetch changes up to 4955f0f9cbefa73577cd30ec262538ffc73dd4c2:

  openvswitch: Introduce flow mask cache. (2014-07-31 15:49:55 -0700)

----------------------------------------------------------------
Pravin B Shelar (3):
      openvswitch: Move table destroy to dp-rcu callback.
      openvswitch: Convert mask list into mask array.
      openvswitch: Introduce flow mask cache.

 net/openvswitch/datapath.c   |   8 +-
 net/openvswitch/flow.h       |   1 -
 net/openvswitch/flow_table.c | 293 +++++++++++++++++++++++++++++++++++++------
 net/openvswitch/flow_table.h |  21 +++-
 4 files changed, 275 insertions(+), 48 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Aug. 2, 2014, 10:16 p.m. UTC | #1
From: Pravin B Shelar <pshelar@nicira.com>
Date: Thu, 31 Jul 2014 16:57:37 -0700

> Following patch adds mask cache so that we do not need to iterate over
> all entries in mask list on every packet. We have seen good performance
> improvement with this patch.

How much have you thought about the DoS'ability of openvswitch's
datastructures?

What are the upper bounds for performance of packet switching?

To be quite honest, a lot of the openvswitch data structures
adjustments that hit my inbox seem to me to only address specific
situations that specific user configurations have run into.

It took us two decades, but we ripped out the ipv4 routing cache
because external entities could provoke unreasonable worst case
behavior in routing lookups.

With openvswitch you guys have a unique opportunity to try and design
all of your features such that they absolutely can use scalable
datastructures from the beginning that provide reasonable performance
in the common case and precise upper bounds for any possible sequence
of incoming packets.

New features tend to blind the developer to the eventual long term
ramifications on performance.  Would you add a new feature if you
could know ahead of time that you'll never be able to design a
datastructure which supports that feature and is not DoS'able by a
remote entity?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pravin B Shelar Aug. 3, 2014, 7:20 p.m. UTC | #2
On Sat, Aug 2, 2014 at 3:16 PM, David Miller <davem@davemloft.net> wrote:
> From: Pravin B Shelar <pshelar@nicira.com>
> Date: Thu, 31 Jul 2014 16:57:37 -0700
>
>> Following patch adds mask cache so that we do not need to iterate over
>> all entries in mask list on every packet. We have seen good performance
>> improvement with this patch.
>
> How much have you thought about the DoS'ability of openvswitch's
> datastructures?
>
This cache is populated by flow lookup in fast path. The mask cache is
fixed in size. Userspace or number of packets can not change its size.
Memory is statically allocated, no garbage collection. So DoS attack
can not exploit this cache to increase ovs memory footprint.

> What are the upper bounds for performance of packet switching?
>
Cache is keyed on packet RSS. Worst case scenario this cache adds one
extra flow-table lookup for the flow if RSS hash matches but packet
belong to different flow (hash collision).
This is designed to be lightweight, stateless cache (does not take any
reference on other data structures) to have least impact on
DoS'ability of Open vSwitch.

> To be quite honest, a lot of the openvswitch data structures
> adjustments that hit my inbox seem to me to only address specific
> situations that specific user configurations have run into.
>

Overall OVS DoS defense has improved since introduction of mega-flow.
Recently introduced OVS feature allows userspace to set multiple
sockets for upcall processing for given vport. This adds fairness by
separating upcall from different flows to a socket. Userspace process
upcall from these sockets in round-robin fashion. This helps to avoid
one port monopolize upcall communication path.

I agree there is scope for improving DoS defense and we will keep
working on this issue.

> It took us two decades, but we ripped out the ipv4 routing cache
> because external entities could provoke unreasonable worst case
> behavior in routing lookups.
>
> With openvswitch you guys have a unique opportunity to try and design
> all of your features such that they absolutely can use scalable
> datastructures from the beginning that provide reasonable performance
> in the common case and precise upper bounds for any possible sequence
> of incoming packets.
>
> New features tend to blind the developer to the eventual long term
> ramifications on performance.  Would you add a new feature if you
> could know ahead of time that you'll never be able to design a
> datastructure which supports that feature and is not DoS'able by a
> remote entity?
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 4, 2014, 4:21 a.m. UTC | #3
From: Pravin Shelar <pshelar@nicira.com>
Date: Sun, 3 Aug 2014 12:20:32 -0700

> On Sat, Aug 2, 2014 at 3:16 PM, David Miller <davem@davemloft.net> wrote:
>> From: Pravin B Shelar <pshelar@nicira.com>
>> Date: Thu, 31 Jul 2014 16:57:37 -0700
>>
>>> Following patch adds mask cache so that we do not need to iterate over
>>> all entries in mask list on every packet. We have seen good performance
>>> improvement with this patch.
>>
>> How much have you thought about the DoS'ability of openvswitch's
>> datastructures?
>>
> This cache is populated by flow lookup in fast path. The mask cache is
> fixed in size. Userspace or number of packets can not change its size.
> Memory is statically allocated, no garbage collection. So DoS attack
> can not exploit this cache to increase ovs memory footprint.

An attacker can construct a packet sequence such that every mask cache
lookup misses, making the cache effectively useless.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pravin B Shelar Aug. 4, 2014, 7:35 p.m. UTC | #4
On Sun, Aug 3, 2014 at 9:21 PM, David Miller <davem@davemloft.net> wrote:
> From: Pravin Shelar <pshelar@nicira.com>
> Date: Sun, 3 Aug 2014 12:20:32 -0700
>
>> On Sat, Aug 2, 2014 at 3:16 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Pravin B Shelar <pshelar@nicira.com>
>>> Date: Thu, 31 Jul 2014 16:57:37 -0700
>>>
>>>> Following patch adds mask cache so that we do not need to iterate over
>>>> all entries in mask list on every packet. We have seen good performance
>>>> improvement with this patch.
>>>
>>> How much have you thought about the DoS'ability of openvswitch's
>>> datastructures?
>>>
>> This cache is populated by flow lookup in fast path. The mask cache is
>> fixed in size. Userspace or number of packets can not change its size.
>> Memory is statically allocated, no garbage collection. So DoS attack
>> can not exploit this cache to increase ovs memory footprint.
>
> An attacker can construct a packet sequence such that every mask cache
> lookup misses, making the cache effectively useless.

Yes, but it does work in normal traffic situations. I have posted
performance numbers in the cover letter.
Under DoS attack as you said attacker need to build sequence of
packets to make cache ineffective. Which results in cache miss and a
full in-kernel flow lookup. Therefore with this cache there is one
more lookup done under DoS. But this is not very different than
current OVS anyways.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 4, 2014, 7:42 p.m. UTC | #5
From: Pravin Shelar <pshelar@nicira.com>
Date: Mon, 4 Aug 2014 12:35:59 -0700

> On Sun, Aug 3, 2014 at 9:21 PM, David Miller <davem@davemloft.net> wrote:
>> From: Pravin Shelar <pshelar@nicira.com>
>> Date: Sun, 3 Aug 2014 12:20:32 -0700
>>
>>> On Sat, Aug 2, 2014 at 3:16 PM, David Miller <davem@davemloft.net> wrote:
>>>> From: Pravin B Shelar <pshelar@nicira.com>
>>>> Date: Thu, 31 Jul 2014 16:57:37 -0700
>>>>
>>>>> Following patch adds mask cache so that we do not need to iterate over
>>>>> all entries in mask list on every packet. We have seen good performance
>>>>> improvement with this patch.
>>>>
>>>> How much have you thought about the DoS'ability of openvswitch's
>>>> datastructures?
>>>>
>>> This cache is populated by flow lookup in fast path. The mask cache is
>>> fixed in size. Userspace or number of packets can not change its size.
>>> Memory is statically allocated, no garbage collection. So DoS attack
>>> can not exploit this cache to increase ovs memory footprint.
>>
>> An attacker can construct a packet sequence such that every mask cache
>> lookup misses, making the cache effectively useless.
> 
> Yes, but it does work in normal traffic situations.

You're basically just reiterating the point I'm trying to make.

Your caches are designed for specific configuration and packet traffic
pattern cases, and can be easily duped into a worse case performance
scenerio by an attacker.

Caches, basically, do not work on the real internet.

Make the fundamental core data structures fast and scalable enough,
rather than bolting caches (which are basically hacks) on top every
time they don't perform to your expectations.

What if you made the full flow lookup fundamentally faster?  Then an
attacker can't do anything about that.  That's a real performance
improvement, one that sustains arbitrary traffic patterns.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Aug. 6, 2014, 10:55 p.m. UTC | #6
On Mon, Aug 4, 2014 at 12:42 PM, David Miller <davem@davemloft.net> wrote:
> From: Pravin Shelar <pshelar@nicira.com>
> Date: Mon, 4 Aug 2014 12:35:59 -0700
>
>> On Sun, Aug 3, 2014 at 9:21 PM, David Miller <davem@davemloft.net> wrote:
>>> An attacker can construct a packet sequence such that every mask cache
>>> lookup misses, making the cache effectively useless.
>>
>> Yes, but it does work in normal traffic situations.
>
> You're basically just reiterating the point I'm trying to make.
>
> Your caches are designed for specific configuration and packet traffic
> pattern cases, and can be easily duped into a worse case performance
> scenerio by an attacker.
>
> Caches, basically, do not work on the real internet.
>
> Make the fundamental core data structures fast and scalable enough,
> rather than bolting caches (which are basically hacks) on top every
> time they don't perform to your expectations.
>
> What if you made the full flow lookup fundamentally faster?  Then an

I suspect that the flow lookup in ovs is as fast as it can be, yet
ovs is still dos-able, since kernel datapath (flow lookup and action)
is considered to be first level cache for user space. By design flow
miss is always punted to userspace. Therefore netperf TCP_CRR test
from a VM is not cheap for host userspace component. Mega-flows and
multiple upcall pids are partially addressing this fundamental
problem. Consider simple distributed virtual bridge with VMs
distributed across multiple hosts. Mega-flow mask that selects dmac
can solve CRR case for well behaving VMs, but rogue VM that spams
random dmac will keep taxing host userspace. So we'd need to add
another flow mask to match the rest of traffic unconditionally and
drop it. Now consider virtual bridge-router-bridge topology (two
subnets and router using openstack names). Since VMs on the same
host may be in different subnets their macs can be the same, so
'mega-flow mask dmac' approach won't work and CRR test again is
getting costly to userspace. We can try to use 'in_port + dmac'
mask, but as network topology grows the flow mask tricks get out
of hand. Situation is worse when ovs works as gateway and receives
internet traffic. Since flow miss goes to userspace remote attacker
can find a way to saturate gateway cpu with certain traffic.
I guess none of this is new to ovs and there is probably a solution
that I don't know about.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel Aug. 13, 2014, 1:34 p.m. UTC | #7
Le 04/08/2014 21:42, David Miller a écrit :
[snip]
> Caches, basically, do not work on the real internet.
A bit late, but I completely agree!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html