diff mbox series

[ovs-dev] ofproto-dpif-xlate: Don't reinstall removed XC_LEARN rule

Message ID 20230720150607.2544072-1-mkp@redhat.com
State Changes Requested
Headers show
Series [ovs-dev] ofproto-dpif-xlate: Don't reinstall removed XC_LEARN rule | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/intel-ovs-compilation fail test: fail

Commit Message

Mike Pattrick July 20, 2023, 3:06 p.m. UTC
When the a revalidator thread is updating statistics for an XC_LEARN
xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
The revalidator will update stats for rules even if they are in a
removed state or marked as invisible. However, ofproto_flow_mod_learn
will detect if a flow has been removed and re-add it in that case. This
can result in an old learn action replacing the new learn action that
had replaced it in the first place.

This change adds a new parameter to ofproto_flow_mod_learn allowing the
caller to specify an action to take if temp_rule is removed. If it is
removed and the caller is xlate_push_stats_entry, then a new rule will
not be replaced.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2213892
Signed-off-by: Mike Pattrick <mkp@redhat.com>
---
 ofproto/ofproto-dpif-xlate-cache.c |  2 +-
 ofproto/ofproto-dpif-xlate.c       |  2 +-
 ofproto/ofproto-dpif.c             |  2 +-
 ofproto/ofproto-provider.h         |  6 ++++--
 ofproto/ofproto.c                  | 21 ++++++++++++++-------
 5 files changed, 21 insertions(+), 12 deletions(-)

Comments

Eelco Chaudron July 21, 2023, 12:37 p.m. UTC | #1
On 20 Jul 2023, at 17:06, Mike Pattrick wrote:

> When the a revalidator thread is updating statistics for an XC_LEARN
> xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
> The revalidator will update stats for rules even if they are in a
> removed state or marked as invisible. However, ofproto_flow_mod_learn
> will detect if a flow has been removed and re-add it in that case. This
> can result in an old learn action replacing the new learn action that
> had replaced it in the first place.
>
> This change adds a new parameter to ofproto_flow_mod_learn allowing the
> caller to specify an action to take if temp_rule is removed. If it is
> removed and the caller is xlate_push_stats_entry, then a new rule will
> not be replaced.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2213892
> Signed-off-by: Mike Pattrick <mkp@redhat.com>

Hi Mike,

Thanks for this, patch! I went over the code, and the fix looks fine to me. However, due to being on PTO in a couple of hours, I did not fully go over the actual LEARN action implementation to find potential side effects. Maybe someone else has time to look at the potential side effects of this change.

I did run all the dp tests, with and without ASAN, and no problems were found.

With the above side note,

Acked-by: Eelco Chaudron <echaudro@redhat.com>

Cheers,

Eelco
Ilya Maximets July 28, 2023, 11:49 a.m. UTC | #2
On 7/20/23 17:06, Mike Pattrick wrote:
> When the a revalidator thread is updating statistics for an XC_LEARN
> xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
> The revalidator will update stats for rules even if they are in a
> removed state or marked as invisible. However, ofproto_flow_mod_learn
> will detect if a flow has been removed and re-add it in that case. This
> can result in an old learn action replacing the new learn action that
> had replaced it in the first place.
> 
> This change adds a new parameter to ofproto_flow_mod_learn allowing the
> caller to specify an action to take if temp_rule is removed. If it is
> removed and the caller is xlate_push_stats_entry, then a new rule will
> not be replaced.

Hi, Mike.  That's an interesting issue, though I'm not sure that is a
correct way to fix it.

It is important to re-install removed learned flows during stats update.
The logic is the following, AFAIU:

1. Packet hits the OF rule (1) with a learning action.
2. Handler translates OF and creates a datapath flow for it, creating
   a learned rule (2) in the process and filling the translation cache.
3. Datapath flow for the rule (1) is installed into the datapath.
4. Time goes by and timeout on the learned rule (2) hits. Learned rule
   is removed now.
5. A packet hits the datapath flow for rule (1).
6. Revalidator updates flow stats for the rule (1).
7. Since the packet hit the learning rule (1), the rule (2) has to be
   added back.

IIUC, the step 7 will not happen with the change proposed in this patch.
Flow will not be re-translated, so the only way to re-install it is
to execute side effects when we see stats update on a parent datapath
flow in the revalidator.

One potential way to avoid breaking this scenario is to check the
removal reason, but it's not bulletproof either as we do still need
to reinstate the flow even if it was manually deleted or evicted.

It also seems like we may hold the reference to this deleted rule for a
long time in the cache until the parent datapath flow is deleted.


What is triggering the deletion of the learned rule in your case?
Is it deletion or modification of the parent rule?  And does it have
'delete_learned' flag?

Best regards, Ilya Maximets.
Mike Pattrick July 28, 2023, 9:13 p.m. UTC | #3
On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets <i.maximets@ovn.org> wrote:

> On 7/20/23 17:06, Mike Pattrick wrote:
> > When the a revalidator thread is updating statistics for an XC_LEARN
> > xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
> > The revalidator will update stats for rules even if they are in a
> > removed state or marked as invisible. However, ofproto_flow_mod_learn
> > will detect if a flow has been removed and re-add it in that case. This
> > can result in an old learn action replacing the new learn action that
> > had replaced it in the first place.
> >
> > This change adds a new parameter to ofproto_flow_mod_learn allowing the
> > caller to specify an action to take if temp_rule is removed. If it is
> > removed and the caller is xlate_push_stats_entry, then a new rule will
> > not be replaced.
>
>
This is a slightly unrelated point, but while reviewing some code and test
results I think I found a different issue. rule_expire() considers rule
modified time for hard_timeout and the man page says hard_timeout "Causes
the flow to expire after the given number of seconds, regardless of
activity", however, modified time is updated in
ofproto_flow_mod_learn_refresh() even if the rule isn't changed. Due to the
current behaviour, I don't think hard_timeout could actually ever trigger
on a learned action if an idle timeout is set or if traffic is ongoing.


> Hi, Mike.  That's an interesting issue, though I'm not sure that is a
> correct way to fix it.
>
> It is important to re-install removed learned flows during stats update.
> The logic is the following, AFAIU:
>
> 1. Packet hits the OF rule (1) with a learning action.
> 2. Handler translates OF and creates a datapath flow for it, creating
>    a learned rule (2) in the process and filling the translation cache.
> 3. Datapath flow for the rule (1) is installed into the datapath.
> 4. Time goes by and timeout on the learned rule (2) hits. Learned rule
>    is removed now.
> 5. A packet hits the datapath flow for rule (1).
> 6. Revalidator updates flow stats for the rule (1).
> 7. Since the packet hit the learning rule (1), the rule (2) has to be
>    added back.
>
> IIUC, the step 7 will not happen with the change proposed in this patch.
> Flow will not be re-translated, so the only way to re-install it is
> to execute side effects when we see stats update on a parent datapath
> flow in the revalidator.
>
> One potential way to avoid breaking this scenario is to check the
> removal reason, but it's not bulletproof either as we do still need
> to reinstate the flow even if it was manually deleted or evicted.
>

I added a log to check the removal reason in the XC_LEARN stats update, and
the reason always seems to be eviction. I wasn't able to catch an idle or
hard removal in that function, though I'm sure I just wasn't winning the
race.

But in this example if we hit 7, and don't re-add the rule, what is the
real penalty? The next packet will have to upcall and we would process the
learn action from scratch, but that may be what the user wants as indicated
by setting a timeout. The alternative is we re-install a flow that may be
different then the flow that the user actually wants installed.

Isn't there a risk of different revalidator threads going out of sync with
different versions of a learned rule in their xcache?


> It also seems like we may hold the reference to this deleted rule for a
> long time in the cache until the parent datapath flow is deleted.
>
>
> What is triggering the deletion of the learned rule in your case?
> Is it deletion or modification of the parent rule?  And does it have
> 'delete_learned' flag?
>

In the case that I was debugging, the parent learn action was not deleted.
Instead, new traffic from a different port was supposed to cause the learn
action to modify the existing rule. But instead we see the rule modified
with the correct information, and then immediately reverted with old stale
information.


>
> Best regards, Ilya Maximets.
>
>
Thank you,
Mike
Ilya Maximets July 28, 2023, 10:32 p.m. UTC | #4
On 7/28/23 23:13, Mike Pattrick wrote:
> On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
> 
>     On 7/20/23 17:06, Mike Pattrick wrote:
>     > When the a revalidator thread is updating statistics for an XC_LEARN
>     > xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
>     > The revalidator will update stats for rules even if they are in a
>     > removed state or marked as invisible. However, ofproto_flow_mod_learn
>     > will detect if a flow has been removed and re-add it in that case. This
>     > can result in an old learn action replacing the new learn action that
>     > had replaced it in the first place.
>     >
>     > This change adds a new parameter to ofproto_flow_mod_learn allowing the
>     > caller to specify an action to take if temp_rule is removed. If it is
>     > removed and the caller is xlate_push_stats_entry, then a new rule will
>     > not be replaced.
> 
>  
> This is a slightly unrelated point, but while reviewing some code and test
> results I think I found a different issue. rule_expire() considers rule
> modified time for hard_timeout and the man page says hard_timeout "Causes
> the flow to expire after the given number of seconds, regardless of activity",
> however, modified time is updated in ofproto_flow_mod_learn_refresh() even if
> the rule isn't changed. Due to the current behaviour, I don't think hard_timeout
> could actually ever trigger on a learned action if an idle timeout is set or if
> traffic is ongoing.

The 'modified' time of a learned rule is updated in this function, because
the packet hit the learn() rule.  If the learned rule didn't exist, we
would create it at this moment.  Instead of re-creating, we just update
the 'modified' time.  If packets will stop hitting the learn() rule, but
will continue to flow through the learned one, the learned one will be
removed when the hard timeout fires up.  That is a correct behavior.

>  
> 
>     Hi, Mike.  That's an interesting issue, though I'm not sure that is a
>     correct way to fix it.
> 
>     It is important to re-install removed learned flows during stats update.
>     The logic is the following, AFAIU:
> 
>     1. Packet hits the OF rule (1) with a learning action.
>     2. Handler translates OF and creates a datapath flow for it, creating
>        a learned rule (2) in the process and filling the translation cache.
>     3. Datapath flow for the rule (1) is installed into the datapath.
>     4. Time goes by and timeout on the learned rule (2) hits. Learned rule
>        is removed now.
>     5. A packet hits the datapath flow for rule (1).
>     6. Revalidator updates flow stats for the rule (1).
>     7. Since the packet hit the learning rule (1), the rule (2) has to be
>        added back.
> 
>     IIUC, the step 7 will not happen with the change proposed in this patch.
>     Flow will not be re-translated, so the only way to re-install it is
>     to execute side effects when we see stats update on a parent datapath
>     flow in the revalidator.
> 
>     One potential way to avoid breaking this scenario is to check the
>     removal reason, but it's not bulletproof either as we do still need
>     to reinstate the flow even if it was manually deleted or evicted.
> 
> 
> I added a log to check the removal reason in the XC_LEARN stats update, and the
> reason always seems to be eviction. I wasn't able to catch an idle or hard removal
> in that function, though I'm sure I just wasn't winning the race.
> 
> But in this example if we hit 7, and don't re-add the rule, what is the real penalty?
> The next packet will have to upcall and we would process the learn action from scratch,
> but that may be what the user wants as indicated by setting a timeout.

Not really.  If the learned flow is removed, we need a packet to hit a
learn() rule, which is a different rule, in order for traffic to continue
to flow.  Let's say we implement a firewall that allows connections only
from A to B.  So, A can initiate connection to B, but B can't initiate
connection to A.  Once A sends tcp SYN, we learn a rule that allows traffic
from B to A on these particular ports.  Now B can send traffic back.
Once the learned flow is removed, A will have to establish a new connection
in order to re-learn the B->A rule.  So, not just an upcall.  May not be a
best example, but should work as an illustration.

> The alternative
> is we re-install a flow that may be different then the flow that the user actually
> wants installed.

If the flow is modified, it means that there was a traffic on a learn()
action and we have to learn that specific rule.  If it was already removed,
we have to reinstate it.  In general, the logic in the current OVS code
seems correct in this particular place.

> 
> Isn't there a risk of different revalidator threads going out of sync with different
> versions of a learned rule in their xcache?
> 
> 
>     It also seems like we may hold the reference to this deleted rule for a
>     long time in the cache until the parent datapath flow is deleted.
> 
> 
>     What is triggering the deletion of the learned rule in your case?
>     Is it deletion or modification of the parent rule?  And does it have
>     'delete_learned' flag?
> 
> 
> In the case that I was debugging, the parent learn action was not deleted. Instead,
> new traffic from a different port was supposed to cause the learn action to modify
> the existing rule. But instead we see the rule modified with the correct information,
> and then immediately reverted with old stale information.

The problem here is that we can't define which one is 'correct'.
It's a race where both values are correct, but you like one better
for some reason.  New rules are only learned if the parent datapath
flow that has a learn() action seen packets after the previous stats
check.

What you're describing is that you have a single rule with a learn
action that is capable of re-writing the same learned rule depending
on what kind of packet actually hits it.  Is that correct?

I would consider this kind of setup as fairly fragile and would not
recommend using it in general.

However, the scenario is likely like this:

1. Upcall on a rule with learn() action.
   New datapath flow (1) is installed.  This flow has a match on
   all the fields used in a learn action and ukey contains the
   translation cache with a rule that it produces.
   The new learned rule is installed from this.

2. Upcall with a different packet that doesn't match the first
   datapath flow (1), but on the same rule with a learn() action.
   New datapath flow (2) is installed.  This flow has a match on
   all the fields used in a learn action (they are different from
   the first one, since we have an upcall) and ukey contains the
   translation cache with a rule that it produces.  Let's say the
   learned rule has the same match, but different actions.
   This overwrites the previously installed learned rule.

3. Both datapath flows (1) and (2) get some traffic.

4. Revalidation starts.

5. Revalidator A dumps datapath flow (1).
   Revalidator B dumps datapath flow (2).

6. Revalidators A and B are trying to revalidate these datapath flows
   at the same time.  Both datapath flows had traffic since the
   last revalidation.  Both revalidators are trying to update the
   learned rule, which is the same for both of them.

7. One of them does that first, the other one ends up with a rule in a
   REMOVED state, since it was just replaced by the other revalidator.


There is no correct or incorrect rule in this situation.  Both
of them are correct, because both datapath flows have seen packets
since the previous revalidation.  We can't guarantee which one will
end up being written the last and therefore will take effect.
if we refuse to reinstate the REMOVED rule, we still can't say which
one will be active in the end.

What can resolve the race is the next revalidation cycle.
Once one of the datapath flows will stop receiving traffic, there
should no longer be a race condition and the next revalidation will
set the "correct" rule, i.e. the rule that corresponds to a learning
datapath flow that seen some packets last.  But until both flows have
traffic, there is no "correct" rule and the result will be random,
regardless of what we choose to do.


One issue that I see in the current code is that it looks like the table
version is not getting updated on learned flow refresh and hence the next
revalidation may not actually be triggered.  In this case the random
result may stay for much longer than we would like.

The suspicious code (and the potential fix) is the following:

@@ -5548,22 +5558,15 @@ ofproto_flow_mod_learn_revert(struct ofproto_flow_mod *ofm)
 }
 
 enum ofperr
-ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm,
-                              struct ofproto *orig_ofproto)
+ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm)
     OVS_REQUIRES(ofproto_mutex)
 {
     struct rule *rule = rule_collection_rules(&ofm->new_rules)[0];
     enum ofperr error = 0;
 
-    /* If learning on a different bridge, must bump its version
-     * number and flush connmgr afterwards. */
-    if (rule->ofproto != orig_ofproto) {
-        ofproto_bump_tables_version(rule->ofproto);
-    }
+    ofproto_bump_tables_version(rule->ofproto);
     error = ofproto_flow_mod_finish(rule->ofproto, ofm, NULL);
-    if (rule->ofproto != orig_ofproto) {
-        ofmonitor_flush(rule->ofproto->connmgr);
-    }
+    ofmonitor_flush(rule->ofproto->connmgr);
 
     return error;
 }
---

We should actually bump the version and request a revalidation in
any case, not only when the change is in a different bridge.

Could you try that in your setup?

The issue appears to come from the commit:
  1f4a89336682 ("ofproto: Refactor packet_out handling.")

There is a slight chance of launching ourselves into an infinite
revlidation loop, I guess.  Especially if both learning datapath
flows will continue to receive the traffic, but there is a minimal
revalidation interval, IIRC.  Not sure if it applies in this case.

Best regards, Ilya Maximets.
Mike Pattrick July 31, 2023, 5:34 a.m. UTC | #5
On Fri, Jul 28, 2023 at 6:31 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 7/28/23 23:13, Mike Pattrick wrote:
> > On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
> >
> >     On 7/20/23 17:06, Mike Pattrick wrote:
> >     > When the a revalidator thread is updating statistics for an XC_LEARN
> >     > xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
> >     > The revalidator will update stats for rules even if they are in a
> >     > removed state or marked as invisible. However, ofproto_flow_mod_learn
> >     > will detect if a flow has been removed and re-add it in that case. This
> >     > can result in an old learn action replacing the new learn action that
> >     > had replaced it in the first place.
> >     >
> >     > This change adds a new parameter to ofproto_flow_mod_learn allowing the
> >     > caller to specify an action to take if temp_rule is removed. If it is
> >     > removed and the caller is xlate_push_stats_entry, then a new rule will
> >     > not be replaced.
> >
> >
> > This is a slightly unrelated point, but while reviewing some code and test
> > results I think I found a different issue. rule_expire() considers rule
> > modified time for hard_timeout and the man page says hard_timeout "Causes
> > the flow to expire after the given number of seconds, regardless of activity",
> > however, modified time is updated in ofproto_flow_mod_learn_refresh() even if
> > the rule isn't changed. Due to the current behaviour, I don't think hard_timeout
> > could actually ever trigger on a learned action if an idle timeout is set or if
> > traffic is ongoing.
>
> The 'modified' time of a learned rule is updated in this function, because
> the packet hit the learn() rule.  If the learned rule didn't exist, we
> would create it at this moment.  Instead of re-creating, we just update
> the 'modified' time.  If packets will stop hitting the learn() rule, but
> will continue to flow through the learned one, the learned one will be
> removed when the hard timeout fires up.  That is a correct behavior.
>

I can accept this is intended, but it doesn't seem like it would
operate in that way based on the man page alone.

>
> >
> >
> >     Hi, Mike.  That's an interesting issue, though I'm not sure that is a
> >     correct way to fix it.
> >
> >     It is important to re-install removed learned flows during stats update.
> >     The logic is the following, AFAIU:
> >
> >     1. Packet hits the OF rule (1) with a learning action.
> >     2. Handler translates OF and creates a datapath flow for it, creating
> >        a learned rule (2) in the process and filling the translation cache.
> >     3. Datapath flow for the rule (1) is installed into the datapath.
> >     4. Time goes by and timeout on the learned rule (2) hits. Learned rule
> >        is removed now.
> >     5. A packet hits the datapath flow for rule (1).
> >     6. Revalidator updates flow stats for the rule (1).
> >     7. Since the packet hit the learning rule (1), the rule (2) has to be
> >        added back.
> >
> >     IIUC, the step 7 will not happen with the change proposed in this patch.
> >     Flow will not be re-translated, so the only way to re-install it is
> >     to execute side effects when we see stats update on a parent datapath
> >     flow in the revalidator.
> >
> >     One potential way to avoid breaking this scenario is to check the
> >     removal reason, but it's not bulletproof either as we do still need
> >     to reinstate the flow even if it was manually deleted or evicted.
> >
> >
> > I added a log to check the removal reason in the XC_LEARN stats update, and the
> > reason always seems to be eviction. I wasn't able to catch an idle or hard removal
> > in that function, though I'm sure I just wasn't winning the race.
> >
> > But in this example if we hit 7, and don't re-add the rule, what is the real penalty?
> > The next packet will have to upcall and we would process the learn action from scratch,
> > but that may be what the user wants as indicated by setting a timeout.
>
> Not really.  If the learned flow is removed, we need a packet to hit a
> learn() rule, which is a different rule, in order for traffic to continue
> to flow.  Let's say we implement a firewall that allows connections only
> from A to B.  So, A can initiate connection to B, but B can't initiate
> connection to A.  Once A sends tcp SYN, we learn a rule that allows traffic
> from B to A on these particular ports.  Now B can send traffic back.
> Once the learned flow is removed, A will have to establish a new connection
> in order to re-learn the B->A rule.  So, not just an upcall.  May not be a
> best example, but should work as an illustration.
>
> > The alternative
> > is we re-install a flow that may be different then the flow that the user actually
> > wants installed.
>
> If the flow is modified, it means that there was a traffic on a learn()
> action and we have to learn that specific rule.  If it was already removed,
> we have to reinstate it.  In general, the logic in the current OVS code
> seems correct in this particular place.
>
> >
> > Isn't there a risk of different revalidator threads going out of sync with different
> > versions of a learned rule in their xcache?
> >
> >
> >     It also seems like we may hold the reference to this deleted rule for a
> >     long time in the cache until the parent datapath flow is deleted.
> >
> >
> >     What is triggering the deletion of the learned rule in your case?
> >     Is it deletion or modification of the parent rule?  And does it have
> >     'delete_learned' flag?
> >
> >
> > In the case that I was debugging, the parent learn action was not deleted. Instead,
> > new traffic from a different port was supposed to cause the learn action to modify
> > the existing rule. But instead we see the rule modified with the correct information,
> > and then immediately reverted with old stale information.
>
> The problem here is that we can't define which one is 'correct'.
> It's a race where both values are correct, but you like one better
> for some reason.  New rules are only learned if the parent datapath
> flow that has a learn() action seen packets after the previous stats
> check.
>
> What you're describing is that you have a single rule with a learn
> action that is capable of re-writing the same learned rule depending
> on what kind of packet actually hits it.  Is that correct?

In this case the rule is:

learn(table=20,hard_timeout=300,priority=1,cookie=0xcb684d24d14b14ad,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:1

And an example of the learned rule is:

vlan_tci=0x0005/0x0fff,dl_dst=fc:16:3e:3e:5d:e0,actions=load:0->NXM_OF_VLAN_TCI[],load:0x3->NXM_NX_TUN_ID[],ou
tput:2

The scenario is that ingressing packets are supposed to change the
route that egress packets take, and that route can change over time.

What I see from some additional logging is:

1. traffic begins to ingress on port A
2. learn action installs a rule to route response to A
3. traffic begins to egress on port A
4. port A becomes invalid, no further traffic ingresses from port A
5. traffic ingresses on port B
6. learn action installs a rule to route response to B
7. revalidator replaces (6) with (2)

Eventually the egressing traffic will become exausted and the correct
rule is installed, but this can take several minutes. The rule
regression can even happen many milliseconds after learn installs the
correct rule and even after packets have flowed through the new rule.


> I would consider this kind of setup as fairly fragile and would not
> recommend using it in general.
>
> However, the scenario is likely like this:
>
> 1. Upcall on a rule with learn() action.
>    New datapath flow (1) is installed.  This flow has a match on
>    all the fields used in a learn action and ukey contains the
>    translation cache with a rule that it produces.
>    The new learned rule is installed from this.
>
> 2. Upcall with a different packet that doesn't match the first
>    datapath flow (1), but on the same rule with a learn() action.
>    New datapath flow (2) is installed.  This flow has a match on
>    all the fields used in a learn action (they are different from
>    the first one, since we have an upcall) and ukey contains the
>    translation cache with a rule that it produces.  Let's say the
>    learned rule has the same match, but different actions.
>    This overwrites the previously installed learned rule.
>
> 3. Both datapath flows (1) and (2) get some traffic.
>
> 4. Revalidation starts.
>
> 5. Revalidator A dumps datapath flow (1).
>    Revalidator B dumps datapath flow (2).
>
> 6. Revalidators A and B are trying to revalidate these datapath flows
>    at the same time.  Both datapath flows had traffic since the
>    last revalidation.  Both revalidators are trying to update the
>    learned rule, which is the same for both of them.
>
> 7. One of them does that first, the other one ends up with a rule in a
>    REMOVED state, since it was just replaced by the other revalidator.
>
>
> There is no correct or incorrect rule in this situation.  Both
> of them are correct, because both datapath flows have seen packets
> since the previous revalidation.  We can't guarantee which one will
> end up being written the last and therefore will take effect.
> if we refuse to reinstate the REMOVED rule, we still can't say which
> one will be active in the end.
>
> What can resolve the race is the next revalidation cycle.
> Once one of the datapath flows will stop receiving traffic, there
> should no longer be a race condition and the next revalidation will
> set the "correct" rule, i.e. the rule that corresponds to a learning
> datapath flow that seen some packets last.  But until both flows have
> traffic, there is no "correct" rule and the result will be random,
> regardless of what we choose to do.
>
>
> One issue that I see in the current code is that it looks like the table
> version is not getting updated on learned flow refresh and hence the next
> revalidation may not actually be triggered.  In this case the random
> result may stay for much longer than we would like.
>
> The suspicious code (and the potential fix) is the following:
>
> @@ -5548,22 +5558,15 @@ ofproto_flow_mod_learn_revert(struct ofproto_flow_mod *ofm)
>  }
>
>  enum ofperr
> -ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm,
> -                              struct ofproto *orig_ofproto)
> +ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm)
>      OVS_REQUIRES(ofproto_mutex)
>  {
>      struct rule *rule = rule_collection_rules(&ofm->new_rules)[0];
>      enum ofperr error = 0;
>
> -    /* If learning on a different bridge, must bump its version
> -     * number and flush connmgr afterwards. */
> -    if (rule->ofproto != orig_ofproto) {
> -        ofproto_bump_tables_version(rule->ofproto);
> -    }
> +    ofproto_bump_tables_version(rule->ofproto);
>      error = ofproto_flow_mod_finish(rule->ofproto, ofm, NULL);
> -    if (rule->ofproto != orig_ofproto) {
> -        ofmonitor_flush(rule->ofproto->connmgr);
> -    }
> +    ofmonitor_flush(rule->ofproto->connmgr);
>
>      return error;
>  }
> ---
>
> We should actually bump the version and request a revalidation in
> any case, not only when the change is in a different bridge.
>
> Could you try that in your setup?

That was actually my initial suspicion when investigating this bug,
but this change doesn't improve the buggy behaviour.

Thank you,
M

>
>
> The issue appears to come from the commit:
>   1f4a89336682 ("ofproto: Refactor packet_out handling.")
>
> There is a slight chance of launching ourselves into an infinite
> revlidation loop, I guess.  Especially if both learning datapath
> flows will continue to receive the traffic, but there is a minimal
> revalidation interval, IIRC.  Not sure if it applies in this case.
>
> Best regards, Ilya Maximets.
>
Ilya Maximets July 31, 2023, 11:21 a.m. UTC | #6
On 7/31/23 07:34, Mike Pattrick wrote:
> On Fri, Jul 28, 2023 at 6:31 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> On 7/28/23 23:13, Mike Pattrick wrote:
>>> On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
>>>
>>>     On 7/20/23 17:06, Mike Pattrick wrote:
>>>     > When the a revalidator thread is updating statistics for an XC_LEARN
>>>     > xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
>>>     > The revalidator will update stats for rules even if they are in a
>>>     > removed state or marked as invisible. However, ofproto_flow_mod_learn
>>>     > will detect if a flow has been removed and re-add it in that case. This
>>>     > can result in an old learn action replacing the new learn action that
>>>     > had replaced it in the first place.
>>>     >
>>>     > This change adds a new parameter to ofproto_flow_mod_learn allowing the
>>>     > caller to specify an action to take if temp_rule is removed. If it is
>>>     > removed and the caller is xlate_push_stats_entry, then a new rule will
>>>     > not be replaced.
>>>
>>>
>>> This is a slightly unrelated point, but while reviewing some code and test
>>> results I think I found a different issue. rule_expire() considers rule
>>> modified time for hard_timeout and the man page says hard_timeout "Causes
>>> the flow to expire after the given number of seconds, regardless of activity",
>>> however, modified time is updated in ofproto_flow_mod_learn_refresh() even if
>>> the rule isn't changed. Due to the current behaviour, I don't think hard_timeout
>>> could actually ever trigger on a learned action if an idle timeout is set or if
>>> traffic is ongoing.
>>
>> The 'modified' time of a learned rule is updated in this function, because
>> the packet hit the learn() rule.  If the learned rule didn't exist, we
>> would create it at this moment.  Instead of re-creating, we just update
>> the 'modified' time.  If packets will stop hitting the learn() rule, but
>> will continue to flow through the learned one, the learned one will be
>> removed when the hard timeout fires up.  That is a correct behavior.
>>
> 
> I can accept this is intended, but it doesn't seem like it would
> operate in that way based on the man page alone.

It's a generic characteristic of an OpenFlow rule.  "activity" means traffic
matching it.  learn() action is triggering a flow modification equal to a
strict mod-flow, as described in the ovs-actions man page.  And ovs-ofctl page
mentions that "mod-flows restarts the hard_timeout".

Maybe it can be clarified somehow better, I'm not sure.  Feel free to submit
a patch.

> 
>>
>>>
>>>
>>>     Hi, Mike.  That's an interesting issue, though I'm not sure that is a
>>>     correct way to fix it.
>>>
>>>     It is important to re-install removed learned flows during stats update.
>>>     The logic is the following, AFAIU:
>>>
>>>     1. Packet hits the OF rule (1) with a learning action.
>>>     2. Handler translates OF and creates a datapath flow for it, creating
>>>        a learned rule (2) in the process and filling the translation cache.
>>>     3. Datapath flow for the rule (1) is installed into the datapath.
>>>     4. Time goes by and timeout on the learned rule (2) hits. Learned rule
>>>        is removed now.
>>>     5. A packet hits the datapath flow for rule (1).
>>>     6. Revalidator updates flow stats for the rule (1).
>>>     7. Since the packet hit the learning rule (1), the rule (2) has to be
>>>        added back.
>>>
>>>     IIUC, the step 7 will not happen with the change proposed in this patch.
>>>     Flow will not be re-translated, so the only way to re-install it is
>>>     to execute side effects when we see stats update on a parent datapath
>>>     flow in the revalidator.
>>>
>>>     One potential way to avoid breaking this scenario is to check the
>>>     removal reason, but it's not bulletproof either as we do still need
>>>     to reinstate the flow even if it was manually deleted or evicted.
>>>
>>>
>>> I added a log to check the removal reason in the XC_LEARN stats update, and the
>>> reason always seems to be eviction. I wasn't able to catch an idle or hard removal
>>> in that function, though I'm sure I just wasn't winning the race.
>>>
>>> But in this example if we hit 7, and don't re-add the rule, what is the real penalty?
>>> The next packet will have to upcall and we would process the learn action from scratch,
>>> but that may be what the user wants as indicated by setting a timeout.
>>
>> Not really.  If the learned flow is removed, we need a packet to hit a
>> learn() rule, which is a different rule, in order for traffic to continue
>> to flow.  Let's say we implement a firewall that allows connections only
>> from A to B.  So, A can initiate connection to B, but B can't initiate
>> connection to A.  Once A sends tcp SYN, we learn a rule that allows traffic
>> from B to A on these particular ports.  Now B can send traffic back.
>> Once the learned flow is removed, A will have to establish a new connection
>> in order to re-learn the B->A rule.  So, not just an upcall.  May not be a
>> best example, but should work as an illustration.
>>
>>> The alternative
>>> is we re-install a flow that may be different then the flow that the user actually
>>> wants installed.
>>
>> If the flow is modified, it means that there was a traffic on a learn()
>> action and we have to learn that specific rule.  If it was already removed,
>> we have to reinstate it.  In general, the logic in the current OVS code
>> seems correct in this particular place.
>>
>>>
>>> Isn't there a risk of different revalidator threads going out of sync with different
>>> versions of a learned rule in their xcache?
>>>
>>>
>>>     It also seems like we may hold the reference to this deleted rule for a
>>>     long time in the cache until the parent datapath flow is deleted.
>>>
>>>
>>>     What is triggering the deletion of the learned rule in your case?
>>>     Is it deletion or modification of the parent rule?  And does it have
>>>     'delete_learned' flag?
>>>
>>>
>>> In the case that I was debugging, the parent learn action was not deleted. Instead,
>>> new traffic from a different port was supposed to cause the learn action to modify
>>> the existing rule. But instead we see the rule modified with the correct information,
>>> and then immediately reverted with old stale information.
>>
>> The problem here is that we can't define which one is 'correct'.
>> It's a race where both values are correct, but you like one better
>> for some reason.  New rules are only learned if the parent datapath
>> flow that has a learn() action seen packets after the previous stats
>> check.
>>
>> What you're describing is that you have a single rule with a learn
>> action that is capable of re-writing the same learned rule depending
>> on what kind of packet actually hits it.  Is that correct?
> 
> In this case the rule is:
> 
> learn(table=20,hard_timeout=300,priority=1,cookie=0xcb684d24d14b14ad,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:1
> 
> And an example of the learned rule is:
> 
> vlan_tci=0x0005/0x0fff,dl_dst=fc:16:3e:3e:5d:e0,actions=load:0->NXM_OF_VLAN_TCI[],load:0x3->NXM_NX_TUN_ID[],ou
> tput:2
> 
> The scenario is that ingressing packets are supposed to change the
> route that egress packets take, and that route can change over time.
> 
> What I see from some additional logging is:
> 
> 1. traffic begins to ingress on port A
> 2. learn action installs a rule to route response to A
> 3. traffic begins to egress on port A
> 4. port A becomes invalid, no further traffic ingresses from port A
> 5. traffic ingresses on port B
> 6. learn action installs a rule to route response to B
> 7. revalidator replaces (6) with (2)
> 
> Eventually the egressing traffic will become exausted and the correct
> rule is installed, but this can take several minutes. The rule
> regression can even happen many milliseconds after learn installs the
> correct rule and even after packets have flowed through the new rule.

The rule can be replaced if there was no revalidation between steps
4 and 5, i.e. at the point of revalidation there is a stats increase
on both datapath flows (ingress A and ingress B).
Revalidators can sleep for up to 500ms by default between revalidation
cycles, so the replacement can happen many milliseconds after.

Is the traffic on port B continuous, or is it a one-shot packet?
The point is, at least one revalidation cycle should pass after step 4.
And the traffic on port B should be seen after this revalidation cycle
is completed.  So, step 5 should be at least 500+ ms apart from step 4.

> 
> 
>> I would consider this kind of setup as fairly fragile and would not
>> recommend using it in general.
>>
>> However, the scenario is likely like this:
>>
>> 1. Upcall on a rule with learn() action.
>>    New datapath flow (1) is installed.  This flow has a match on
>>    all the fields used in a learn action and ukey contains the
>>    translation cache with a rule that it produces.
>>    The new learned rule is installed from this.
>>
>> 2. Upcall with a different packet that doesn't match the first
>>    datapath flow (1), but on the same rule with a learn() action.
>>    New datapath flow (2) is installed.  This flow has a match on
>>    all the fields used in a learn action (they are different from
>>    the first one, since we have an upcall) and ukey contains the
>>    translation cache with a rule that it produces.  Let's say the
>>    learned rule has the same match, but different actions.
>>    This overwrites the previously installed learned rule.
>>
>> 3. Both datapath flows (1) and (2) get some traffic.
>>
>> 4. Revalidation starts.
>>
>> 5. Revalidator A dumps datapath flow (1).
>>    Revalidator B dumps datapath flow (2).
>>
>> 6. Revalidators A and B are trying to revalidate these datapath flows
>>    at the same time.  Both datapath flows had traffic since the
>>    last revalidation.  Both revalidators are trying to update the
>>    learned rule, which is the same for both of them.
>>
>> 7. One of them does that first, the other one ends up with a rule in a
>>    REMOVED state, since it was just replaced by the other revalidator.
>>
>>
>> There is no correct or incorrect rule in this situation.  Both
>> of them are correct, because both datapath flows have seen packets
>> since the previous revalidation.  We can't guarantee which one will
>> end up being written the last and therefore will take effect.
>> if we refuse to reinstate the REMOVED rule, we still can't say which
>> one will be active in the end.
>>
>> What can resolve the race is the next revalidation cycle.
>> Once one of the datapath flows will stop receiving traffic, there
>> should no longer be a race condition and the next revalidation will
>> set the "correct" rule, i.e. the rule that corresponds to a learning
>> datapath flow that seen some packets last.  But until both flows have
>> traffic, there is no "correct" rule and the result will be random,
>> regardless of what we choose to do.
>>
>>
>> One issue that I see in the current code is that it looks like the table
>> version is not getting updated on learned flow refresh and hence the next
>> revalidation may not actually be triggered.  In this case the random
>> result may stay for much longer than we would like.
>>
>> The suspicious code (and the potential fix) is the following:
>>
>> @@ -5548,22 +5558,15 @@ ofproto_flow_mod_learn_revert(struct ofproto_flow_mod *ofm)
>>  }
>>
>>  enum ofperr
>> -ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm,
>> -                              struct ofproto *orig_ofproto)
>> +ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm)
>>      OVS_REQUIRES(ofproto_mutex)
>>  {
>>      struct rule *rule = rule_collection_rules(&ofm->new_rules)[0];
>>      enum ofperr error = 0;
>>
>> -    /* If learning on a different bridge, must bump its version
>> -     * number and flush connmgr afterwards. */
>> -    if (rule->ofproto != orig_ofproto) {
>> -        ofproto_bump_tables_version(rule->ofproto);
>> -    }
>> +    ofproto_bump_tables_version(rule->ofproto);
>>      error = ofproto_flow_mod_finish(rule->ofproto, ofm, NULL);
>> -    if (rule->ofproto != orig_ofproto) {
>> -        ofmonitor_flush(rule->ofproto->connmgr);
>> -    }
>> +    ofmonitor_flush(rule->ofproto->connmgr);
>>
>>      return error;
>>  }
>> ---
>>
>> We should actually bump the version and request a revalidation in
>> any case, not only when the change is in a different bridge.

FWIW, the original code in commit 1f4a89336682 was written mostly
for a case where learning is triggered by a packet-out executed
within a bundle.  And we can't actually publish changes for a current
bridge in this case.  So, the fix should be a bit more clever.

>>
>> Could you try that in your setup?
> 
> That was actually my initial suspicion when investigating this bug,
> but this change doesn't improve the buggy behaviour.
> 
> Thank you,
> M
> 
>>
>>
>> The issue appears to come from the commit:
>>   1f4a89336682 ("ofproto: Refactor packet_out handling.")
>>
>> There is a slight chance of launching ourselves into an infinite
>> revlidation loop, I guess.  Especially if both learning datapath
>> flows will continue to receive the traffic, but there is a minimal
>> revalidation interval, IIRC.  Not sure if it applies in this case.
>>
>> Best regards, Ilya Maximets.
>>
>
Mike Pattrick July 31, 2023, 4:01 p.m. UTC | #7
On Mon, Jul 31, 2023 at 7:20 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 7/31/23 07:34, Mike Pattrick wrote:
> > On Fri, Jul 28, 2023 at 6:31 PM Ilya Maximets <i.maximets@ovn.org> wrote:
> >>
> >> On 7/28/23 23:13, Mike Pattrick wrote:
> >>> On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
> >>>
> >>>     On 7/20/23 17:06, Mike Pattrick wrote:
> >>>     > When the a revalidator thread is updating statistics for an XC_LEARN
> >>>     > xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
> >>>     > The revalidator will update stats for rules even if they are in a
> >>>     > removed state or marked as invisible. However, ofproto_flow_mod_learn
> >>>     > will detect if a flow has been removed and re-add it in that case. This
> >>>     > can result in an old learn action replacing the new learn action that
> >>>     > had replaced it in the first place.
> >>>     >
> >>>     > This change adds a new parameter to ofproto_flow_mod_learn allowing the
> >>>     > caller to specify an action to take if temp_rule is removed. If it is
> >>>     > removed and the caller is xlate_push_stats_entry, then a new rule will
> >>>     > not be replaced.
> >>>
> >>>
> >>> This is a slightly unrelated point, but while reviewing some code and test
> >>> results I think I found a different issue. rule_expire() considers rule
> >>> modified time for hard_timeout and the man page says hard_timeout "Causes
> >>> the flow to expire after the given number of seconds, regardless of activity",
> >>> however, modified time is updated in ofproto_flow_mod_learn_refresh() even if
> >>> the rule isn't changed. Due to the current behaviour, I don't think hard_timeout
> >>> could actually ever trigger on a learned action if an idle timeout is set or if
> >>> traffic is ongoing.
> >>
> >> The 'modified' time of a learned rule is updated in this function, because
> >> the packet hit the learn() rule.  If the learned rule didn't exist, we
> >> would create it at this moment.  Instead of re-creating, we just update
> >> the 'modified' time.  If packets will stop hitting the learn() rule, but
> >> will continue to flow through the learned one, the learned one will be
> >> removed when the hard timeout fires up.  That is a correct behavior.
> >>
> >
> > I can accept this is intended, but it doesn't seem like it would
> > operate in that way based on the man page alone.
>
> It's a generic characteristic of an OpenFlow rule.  "activity" means traffic
> matching it.  learn() action is triggering a flow modification equal to a
> strict mod-flow, as described in the ovs-actions man page.  And ovs-ofctl page
> mentions that "mod-flows restarts the hard_timeout".
>
> Maybe it can be clarified somehow better, I'm not sure.  Feel free to submit
> a patch.
>
> >
> >>
> >>>
> >>>
> >>>     Hi, Mike.  That's an interesting issue, though I'm not sure that is a
> >>>     correct way to fix it.
> >>>
> >>>     It is important to re-install removed learned flows during stats update.
> >>>     The logic is the following, AFAIU:
> >>>
> >>>     1. Packet hits the OF rule (1) with a learning action.
> >>>     2. Handler translates OF and creates a datapath flow for it, creating
> >>>        a learned rule (2) in the process and filling the translation cache.
> >>>     3. Datapath flow for the rule (1) is installed into the datapath.
> >>>     4. Time goes by and timeout on the learned rule (2) hits. Learned rule
> >>>        is removed now.
> >>>     5. A packet hits the datapath flow for rule (1).
> >>>     6. Revalidator updates flow stats for the rule (1).
> >>>     7. Since the packet hit the learning rule (1), the rule (2) has to be
> >>>        added back.
> >>>
> >>>     IIUC, the step 7 will not happen with the change proposed in this patch.
> >>>     Flow will not be re-translated, so the only way to re-install it is
> >>>     to execute side effects when we see stats update on a parent datapath
> >>>     flow in the revalidator.
> >>>
> >>>     One potential way to avoid breaking this scenario is to check the
> >>>     removal reason, but it's not bulletproof either as we do still need
> >>>     to reinstate the flow even if it was manually deleted or evicted.
> >>>
> >>>
> >>> I added a log to check the removal reason in the XC_LEARN stats update, and the
> >>> reason always seems to be eviction. I wasn't able to catch an idle or hard removal
> >>> in that function, though I'm sure I just wasn't winning the race.
> >>>
> >>> But in this example if we hit 7, and don't re-add the rule, what is the real penalty?
> >>> The next packet will have to upcall and we would process the learn action from scratch,
> >>> but that may be what the user wants as indicated by setting a timeout.
> >>
> >> Not really.  If the learned flow is removed, we need a packet to hit a
> >> learn() rule, which is a different rule, in order for traffic to continue
> >> to flow.  Let's say we implement a firewall that allows connections only
> >> from A to B.  So, A can initiate connection to B, but B can't initiate
> >> connection to A.  Once A sends tcp SYN, we learn a rule that allows traffic
> >> from B to A on these particular ports.  Now B can send traffic back.
> >> Once the learned flow is removed, A will have to establish a new connection
> >> in order to re-learn the B->A rule.  So, not just an upcall.  May not be a
> >> best example, but should work as an illustration.
> >>
> >>> The alternative
> >>> is we re-install a flow that may be different then the flow that the user actually
> >>> wants installed.
> >>
> >> If the flow is modified, it means that there was a traffic on a learn()
> >> action and we have to learn that specific rule.  If it was already removed,
> >> we have to reinstate it.  In general, the logic in the current OVS code
> >> seems correct in this particular place.
> >>
> >>>
> >>> Isn't there a risk of different revalidator threads going out of sync with different
> >>> versions of a learned rule in their xcache?
> >>>
> >>>
> >>>     It also seems like we may hold the reference to this deleted rule for a
> >>>     long time in the cache until the parent datapath flow is deleted.
> >>>
> >>>
> >>>     What is triggering the deletion of the learned rule in your case?
> >>>     Is it deletion or modification of the parent rule?  And does it have
> >>>     'delete_learned' flag?
> >>>
> >>>
> >>> In the case that I was debugging, the parent learn action was not deleted. Instead,
> >>> new traffic from a different port was supposed to cause the learn action to modify
> >>> the existing rule. But instead we see the rule modified with the correct information,
> >>> and then immediately reverted with old stale information.
> >>
> >> The problem here is that we can't define which one is 'correct'.
> >> It's a race where both values are correct, but you like one better
> >> for some reason.  New rules are only learned if the parent datapath
> >> flow that has a learn() action seen packets after the previous stats
> >> check.
> >>
> >> What you're describing is that you have a single rule with a learn
> >> action that is capable of re-writing the same learned rule depending
> >> on what kind of packet actually hits it.  Is that correct?
> >
> > In this case the rule is:
> >
> > learn(table=20,hard_timeout=300,priority=1,cookie=0xcb684d24d14b14ad,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:1
> >
> > And an example of the learned rule is:
> >
> > vlan_tci=0x0005/0x0fff,dl_dst=fc:16:3e:3e:5d:e0,actions=load:0->NXM_OF_VLAN_TCI[],load:0x3->NXM_NX_TUN_ID[],ou
> > tput:2
> >
> > The scenario is that ingressing packets are supposed to change the
> > route that egress packets take, and that route can change over time.
> >
> > What I see from some additional logging is:
> >
> > 1. traffic begins to ingress on port A
> > 2. learn action installs a rule to route response to A
> > 3. traffic begins to egress on port A
> > 4. port A becomes invalid, no further traffic ingresses from port A
> > 5. traffic ingresses on port B
> > 6. learn action installs a rule to route response to B
> > 7. revalidator replaces (6) with (2)
> >
> > Eventually the egressing traffic will become exausted and the correct
> > rule is installed, but this can take several minutes. The rule
> > regression can even happen many milliseconds after learn installs the
> > correct rule and even after packets have flowed through the new rule.
>
> The rule can be replaced if there was no revalidation between steps
> 4 and 5, i.e. at the point of revalidation there is a stats increase
> on both datapath flows (ingress A and ingress B).
> Revalidators can sleep for up to 500ms by default between revalidation
> cycles, so the replacement can happen many milliseconds after.
>
> Is the traffic on port B continuous, or is it a one-shot packet?

Continuous but less frequent than traffic in the opposite direction.
In a client's environment there were ingress packets every few seconds
and egress packets continued to flow incorrectly for several minutes
afterwards.

If this solution is unacceptable; I could resubmit with an additional
check to only suppress the flow_mod if the learn_modify rule is in a
removed state and there is already a conflicting rule in the
classifier that isn't removed. Or would you prefer a completely
different approach be taken here?


Thanks,
M

> The point is, at least one revalidation cycle should pass after step 4.
> And the traffic on port B should be seen after this revalidation cycle
> is completed.  So, step 5 should be at least 500+ ms apart from step 4.
>
> >
> >
> >> I would consider this kind of setup as fairly fragile and would not
> >> recommend using it in general.
> >>
> >> However, the scenario is likely like this:
> >>
> >> 1. Upcall on a rule with learn() action.
> >>    New datapath flow (1) is installed.  This flow has a match on
> >>    all the fields used in a learn action and ukey contains the
> >>    translation cache with a rule that it produces.
> >>    The new learned rule is installed from this.
> >>
> >> 2. Upcall with a different packet that doesn't match the first
> >>    datapath flow (1), but on the same rule with a learn() action.
> >>    New datapath flow (2) is installed.  This flow has a match on
> >>    all the fields used in a learn action (they are different from
> >>    the first one, since we have an upcall) and ukey contains the
> >>    translation cache with a rule that it produces.  Let's say the
> >>    learned rule has the same match, but different actions.
> >>    This overwrites the previously installed learned rule.
> >>
> >> 3. Both datapath flows (1) and (2) get some traffic.
> >>
> >> 4. Revalidation starts.
> >>
> >> 5. Revalidator A dumps datapath flow (1).
> >>    Revalidator B dumps datapath flow (2).
> >>
> >> 6. Revalidators A and B are trying to revalidate these datapath flows
> >>    at the same time.  Both datapath flows had traffic since the
> >>    last revalidation.  Both revalidators are trying to update the
> >>    learned rule, which is the same for both of them.
> >>
> >> 7. One of them does that first, the other one ends up with a rule in a
> >>    REMOVED state, since it was just replaced by the other revalidator.
> >>
> >>
> >> There is no correct or incorrect rule in this situation.  Both
> >> of them are correct, because both datapath flows have seen packets
> >> since the previous revalidation.  We can't guarantee which one will
> >> end up being written the last and therefore will take effect.
> >> if we refuse to reinstate the REMOVED rule, we still can't say which
> >> one will be active in the end.
> >>
> >> What can resolve the race is the next revalidation cycle.
> >> Once one of the datapath flows will stop receiving traffic, there
> >> should no longer be a race condition and the next revalidation will
> >> set the "correct" rule, i.e. the rule that corresponds to a learning
> >> datapath flow that seen some packets last.  But until both flows have
> >> traffic, there is no "correct" rule and the result will be random,
> >> regardless of what we choose to do.
> >>
> >>
> >> One issue that I see in the current code is that it looks like the table
> >> version is not getting updated on learned flow refresh and hence the next
> >> revalidation may not actually be triggered.  In this case the random
> >> result may stay for much longer than we would like.
> >>
> >> The suspicious code (and the potential fix) is the following:
> >>
> >> @@ -5548,22 +5558,15 @@ ofproto_flow_mod_learn_revert(struct ofproto_flow_mod *ofm)
> >>  }
> >>
> >>  enum ofperr
> >> -ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm,
> >> -                              struct ofproto *orig_ofproto)
> >> +ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm)
> >>      OVS_REQUIRES(ofproto_mutex)
> >>  {
> >>      struct rule *rule = rule_collection_rules(&ofm->new_rules)[0];
> >>      enum ofperr error = 0;
> >>
> >> -    /* If learning on a different bridge, must bump its version
> >> -     * number and flush connmgr afterwards. */
> >> -    if (rule->ofproto != orig_ofproto) {
> >> -        ofproto_bump_tables_version(rule->ofproto);
> >> -    }
> >> +    ofproto_bump_tables_version(rule->ofproto);
> >>      error = ofproto_flow_mod_finish(rule->ofproto, ofm, NULL);
> >> -    if (rule->ofproto != orig_ofproto) {
> >> -        ofmonitor_flush(rule->ofproto->connmgr);
> >> -    }
> >> +    ofmonitor_flush(rule->ofproto->connmgr);
> >>
> >>      return error;
> >>  }
> >> ---
> >>
> >> We should actually bump the version and request a revalidation in
> >> any case, not only when the change is in a different bridge.
>
> FWIW, the original code in commit 1f4a89336682 was written mostly
> for a case where learning is triggered by a packet-out executed
> within a bundle.  And we can't actually publish changes for a current
> bridge in this case.  So, the fix should be a bit more clever.
>
> >>
> >> Could you try that in your setup?
> >
> > That was actually my initial suspicion when investigating this bug,
> > but this change doesn't improve the buggy behaviour.
> >
> > Thank you,
> > M
> >
> >>
> >>
> >> The issue appears to come from the commit:
> >>   1f4a89336682 ("ofproto: Refactor packet_out handling.")
> >>
> >> There is a slight chance of launching ourselves into an infinite
> >> revlidation loop, I guess.  Especially if both learning datapath
> >> flows will continue to receive the traffic, but there is a minimal
> >> revalidation interval, IIRC.  Not sure if it applies in this case.
> >>
> >> Best regards, Ilya Maximets.
> >>
> >
>
Ilya Maximets July 31, 2023, 6:04 p.m. UTC | #8
On 7/31/23 18:01, Mike Pattrick wrote:
> On Mon, Jul 31, 2023 at 7:20 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> On 7/31/23 07:34, Mike Pattrick wrote:
>>> On Fri, Jul 28, 2023 at 6:31 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>
>>>> On 7/28/23 23:13, Mike Pattrick wrote:
>>>>> On Fri, Jul 28, 2023 at 7:48 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
>>>>>
>>>>>     On 7/20/23 17:06, Mike Pattrick wrote:
>>>>>     > When the a revalidator thread is updating statistics for an XC_LEARN
>>>>>     > xcache entry in xlate_push_stats_entry it uses ofproto_flow_mod_learn.
>>>>>     > The revalidator will update stats for rules even if they are in a
>>>>>     > removed state or marked as invisible. However, ofproto_flow_mod_learn
>>>>>     > will detect if a flow has been removed and re-add it in that case. This
>>>>>     > can result in an old learn action replacing the new learn action that
>>>>>     > had replaced it in the first place.
>>>>>     >
>>>>>     > This change adds a new parameter to ofproto_flow_mod_learn allowing the
>>>>>     > caller to specify an action to take if temp_rule is removed. If it is
>>>>>     > removed and the caller is xlate_push_stats_entry, then a new rule will
>>>>>     > not be replaced.
>>>>>
>>>>>
>>>>> This is a slightly unrelated point, but while reviewing some code and test
>>>>> results I think I found a different issue. rule_expire() considers rule
>>>>> modified time for hard_timeout and the man page says hard_timeout "Causes
>>>>> the flow to expire after the given number of seconds, regardless of activity",
>>>>> however, modified time is updated in ofproto_flow_mod_learn_refresh() even if
>>>>> the rule isn't changed. Due to the current behaviour, I don't think hard_timeout
>>>>> could actually ever trigger on a learned action if an idle timeout is set or if
>>>>> traffic is ongoing.
>>>>
>>>> The 'modified' time of a learned rule is updated in this function, because
>>>> the packet hit the learn() rule.  If the learned rule didn't exist, we
>>>> would create it at this moment.  Instead of re-creating, we just update
>>>> the 'modified' time.  If packets will stop hitting the learn() rule, but
>>>> will continue to flow through the learned one, the learned one will be
>>>> removed when the hard timeout fires up.  That is a correct behavior.
>>>>
>>>
>>> I can accept this is intended, but it doesn't seem like it would
>>> operate in that way based on the man page alone.
>>
>> It's a generic characteristic of an OpenFlow rule.  "activity" means traffic
>> matching it.  learn() action is triggering a flow modification equal to a
>> strict mod-flow, as described in the ovs-actions man page.  And ovs-ofctl page
>> mentions that "mod-flows restarts the hard_timeout".
>>
>> Maybe it can be clarified somehow better, I'm not sure.  Feel free to submit
>> a patch.
>>
>>>
>>>>
>>>>>
>>>>>
>>>>>     Hi, Mike.  That's an interesting issue, though I'm not sure that is a
>>>>>     correct way to fix it.
>>>>>
>>>>>     It is important to re-install removed learned flows during stats update.
>>>>>     The logic is the following, AFAIU:
>>>>>
>>>>>     1. Packet hits the OF rule (1) with a learning action.
>>>>>     2. Handler translates OF and creates a datapath flow for it, creating
>>>>>        a learned rule (2) in the process and filling the translation cache.
>>>>>     3. Datapath flow for the rule (1) is installed into the datapath.
>>>>>     4. Time goes by and timeout on the learned rule (2) hits. Learned rule
>>>>>        is removed now.
>>>>>     5. A packet hits the datapath flow for rule (1).
>>>>>     6. Revalidator updates flow stats for the rule (1).
>>>>>     7. Since the packet hit the learning rule (1), the rule (2) has to be
>>>>>        added back.
>>>>>
>>>>>     IIUC, the step 7 will not happen with the change proposed in this patch.
>>>>>     Flow will not be re-translated, so the only way to re-install it is
>>>>>     to execute side effects when we see stats update on a parent datapath
>>>>>     flow in the revalidator.
>>>>>
>>>>>     One potential way to avoid breaking this scenario is to check the
>>>>>     removal reason, but it's not bulletproof either as we do still need
>>>>>     to reinstate the flow even if it was manually deleted or evicted.
>>>>>
>>>>>
>>>>> I added a log to check the removal reason in the XC_LEARN stats update, and the
>>>>> reason always seems to be eviction. I wasn't able to catch an idle or hard removal
>>>>> in that function, though I'm sure I just wasn't winning the race.
>>>>>
>>>>> But in this example if we hit 7, and don't re-add the rule, what is the real penalty?
>>>>> The next packet will have to upcall and we would process the learn action from scratch,
>>>>> but that may be what the user wants as indicated by setting a timeout.
>>>>
>>>> Not really.  If the learned flow is removed, we need a packet to hit a
>>>> learn() rule, which is a different rule, in order for traffic to continue
>>>> to flow.  Let's say we implement a firewall that allows connections only
>>>> from A to B.  So, A can initiate connection to B, but B can't initiate
>>>> connection to A.  Once A sends tcp SYN, we learn a rule that allows traffic
>>>> from B to A on these particular ports.  Now B can send traffic back.
>>>> Once the learned flow is removed, A will have to establish a new connection
>>>> in order to re-learn the B->A rule.  So, not just an upcall.  May not be a
>>>> best example, but should work as an illustration.
>>>>
>>>>> The alternative
>>>>> is we re-install a flow that may be different then the flow that the user actually
>>>>> wants installed.
>>>>
>>>> If the flow is modified, it means that there was a traffic on a learn()
>>>> action and we have to learn that specific rule.  If it was already removed,
>>>> we have to reinstate it.  In general, the logic in the current OVS code
>>>> seems correct in this particular place.
>>>>
>>>>>
>>>>> Isn't there a risk of different revalidator threads going out of sync with different
>>>>> versions of a learned rule in their xcache?
>>>>>
>>>>>
>>>>>     It also seems like we may hold the reference to this deleted rule for a
>>>>>     long time in the cache until the parent datapath flow is deleted.
>>>>>
>>>>>
>>>>>     What is triggering the deletion of the learned rule in your case?
>>>>>     Is it deletion or modification of the parent rule?  And does it have
>>>>>     'delete_learned' flag?
>>>>>
>>>>>
>>>>> In the case that I was debugging, the parent learn action was not deleted. Instead,
>>>>> new traffic from a different port was supposed to cause the learn action to modify
>>>>> the existing rule. But instead we see the rule modified with the correct information,
>>>>> and then immediately reverted with old stale information.
>>>>
>>>> The problem here is that we can't define which one is 'correct'.
>>>> It's a race where both values are correct, but you like one better
>>>> for some reason.  New rules are only learned if the parent datapath
>>>> flow that has a learn() action seen packets after the previous stats
>>>> check.
>>>>
>>>> What you're describing is that you have a single rule with a learn
>>>> action that is capable of re-writing the same learned rule depending
>>>> on what kind of packet actually hits it.  Is that correct?
>>>
>>> In this case the rule is:
>>>
>>> learn(table=20,hard_timeout=300,priority=1,cookie=0xcb684d24d14b14ad,>>>       NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],
>>>       load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],
>>>       output:OXM_OF_IN_PORT[]),output:1
>>>
>>> And an example of the learned rule is:
>>>
>>> vlan_tci=0x0005/0x0fff,dl_dst=fc:16:3e:3e:5d:e0,
>>>     actions=load:0->NXM_OF_VLAN_TCI[],load:0x3->NXM_NX_TUN_ID[],output:2

Could you also show how the datapath flows look like for the learning
flow?  For both ingress ports - A and B.  Since you have TUN_ID in the
actions, I suppose these should have TUN_ID and IN_PORT as a match.

>>>
>>> The scenario is that ingressing packets are supposed to change the
>>> route that egress packets take, and that route can change over time.
>>>
>>> What I see from some additional logging is:
>>>
>>> 1. traffic begins to ingress on port A
>>> 2. learn action installs a rule to route response to A
>>> 3. traffic begins to egress on port A
>>> 4. port A becomes invalid, no further traffic ingresses from port A
>>> 5. traffic ingresses on port B
>>> 6. learn action installs a rule to route response to B
>>> 7. revalidator replaces (6) with (2)
>>>
>>> Eventually the egressing traffic will become exausted and the correct
>>> rule is installed, but this can take several minutes. The rule
>>> regression can even happen many milliseconds after learn installs the
>>> correct rule and even after packets have flowed through the new rule.
>>
>> The rule can be replaced if there was no revalidation between steps
>> 4 and 5, i.e. at the point of revalidation there is a stats increase
>> on both datapath flows (ingress A and ingress B).
>> Revalidators can sleep for up to 500ms by default between revalidation
>> cycles, so the replacement can happen many milliseconds after.
>>
>> Is the traffic on port B continuous, or is it a one-shot packet?
> 
> Continuous but less frequent than traffic in the opposite direction.

IIUC, the traffic in the opposite direction is going through the
learned rule, not the learning one.  In this case, the frequency
of that traffic doesn't matter.  We only care about packets flowing
through the original learning rule, unless you have a lot of drops
or otherwise unhealthy system that is causing packet drops on the
learning datapath flow.

> In a client's environment there were ingress packets every few seconds
> and egress packets continued to flow incorrectly for several minutes
> afterwards.

If the traffic ingresses from B every few seconds and there is no
traffic ingressing from A, I'd expect the issue to be resolved on
the next revalidation.  And I don't see a reason why this doesn't
happen for several minutes.  Unless revalidation is not happening
due to a bug below.

> If this solution is unacceptable; I could resubmit with an additional
> check to only suppress the flow_mod if the learn_modify rule is in a
> removed state and there is already a conflicting rule in the
> classifier that isn't removed. Or would you prefer a completely
> different approach be taken here?

The "revalidation is not requested" bug described below is the
only issue I see in the current code.  The rule replacement is
correct, AFAICT.

If the fixing of the revalidation request is not fixing the issue
within a few seconds, then we have a bug that we do not understand.
And this bug is likely somewhere else.  Meaning, we have not
identified a root cause of the issue.  Changing the correct rule
re-instantiation code doesn't make much sense if we do not understand
what is actually going wrong in the setup.

> 
> 
> Thanks,
> M
> 
>> The point is, at least one revalidation cycle should pass after step 4.
>> And the traffic on port B should be seen after this revalidation cycle
>> is completed.  So, step 5 should be at least 500+ ms apart from step 4.
>>
>>>
>>>
>>>> I would consider this kind of setup as fairly fragile and would not
>>>> recommend using it in general.
>>>>
>>>> However, the scenario is likely like this:
>>>>
>>>> 1. Upcall on a rule with learn() action.
>>>>    New datapath flow (1) is installed.  This flow has a match on
>>>>    all the fields used in a learn action and ukey contains the
>>>>    translation cache with a rule that it produces.
>>>>    The new learned rule is installed from this.
>>>>
>>>> 2. Upcall with a different packet that doesn't match the first
>>>>    datapath flow (1), but on the same rule with a learn() action.
>>>>    New datapath flow (2) is installed.  This flow has a match on
>>>>    all the fields used in a learn action (they are different from
>>>>    the first one, since we have an upcall) and ukey contains the
>>>>    translation cache with a rule that it produces.  Let's say the
>>>>    learned rule has the same match, but different actions.
>>>>    This overwrites the previously installed learned rule.
>>>>
>>>> 3. Both datapath flows (1) and (2) get some traffic.
>>>>
>>>> 4. Revalidation starts.
>>>>
>>>> 5. Revalidator A dumps datapath flow (1).
>>>>    Revalidator B dumps datapath flow (2).
>>>>
>>>> 6. Revalidators A and B are trying to revalidate these datapath flows
>>>>    at the same time.  Both datapath flows had traffic since the
>>>>    last revalidation.  Both revalidators are trying to update the
>>>>    learned rule, which is the same for both of them.
>>>>
>>>> 7. One of them does that first, the other one ends up with a rule in a
>>>>    REMOVED state, since it was just replaced by the other revalidator.
>>>>
>>>>
>>>> There is no correct or incorrect rule in this situation.  Both
>>>> of them are correct, because both datapath flows have seen packets
>>>> since the previous revalidation.  We can't guarantee which one will
>>>> end up being written the last and therefore will take effect.
>>>> if we refuse to reinstate the REMOVED rule, we still can't say which
>>>> one will be active in the end.
>>>>
>>>> What can resolve the race is the next revalidation cycle.
>>>> Once one of the datapath flows will stop receiving traffic, there
>>>> should no longer be a race condition and the next revalidation will
>>>> set the "correct" rule, i.e. the rule that corresponds to a learning
>>>> datapath flow that seen some packets last.  But until both flows have
>>>> traffic, there is no "correct" rule and the result will be random,
>>>> regardless of what we choose to do.
>>>>
>>>>
>>>> One issue that I see in the current code is that it looks like the table
>>>> version is not getting updated on learned flow refresh and hence the next
>>>> revalidation may not actually be triggered.  In this case the random
>>>> result may stay for much longer than we would like.
>>>>
>>>> The suspicious code (and the potential fix) is the following:
>>>>
>>>> @@ -5548,22 +5558,15 @@ ofproto_flow_mod_learn_revert(struct ofproto_flow_mod *ofm)
>>>>  }
>>>>
>>>>  enum ofperr
>>>> -ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm,
>>>> -                              struct ofproto *orig_ofproto)
>>>> +ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm)
>>>>      OVS_REQUIRES(ofproto_mutex)
>>>>  {
>>>>      struct rule *rule = rule_collection_rules(&ofm->new_rules)[0];
>>>>      enum ofperr error = 0;
>>>>
>>>> -    /* If learning on a different bridge, must bump its version
>>>> -     * number and flush connmgr afterwards. */
>>>> -    if (rule->ofproto != orig_ofproto) {
>>>> -        ofproto_bump_tables_version(rule->ofproto);
>>>> -    }
>>>> +    ofproto_bump_tables_version(rule->ofproto);
>>>>      error = ofproto_flow_mod_finish(rule->ofproto, ofm, NULL);
>>>> -    if (rule->ofproto != orig_ofproto) {
>>>> -        ofmonitor_flush(rule->ofproto->connmgr);
>>>> -    }
>>>> +    ofmonitor_flush(rule->ofproto->connmgr);
>>>>
>>>>      return error;
>>>>  }
>>>> ---
>>>>
>>>> We should actually bump the version and request a revalidation in
>>>> any case, not only when the change is in a different bridge.
>>
>> FWIW, the original code in commit 1f4a89336682 was written mostly
>> for a case where learning is triggered by a packet-out executed
>> within a bundle.  And we can't actually publish changes for a current
>> bridge in this case.  So, the fix should be a bit more clever.
>>
>>>>
>>>> Could you try that in your setup?
>>>
>>> That was actually my initial suspicion when investigating this bug,
>>> but this change doesn't improve the buggy behaviour.

Could you describe a buggy behavior again?  Is it that you can see
the traffic still going to an old port several minutes after you
stopped receiving any packets from this old port, but did receive
packets from a new port hitting the learning rule every few seconds?

If that is a correct description and that is happening after the
change above is applied, we have a serious issue somewhere else.

>>>
>>> Thank you,
>>> M
>>>
>>>>
>>>>
>>>> The issue appears to come from the commit:
>>>>   1f4a89336682 ("ofproto: Refactor packet_out handling.")
>>>>
>>>> There is a slight chance of launching ourselves into an infinite
>>>> revlidation loop, I guess.  Especially if both learning datapath
>>>> flows will continue to receive the traffic, but there is a minimal
>>>> revalidation interval, IIRC.  Not sure if it applies in this case.
>>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>
>>
>
diff mbox series

Patch

diff --git a/ofproto/ofproto-dpif-xlate-cache.c b/ofproto/ofproto-dpif-xlate-cache.c
index 9224ee2e6..843903614 100644
--- a/ofproto/ofproto-dpif-xlate-cache.c
+++ b/ofproto/ofproto-dpif-xlate-cache.c
@@ -125,7 +125,7 @@  xlate_push_stats_entry(struct xc_entry *entry,
     case XC_LEARN: {
         enum ofperr error;
         error = ofproto_flow_mod_learn(entry->learn.ofm, true,
-                                       entry->learn.limit, NULL);
+                                       entry->learn.limit, NULL, false);
         if (error) {
             static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
             VLOG_WARN_RL(&rl, "xcache LEARN action execution failed.");
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 4928ea99c..8aa2ed699 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -5701,7 +5701,7 @@  xlate_learn_action(struct xlate_ctx *ctx, const struct ofpact_learn *learn)
             bool success = true;
             if (ctx->xin->allow_side_effects) {
                 error = ofproto_flow_mod_learn(ofm, ctx->xin->xcache != NULL,
-                                               learn->limit, &success);
+                                               learn->limit, &success, true);
             } else if (learn->limit) {
                 if (!ofm->temp_rule
                     || ofm->temp_rule->state != RULE_INSERTED) {
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index fad7342b0..465f2a3f4 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4876,7 +4876,7 @@  packet_xlate(struct ofproto *ofproto_, struct ofproto_packet_out *opo)
             if (entry->type == XC_LEARN) {
                 struct ofproto_flow_mod *ofm = entry->learn.ofm;
 
-                error = ofproto_flow_mod_learn_refresh(ofm);
+                error = ofproto_flow_mod_learn_refresh(ofm, true);
                 if (error) {
                     goto error_out;
                 }
diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
index 143ded690..de1823c89 100644
--- a/ofproto/ofproto-provider.h
+++ b/ofproto/ofproto-provider.h
@@ -2027,9 +2027,11 @@  enum ofperr ofproto_flow_mod_init_for_learn(struct ofproto *,
                                             struct ofproto_flow_mod *)
     OVS_EXCLUDED(ofproto_mutex);
 enum ofperr ofproto_flow_mod_learn(struct ofproto_flow_mod *, bool keep_ref,
-                                   unsigned limit, bool *below_limit)
+                                   unsigned limit, bool *below_limit,
+                                   bool recreate)
     OVS_EXCLUDED(ofproto_mutex);
-enum ofperr ofproto_flow_mod_learn_refresh(struct ofproto_flow_mod *ofm);
+enum ofperr ofproto_flow_mod_learn_refresh(struct ofproto_flow_mod *ofm,
+                                           bool recreate);
 enum ofperr ofproto_flow_mod_learn_start(struct ofproto_flow_mod *ofm)
     OVS_REQUIRES(ofproto_mutex);
 void ofproto_flow_mod_learn_revert(struct ofproto_flow_mod *ofm)
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index dbf4958bc..dd2562acb 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -5472,7 +5472,7 @@  ofproto_flow_mod_init_for_learn(struct ofproto *ofproto,
 }
 
 enum ofperr
-ofproto_flow_mod_learn_refresh(struct ofproto_flow_mod *ofm)
+ofproto_flow_mod_learn_refresh(struct ofproto_flow_mod *ofm, bool recreate)
 {
     enum ofperr error = 0;
 
@@ -5492,6 +5492,11 @@  ofproto_flow_mod_learn_refresh(struct ofproto_flow_mod *ofm)
      * after we have read the 'rule->state' below.  In this case the next time
      * this function is executed the rule will be reinstated. */
     if (rule->state == RULE_REMOVED) {
+        if (!recreate) {
+            /* If the rule is marked as removed and the caller doesn't want it
+             * recreated, just return an error here as if there was no rule. */
+            return OFPERR_OFPFMFC_UNKNOWN;
+        }
         struct cls_rule cr;
 
         cls_rule_clone(&cr, &rule->cr);
@@ -5565,10 +5570,12 @@  ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm,
 
 /* Refresh 'ofm->temp_rule', for which the caller holds a reference, if already
  * in the classifier, insert it otherwise.  If the rule has already been
- * removed from the classifier, a new rule is created using 'ofm->temp_rule' as
- * a template and the reference to the old 'ofm->temp_rule' is freed.  If
- * 'keep_ref' is true, then a reference to the current rule is held, otherwise
- * it is released and 'ofm->temp_rule' is set to NULL.
+ * removed from the classifier and 'recreate' is true, a new rule is created
+ * using 'ofm->temp_rule' as a template and the reference to the old
+ * 'ofm->temp_rule' is freed.  If recreate is false and the rule has been
+ * removed, do not reinsert a new copy of that rule. If 'keep_ref' is true,
+ * then a reference to the current rule is held, otherwise it is released and
+ * 'ofm->temp_rule' is set to NULL.
  *
  * If 'limit' != 0, insertion will fail if there are more than 'limit' rules
  * in the same table with the same cookie.  If insertion succeeds,
@@ -5579,10 +5586,10 @@  ofproto_flow_mod_learn_finish(struct ofproto_flow_mod *ofm,
  * during the call. */
 enum ofperr
 ofproto_flow_mod_learn(struct ofproto_flow_mod *ofm, bool keep_ref,
-                       unsigned limit, bool *below_limitp)
+                       unsigned limit, bool *below_limitp, bool recreate)
     OVS_EXCLUDED(ofproto_mutex)
 {
-    enum ofperr error = ofproto_flow_mod_learn_refresh(ofm);
+    enum ofperr error = ofproto_flow_mod_learn_refresh(ofm, recreate);
     struct rule *rule = ofm->temp_rule;
     bool below_limit = true;