mbox series

[ovs-dev,RFC,0/7] OVN IC bugfixes & proposals/questions

Message ID 20221118162050.3019353-1-odivlad@gmail.com
Headers show
Series OVN IC bugfixes & proposals/questions | expand

Message

Vladislav Odintsov Nov. 18, 2022, 4:20 p.m. UTC
Hi,

we’ve met with an issue, where it was possible to create multiple similar
routes within LR (same ip_prefix, nexthop, and route table).  Initially
this was done using python ovsdbapp library, but the problem itself
touches OVN and even OVS.  Sorry for the long read, but it seems that
there are a couple of bugs in different places, part of which this RFC
used to cover.

How the issue was initially reproduced:

1. assume we have (at least) 2-Availability Zone OVN deployment
   (utilising ovn-ic infrastructure).
2. create transit switch in IC NB
3. create LR in each AZ, connect them to transit switch
4. create one logical switch with a VIF port attached to local OVS &
   connect this logical switch to LR (e.g. 192.168.0.1/24)
5. install in one AZ in LR 2 static routes with a create command (invoke
   next command twice):

   ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 nexthop=192.168.0.10 -- logical_router add lr1 static_routes @id

From this time there is a couple of strange behaviour/bugs appear:

1. [possible problem] There is a duplicated route in the NB within a
   single LR.  lflow is computed to have ECMP group with two similar
   routes:

   table=11(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1; reg8[16..31] = select(1, 2);
   table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 1 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
   table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 2 && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)

   Maybe, it’s better to have some kind of handling such routes?
   ovsdb index or some logic in ovn-northd?

2. [bug] There is a duplicated route advertisement in
   OVN_IC_Southbound:Route table.  IMO, this should be fixed by adding a
   new index to this table for availability_zone, transit_switch,
   ip_prefix, nexthop and route_table; adding a logic to check if the
   route was already advertised (covered in Patch #7).

3. [bug] There is a constant same route learning.  Each ovn-ic iteration
   on the opposite availability zone adds one new same route.  It creates
   thousands of same routes each second. This bug is covered by Patch #7.

4. [possible problem] After multiple routes are learned to NB on the
   opposite availability zone, ovn-northd generates ecmp lflows.  Same as
   in #1: one in lr_in_ip_routing with select(<thousands of elements>)
   and thousands of same records in lr_in_ip_routing_ecmp.  OVN allows
   installing UINT_MAX routes within ECMP group.

5. [OVS bug?] I'd like someone from OVS team to see on this.
   ovn-controller installed long-long openflow group rule
   (group #3):

   # ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c
   797824

   When I try to dump groups with ovs-ofctl dump-groups br-int, I get
   next error in console:

   # ovs-ofctl dump-groups br-int
   ovs-ofctl: OpenFlow packet receive failed (End of file)

   In ovs-vswitchd I see next error in logs and after this line ovs is
   restarted:

   2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion start_ofs <= UINT16_MAX failed in ofpmp_postappend()

   If I issue command again, sometimes it prints same error, but
   sometimes this one (I had on the dev machine another OVN LB, so there
   are excess groups):

   # ovs-ofctl dump-groups br-int
   NXST_GROUP_DESC reply (xid=0x2): flags=[more]
   group_id=3,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
   group_id=1,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
   2022-11-17T17:53:41Z|00001|ofp_group|WARN|OpenFlow message bucket length 56 exceeds remaining buckets data size 40
   NXST_GROUP_DESC reply (xid=0x2): ***decode error: OFPGMFC_BAD_BUCKET***
   00000000  01 11 a9 58 00 00 00 02-ff ff 00 00 00 00 23 20 |...X..........# |
   00000010  00 00 00 08 00 00 00 00-a9 40 01 00 00 00 00 02 |.........@......|
   00000020  a9 08 00 00 00 00 00 00-00 38 00 28 00 00 00 00 |.........8.(....|
   00000030  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   00000040  00 00 00 00 00 00 00 01-ff ff 00 10 00 00 23 20 |..............# |
   00000050  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
   00000060  00 38 00 28 00 00 00 01-ff ff 00 18 00 00 23 20 |.8.(..........# |
   00000070  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 02 |................|
   00000080  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
   00000090  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 02 |.....d...8.(....|
   000000a0  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   000000b0  00 00 00 00 00 00 00 03-ff ff 00 10 00 00 23 20 |..............# |
   000000c0  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
   000000d0  00 38 00 28 00 00 00 03-ff ff 00 18 00 00 23 20 |.8.(..........# |
   000000e0  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 04 |................|
   000000f0  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
   00000100  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 04 |.....d...8.(....|
   00000110  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   00000120  00 00 00 00 00 00 00 05-ff ff 00 10 00 00 23 20 |..............# |
   00000130  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
   00000140  00 38 00 28 00 00 00 05-ff ff 00 18 00 00 23 20 |.8.(..........# |
   00000150  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 06 |................|
   00000160  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
   00000170  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 06 |.....d...8.(....|
   00000180  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   00000190  00 00 00 00 00 00 00 07-ff ff 00 10 00 00 23 20 |..............# |
   000001a0  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|
   000001b0  00 38 00 28 00 00 00 07-ff ff 00 18 00 00 23 20 |.8.(..........# |
   000001c0  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 08 |................|
   000001d0  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# ........|
   000001e0  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 08 |.....d...8.(....|
   000001f0  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# ........|
   00000200  00 00 00 00 00 00 00 09-ff ff 00 10 00 00 23 20 |..............# |
   00000210  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 |.............d..|

7. From this problem with groups-dump I have some questions:
   1. Is there a limit for a buckets count in group? Or a limit for the
      group string length?
   2. If yes, should OVN limit on its side the count of buckets in a
      group? (Patches #4 && #6).

8. Also I’ve tried to see from which values do these problem with
   dump-groups begin. I created in a for-loop in OVN multiple ECMP routes
   and see that starting from 1200 items in a group the error from last
   example appear. I tried to create 10k buckets and while it was
   configuring on my machine there were also next lines in logfile:

   2022-11-17T18:23:30.992Z|00554|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce
   2022-11-17T18:23:31.992Z|00555|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce
   2022-11-17T18:23:33.993Z|00556|ovs_rcu(urcu6)|WARN|blocked 4001 ms waiting for main to quiesce

   When the routes finished creating, I've issued ovs-ofctl dump-groups br-int
   and there was just an error:

   # ovs-ofctl dump-groups br-int
   ovs-ofctl: OpenFlow packet receive failed (End of file)

   And OVS crashed. OVS 2.17.3 is used.

   My script:

# cat ./repro.sh
#!/bin/bash

count=$1

echo "Creating ${count} same routes..."

ovn-nbctl lr-route-del lr1 1.2.3.4/32

for i in $(seq 1 ${count}); do
    echo $i
    ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 nexthop=172.31.32.4 policy=dst-ip -- add logical-router vpc-FC7D6A54 static_routes @id
done

Thanks for reading this, I'm ready to provide any additional information to help investigate this.

Vladislav Odintsov (7):
  ic: move routes_ad hmap insert to separate function
  ic: remove orphan ovn interconnection routes
  ic: lookup southbound port_binding only if needed
  actions: limit possible OF group bucket count
  ic: minor code improvements
  northd: limit ECMP group by 1024 members
  ic: prevent advertising/learning multiple same routes

 ic/ovn-ic.c         | 123 ++++++++++++++++++++++++++++------------
 lib/actions.c       |  40 ++++++++++++-
 northd/northd.c     |   2 +-
 ovn-ic-sb.ovsschema |   6 +-
 tests/ovn-ic.at     | 133 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 263 insertions(+), 41 deletions(-)