diff mbox series

[ovs-dev] rhel/systemd: Set ovsdb-server timeout to 5 minutes

Message ID 20240410154855.146644-1-chris.riches@nutanix.com
State Accepted
Commit e876b046630bdd141387c5054407acdf96bc87ce
Delegated to: Simon Horman
Headers show
Series [ovs-dev] rhel/systemd: Set ovsdb-server timeout to 5 minutes | expand

Commit Message

Chris Riches April 10, 2024, 3:48 p.m. UTC
If the database is particularly large (multi-GB), ovsdb-server can take
several minutes to come up. This tends to fall afoul of the default
systemd start timeout, which is typically 90s, putting the service into
an infinite restart loop.

To avoid this, set the timeout to a more generous 5 minutes.

This change brings ovsdb-server's timeout in line with ovs-vswitchd,
which got the same treatment in commit c1c69e8a45 ("rhel/systemd: Set
ovs-vswitchd timeout to 5 minutes").

Signed-off-by: Chris Riches <chris.riches@nutanix.com>
---
 rhel/usr_lib_systemd_system_ovsdb-server.service | 1 +
 1 file changed, 1 insertion(+)

Comments

Ilya Maximets April 10, 2024, 10:31 p.m. UTC | #1
On 4/10/24 17:48, Chris Riches wrote:
> If the database is particularly large (multi-GB), ovsdb-server can take

Hi, Chris.  May I ask how did you end up with multi-GB database?
I would understand if it was an OVN Southbound DB, for example,
but why the local database that only stores ports/bridges and
some other not that large things ends up with so much data?

Sounds a little strange.

Best regards, Ilya Maximets.

> several minutes to come up. This tends to fall afoul of the default
> systemd start timeout, which is typically 90s, putting the service into
> an infinite restart loop.
> 
> To avoid this, set the timeout to a more generous 5 minutes.
> 
> This change brings ovsdb-server's timeout in line with ovs-vswitchd,
> which got the same treatment in commit c1c69e8a45 ("rhel/systemd: Set
> ovs-vswitchd timeout to 5 minutes").
> 
> Signed-off-by: Chris Riches <chris.riches@nutanix.com>
> ---
>  rhel/usr_lib_systemd_system_ovsdb-server.service | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/rhel/usr_lib_systemd_system_ovsdb-server.service b/rhel/usr_lib_systemd_system_ovsdb-server.service
> index 49dc06e38..558632320 100644
> --- a/rhel/usr_lib_systemd_system_ovsdb-server.service
> +++ b/rhel/usr_lib_systemd_system_ovsdb-server.service
> @@ -29,3 +29,4 @@ ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd stop
>  ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd \
>             ${OVS_USER_OPT} \
>             --no-monitor restart $OPTIONS
> +TimeoutSec=300
Chris Riches April 11, 2024, 8:59 a.m. UTC | #2
On 10/04/2024 23:31, Ilya Maximets wrote:
> On 4/10/24 17:48, Chris Riches wrote:
>> If the database is particularly large (multi-GB), ovsdb-server can take
> Hi, Chris.  May I ask how did you end up with multi-GB database?
> I would understand if it was an OVN Southbound DB, for example,
> but why the local database that only stores ports/bridges and
> some other not that large things ends up with so much data?
>
> Sounds a little strange.
>
> Best regards, Ilya Maximets.

I'd like to understand that too, and it's a separate RCA we're working 
on but haven't reached a conclusion yet.

 From what we know so far, the DB was full of stale connection-tracking 
information such as the following:

{
   "_date": 1710858766431,
   "Bridge": {
     "49cb85cd-b085-4af8-98a2-56030dd614b9": {
       "external_ids": [
         "map",
         [
           [
"ct-zone-lrp-ext_gw_port_48a89ae3-6528-4851-a277-e21db02518ad",
             "4"
           ],
           [
             "external",
             "true"
           ]
         ]
       ]
     }
   },
   "_comment": "ovn-controller: modifying OVS tunnels 
'5995b338-3080-44b1-9251-58080cc878f7'"
}

Once the host was recovered by putting in the timeout increase, 
ovsdb-server successfully started and GCed the database down from 2.4 
*GB* to 29 *KB*. Had this happened before the host restart, we would 
have never seen this problem. But since it seems possible to end up 
booting with such a large DB, we figured a timeout increase was a 
sensible measure to take.
Ilya Maximets April 11, 2024, 1:24 p.m. UTC | #3
On 4/11/24 10:59, Chris Riches wrote:
> On 10/04/2024 23:31, Ilya Maximets wrote:
>> On 4/10/24 17:48, Chris Riches wrote:
>>> If the database is particularly large (multi-GB), ovsdb-server can take
>> Hi, Chris.  May I ask how did you end up with multi-GB database?
>> I would understand if it was an OVN Southbound DB, for example,
>> but why the local database that only stores ports/bridges and
>> some other not that large things ends up with so much data?
>>
>> Sounds a little strange.
>>
>> Best regards, Ilya Maximets.
> 
> I'd like to understand that too, and it's a separate RCA we're working 
> on but haven't reached a conclusion yet.
> 
>  From what we know so far, the DB was full of stale connection-tracking 
> information such as the following:
> 
> {
>    "_date": 1710858766431,
>    "Bridge": {
>      "49cb85cd-b085-4af8-98a2-56030dd614b9": {
>        "external_ids": [
>          "map",
>          [
>            [
> "ct-zone-lrp-ext_gw_port_48a89ae3-6528-4851-a277-e21db02518ad",
>              "4"
>            ],
>            [
>              "external",
>              "true"
>            ]
>          ]
>        ]
>      }
>    },
>    "_comment": "ovn-controller: modifying OVS tunnels 
> '5995b338-3080-44b1-9251-58080cc878f7'"
> }
> 
> Once the host was recovered by putting in the timeout increase, 
> ovsdb-server successfully started and GCed the database down from 2.4 
> *GB* to 29 *KB*. Had this happened before the host restart, we would 
> have never seen this problem. But since it seems possible to end up 
> booting with such a large DB, we figured a timeout increase was a 
> sensible measure to take.

Uff.  Sounds like ovn-controller went off the rails.

Normally, ovsdb-server compacts the database once in 10-20 minutes,
if the database doubles the size since the previous check.  If all
the transactions are that small, it would mean ovn-controller made
about 10K transactions per second in the 10-20 minutes before the
restart.  That's huge.

I wonder if this can be addressed with a better compaction strategy.
Something like forcing compaction if "the database is more than 10 MB
and increased 10x" regardless of the time.

Bets regards, Ilya Maximets.
Chris Riches April 11, 2024, 1:43 p.m. UTC | #4
On 11/04/2024 14:24, Ilya Maximets wrote:
> On 4/11/24 10:59, Chris Riches wrote:
>>  From what we know so far, the DB was full of stale connection-tracking
>> information such as the following:
>>
>> [...]
>>
>> Once the host was recovered by putting in the timeout increase,
>> ovsdb-server successfully started and GCed the database down from 2.4
>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>> have never seen this problem. But since it seems possible to end up
>> booting with such a large DB, we figured a timeout increase was a
>> sensible measure to take.
> Uff.  Sounds like ovn-controller went off the rails.
>
> Normally, ovsdb-server compacts the database once in 10-20 minutes,
> if the database doubles the size since the previous check.  If all
> the transactions are that small, it would mean ovn-controller made
> about 10K transactions per second in the 10-20 minutes before the
> restart.  That's huge.
>
> I wonder if this can be addressed with a better compaction strategy.
> Something like forcing compaction if "the database is more than 10 MB
> and increased 10x" regardless of the time.

I'm not sure exactly what the test was doing when this was observed, so 
I don't know whether that transaction volume is within the realm of 
possibility or if we're looking at a failure to perform compaction on 
time. It would be nice to have an enhanced safety-net for DB size, as we 
were only a few hundred MB away from hitting filesystem space issues as 
well.

> Normally, ovsdb-server compacts the database once in 10-20 minutes, if 
> the database doubles the size since the previous check.

I presume you mean if it doubled in size since the previous 
*compaction*? If we only compact when it doubles since the last *check*, 
then it would be easy for it to slightly-less-than-double every 10-20 
minutes and never trigger the compaction while still growing exponentially.

I'm happy to discuss compaction approaches (though my expertise is very 
much in host service management and not OVS itself), but do you think 
there's merit in having this extended timeout as a backstop too?
Dumitru Ceara April 11, 2024, 4:10 p.m. UTC | #5
On 4/11/24 15:43, Chris Riches wrote:
> On 11/04/2024 14:24, Ilya Maximets wrote:
>> On 4/11/24 10:59, Chris Riches wrote:

Hi Chris, Ilya,

>>>  From what we know so far, the DB was full of stale connection-tracking
>>> information such as the following:
>>>
>>> [...]
>>>
>>> Once the host was recovered by putting in the timeout increase,
>>> ovsdb-server successfully started and GCed the database down from 2.4
>>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>>> have never seen this problem. But since it seems possible to end up
>>> booting with such a large DB, we figured a timeout increase was a
>>> sensible measure to take.
>> Uff.  Sounds like ovn-controller went off the rails.
>>
>> Normally, ovsdb-server compacts the database once in 10-20 minutes,
>> if the database doubles the size since the previous check.  If all
>> the transactions are that small, it would mean ovn-controller made
>> about 10K transactions per second in the 10-20 minutes before the
>> restart.  That's huge.
>>
>> I wonder if this can be addressed with a better compaction strategy.
>> Something like forcing compaction if "the database is more than 10 MB
>> and increased 10x" regardless of the time.
> 
> I'm not sure exactly what the test was doing when this was observed, so
> I don't know whether that transaction volume is within the realm of
> possibility or if we're looking at a failure to perform compaction on
> time. It would be nice to have an enhanced safety-net for DB size, as we
> were only a few hundred MB away from hitting filesystem space issues as
> well.
> 

To rule out any known issues, what OVN version is running on that setup?

>> Normally, ovsdb-server compacts the database once in 10-20 minutes, if
>> the database doubles the size since the previous check.
> 
> I presume you mean if it doubled in size since the previous
> *compaction*? If we only compact when it doubles since the last *check*,
> then it would be easy for it to slightly-less-than-double every 10-20
> minutes and never trigger the compaction while still growing exponentially.
> 
> I'm happy to discuss compaction approaches (though my expertise is very
> much in host service management and not OVS itself), but do you think
> there's merit in having this extended timeout as a backstop too?
> 

Regards,
Dumitru
Chris Riches April 11, 2024, 5:10 p.m. UTC | #6
On 11/04/2024 17:10, Dumitru Ceara wrote:
> On 4/11/24 15:43, Chris Riches wrote:
>> On 11/04/2024 14:24, Ilya Maximets wrote:
>>> On 4/11/24 10:59, Chris Riches wrote:
> Hi Chris, Ilya,
>
>>>>   From what we know so far, the DB was full of stale connection-tracking
>>>> information such as the following:
>>>>
>>>> [...]
>>>>
>>>> Once the host was recovered by putting in the timeout increase,
>>>> ovsdb-server successfully started and GCed the database down from 2.4
>>>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>>>> have never seen this problem. But since it seems possible to end up
>>>> booting with such a large DB, we figured a timeout increase was a
>>>> sensible measure to take.
>>> Uff.  Sounds like ovn-controller went off the rails.
>>>
>>> Normally, ovsdb-server compacts the database once in 10-20 minutes,
>>> if the database doubles the size since the previous check.  If all
>>> the transactions are that small, it would mean ovn-controller made
>>> about 10K transactions per second in the 10-20 minutes before the
>>> restart.  That's huge.
>>>
>>> I wonder if this can be addressed with a better compaction strategy.
>>> Something like forcing compaction if "the database is more than 10 MB
>>> and increased 10x" regardless of the time.
>> I'm not sure exactly what the test was doing when this was observed, so
>> I don't know whether that transaction volume is within the realm of
>> possibility or if we're looking at a failure to perform compaction on
>> time. It would be nice to have an enhanced safety-net for DB size, as we
>> were only a few hundred MB away from hitting filesystem space issues as
>> well.
>>
> To rule out any known issues, what OVN version is running on that setup?
This was during an upgrade test. We started with OVN 20.9, and this 
produced the massive DB. We then upgraded to 21.9 and rebooted, which 
failed to come up as described due to the massive DB.

Our networking team doing the RCA think that the system was rapidly 
flapping external ports between two configurations, hence the excessive 
DB transactions. The root cause of flapping is yet to be determined but 
these transactions were being done from OVN itself. They raised the 
theory that the flapping was so intense that ovsdb didn't actually get a 
chance to compact at all - is this a possibility?

I've CCed Priyankar who is in charge of the RCA.
Dumitru Ceara April 12, 2024, 9:20 a.m. UTC | #7
On 4/11/24 19:10, Chris Riches wrote:
> On 11/04/2024 17:10, Dumitru Ceara wrote:
>> On 4/11/24 15:43, Chris Riches wrote:
>>> On 11/04/2024 14:24, Ilya Maximets wrote:
>>>> On 4/11/24 10:59, Chris Riches wrote:
>> Hi Chris, Ilya,
>>
>>>>>   From what we know so far, the DB was full of stale
>>>>> connection-tracking
>>>>> information such as the following:
>>>>>
>>>>> [...]
>>>>>
>>>>> Once the host was recovered by putting in the timeout increase,
>>>>> ovsdb-server successfully started and GCed the database down from 2.4
>>>>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>>>>> have never seen this problem. But since it seems possible to end up
>>>>> booting with such a large DB, we figured a timeout increase was a
>>>>> sensible measure to take.
>>>> Uff.  Sounds like ovn-controller went off the rails.
>>>>
>>>> Normally, ovsdb-server compacts the database once in 10-20 minutes,
>>>> if the database doubles the size since the previous check.  If all
>>>> the transactions are that small, it would mean ovn-controller made
>>>> about 10K transactions per second in the 10-20 minutes before the
>>>> restart.  That's huge.
>>>>
>>>> I wonder if this can be addressed with a better compaction strategy.
>>>> Something like forcing compaction if "the database is more than 10 MB
>>>> and increased 10x" regardless of the time.
>>> I'm not sure exactly what the test was doing when this was observed, so
>>> I don't know whether that transaction volume is within the realm of
>>> possibility or if we're looking at a failure to perform compaction on
>>> time. It would be nice to have an enhanced safety-net for DB size, as we
>>> were only a few hundred MB away from hitting filesystem space issues as
>>> well.
>>>
>> To rule out any known issues, what OVN version is running on that setup?
> This was during an upgrade test. We started with OVN 20.9, and this
> produced the massive DB. We then upgraded to 21.9 and rebooted, which
> failed to come up as described due to the massive DB.
> 

Both 20.09 and 21.09 are not supported for a while now.  Currently
supported releases are 24.03, 23.09, 23.06, 22.03:

https://www.ovn.org/en/releases/all_releases/

> Our networking team doing the RCA think that the system was rapidly
> flapping external ports between two configurations, hence the excessive
> DB transactions. The root cause of flapping is yet to be determined but
> these transactions were being done from OVN itself. They raised the

Maybe you're missing these commits (it's hard to say without knowing the
exact version you're running - "21.09" is vague, we need the z version
too, e.g. 21.09.0 or 21.09.1):

https://github.com/ovn-org/ovn/commit/d4bca93c08
https://github.com/ovn-org/ovn/commit/6fb87aad8c

> theory that the flapping was so intense that ovsdb didn't actually get a
> chance to compact at all - is this a possibility?
> 

It doesn't sound possible to me but I'll let Ilya comment on this.

> I've CCed Priyankar who is in charge of the RCA.
> 

I've CCed Numan in case he has more ideas about what could cause this.

Regards,
Dumitru
Chris Riches April 12, 2024, 9:44 a.m. UTC | #8
On 12/04/2024 10:20, Dumitru Ceara wrote:
> On 4/11/24 19:10, Chris Riches wrote:
>> On 11/04/2024 17:10, Dumitru Ceara wrote:
>>> On 4/11/24 15:43, Chris Riches wrote:
>>>> On 11/04/2024 14:24, Ilya Maximets wrote:
>>>>> On 4/11/24 10:59, Chris Riches wrote:
>>> Hi Chris, Ilya,
>>>
>>>>>>    From what we know so far, the DB was full of stale
>>>>>> connection-tracking
>>>>>> information such as the following:
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> Once the host was recovered by putting in the timeout increase,
>>>>>> ovsdb-server successfully started and GCed the database down from 2.4
>>>>>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>>>>>> have never seen this problem. But since it seems possible to end up
>>>>>> booting with such a large DB, we figured a timeout increase was a
>>>>>> sensible measure to take.
>>>>> Uff.  Sounds like ovn-controller went off the rails.
>>>>>
>>>>> Normally, ovsdb-server compacts the database once in 10-20 minutes,
>>>>> if the database doubles the size since the previous check.  If all
>>>>> the transactions are that small, it would mean ovn-controller made
>>>>> about 10K transactions per second in the 10-20 minutes before the
>>>>> restart.  That's huge.
>>>>>
>>>>> I wonder if this can be addressed with a better compaction strategy.
>>>>> Something like forcing compaction if "the database is more than 10 MB
>>>>> and increased 10x" regardless of the time.
>>>> I'm not sure exactly what the test was doing when this was observed, so
>>>> I don't know whether that transaction volume is within the realm of
>>>> possibility or if we're looking at a failure to perform compaction on
>>>> time. It would be nice to have an enhanced safety-net for DB size, as we
>>>> were only a few hundred MB away from hitting filesystem space issues as
>>>> well.
>>>>
>>> To rule out any known issues, what OVN version is running on that setup?
>> This was during an upgrade test. We started with OVN 20.9, and this
>> produced the massive DB. We then upgraded to 21.9 and rebooted, which
>> failed to come up as described due to the massive DB.
>>
> Both 20.09 and 21.09 are not supported for a while now.  Currently
> supported releases are 24.03, 23.09, 23.06, 22.03:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ovn.org_en_releases_all-5Freleases_&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=lxL5_Fdek_AGq_9h43DzQLJRReaIKPX4GMXchXq5QVo&m=3aTZ7K4dr0rP7Qn1aNh2BXR3F3VHvgXYatSR5-eTJaRGoX3lioRv3X3-lPLPNNX-&s=Ubvzv3RnWxiLDUFo7DbNnnomHMpaulOFpMB5-NxwEDg&e=
>
>> Our networking team doing the RCA think that the system was rapidly
>> flapping external ports between two configurations, hence the excessive
>> DB transactions. The root cause of flapping is yet to be determined but
>> these transactions were being done from OVN itself. They raised the
> Maybe you're missing these commits (it's hard to say without knowing the
> exact version you're running - "21.09" is vague, we need the z version
> too, e.g. 21.09.0 or 21.09.1):
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_d4bca93c08&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=lxL5_Fdek_AGq_9h43DzQLJRReaIKPX4GMXchXq5QVo&m=3aTZ7K4dr0rP7Qn1aNh2BXR3F3VHvgXYatSR5-eTJaRGoX3lioRv3X3-lPLPNNX-&s=J0DIwACWSXfaiA-u42A2YHcPHNkzWjJZjkKBU9h0W4o&e=
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_6fb87aad8c&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=lxL5_Fdek_AGq_9h43DzQLJRReaIKPX4GMXchXq5QVo&m=3aTZ7K4dr0rP7Qn1aNh2BXR3F3VHvgXYatSR5-eTJaRGoX3lioRv3X3-lPLPNNX-&s=JV7P7Azze44C87-0TgVY2vvytbvC_qHrsLhd7_SQ13E&e=
Exact versions appear to be the following before upgrade:
[root@pre-upgrade ~]# ovn-controller --version
ovn-controller 20.09.1
Open vSwitch Library 2.14.2
OpenFlow versions 0x6:0x6

And the following after:
[root@post-upgrade ~]# ovn-controller --version
ovn-controller 21.09.2
Open vSwitch Library 2.16.90
OpenFlow versions 0x6:0x6
SB DB Schema 20.21.0

I'll pass this onto the networking team, but I'd also like to take a 
step back and look at the original patch proposed here, which is about 
increasing the timeout. Do you think that this timeout increase is a 
sensible second line of defence against oversized DBs, even if such DBs 
can only arise due to a separate historical or future bug? Or do you 
feel that preventing the DB from growing in the first place is sufficient?
Jon Kohler April 15, 2024, 1:39 p.m. UTC | #9
> On Apr 11, 2024, at 9:43 AM, Chris Riches <chris.riches@nutanix.com> wrote:
> 
> On 11/04/2024 14:24, Ilya Maximets wrote:
>> On 4/11/24 10:59, Chris Riches wrote:
>>> From what we know so far, the DB was full of stale connection-tracking
>>> information such as the following:
>>> 
>>> [...]
>>> 
>>> Once the host was recovered by putting in the timeout increase,
>>> ovsdb-server successfully started and GCed the database down from 2.4
>>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>>> have never seen this problem. But since it seems possible to end up
>>> booting with such a large DB, we figured a timeout increase was a
>>> sensible measure to take.
>> Uff.  Sounds like ovn-controller went off the rails.
>> 
>> Normally, ovsdb-server compacts the database once in 10-20 minutes,
>> if the database doubles the size since the previous check.  If all
>> the transactions are that small, it would mean ovn-controller made
>> about 10K transactions per second in the 10-20 minutes before the
>> restart.  That's huge.
>> 
>> I wonder if this can be addressed with a better compaction strategy.
>> Something like forcing compaction if "the database is more than 10 MB
>> and increased 10x" regardless of the time.
> 
> I'm not sure exactly what the test was doing when this was observed, so I don't know whether that transaction volume is within the realm of possibility or if we're looking at a failure to perform compaction on time. It would be nice to have an enhanced safety-net for DB size, as we were only a few hundred MB away from hitting filesystem space issues as well.
> 
>> Normally, ovsdb-server compacts the database once in 10-20 minutes, if the database doubles the size since the previous check.
> 
> I presume you mean if it doubled in size since the previous *compaction*? If we only compact when it doubles since the last *check*, then it would be easy for it to slightly-less-than-double every 10-20 minutes and never trigger the compaction while still growing exponentially.
> 
> I'm happy to discuss compaction approaches (though my expertise is very much in host service management and not OVS itself), but do you think there's merit in having this extended timeout as a backstop too?

FWIW, I think we should do both extending the time out and tuning up the
compaction, as having a situation where a service can get in an endless
loop if for whatever reason it takes too long is problematic. Addressing
the root cause (compaction, too many calls, some other bug(s) etc) is
good, but extending the timeout seems like an easy backstop.

Jon

> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=W-MV_AlPAPbGd0QQE1V3omKJ2hiODNwbKHcM7ION6RNc0sYiyjrAH_TO-iOsIPpm&s=pGAqsnVB7yeN2KmbcZaS7UGC4ybLp4oJPc4wVMaK02A&e=
Chris Riches April 18, 2024, 2:35 p.m. UTC | #10
On 15/04/2024 14:39, Jon Kohler wrote:
>> On Apr 11, 2024, at 9:43 AM, Chris Riches <chris.riches@nutanix.com> wrote:
>>
>> On 11/04/2024 14:24, Ilya Maximets wrote:
>>> On 4/11/24 10:59, Chris Riches wrote:
>>>>  From what we know so far, the DB was full of stale connection-tracking
>>>> information such as the following:
>>>>
>>>> [...]
>>>>
>>>> Once the host was recovered by putting in the timeout increase,
>>>> ovsdb-server successfully started and GCed the database down from 2.4
>>>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>>>> have never seen this problem. But since it seems possible to end up
>>>> booting with such a large DB, we figured a timeout increase was a
>>>> sensible measure to take.
>>> Uff.  Sounds like ovn-controller went off the rails.
>>>
>>> Normally, ovsdb-server compacts the database once in 10-20 minutes,
>>> if the database doubles the size since the previous check.  If all
>>> the transactions are that small, it would mean ovn-controller made
>>> about 10K transactions per second in the 10-20 minutes before the
>>> restart.  That's huge.
>>>
>>> I wonder if this can be addressed with a better compaction strategy.
>>> Something like forcing compaction if "the database is more than 10 MB
>>> and increased 10x" regardless of the time.
>> I'm not sure exactly what the test was doing when this was observed, so I don't know whether that transaction volume is within the realm of possibility or if we're looking at a failure to perform compaction on time. It would be nice to have an enhanced safety-net for DB size, as we were only a few hundred MB away from hitting filesystem space issues as well.
>>
>>> Normally, ovsdb-server compacts the database once in 10-20 minutes, if the database doubles the size since the previous check.
>> I presume you mean if it doubled in size since the previous *compaction*? If we only compact when it doubles since the last *check*, then it would be easy for it to slightly-less-than-double every 10-20 minutes and never trigger the compaction while still growing exponentially.
>>
>> I'm happy to discuss compaction approaches (though my expertise is very much in host service management and not OVS itself), but do you think there's merit in having this extended timeout as a backstop too?
> FWIW, I think we should do both extending the time out and tuning up the
> compaction, as having a situation where a service can get in an endless
> loop if for whatever reason it takes too long is problematic. Addressing
> the root cause (compaction, too many calls, some other bug(s) etc) is
> good, but extending the timeout seems like an easy backstop.

I agree with Jon's assessment - regardless of any action taken on 
compaction or preventing growth in the first place, we should consider 
the proposed timeout increase as a backstop against getting stuck in an 
infinite loop.

Ilya (or another maintainer) - can I get an opinion on this?


Thanks,
Chris
Simon Horman April 23, 2024, 10:35 a.m. UTC | #11
On Thu, Apr 18, 2024 at 03:35:06PM +0100, Chris Riches wrote:
> On 15/04/2024 14:39, Jon Kohler wrote:
> > > On Apr 11, 2024, at 9:43 AM, Chris Riches <chris.riches@nutanix.com> wrote:
> > > 
> > > On 11/04/2024 14:24, Ilya Maximets wrote:
> > > > On 4/11/24 10:59, Chris Riches wrote:
> > > > >  From what we know so far, the DB was full of stale connection-tracking
> > > > > information such as the following:
> > > > > 
> > > > > [...]
> > > > > 
> > > > > Once the host was recovered by putting in the timeout increase,
> > > > > ovsdb-server successfully started and GCed the database down from 2.4
> > > > > *GB* to 29 *KB*. Had this happened before the host restart, we would
> > > > > have never seen this problem. But since it seems possible to end up
> > > > > booting with such a large DB, we figured a timeout increase was a
> > > > > sensible measure to take.
> > > > Uff.  Sounds like ovn-controller went off the rails.
> > > > 
> > > > Normally, ovsdb-server compacts the database once in 10-20 minutes,
> > > > if the database doubles the size since the previous check.  If all
> > > > the transactions are that small, it would mean ovn-controller made
> > > > about 10K transactions per second in the 10-20 minutes before the
> > > > restart.  That's huge.
> > > > 
> > > > I wonder if this can be addressed with a better compaction strategy.
> > > > Something like forcing compaction if "the database is more than 10 MB
> > > > and increased 10x" regardless of the time.
> > > I'm not sure exactly what the test was doing when this was observed, so I don't know whether that transaction volume is within the realm of possibility or if we're looking at a failure to perform compaction on time. It would be nice to have an enhanced safety-net for DB size, as we were only a few hundred MB away from hitting filesystem space issues as well.
> > > 
> > > > Normally, ovsdb-server compacts the database once in 10-20 minutes, if the database doubles the size since the previous check.
> > > I presume you mean if it doubled in size since the previous *compaction*? If we only compact when it doubles since the last *check*, then it would be easy for it to slightly-less-than-double every 10-20 minutes and never trigger the compaction while still growing exponentially.
> > > 
> > > I'm happy to discuss compaction approaches (though my expertise is very much in host service management and not OVS itself), but do you think there's merit in having this extended timeout as a backstop too?
> > FWIW, I think we should do both extending the time out and tuning up the
> > compaction, as having a situation where a service can get in an endless
> > loop if for whatever reason it takes too long is problematic. Addressing
> > the root cause (compaction, too many calls, some other bug(s) etc) is
> > good, but extending the timeout seems like an easy backstop.
> 
> I agree with Jon's assessment - regardless of any action taken on compaction
> or preventing growth in the first place, we should consider the proposed
> timeout increase as a backstop against getting stuck in an infinite loop.
> 
> Ilya (or another maintainer) - can I get an opinion on this?

Yes, I agree that the timeout increase is a good idea.

Acked-by: Simon Horman <horms@ovn.org>
Ilya Maximets April 23, 2024, 11:10 a.m. UTC | #12
On 4/23/24 12:35, Simon Horman wrote:
> On Thu, Apr 18, 2024 at 03:35:06PM +0100, Chris Riches wrote:
>> On 15/04/2024 14:39, Jon Kohler wrote:
>>>> On Apr 11, 2024, at 9:43 AM, Chris Riches <chris.riches@nutanix.com> wrote:
>>>>
>>>> On 11/04/2024 14:24, Ilya Maximets wrote:
>>>>> On 4/11/24 10:59, Chris Riches wrote:
>>>>>>  From what we know so far, the DB was full of stale connection-tracking
>>>>>> information such as the following:
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>> Once the host was recovered by putting in the timeout increase,
>>>>>> ovsdb-server successfully started and GCed the database down from 2.4
>>>>>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>>>>>> have never seen this problem. But since it seems possible to end up
>>>>>> booting with such a large DB, we figured a timeout increase was a
>>>>>> sensible measure to take.
>>>>> Uff.  Sounds like ovn-controller went off the rails.
>>>>>
>>>>> Normally, ovsdb-server compacts the database once in 10-20 minutes,
>>>>> if the database doubles the size since the previous check.  If all
>>>>> the transactions are that small, it would mean ovn-controller made
>>>>> about 10K transactions per second in the 10-20 minutes before the
>>>>> restart.  That's huge.
>>>>>
>>>>> I wonder if this can be addressed with a better compaction strategy.
>>>>> Something like forcing compaction if "the database is more than 10 MB
>>>>> and increased 10x" regardless of the time.
>>>> I'm not sure exactly what the test was doing when this was observed, so I don't know whether that transaction volume is within the realm of possibility or if we're looking at a failure to perform compaction on time. It would be nice to have an enhanced safety-net for DB size, as we were only a few hundred MB away from hitting filesystem space issues as well.
>>>>
>>>>> Normally, ovsdb-server compacts the database once in 10-20 minutes, if the database doubles the size since the previous check.
>>>> I presume you mean if it doubled in size since the previous *compaction*? If we only compact when it doubles since the last *check*, then it would be easy for it to slightly-less-than-double every 10-20 minutes and never trigger the compaction while still growing exponentially.
>>>>
>>>> I'm happy to discuss compaction approaches (though my expertise is very much in host service management and not OVS itself), but do you think there's merit in having this extended timeout as a backstop too?
>>> FWIW, I think we should do both extending the time out and tuning up the
>>> compaction, as having a situation where a service can get in an endless
>>> loop if for whatever reason it takes too long is problematic. Addressing
>>> the root cause (compaction, too many calls, some other bug(s) etc) is
>>> good, but extending the timeout seems like an easy backstop.
>>
>> I agree with Jon's assessment - regardless of any action taken on compaction
>> or preventing growth in the first place, we should consider the proposed
>> timeout increase as a backstop against getting stuck in an infinite loop.
>>
>> Ilya (or another maintainer) - can I get an opinion on this?
> 
> Yes, I agree that the timeout increase is a good idea.
> 
> Acked-by: Simon Horman <horms@ovn.org>
> 

Sorry for delay, been off for a week.  I agree that timeout increase
makes sense since we know the mechanism for the occurrence of the issue.

I plan to catch up on the rest of the thread and apply the fix later today.

Best regards, Ilya Maximets.
Aaron Conole April 23, 2024, 6:47 p.m. UTC | #13
Chris Riches <chris.riches@nutanix.com> writes:

> On 15/04/2024 14:39, Jon Kohler wrote:
>>> On Apr 11, 2024, at 9:43 AM, Chris Riches <chris.riches@nutanix.com> wrote:
>>>
>>> On 11/04/2024 14:24, Ilya Maximets wrote:
>>>> On 4/11/24 10:59, Chris Riches wrote:
>>>>>  From what we know so far, the DB was full of stale connection-tracking
>>>>> information such as the following:
>>>>>
>>>>> [...]
>>>>>
>>>>> Once the host was recovered by putting in the timeout increase,
>>>>> ovsdb-server successfully started and GCed the database down from 2.4
>>>>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>>>>> have never seen this problem. But since it seems possible to end up
>>>>> booting with such a large DB, we figured a timeout increase was a
>>>>> sensible measure to take.
>>>> Uff.  Sounds like ovn-controller went off the rails.
>>>>
>>>> Normally, ovsdb-server compacts the database once in 10-20 minutes,
>>>> if the database doubles the size since the previous check.  If all
>>>> the transactions are that small, it would mean ovn-controller made
>>>> about 10K transactions per second in the 10-20 minutes before the
>>>> restart.  That's huge.
>>>>
>>>> I wonder if this can be addressed with a better compaction strategy.
>>>> Something like forcing compaction if "the database is more than 10 MB
>>>> and increased 10x" regardless of the time.
>>> I'm not sure exactly what the test was doing when this was
>>> observed, so I don't know whether that transaction volume is within
>>> the realm of possibility or if we're looking at a failure to
>>> perform compaction on time. It would be nice to have an enhanced
>>> safety-net for DB size, as we were only a few hundred MB away from
>>> hitting filesystem space issues as well.
>>>
>>>> Normally, ovsdb-server compacts the database once in 10-20
>>>> minutes, if the database doubles the size since the previous
>>>> check.
>>> I presume you mean if it doubled in size since the previous
>>> *compaction*? If we only compact when it doubles since the last
>>> *check*, then it would be easy for it to slightly-less-than-double
>>> every 10-20 minutes and never trigger the compaction while still
>>> growing exponentially.
>>>
>>> I'm happy to discuss compaction approaches (though my expertise is
>>> very much in host service management and not OVS itself), but do
>>> you think there's merit in having this extended timeout as a
>>> backstop too?
>> FWIW, I think we should do both extending the time out and tuning up the
>> compaction, as having a situation where a service can get in an endless
>> loop if for whatever reason it takes too long is problematic. Addressing
>> the root cause (compaction, too many calls, some other bug(s) etc) is
>> good, but extending the timeout seems like an easy backstop.
>
> I agree with Jon's assessment - regardless of any action taken on
> compaction or preventing growth in the first place, we should consider
> the proposed timeout increase as a backstop against getting stuck in
> an infinite loop.
>
> Ilya (or another maintainer) - can I get an opinion on this?

From my side, it looks fine.  I don't think we ever saw the DB taking
this long on startup, so never considered that it could (maybe it is the
case that compaction also occurs on graceful exits?  I don't know ovsdb
that well).

At least from my side, it seems fine.

> Thanks,
> Chris
Ilya Maximets April 23, 2024, 9:38 p.m. UTC | #14
On 4/23/24 13:10, Ilya Maximets wrote:
> On 4/23/24 12:35, Simon Horman wrote:
>> On Thu, Apr 18, 2024 at 03:35:06PM +0100, Chris Riches wrote:
>>> On 15/04/2024 14:39, Jon Kohler wrote:
>>>>> On Apr 11, 2024, at 9:43 AM, Chris Riches <chris.riches@nutanix.com> wrote:
>>>>>
>>>>> On 11/04/2024 14:24, Ilya Maximets wrote:
>>>>>> On 4/11/24 10:59, Chris Riches wrote:
>>>>>>>  From what we know so far, the DB was full of stale connection-tracking
>>>>>>> information such as the following:
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>> Once the host was recovered by putting in the timeout increase,
>>>>>>> ovsdb-server successfully started and GCed the database down from 2.4
>>>>>>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>>>>>>> have never seen this problem. But since it seems possible to end up
>>>>>>> booting with such a large DB, we figured a timeout increase was a
>>>>>>> sensible measure to take.
>>>>>> Uff.  Sounds like ovn-controller went off the rails.
>>>>>>
>>>>>> Normally, ovsdb-server compacts the database once in 10-20 minutes,
>>>>>> if the database doubles the size since the previous check.  If all
>>>>>> the transactions are that small, it would mean ovn-controller made
>>>>>> about 10K transactions per second in the 10-20 minutes before the
>>>>>> restart.  That's huge.
>>>>>>
>>>>>> I wonder if this can be addressed with a better compaction strategy.
>>>>>> Something like forcing compaction if "the database is more than 10 MB
>>>>>> and increased 10x" regardless of the time.
>>>>> I'm not sure exactly what the test was doing when this was observed, so I don't know whether that transaction volume is within the realm of possibility or if we're looking at a failure to perform compaction on time. It would be nice to have an enhanced safety-net for DB size, as we were only a few hundred MB away from hitting filesystem space issues as well.
>>>>>
>>>>>> Normally, ovsdb-server compacts the database once in 10-20 minutes, if the database doubles the size since the previous check.
>>>>> I presume you mean if it doubled in size since the previous *compaction*? If we only compact when it doubles since the last *check*, then it would be easy for it to slightly-less-than-double every 10-20 minutes and never trigger the compaction while still growing exponentially.
>>>>>
>>>>> I'm happy to discuss compaction approaches (though my expertise is very much in host service management and not OVS itself), but do you think there's merit in having this extended timeout as a backstop too?
>>>> FWIW, I think we should do both extending the time out and tuning up the
>>>> compaction, as having a situation where a service can get in an endless
>>>> loop if for whatever reason it takes too long is problematic. Addressing
>>>> the root cause (compaction, too many calls, some other bug(s) etc) is
>>>> good, but extending the timeout seems like an easy backstop.
>>>
>>> I agree with Jon's assessment - regardless of any action taken on compaction
>>> or preventing growth in the first place, we should consider the proposed
>>> timeout increase as a backstop against getting stuck in an infinite loop.
>>>
>>> Ilya (or another maintainer) - can I get an opinion on this?
>>
>> Yes, I agree that the timeout increase is a good idea.
>>
>> Acked-by: Simon Horman <horms@ovn.org>
>>
> 
> Sorry for delay, been off for a week.  I agree that timeout increase
> makes sense since we know the mechanism for the occurrence of the issue.
> 
> I plan to catch up on the rest of the thread and apply the fix later today.

Applied to main and backported down to 2.17.  Thanks!

Best regards, Ilya Maximets.
Ilya Maximets April 23, 2024, 9:52 p.m. UTC | #15
On 4/11/24 15:43, Chris Riches wrote:
> On 11/04/2024 14:24, Ilya Maximets wrote:
>> On 4/11/24 10:59, Chris Riches wrote:
>>>  From what we know so far, the DB was full of stale connection-tracking
>>> information such as the following:
>>>
>>> [...]
>>>
>>> Once the host was recovered by putting in the timeout increase,
>>> ovsdb-server successfully started and GCed the database down from 2.4
>>> *GB* to 29 *KB*. Had this happened before the host restart, we would
>>> have never seen this problem. But since it seems possible to end up
>>> booting with such a large DB, we figured a timeout increase was a
>>> sensible measure to take.
>> Uff.  Sounds like ovn-controller went off the rails.
>>
>> Normally, ovsdb-server compacts the database once in 10-20 minutes,
>> if the database doubles the size since the previous check.  If all
>> the transactions are that small, it would mean ovn-controller made
>> about 10K transactions per second in the 10-20 minutes before the
>> restart.  That's huge.
>>
>> I wonder if this can be addressed with a better compaction strategy.
>> Something like forcing compaction if "the database is more than 10 MB
>> and increased 10x" regardless of the time.
> 
> I'm not sure exactly what the test was doing when this was observed, so 
> I don't know whether that transaction volume is within the realm of 
> possibility or if we're looking at a failure to perform compaction on 
> time. It would be nice to have an enhanced safety-net for DB size, as we 
> were only a few hundred MB away from hitting filesystem space issues as 
> well.

The compaction check is on the path in the main event loop, so it
should not be possible to avoid it, especially for a standalone
database.  Database will stop executing transactions until compaction
is done.

The transaction rate is very high, bu it might be possible, I guess,
with very small transaction as we have here.

I need to experiment with it and maybe I'll post some patches to
force compaction earlier under extreme conditions like these.

> 
>> Normally, ovsdb-server compacts the database once in 10-20 minutes, if 
>> the database doubles the size since the previous check.
> 
> I presume you mean if it doubled in size since the previous 
> *compaction*? If we only compact when it doubles since the last *check*, 
> then it would be easy for it to slightly-less-than-double every 10-20 
> minutes and never trigger the compaction while still growing exponentially.

Yes, I meant compaction, not the check, sorry.  So, this scenario is
covered and should not be possible.

> 
> I'm happy to discuss compaction approaches (though my expertise is very 
> much in host service management and not OVS itself), but do you think 
> there's merit in having this extended timeout as a backstop too?

Yep, I applied the change for now.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/rhel/usr_lib_systemd_system_ovsdb-server.service b/rhel/usr_lib_systemd_system_ovsdb-server.service
index 49dc06e38..558632320 100644
--- a/rhel/usr_lib_systemd_system_ovsdb-server.service
+++ b/rhel/usr_lib_systemd_system_ovsdb-server.service
@@ -29,3 +29,4 @@  ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd stop
 ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovs-vswitchd \
            ${OVS_USER_OPT} \
            --no-monitor restart $OPTIONS
+TimeoutSec=300