mbox series

[ovs-dev,net-next,0/1] openvswitch: allow specifying ifindex of new interfaces

Message ID 20220817124909.83373-1-andrey.zhadchenko@virtuozzo.com
Headers show
Series openvswitch: allow specifying ifindex of new interfaces | expand

Message

Andrey Zhadchenko Aug. 17, 2022, 12:49 p.m. UTC
Hi!

CRIU currently do not support checkpoint/restore of OVS configurations, but
there was several requests for it. For example,
https://github.com/lxc/lxc/issues/2909

The main problem is ifindexes of newly created interfaces. We realy need to
preserve them after restore. Current openvswitch API does not allow to
specify ifindex. Most of the time we can just create an interface via
generic netlink requests and plug it into ovs but datapaths (generally any
OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
do not support selecting ifindex.

This patch allows to do so.
For new datapaths I decided to use dp_infindex in header as infindex
because it control ifindex for other requests too.
For internal vports I reused OVS_VPORT_ATTR_IFINDEX.

The only concern I have is that previously dp_ifindex was not used for
OVS_DP_VMD_NEW requests and some software may not set it to zero. However
we have been running this patch at Virtuozzo for 2 years and have not
encountered this problem. Not sure if it is worth to add new
ovs_datapath_attr instead.


As a broader solution, another generic approach is possible: modify
__dev_change_net_namespace() to allow changing ifindexes within the same
netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
sure that all its users are ready for ifindex change.
This will be indeed better for CRIU so we won't need to change every SDN
module to be able to checkpoint/restore it. And probably avoid some bloat.
What do you think of this?

Andrey Zhadchenko (1):
  openvswitch: allow specifying ifindex of new interfaces

 include/uapi/linux/openvswitch.h     |  4 ++++
 net/openvswitch/datapath.c           | 16 ++++++++++++++--
 net/openvswitch/vport-internal_dev.c |  1 +
 net/openvswitch/vport.h              |  2 ++
 4 files changed, 21 insertions(+), 2 deletions(-)

Comments

Christian Brauner Aug. 17, 2022, 1:09 p.m. UTC | #1
On Wed, Aug 17, 2022 at 03:49:08PM +0300, Andrey Zhadchenko wrote:
> Hi!
> 
> CRIU currently do not support checkpoint/restore of OVS configurations, but
> there was several requests for it. For example,
> https://github.com/lxc/lxc/issues/2909

Ah right, I remember that. :)

> 
> The main problem is ifindexes of newly created interfaces. We realy need to
> preserve them after restore. Current openvswitch API does not allow to
> specify ifindex. Most of the time we can just create an interface via
> generic netlink requests and plug it into ovs but datapaths (generally any
> OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
> do not support selecting ifindex.
> 
> This patch allows to do so.
> For new datapaths I decided to use dp_infindex in header as infindex
> because it control ifindex for other requests too.
> For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
> 
> The only concern I have is that previously dp_ifindex was not used for
> OVS_DP_VMD_NEW requests and some software may not set it to zero. However
> we have been running this patch at Virtuozzo for 2 years and have not
> encountered this problem. Not sure if it is worth to add new
> ovs_datapath_attr instead.
> 
> 
> As a broader solution, another generic approach is possible: modify
> __dev_change_net_namespace() to allow changing ifindexes within the same
> netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
> sure that all its users are ready for ifindex change.

I think that might become confusing. We already have issues - even with
the tracking infrastucture - to keep track of ifindex changes when a
network device is moved between network namespaces multiple times. So
I'd rather not make it possible to change the ifindex at will within the
same network namespace though I understand the appeal for CRIU.

> This will be indeed better for CRIU so we won't need to change every SDN
> module to be able to checkpoint/restore it. And probably avoid some bloat.
> What do you think of this?
> 
> Andrey Zhadchenko (1):
>   openvswitch: allow specifying ifindex of new interfaces
> 
>  include/uapi/linux/openvswitch.h     |  4 ++++
>  net/openvswitch/datapath.c           | 16 ++++++++++++++--
>  net/openvswitch/vport-internal_dev.c |  1 +
>  net/openvswitch/vport.h              |  2 ++
>  4 files changed, 21 insertions(+), 2 deletions(-)
> 
> -- 
> 2.31.1
>
Ilya Maximets Aug. 17, 2022, 6:19 p.m. UTC | #2
On 8/17/22 14:49, Andrey Zhadchenko via dev wrote:
> Hi!
> 
> CRIU currently do not support checkpoint/restore of OVS configurations, but
> there was several requests for it. For example,
> https://github.com/lxc/lxc/issues/2909
> 
> The main problem is ifindexes of newly created interfaces. We realy need to
> preserve them after restore. Current openvswitch API does not allow to
> specify ifindex. Most of the time we can just create an interface via
> generic netlink requests and plug it into ovs but datapaths (generally any
> OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
> do not support selecting ifindex.

Hmm.  Assuming you restored network interfaces by re-creating them
on a target system, but I'm curious how do you restore the upcall PID?
Are you somehow re-creating the netlink socket with the same PID?
If that will not be done, no traffic will be able to flow through OVS
anyway until you remove/re-add the port in userspace or re-start OVS.
Or am I missing something?

Best regards, Ilya Maximets.

> 
> This patch allows to do so.
> For new datapaths I decided to use dp_infindex in header as infindex
> because it control ifindex for other requests too.
> For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
> 
> The only concern I have is that previously dp_ifindex was not used for
> OVS_DP_VMD_NEW requests and some software may not set it to zero. However
> we have been running this patch at Virtuozzo for 2 years and have not
> encountered this problem. Not sure if it is worth to add new
> ovs_datapath_attr instead.
> 
> 
> As a broader solution, another generic approach is possible: modify
> __dev_change_net_namespace() to allow changing ifindexes within the same
> netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
> sure that all its users are ready for ifindex change.
> This will be indeed better for CRIU so we won't need to change every SDN
> module to be able to checkpoint/restore it. And probably avoid some bloat.
> What do you think of this?
> 
> Andrey Zhadchenko (1):
>   openvswitch: allow specifying ifindex of new interfaces
> 
>  include/uapi/linux/openvswitch.h     |  4 ++++
>  net/openvswitch/datapath.c           | 16 ++++++++++++++--
>  net/openvswitch/vport-internal_dev.c |  1 +
>  net/openvswitch/vport.h              |  2 ++
>  4 files changed, 21 insertions(+), 2 deletions(-)
>
Andrey Zhadchenko Aug. 17, 2022, 8:35 p.m. UTC | #3
On 8/17/22 21:19, Ilya Maximets wrote:
> On 8/17/22 14:49, Andrey Zhadchenko via dev wrote:
>> Hi!
>>
>> CRIU currently do not support checkpoint/restore of OVS configurations, but
>> there was several requests for it. For example,
>> https://github.com/lxc/lxc/issues/2909
>>
>> The main problem is ifindexes of newly created interfaces. We realy need to
>> preserve them after restore. Current openvswitch API does not allow to
>> specify ifindex. Most of the time we can just create an interface via
>> generic netlink requests and plug it into ovs but datapaths (generally any
>> OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
>> do not support selecting ifindex.
> 
> Hmm.  Assuming you restored network interfaces by re-creating them
> on a target system, but I'm curious how do you restore the upcall PID?
> Are you somehow re-creating the netlink socket with the same PID?
> If that will not be done, no traffic will be able to flow through OVS
> anyway until you remove/re-add the port in userspace or re-start OVS.
> Or am I missing something?
> 
> Best regards, Ilya Maximets.

Yes, CRIU is able to restore socket nl_pid. We get it via 
NDIAG_PROTO_ALL netlink protocol requests (see net/netlink/diag.c)
Upcall pid is exported by get requests via OVS_VPORT_ATTR_UPCALL_PID.
So everything is fine here.

I should note that I did not test *complicated* setups with 
ovs-vswitchd, mostly basic ones like veth plugging and several 
containers in network. We mainly supported Weave Net k8s SDN  which is 
based on ovs but do not use its daemon.

Maybe if this is merged and people start use this we will find more 
problems with checkpoint/restore, but for now the only problem is 
volatile ifindex.

Best regards, Andrey Zhadchenko
> 
>>
>> This patch allows to do so.
>> For new datapaths I decided to use dp_infindex in header as infindex
>> because it control ifindex for other requests too.
>> For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
>>
>> The only concern I have is that previously dp_ifindex was not used for
>> OVS_DP_VMD_NEW requests and some software may not set it to zero. However
>> we have been running this patch at Virtuozzo for 2 years and have not
>> encountered this problem. Not sure if it is worth to add new
>> ovs_datapath_attr instead.
>>
>>
>> As a broader solution, another generic approach is possible: modify
>> __dev_change_net_namespace() to allow changing ifindexes within the same
>> netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
>> sure that all its users are ready for ifindex change.
>> This will be indeed better for CRIU so we won't need to change every SDN
>> module to be able to checkpoint/restore it. And probably avoid some bloat.
>> What do you think of this?
>>
>> Andrey Zhadchenko (1):
>>    openvswitch: allow specifying ifindex of new interfaces
>>
>>   include/uapi/linux/openvswitch.h     |  4 ++++
>>   net/openvswitch/datapath.c           | 16 ++++++++++++++--
>>   net/openvswitch/vport-internal_dev.c |  1 +
>>   net/openvswitch/vport.h              |  2 ++
>>   4 files changed, 21 insertions(+), 2 deletions(-)
>>
>
Ilya Maximets Aug. 17, 2022, 10:15 p.m. UTC | #4
On 8/17/22 22:35, Andrey Zhadchenko wrote:
> 
> 
> On 8/17/22 21:19, Ilya Maximets wrote:
>> On 8/17/22 14:49, Andrey Zhadchenko via dev wrote:
>>> Hi!
>>>
>>> CRIU currently do not support checkpoint/restore of OVS configurations, but
>>> there was several requests for it. For example,
>>> https://github.com/lxc/lxc/issues/2909
>>>
>>> The main problem is ifindexes of newly created interfaces. We realy need to
>>> preserve them after restore. Current openvswitch API does not allow to
>>> specify ifindex. Most of the time we can just create an interface via
>>> generic netlink requests and plug it into ovs but datapaths (generally any
>>> OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
>>> do not support selecting ifindex.
>>
>> Hmm.  Assuming you restored network interfaces by re-creating them
>> on a target system, but I'm curious how do you restore the upcall PID?
>> Are you somehow re-creating the netlink socket with the same PID?
>> If that will not be done, no traffic will be able to flow through OVS
>> anyway until you remove/re-add the port in userspace or re-start OVS.
>> Or am I missing something?
>>
>> Best regards, Ilya Maximets.
> 
> Yes, CRIU is able to restore socket nl_pid. We get it via NDIAG_PROTO_ALL
> netlink protocol requests (see net/netlink/diag.c)  Upcall pid is exported
> by get requests via OVS_VPORT_ATTR_UPCALL_PID.
> So everything is fine here.

I didn't dig deep into how that works, but sounds interesting.
Thanks for the pointers!

> 
> I should note that I did not test *complicated* setups with ovs-vswitchd,
> mostly basic ones like veth plugging and several containers in network. We
> mainly supported Weave Net k8s SDN  which is based on ovs but do not use its
> daemon.
> 
> Maybe if this is merged and people start use this we will find more problems
> with checkpoint/restore, but for now the only problem is volatile ifindex.

Current implementation even with ifindexes sorted out will not work for
at least one reason for recent versions of OVS.  Since last year OVS doesn't
use OVS_VPORT_ATTR_UPCALL_PID if kernel supports OVS_DP_ATTR_PER_CPU_PIDS
instead.  It's a datapath-wide CPU ID to PID mapping for per-CPU upcall
dispatch mode.  It is used by default starting with OVS 2.16.

So, you need to make sure you're correctly restoring 'user_features' and
the OVS_DP_ATTR_PER_CPU_PIDS.  Problem here is that OVS_DP_ATTR_PER_CPU_PIDS
currently not dumped to userpsace via GET request, simply because ovs-vswitchd
has no use for it.  So, you'll need to add that as well.

And there could be some issues when starting OVS from a checkpoint created
on a system with different number of CPU cores.  Traffic will not be broken,
but performance may be affected, and there might be some kernel warnings.

If you won't restore OVS_DP_ATTR_PER_CPU_PIDS, traffic will not work on
recent versions of OVS, including 2.17 LTS, on more or less recent kernels.

Another fairly recent addition is OVS_DP_ATTR_MASKS_CACHE_SIZE, which is
not critical, but would be nice to restore as well, if you're not doing
that already.

> 
> Best regards, Andrey Zhadchenko
>>
>>>
>>> This patch allows to do so.
>>> For new datapaths I decided to use dp_infindex in header as infindex
>>> because it control ifindex for other requests too.
>>> For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
>>>
>>> The only concern I have is that previously dp_ifindex was not used for
>>> OVS_DP_VMD_NEW requests and some software may not set it to zero. However
>>> we have been running this patch at Virtuozzo for 2 years and have not
>>> encountered this problem. Not sure if it is worth to add new
>>> ovs_datapath_attr instead.
>>>
>>>
>>> As a broader solution, another generic approach is possible: modify
>>> __dev_change_net_namespace() to allow changing ifindexes within the same
>>> netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
>>> sure that all its users are ready for ifindex change.
>>> This will be indeed better for CRIU so we won't need to change every SDN
>>> module to be able to checkpoint/restore it. And probably avoid some bloat.
>>> What do you think of this?
>>>
>>> Andrey Zhadchenko (1):
>>>    openvswitch: allow specifying ifindex of new interfaces
>>>
>>>   include/uapi/linux/openvswitch.h     |  4 ++++
>>>   net/openvswitch/datapath.c           | 16 ++++++++++++++--
>>>   net/openvswitch/vport-internal_dev.c |  1 +
>>>   net/openvswitch/vport.h              |  2 ++
>>>   4 files changed, 21 insertions(+), 2 deletions(-)
>>>
>>
Andrey Zhadchenko Aug. 17, 2022, 11:11 p.m. UTC | #5
On 8/18/22 01:15, Ilya Maximets wrote:
> On 8/17/22 22:35, Andrey Zhadchenko wrote:
>>
>>
>> On 8/17/22 21:19, Ilya Maximets wrote:
>>> On 8/17/22 14:49, Andrey Zhadchenko via dev wrote:
>>>> Hi!
>>>>
>>>> CRIU currently do not support checkpoint/restore of OVS configurations, but
>>>> there was several requests for it. For example,
>>>> https://github.com/lxc/lxc/issues/2909
>>>>
>>>> The main problem is ifindexes of newly created interfaces. We realy need to
>>>> preserve them after restore. Current openvswitch API does not allow to
>>>> specify ifindex. Most of the time we can just create an interface via
>>>> generic netlink requests and plug it into ovs but datapaths (generally any
>>>> OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
>>>> do not support selecting ifindex.
>>>
>>> Hmm.  Assuming you restored network interfaces by re-creating them
>>> on a target system, but I'm curious how do you restore the upcall PID?
>>> Are you somehow re-creating the netlink socket with the same PID?
>>> If that will not be done, no traffic will be able to flow through OVS
>>> anyway until you remove/re-add the port in userspace or re-start OVS.
>>> Or am I missing something?
>>>
>>> Best regards, Ilya Maximets.
>>
>> Yes, CRIU is able to restore socket nl_pid. We get it via NDIAG_PROTO_ALL
>> netlink protocol requests (see net/netlink/diag.c)  Upcall pid is exported
>> by get requests via OVS_VPORT_ATTR_UPCALL_PID.
>> So everything is fine here.
> 
> I didn't dig deep into how that works, but sounds interesting.
> Thanks for the pointers!
> 
>>
>> I should note that I did not test *complicated* setups with ovs-vswitchd,
>> mostly basic ones like veth plugging and several containers in network. We
>> mainly supported Weave Net k8s SDN  which is based on ovs but do not use its
>> daemon.
>>
>> Maybe if this is merged and people start use this we will find more problems
>> with checkpoint/restore, but for now the only problem is volatile ifindex.
> 
> Current implementation even with ifindexes sorted out will not work for
> at least one reason for recent versions of OVS.  Since last year OVS doesn't
> use OVS_VPORT_ATTR_UPCALL_PID if kernel supports OVS_DP_ATTR_PER_CPU_PIDS
> instead.  It's a datapath-wide CPU ID to PID mapping for per-CPU upcall
> dispatch mode.  It is used by default starting with OVS 2.16. >
> So, you need to make sure you're correctly restoring 'user_features' and
> the OVS_DP_ATTR_PER_CPU_PIDS.  Problem here is that OVS_DP_ATTR_PER_CPU_PIDS
> currently not dumped to userpsace via GET request, simply because ovs-vswitchd
> has no use for it.  So, you'll need to add that as well.
Thanks, this is very important! I will make v2 soon.

> 
> And there could be some issues when starting OVS from a checkpoint created
> on a system with different number of CPU cores.  Traffic will not be broken,
> but performance may be affected, and there might be some kernel warnings.
Migrating to another type of CPU is a challenge usually due to different 
CPUID and some other problems (do we handle cpu cgroup values if ncpus 
changed? no idea honestly). Anyway thanks for pointing that out.

> 
> If you won't restore OVS_DP_ATTR_PER_CPU_PIDS, traffic will not work on
> recent versions of OVS, including 2.17 LTS, on more or less recent kernels.
> 
> Another fairly recent addition is OVS_DP_ATTR_MASKS_CACHE_SIZE, which is
> not critical, but would be nice to restore as well, if you're not doing
> that already.
Looks like ovs_dp_cmd_fill_info() already fills it so no need to 
additionally patch kernel part. Current CRIU implementation does not 
care about it, but it is not hard to include.

> 
>>
>> Best regards, Andrey Zhadchenko
>>>
>>>>
>>>> This patch allows to do so.
>>>> For new datapaths I decided to use dp_infindex in header as infindex
>>>> because it control ifindex for other requests too.
>>>> For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
>>>>
>>>> The only concern I have is that previously dp_ifindex was not used for
>>>> OVS_DP_VMD_NEW requests and some software may not set it to zero. However
>>>> we have been running this patch at Virtuozzo for 2 years and have not
>>>> encountered this problem. Not sure if it is worth to add new
>>>> ovs_datapath_attr instead.
>>>>
>>>>
>>>> As a broader solution, another generic approach is possible: modify
>>>> __dev_change_net_namespace() to allow changing ifindexes within the same
>>>> netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
>>>> sure that all its users are ready for ifindex change.
>>>> This will be indeed better for CRIU so we won't need to change every SDN
>>>> module to be able to checkpoint/restore it. And probably avoid some bloat.
>>>> What do you think of this?
>>>>
>>>> Andrey Zhadchenko (1):
>>>>     openvswitch: allow specifying ifindex of new interfaces
>>>>
>>>>    include/uapi/linux/openvswitch.h     |  4 ++++
>>>>    net/openvswitch/datapath.c           | 16 ++++++++++++++--
>>>>    net/openvswitch/vport-internal_dev.c |  1 +
>>>>    net/openvswitch/vport.h              |  2 ++
>>>>    4 files changed, 21 insertions(+), 2 deletions(-)
>>>>
>>>
>
Ilya Maximets Aug. 18, 2022, 11:06 a.m. UTC | #6
On 8/18/22 01:11, Andrey Zhadchenko wrote:
> 
> 
> On 8/18/22 01:15, Ilya Maximets wrote:
>> On 8/17/22 22:35, Andrey Zhadchenko wrote:
>>>
>>>
>>> On 8/17/22 21:19, Ilya Maximets wrote:
>>>> On 8/17/22 14:49, Andrey Zhadchenko via dev wrote:
>>>>> Hi!
>>>>>
>>>>> CRIU currently do not support checkpoint/restore of OVS configurations, but
>>>>> there was several requests for it. For example,
>>>>> https://github.com/lxc/lxc/issues/2909
>>>>>
>>>>> The main problem is ifindexes of newly created interfaces. We realy need to
>>>>> preserve them after restore. Current openvswitch API does not allow to
>>>>> specify ifindex. Most of the time we can just create an interface via
>>>>> generic netlink requests and plug it into ovs but datapaths (generally any
>>>>> OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
>>>>> do not support selecting ifindex.
>>>>
>>>> Hmm.  Assuming you restored network interfaces by re-creating them
>>>> on a target system, but I'm curious how do you restore the upcall PID?
>>>> Are you somehow re-creating the netlink socket with the same PID?
>>>> If that will not be done, no traffic will be able to flow through OVS
>>>> anyway until you remove/re-add the port in userspace or re-start OVS.
>>>> Or am I missing something?
>>>>
>>>> Best regards, Ilya Maximets.
>>>
>>> Yes, CRIU is able to restore socket nl_pid. We get it via NDIAG_PROTO_ALL
>>> netlink protocol requests (see net/netlink/diag.c)  Upcall pid is exported
>>> by get requests via OVS_VPORT_ATTR_UPCALL_PID.
>>> So everything is fine here.
>>
>> I didn't dig deep into how that works, but sounds interesting.
>> Thanks for the pointers!
>>
>>>
>>> I should note that I did not test *complicated* setups with ovs-vswitchd,
>>> mostly basic ones like veth plugging and several containers in network. We
>>> mainly supported Weave Net k8s SDN  which is based on ovs but do not use its
>>> daemon.
>>>
>>> Maybe if this is merged and people start use this we will find more problems
>>> with checkpoint/restore, but for now the only problem is volatile ifindex.
>>
>> Current implementation even with ifindexes sorted out will not work for
>> at least one reason for recent versions of OVS.  Since last year OVS doesn't
>> use OVS_VPORT_ATTR_UPCALL_PID if kernel supports OVS_DP_ATTR_PER_CPU_PIDS
>> instead.  It's a datapath-wide CPU ID to PID mapping for per-CPU upcall
>> dispatch mode.  It is used by default starting with OVS 2.16. >
>> So, you need to make sure you're correctly restoring 'user_features' and
>> the OVS_DP_ATTR_PER_CPU_PIDS.  Problem here is that OVS_DP_ATTR_PER_CPU_PIDS
>> currently not dumped to userpsace via GET request, simply because ovs-vswitchd
>> has no use for it.  So, you'll need to add that as well.
> Thanks, this is very important! I will make v2 soon.
> 
>>
>> And there could be some issues when starting OVS from a checkpoint created
>> on a system with different number of CPU cores.  Traffic will not be broken,
>> but performance may be affected, and there might be some kernel warnings.
> Migrating to another type of CPU is a challenge usually due to different CPUID
> and some other problems (do we handle cpu cgroup values if ncpus changed? no
> idea honestly). Anyway thanks for pointing that out.

TBH, it is a challenge to just figure out CPU IDs on a system you're running at,
migration to a different CPU topology will be indeed a major pain if you want to
preserve performance characteristics on a new setup.  It's probably much easier
to re-start ovs-vswitchd after you migrated the kernel bits.  Though it doesn't
seem like something that CRIU would like to do and I understand that.  Current
ovs-vswitchd will not react to sudden changes in CPU topology/affinity.  Reacting
to changes in CPU affinity though is something we're exploring, because it can
happen in k8s-like environments with different performance monitoring/tweaking
tools involved.

Regarding more "complex" scenarios, I should also mention qdisc's that OVS creates
and attaches to interfaces it manages.  These could be for the purpose of QoS,
ingress policing or offloading to TC flower.  OVS may be confused if these
qdisc's will suddenly disappear.  This may cause some traffic to stop flowing
or be directed to where it shouldn't be.  Don't know if CRIU covers that part.

There are also basic XDP programs that could be attached to interfaces along with
registered umem blocks if users are running userspace datapath with AF_XDP ports.
Is AF_XDP sockets or io_uring something that CRIU is able to migrate?  Just curious.

>> If you won't restore OVS_DP_ATTR_PER_CPU_PIDS, traffic will not work on
>> recent versions of OVS, including 2.17 LTS, on more or less recent kernels.
>>
>> Another fairly recent addition is OVS_DP_ATTR_MASKS_CACHE_SIZE, which is
>> not critical, but would be nice to restore as well, if you're not doing
>> that already.
> Looks like ovs_dp_cmd_fill_info() already fills it so no need to additionally
> patch kernel part. Current CRIU implementation does not care about it, but it
> is not hard to include.
> 
>>
>>>
>>> Best regards, Andrey Zhadchenko
>>>>
>>>>>
>>>>> This patch allows to do so.
>>>>> For new datapaths I decided to use dp_infindex in header as infindex
>>>>> because it control ifindex for other requests too.
>>>>> For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
>>>>>
>>>>> The only concern I have is that previously dp_ifindex was not used for
>>>>> OVS_DP_VMD_NEW requests and some software may not set it to zero. However
>>>>> we have been running this patch at Virtuozzo for 2 years and have not
>>>>> encountered this problem. Not sure if it is worth to add new
>>>>> ovs_datapath_attr instead.
>>>>>
>>>>>
>>>>> As a broader solution, another generic approach is possible: modify
>>>>> __dev_change_net_namespace() to allow changing ifindexes within the same
>>>>> netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
>>>>> sure that all its users are ready for ifindex change.
>>>>> This will be indeed better for CRIU so we won't need to change every SDN
>>>>> module to be able to checkpoint/restore it. And probably avoid some bloat.
>>>>> What do you think of this?
>>>>>
>>>>> Andrey Zhadchenko (1):
>>>>>     openvswitch: allow specifying ifindex of new interfaces
>>>>>
>>>>>    include/uapi/linux/openvswitch.h     |  4 ++++
>>>>>    net/openvswitch/datapath.c           | 16 ++++++++++++++--
>>>>>    net/openvswitch/vport-internal_dev.c |  1 +
>>>>>    net/openvswitch/vport.h              |  2 ++
>>>>>    4 files changed, 21 insertions(+), 2 deletions(-)
>>>>>
>>>>
>>
Andrey Zhadchenko Aug. 18, 2022, 12:26 p.m. UTC | #7
On 8/18/22 14:06, Ilya Maximets wrote:
> On 8/18/22 01:11, Andrey Zhadchenko wrote:
>>
>>
>> On 8/18/22 01:15, Ilya Maximets wrote:
>>> On 8/17/22 22:35, Andrey Zhadchenko wrote:
>>>>
>>>>
>>>> On 8/17/22 21:19, Ilya Maximets wrote:
>>>>> On 8/17/22 14:49, Andrey Zhadchenko via dev wrote:
>>>>>> Hi!
>>>>>>
>>>>>> CRIU currently do not support checkpoint/restore of OVS configurations, but
>>>>>> there was several requests for it. For example,
>>>>>> https://github.com/lxc/lxc/issues/2909
>>>>>>
>>>>>> The main problem is ifindexes of newly created interfaces. We realy need to
>>>>>> preserve them after restore. Current openvswitch API does not allow to
>>>>>> specify ifindex. Most of the time we can just create an interface via
>>>>>> generic netlink requests and plug it into ovs but datapaths (generally any
>>>>>> OVS_VPORT_TYPE_INTERNAL) can only be created via openvswitch requests which
>>>>>> do not support selecting ifindex.
>>>>>
>>>>> Hmm.  Assuming you restored network interfaces by re-creating them
>>>>> on a target system, but I'm curious how do you restore the upcall PID?
>>>>> Are you somehow re-creating the netlink socket with the same PID?
>>>>> If that will not be done, no traffic will be able to flow through OVS
>>>>> anyway until you remove/re-add the port in userspace or re-start OVS.
>>>>> Or am I missing something?
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>
>>>> Yes, CRIU is able to restore socket nl_pid. We get it via NDIAG_PROTO_ALL
>>>> netlink protocol requests (see net/netlink/diag.c)  Upcall pid is exported
>>>> by get requests via OVS_VPORT_ATTR_UPCALL_PID.
>>>> So everything is fine here.
>>>
>>> I didn't dig deep into how that works, but sounds interesting.
>>> Thanks for the pointers!
>>>
>>>>
>>>> I should note that I did not test *complicated* setups with ovs-vswitchd,
>>>> mostly basic ones like veth plugging and several containers in network. We
>>>> mainly supported Weave Net k8s SDN  which is based on ovs but do not use its
>>>> daemon.
>>>>
>>>> Maybe if this is merged and people start use this we will find more problems
>>>> with checkpoint/restore, but for now the only problem is volatile ifindex.
>>>
>>> Current implementation even with ifindexes sorted out will not work for
>>> at least one reason for recent versions of OVS.  Since last year OVS doesn't
>>> use OVS_VPORT_ATTR_UPCALL_PID if kernel supports OVS_DP_ATTR_PER_CPU_PIDS
>>> instead.  It's a datapath-wide CPU ID to PID mapping for per-CPU upcall
>>> dispatch mode.  It is used by default starting with OVS 2.16. >
>>> So, you need to make sure you're correctly restoring 'user_features' and
>>> the OVS_DP_ATTR_PER_CPU_PIDS.  Problem here is that OVS_DP_ATTR_PER_CPU_PIDS
>>> currently not dumped to userpsace via GET request, simply because ovs-vswitchd
>>> has no use for it.  So, you'll need to add that as well.
>> Thanks, this is very important! I will make v2 soon.
>>
>>>
>>> And there could be some issues when starting OVS from a checkpoint created
>>> on a system with different number of CPU cores.  Traffic will not be broken,
>>> but performance may be affected, and there might be some kernel warnings.
>> Migrating to another type of CPU is a challenge usually due to different CPUID
>> and some other problems (do we handle cpu cgroup values if ncpus changed? no
>> idea honestly). Anyway thanks for pointing that out.
> 
> TBH, it is a challenge to just figure out CPU IDs on a system you're running at,
> migration to a different CPU topology will be indeed a major pain if you want to
> preserve performance characteristics on a new setup.  It's probably much easier
> to re-start ovs-vswitchd after you migrated the kernel bits.  Though it doesn't
> seem like something that CRIU would like to do and I understand that.  Current
> ovs-vswitchd will not react to sudden changes in CPU topology/affinity.  Reacting
> to changes in CPU affinity though is something we're exploring, because it can
> happen in k8s-like environments with different performance monitoring/tweaking
> tools involved.
> 
> Regarding more "complex" scenarios, I should also mention qdisc's that OVS creates
> and attaches to interfaces it manages.  These could be for the purpose of QoS,
> ingress policing or offloading to TC flower.  OVS may be confused if these
> qdisc's will suddenly disappear.  This may cause some traffic to stop flowing
> or be directed to where it shouldn't be.  Don't know if CRIU covers that part. >
> There are also basic XDP programs that could be attached to interfaces along with
> registered umem blocks if users are running userspace datapath with AF_XDP ports.
> Is AF_XDP sockets or io_uring something that CRIU is able to migrate?  Just curious.

CRIU poorly handles traffic shaping. I assume all custom qdiscs are gone 
after migration. Probably we should refuse to checkpoint any non-default 
values to prevent unexpected results.

XDP (and eBPF in general) are not supported by CRIU. We can 
checkpoint/restore only simple classical BPF attached via SO_ATTACH_FILTER.

As far as I know io_uring support have some working PoC made in GSoC 
2021 but not yet merged.


> 
>>> If you won't restore OVS_DP_ATTR_PER_CPU_PIDS, traffic will not work on
>>> recent versions of OVS, including 2.17 LTS, on more or less recent kernels.
>>>
>>> Another fairly recent addition is OVS_DP_ATTR_MASKS_CACHE_SIZE, which is
>>> not critical, but would be nice to restore as well, if you're not doing
>>> that already.
>> Looks like ovs_dp_cmd_fill_info() already fills it so no need to additionally
>> patch kernel part. Current CRIU implementation does not care about it, but it
>> is not hard to include.
>>
>>>
>>>>
>>>> Best regards, Andrey Zhadchenko
>>>>>
>>>>>>
>>>>>> This patch allows to do so.
>>>>>> For new datapaths I decided to use dp_infindex in header as infindex
>>>>>> because it control ifindex for other requests too.
>>>>>> For internal vports I reused OVS_VPORT_ATTR_IFINDEX.
>>>>>>
>>>>>> The only concern I have is that previously dp_ifindex was not used for
>>>>>> OVS_DP_VMD_NEW requests and some software may not set it to zero. However
>>>>>> we have been running this patch at Virtuozzo for 2 years and have not
>>>>>> encountered this problem. Not sure if it is worth to add new
>>>>>> ovs_datapath_attr instead.
>>>>>>
>>>>>>
>>>>>> As a broader solution, another generic approach is possible: modify
>>>>>> __dev_change_net_namespace() to allow changing ifindexes within the same
>>>>>> netns. Yet we will still need to ignore NETIF_F_NETNS_LOCAL and I am not
>>>>>> sure that all its users are ready for ifindex change.
>>>>>> This will be indeed better for CRIU so we won't need to change every SDN
>>>>>> module to be able to checkpoint/restore it. And probably avoid some bloat.
>>>>>> What do you think of this?
>>>>>>
>>>>>> Andrey Zhadchenko (1):
>>>>>>      openvswitch: allow specifying ifindex of new interfaces
>>>>>>
>>>>>>     include/uapi/linux/openvswitch.h     |  4 ++++
>>>>>>     net/openvswitch/datapath.c           | 16 ++++++++++++++--
>>>>>>     net/openvswitch/vport-internal_dev.c |  1 +
>>>>>>     net/openvswitch/vport.h              |  2 ++
>>>>>>     4 files changed, 21 insertions(+), 2 deletions(-)
>>>>>>
>>>>>
>>>
>