| Message ID | 20260401091318.2671624-1-elibr@nvidia.com |
|---|---|
| Headers | show |
| Series | netdev-doca | expand |
On 1 Apr 2026, at 11:13, Eli Britstein wrote:
> Introduce a new netdev type - netdev-doca.
Hi Eli/Gaetan,
Not sure if you’ve noticed, but this series has failing tests due to a
UBSan issue in patch 1. Maybe you can take a quick look and fix this so
that all unit tests pass.
I would suggest setting up a GitHub Action in your local repo so these
tests run before submitting a series. That way, you can be sure it passes
at least the basic checks.
I’ll start reviewing the series, but I’ll be on PTO at the end of next
week for about a week, so a full review may take some time. I’ll try to
send comments on individual patches so you can continue working on what
has been reviewed so far.
Cheers,
Eelco
Example error:
../../lib/ovs-rcu.c:292:9: runtime error: call to function rule_destroy_cb through pointer to incorrect function type 'void (*)(void *)'
/home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/../../lib/ovs-router.c:1244: note: rule_destroy_cb defined here
#0 0x560e4675ce49 in ovsrcu_run_cbset /home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/../../lib/ovs-rcu.c:292:9
#1 0x560e4675bef1 in ovsrcu_call_postponed /home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/../../lib/ovs-rcu.c:370:9
#2 0x560e4687e368 in time_poll /home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/../../lib/timeval.c:325:17
#3 0x560e467f76d4 in poll_block /home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/../../lib/poll-loop.c:364:14
#4 0x560e4613c401 in main /home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/../../vswitchd/ovs-vswitchd.c:149:9
#5 0x7efe9d42a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb)
#6 0x7efe9d42a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb)
#7 0x560e46027fa4 in _start (/home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/vswitchd/ovs-vswitchd+0x74efa4) (BuildId: a540f0f744c789e977aecf68f1c17a835856d3f8)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../lib/ovs-rcu.c:292:9
[...]
On 09/04/2026 15:16, Eelco Chaudron wrote: > External email: Use caution opening links or attachments > > > On 1 Apr 2026, at 11:13, Eli Britstein wrote: > >> Introduce a new netdev type - netdev-doca. > Hi Eli/Gaetan, > > Not sure if you’ve noticed, but this series has failing tests due to a > UBSan issue in patch 1. Maybe you can take a quick look and fix this so > that all unit tests pass. > > I would suggest setting up a GitHub Action in your local repo so these > tests run before submitting a series. That way, you can be sure it passes > at least the basic checks. > > I’ll start reviewing the series, but I’ll be on PTO at the end of next > week for about a week, so a full review may take some time. I’ll try to > send comments on individual patches so you can continue working on what > has been reviewed so far. > > Cheers, > Eelco > > > Example error: > > ../../lib/ovs-rcu.c:292:9: runtime error: call to function rule_destroy_cb through pointer to incorrect function type 'void (*)(void *)' > /home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/../../lib/ovs-router.c:1244: note: rule_destroy_cb defined here > #0 0x560e4675ce49 in ovsrcu_run_cbset /home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/../../lib/ovs-rcu.c:292:9 > #1 0x560e4675bef1 in ovsrcu_call_postponed /home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/../../lib/ovs-rcu.c:370:9 > #2 0x560e4687e368 in time_poll /home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/../../lib/timeval.c:325:17 > #3 0x560e467f76d4 in poll_block /home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/../../lib/poll-loop.c:364:14 > #4 0x560e4613c401 in main /home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/../../vswitchd/ovs-vswitchd.c:149:9 > #5 0x7efe9d42a1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb) > #6 0x7efe9d42a28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb) > #7 0x560e46027fa4 in _start (/home/runner/work/ovs/ovs/openvswitch-3.7.90/_build/sub/vswitchd/ovs-vswitchd+0x74efa4) (BuildId: a540f0f744c789e977aecf68f1c17a835856d3f8) > > SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../lib/ovs-rcu.c:292:9 Sure, I'll take that into consideration. So far, I found some issues myself that will be fixed in v4: - refmap still may fail in the scenario you described - there is no destroy of the mempool - demote some error messages to WARN if adding a non-existent port or invalid port - some documentation fixes and improvements Also, currently there is no support to build deb/rpm with doca support. I'm working on that, so it will probably affect the flags stuff around acinclude.m4. Enjoy your PTO. > > [...] >
On 1 Apr 2026, at 11:13, Eli Britstein wrote: > Introduce a new netdev type - netdev-doca. > In order to compile, need to install doca on the build machine. Hi Eli, I was running the DOCA test suite patch[1] on this series, and I get a few errors. The earlier tunnel related errors are gone though. 95: conntrack - IPv4 fragmentation FAILED (system-traffic.at:4881) 97: conntrack - IPv4 fragmentation expiry FAILED (system-traffic.at:5002) 105: conntrack - IPv6 fragmentation FAILED (system-traffic.at:5264) 107: conntrack - IPv6 fragmentation expiry FAILED (system-traffic.at:5390) These tests pass fine with check-dpdk and check-dpdk-offloads. I did not investigate the issues, as I'm focussing on the general review of the series, so please take a look. Cheers, Eelco [1]: https://patchwork.ozlabs.org/project/openvswitch/patch/e88f13b8266bd6bb792a522b8b323e76923f376e.1777380423.git.echaudro@redhat.com/
On 30/04/2026 12:25, Eelco Chaudron wrote: > External email: Use caution opening links or attachments > > > On 1 Apr 2026, at 11:13, Eli Britstein wrote: > >> Introduce a new netdev type - netdev-doca. >> In order to compile, need to install doca on the build machine. > Hi Eli, > > I was running the DOCA test suite patch[1] on this series, and I get a > few errors. The earlier tunnel related errors are gone though. > > 95: conntrack - IPv4 fragmentation FAILED (system-traffic.at:4881) > 97: conntrack - IPv4 fragmentation expiry FAILED (system-traffic.at:5002) > 105: conntrack - IPv6 fragmentation FAILED (system-traffic.at:5264) > 107: conntrack - IPv6 fragmentation expiry FAILED (system-traffic.at:5390) > > These tests pass fine with check-dpdk and check-dpdk-offloads. > > I did not investigate the issues, as I'm focussing on the general > review of the series, so please take a look. Thanks Eelco for this info. I looked into it. The RC is that there is a UAF in the following scenario: 1. A frag packet is added to frag_list in ipf module, and not returned to the mempool. 2. the datapath is destroyed. At this point, we first remove the ports, and only then destroy CT/IPF. 3. When a doca port is destroyed, it frees its packet mempool (rte_mempool_free). in this process, the memory is zeroed in dpdk code: lib/eal/common/malloc_elem.c -> malloc_elem_free() -> memset(ptr, 0, data_len). 4. When ipf_destroy occurs, the packet is already free. It crashes when IPF tries to do dp_packet_delete(). With DPDK, there is this commit that frees a mempool only when it's "full", meaning all its mbufs are returned to the pool: 91fccdad72a2 ("netdev-dpdk: Free mempool only when no in-use mbufs.") With this, the mempool is not destroyed at all unless another dpdk port is configured, then the sweep mechanism will destroy it. IMO, this behavior is a kind of a WA, for which I'm not sure what is the correct solution. I will add similar doca-sweep for now, but I think this is not correct. WDYT? PS: I had a lot of trouble running the tests. There are few fixes to the infrastructure that I will post. > > Cheers, > > Eelco > > > [1]: https://patchwork.ozlabs.org/project/openvswitch/patch/e88f13b8266bd6bb792a522b8b323e76923f376e.1777380423.git.echaudro@redhat.com/ >
On 3 May 2026, at 13:57, Eli Britstein wrote: > On 30/04/2026 12:25, Eelco Chaudron wrote: >> External email: Use caution opening links or attachments >> >> >> On 1 Apr 2026, at 11:13, Eli Britstein wrote: >> >>> Introduce a new netdev type - netdev-doca. >>> In order to compile, need to install doca on the build machine. >> Hi Eli, >> >> I was running the DOCA test suite patch[1] on this series, and I get a >> few errors. The earlier tunnel related errors are gone though. >> >> 95: conntrack - IPv4 fragmentation FAILED (system-traffic.at:4881) >> 97: conntrack - IPv4 fragmentation expiry FAILED (system-traffic.at:5002) >> 105: conntrack - IPv6 fragmentation FAILED (system-traffic.at:5264) >> 107: conntrack - IPv6 fragmentation expiry FAILED (system-traffic.at:5390) >> >> These tests pass fine with check-dpdk and check-dpdk-offloads. >> >> I did not investigate the issues, as I'm focussing on the general >> review of the series, so please take a look. > > Thanks Eelco for this info. > > I looked into it. > > The RC is that there is a UAF in the following scenario: > > 1. A frag packet is added to frag_list in ipf module, and not returned to the mempool. > 2. the datapath is destroyed. At this point, we first remove the ports, and only then destroy CT/IPF. > 3. When a doca port is destroyed, it frees its packet mempool (rte_mempool_free). in this process, the memory is zeroed in dpdk code: lib/eal/common/malloc_elem.c -> malloc_elem_free() -> memset(ptr, 0, data_len). > 4. When ipf_destroy occurs, the packet is already free. It crashes when IPF tries to do dp_packet_delete(). > > With DPDK, there is this commit that frees a mempool only when it's "full", meaning all its mbufs are returned to the pool: > > 91fccdad72a2 ("netdev-dpdk: Free mempool only when no in-use mbufs.") > > With this, the mempool is not destroyed at all unless another dpdk port is configured, then the sweep mechanism will destroy it. > > IMO, this behavior is a kind of a WA, for which I'm not sure what is the correct solution. > > I will add similar doca-sweep for now, but I think this is not correct. > > WDYT? I've included Mike and Kevin who worked on ipf before. Maybe they have some more insight into how to fix this. > > PS: I had a lot of trouble running the tests. There are few fixes to the infrastructure that I will post. Interested to find out what your problems are, as I had no problems following the documentation. Anyhow, I can include your changes (if you reply to the RFC patch), or you can include my patch in your series. Cheers, Eelco
On Mon, May 4, 2026 at 3:30 AM Eelco Chaudron <echaudro@redhat.com> wrote: > > > On 3 May 2026, at 13:57, Eli Britstein wrote: > > > On 30/04/2026 12:25, Eelco Chaudron wrote: > >> External email: Use caution opening links or attachments > >> > >> > >> On 1 Apr 2026, at 11:13, Eli Britstein wrote: > >> > >>> Introduce a new netdev type - netdev-doca. > >>> In order to compile, need to install doca on the build machine. > >> Hi Eli, > >> > >> I was running the DOCA test suite patch[1] on this series, and I get a > >> few errors. The earlier tunnel related errors are gone though. > >> > >> 95: conntrack - IPv4 fragmentation FAILED ( > system-traffic.at:4881) > >> 97: conntrack - IPv4 fragmentation expiry FAILED ( > system-traffic.at:5002) > >> 105: conntrack - IPv6 fragmentation FAILED ( > system-traffic.at:5264) > >> 107: conntrack - IPv6 fragmentation expiry FAILED ( > system-traffic.at:5390) > >> > >> These tests pass fine with check-dpdk and check-dpdk-offloads. > >> > >> I did not investigate the issues, as I'm focussing on the general > >> review of the series, so please take a look. > > > > Thanks Eelco for this info. > > > > I looked into it. > > > > The RC is that there is a UAF in the following scenario: > > > > 1. A frag packet is added to frag_list in ipf module, and not returned > to the mempool. > > 2. the datapath is destroyed. At this point, we first remove the ports, > and only then destroy CT/IPF. > > 3. When a doca port is destroyed, it frees its packet mempool > (rte_mempool_free). in this process, the memory is zeroed in dpdk code: > lib/eal/common/malloc_elem.c -> malloc_elem_free() -> memset(ptr, 0, > data_len). > > 4. When ipf_destroy occurs, the packet is already free. It crashes when > IPF tries to do dp_packet_delete(). > > > > With DPDK, there is this commit that frees a mempool only when it's > "full", meaning all its mbufs are returned to the pool: > > > > 91fccdad72a2 ("netdev-dpdk: Free mempool only when no in-use mbufs.") > > > > With this, the mempool is not destroyed at all unless another dpdk port > is configured, then the sweep mechanism will destroy it. > > > > IMO, this behavior is a kind of a WA, for which I'm not sure what is the > correct solution. > > > > I will add similar doca-sweep for now, but I think this is not correct. > > > > WDYT? > > I've included Mike and Kevin who worked on ipf before. Maybe they > have some more insight into how to fix this. > The commit message indicates a non-ipf motivation for the mp_sweep thread, and the current behaviour is very convenient because it allows ipf to age off those packets as part of the frag expiry mechanism. But I think it makes sense for ipf to walk the frags_list and purge any fragments that come from an interface before we destroy the iface. Aside from the UAF, the current behaviour could also enable a fragment from a deleted interface to interfere with the packet transmission of a new interface. However, in the current implementation of ipf that would be a very expensive operation. We could add another index on in_port to help reduce that cost. We may also want to revisit how the ipf_lock works, a read/write lock on the frag_lists (including exp_list, complete_list) and a separate mutex in each ipf_list. This would improve parallelization, at the expense of more complex code. I could create an RFC for this if others agree. Cheers, M > > > > PS: I had a lot of trouble running the tests. There are few fixes to the > infrastructure that I will post. > > Interested to find out what your problems are, as I had no problems > following the documentation. Anyhow, I can include your changes (if you > reply to the RFC patch), or you can include my patch in your series. > > Cheers, > > Eelco > >