mbox series

[ovs-dev,v15,0/7] Add offload support for sFlow

Message ID 20210915124319.1683219-1-cmi@nvidia.com
Headers show
Series Add offload support for sFlow | expand

Message

Chris Mi Sept. 15, 2021, 12:43 p.m. UTC
This patch set adds offload support for sFlow.

Psample is a genetlink channel for packet sampling. TC action act_sample
uses psample to send sampled packets to userspace.

When offloading sample action to TC, userspace creates a unique ID to
map sFlow action and tunnel info and passes this ID to kernel instead
of the sFlow info. psample will send this ID and sampled packet to
userspace. Using the ID, userspace can recover the sFlow info and send
sampled packet to the right sFlow monitoring host.

v2-v1:
- Fix robot errors.
v3-v2:
- Remove Gerrit Change-Id.
- Add patch #9 to fix older kernels build issue.
- Add travis test result.
v4-v3:
- Fix offload issue when sampling rate is 1.
v5-v4:
- Move polling thread from ofproto to netdev-offload-tc.
v6-v5:
- Rebase.
- Add GitHub Actions test result.
v7-v6:
- Remove Gerrit Change-Id.
- Fix "ERROR: Inappropriate spacing around cast"
v8-v7
- Address Eelco Chaudron's comment for patch #11.
v9-v8
- Remove sflow_len from struct dpif_sflow_attr.
- Log a debug message for other userspace actions.
v10-v9
- Address Eelco Chaudron's comments on v9.
v11-v10
- Fix a bracing error.
v12-v11
- Add duplicate sample group id check.
v13-v12
- Remove the psample poll thread from netdev-offload-tc and reuse
  ofproto handler thread according to Ilya's new desgin.
- Add dpif-offload-provider layer according to Eli's suggestion.
v14-v13
- Fix a robot error.
v15-v14
- Address Eelco Chaudron's comments on v14.

Chris Mi (7):
  compat: Add psample and tc sample action defines for older kernels
  ovs-kmod-ctl: Load kernel module psample
  dpif-offload-provider: Introduce dpif-offload-provider layer
  netdev-offload-tc: Introduce group ID management API
  dpif-offload-netlink: Implement dpif-offload-provider API
  ofproto: Introduce API to process sFlow offload packet
  netdev-offload-tc: Add offload support for sFlow

 NEWS                             |   1 +
 include/linux/automake.mk        |   4 +-
 include/linux/psample.h          |  62 +++++
 include/linux/tc_act/tc_sample.h |  25 ++
 lib/automake.mk                  |   3 +
 lib/dpif-netdev.c                |   1 +
 lib/dpif-netlink.c               |   2 +
 lib/dpif-offload-netlink.c       | 208 ++++++++++++++
 lib/dpif-offload-provider.h      |  75 +++++
 lib/dpif-offload.c               |  43 +++
 lib/dpif-provider.h              |   8 +-
 lib/dpif.c                       |  10 +
 lib/netdev-offload-tc.c          | 459 +++++++++++++++++++++++++++++--
 lib/netdev-offload.h             |   1 +
 lib/tc.c                         |  61 +++-
 lib/tc.h                         |  16 +-
 ofproto/ofproto-dpif-upcall.c    |  63 +++++
 utilities/ovs-kmod-ctl.in        |  14 +
 18 files changed, 1030 insertions(+), 26 deletions(-)
 create mode 100644 include/linux/psample.h
 create mode 100644 include/linux/tc_act/tc_sample.h
 create mode 100644 lib/dpif-offload-netlink.c
 create mode 100644 lib/dpif-offload-provider.h
 create mode 100644 lib/dpif-offload.c

Comments

Eelco Chaudron Sept. 21, 2021, 8:12 a.m. UTC | #1
Hi Chris,

Just a quick update, I did see your responses to v14 and I also noticed you send out a v15. I planned to review it this week, but due to some other unforeseen stuff, I have to move it to next week (if nothing more is going to mess up my plan ;)

Cheers,

Eelco


On 15 Sep 2021, at 14:43, Chris Mi wrote:

> This patch set adds offload support for sFlow.
>
> Psample is a genetlink channel for packet sampling. TC action act_sample
> uses psample to send sampled packets to userspace.
>
> When offloading sample action to TC, userspace creates a unique ID to
> map sFlow action and tunnel info and passes this ID to kernel instead
> of the sFlow info. psample will send this ID and sampled packet to
> userspace. Using the ID, userspace can recover the sFlow info and send
> sampled packet to the right sFlow monitoring host.
>
> v2-v1:
> - Fix robot errors.
> v3-v2:
> - Remove Gerrit Change-Id.
> - Add patch #9 to fix older kernels build issue.
> - Add travis test result.
> v4-v3:
> - Fix offload issue when sampling rate is 1.
> v5-v4:
> - Move polling thread from ofproto to netdev-offload-tc.
> v6-v5:
> - Rebase.
> - Add GitHub Actions test result.
> v7-v6:
> - Remove Gerrit Change-Id.
> - Fix "ERROR: Inappropriate spacing around cast"
> v8-v7
> - Address Eelco Chaudron's comment for patch #11.
> v9-v8
> - Remove sflow_len from struct dpif_sflow_attr.
> - Log a debug message for other userspace actions.
> v10-v9
> - Address Eelco Chaudron's comments on v9.
> v11-v10
> - Fix a bracing error.
> v12-v11
> - Add duplicate sample group id check.
> v13-v12
> - Remove the psample poll thread from netdev-offload-tc and reuse
>   ofproto handler thread according to Ilya's new desgin.
> - Add dpif-offload-provider layer according to Eli's suggestion.
> v14-v13
> - Fix a robot error.
> v15-v14
> - Address Eelco Chaudron's comments on v14.
>
> Chris Mi (7):
>   compat: Add psample and tc sample action defines for older kernels
>   ovs-kmod-ctl: Load kernel module psample
>   dpif-offload-provider: Introduce dpif-offload-provider layer
>   netdev-offload-tc: Introduce group ID management API
>   dpif-offload-netlink: Implement dpif-offload-provider API
>   ofproto: Introduce API to process sFlow offload packet
>   netdev-offload-tc: Add offload support for sFlow
>
>  NEWS                             |   1 +
>  include/linux/automake.mk        |   4 +-
>  include/linux/psample.h          |  62 +++++
>  include/linux/tc_act/tc_sample.h |  25 ++
>  lib/automake.mk                  |   3 +
>  lib/dpif-netdev.c                |   1 +
>  lib/dpif-netlink.c               |   2 +
>  lib/dpif-offload-netlink.c       | 208 ++++++++++++++
>  lib/dpif-offload-provider.h      |  75 +++++
>  lib/dpif-offload.c               |  43 +++
>  lib/dpif-provider.h              |   8 +-
>  lib/dpif.c                       |  10 +
>  lib/netdev-offload-tc.c          | 459 +++++++++++++++++++++++++++++--
>  lib/netdev-offload.h             |   1 +
>  lib/tc.c                         |  61 +++-
>  lib/tc.h                         |  16 +-
>  ofproto/ofproto-dpif-upcall.c    |  63 +++++
>  utilities/ovs-kmod-ctl.in        |  14 +
>  18 files changed, 1030 insertions(+), 26 deletions(-)
>  create mode 100644 include/linux/psample.h
>  create mode 100644 include/linux/tc_act/tc_sample.h
>  create mode 100644 lib/dpif-offload-netlink.c
>  create mode 100644 lib/dpif-offload-provider.h
>  create mode 100644 lib/dpif-offload.c
>
> -- 
> 2.27.0
Chris Mi Sept. 22, 2021, 2:23 a.m. UTC | #2
Hi Eelco,

That's ok. Please review it according to your plan. 😁

Thanks,
Chris

On 9/21/2021 4:12 PM, Eelco Chaudron wrote:
> Hi Chris,
>
> Just a quick update, I did see your responses to v14 and I also noticed you send out a v15. I planned to review it this week, but due to some other unforeseen stuff, I have to move it to next week (if nothing more is going to mess up my plan ;)
>
> Cheers,
>
> Eelco
>
>
> On 15 Sep 2021, at 14:43, Chris Mi wrote:
>
>> This patch set adds offload support for sFlow.
>>
>> Psample is a genetlink channel for packet sampling. TC action act_sample
>> uses psample to send sampled packets to userspace.
>>
>> When offloading sample action to TC, userspace creates a unique ID to
>> map sFlow action and tunnel info and passes this ID to kernel instead
>> of the sFlow info. psample will send this ID and sampled packet to
>> userspace. Using the ID, userspace can recover the sFlow info and send
>> sampled packet to the right sFlow monitoring host.
>>
>> v2-v1:
>> - Fix robot errors.
>> v3-v2:
>> - Remove Gerrit Change-Id.
>> - Add patch #9 to fix older kernels build issue.
>> - Add travis test result.
>> v4-v3:
>> - Fix offload issue when sampling rate is 1.
>> v5-v4:
>> - Move polling thread from ofproto to netdev-offload-tc.
>> v6-v5:
>> - Rebase.
>> - Add GitHub Actions test result.
>> v7-v6:
>> - Remove Gerrit Change-Id.
>> - Fix "ERROR: Inappropriate spacing around cast"
>> v8-v7
>> - Address Eelco Chaudron's comment for patch #11.
>> v9-v8
>> - Remove sflow_len from struct dpif_sflow_attr.
>> - Log a debug message for other userspace actions.
>> v10-v9
>> - Address Eelco Chaudron's comments on v9.
>> v11-v10
>> - Fix a bracing error.
>> v12-v11
>> - Add duplicate sample group id check.
>> v13-v12
>> - Remove the psample poll thread from netdev-offload-tc and reuse
>>    ofproto handler thread according to Ilya's new desgin.
>> - Add dpif-offload-provider layer according to Eli's suggestion.
>> v14-v13
>> - Fix a robot error.
>> v15-v14
>> - Address Eelco Chaudron's comments on v14.
>>
>> Chris Mi (7):
>>    compat: Add psample and tc sample action defines for older kernels
>>    ovs-kmod-ctl: Load kernel module psample
>>    dpif-offload-provider: Introduce dpif-offload-provider layer
>>    netdev-offload-tc: Introduce group ID management API
>>    dpif-offload-netlink: Implement dpif-offload-provider API
>>    ofproto: Introduce API to process sFlow offload packet
>>    netdev-offload-tc: Add offload support for sFlow
>>
>>   NEWS                             |   1 +
>>   include/linux/automake.mk        |   4 +-
>>   include/linux/psample.h          |  62 +++++
>>   include/linux/tc_act/tc_sample.h |  25 ++
>>   lib/automake.mk                  |   3 +
>>   lib/dpif-netdev.c                |   1 +
>>   lib/dpif-netlink.c               |   2 +
>>   lib/dpif-offload-netlink.c       | 208 ++++++++++++++
>>   lib/dpif-offload-provider.h      |  75 +++++
>>   lib/dpif-offload.c               |  43 +++
>>   lib/dpif-provider.h              |   8 +-
>>   lib/dpif.c                       |  10 +
>>   lib/netdev-offload-tc.c          | 459 +++++++++++++++++++++++++++++--
>>   lib/netdev-offload.h             |   1 +
>>   lib/tc.c                         |  61 +++-
>>   lib/tc.h                         |  16 +-
>>   ofproto/ofproto-dpif-upcall.c    |  63 +++++
>>   utilities/ovs-kmod-ctl.in        |  14 +
>>   18 files changed, 1030 insertions(+), 26 deletions(-)
>>   create mode 100644 include/linux/psample.h
>>   create mode 100644 include/linux/tc_act/tc_sample.h
>>   create mode 100644 lib/dpif-offload-netlink.c
>>   create mode 100644 lib/dpif-offload-provider.h
>>   create mode 100644 lib/dpif-offload.c
>>
>> -- 
>> 2.27.0
Eelco Chaudron Oct. 1, 2021, 9:35 a.m. UTC | #3
Hi Chris,

I just started to review this patchset, but as some of the v14 discussions have not finished, I’ll copy them over to v15. This way, all the open items are contained in a single thread/revision. I would suggest clear up the open items before you send a new revision.

This is only a visual review, as I did not test the code, assuming we need a new revision anyway with the test cases integrated.

Cheers,

Eelco


On 15 Sep 2021, at 14:43, Chris Mi wrote:

> This patch set adds offload support for sFlow.
>
> Psample is a genetlink channel for packet sampling. TC action act_sample
> uses psample to send sampled packets to userspace.
>
> When offloading sample action to TC, userspace creates a unique ID to
> map sFlow action and tunnel info and passes this ID to kernel instead
> of the sFlow info. psample will send this ID and sampled packet to
> userspace. Using the ID, userspace can recover the sFlow info and send
> sampled packet to the right sFlow monitoring host.
>
> v2-v1:
> - Fix robot errors.
> v3-v2:
> - Remove Gerrit Change-Id.
> - Add patch #9 to fix older kernels build issue.
> - Add travis test result.
> v4-v3:
> - Fix offload issue when sampling rate is 1.
> v5-v4:
> - Move polling thread from ofproto to netdev-offload-tc.
> v6-v5:
> - Rebase.
> - Add GitHub Actions test result.
> v7-v6:
> - Remove Gerrit Change-Id.
> - Fix "ERROR: Inappropriate spacing around cast"
> v8-v7
> - Address Eelco Chaudron's comment for patch #11.
> v9-v8
> - Remove sflow_len from struct dpif_sflow_attr.
> - Log a debug message for other userspace actions.
> v10-v9
> - Address Eelco Chaudron's comments on v9.
> v11-v10
> - Fix a bracing error.
> v12-v11
> - Add duplicate sample group id check.
> v13-v12
> - Remove the psample poll thread from netdev-offload-tc and reuse
>   ofproto handler thread according to Ilya's new desgin.
> - Add dpif-offload-provider layer according to Eli's suggestion.
> v14-v13
> - Fix a robot error.
> v15-v14
> - Address Eelco Chaudron's comments on v14.
>
> Chris Mi (7):
>   compat: Add psample and tc sample action defines for older kernels
>   ovs-kmod-ctl: Load kernel module psample
>   dpif-offload-provider: Introduce dpif-offload-provider layer
>   netdev-offload-tc: Introduce group ID management API
>   dpif-offload-netlink: Implement dpif-offload-provider API
>   ofproto: Introduce API to process sFlow offload packet
>   netdev-offload-tc: Add offload support for sFlow
>
>  NEWS                             |   1 +
>  include/linux/automake.mk        |   4 +-
>  include/linux/psample.h          |  62 +++++
>  include/linux/tc_act/tc_sample.h |  25 ++
>  lib/automake.mk                  |   3 +
>  lib/dpif-netdev.c                |   1 +
>  lib/dpif-netlink.c               |   2 +
>  lib/dpif-offload-netlink.c       | 208 ++++++++++++++
>  lib/dpif-offload-provider.h      |  75 +++++
>  lib/dpif-offload.c               |  43 +++
>  lib/dpif-provider.h              |   8 +-
>  lib/dpif.c                       |  10 +
>  lib/netdev-offload-tc.c          | 459 +++++++++++++++++++++++++++++--
>  lib/netdev-offload.h             |   1 +
>  lib/tc.c                         |  61 +++-
>  lib/tc.h                         |  16 +-
>  ofproto/ofproto-dpif-upcall.c    |  63 +++++
>  utilities/ovs-kmod-ctl.in        |  14 +
>  18 files changed, 1030 insertions(+), 26 deletions(-)
>  create mode 100644 include/linux/psample.h
>  create mode 100644 include/linux/tc_act/tc_sample.h
>  create mode 100644 lib/dpif-offload-netlink.c
>  create mode 100644 lib/dpif-offload-provider.h
>  create mode 100644 lib/dpif-offload.c
>
> -- 
> 2.27.0
Eelco Chaudron Oct. 1, 2021, 9:43 a.m. UTC | #4
On 1 Oct 2021, at 11:35, Eelco Chaudron wrote:

> Hi Chris,
>
> I just started to review this patchset, but as some of the v14 discussions have not finished, I’ll copy them over to v15. This way, all the open items are contained in a single thread/revision. I would suggest clear up the open items before you send a new revision.
>
> This is only a visual review, as I did not test the code, assuming we need a new revision anyway with the test cases integrated.
>
> Cheers,
>
> Eelco


Forgot about this issue (see below), was this fixed in v15? I did not see any reply to the email chain?


> Here is some debug info, it seems related to an update to an existing handle, which does not update the tc part:
>
>  Here the flow gets added:
>
>  2021-09-09T14:01:12.278Z|00001|netdev_offload_tc(handler1)|ERR|EC_DBG: Enter b13915aa-c7a8-4574-8dcf-439f03929b8e.
>
>  2021-09-09T14:01:12.278Z|00002|netdev_offload_tc(handler1)|ERR|EC_DEBUG: port -2147483646
>
>  2021-09-09T14:01:12.285Z|00003|netdev_offload_tc(handler1)|ERR|EC_DBG: EXIT ok b13915aa-c7a8-4574-8dcf-439f03929b8e.
>
>  Here the revalidator updates it with new port information:
>
>  2021-09-09T14:01:12.537Z|00001|netdev_offload_tc(revalidator13)|ERR|EC_DBG: Enter b13915aa-c7a8-4574-8dcf-439f03929b8e.
>
>  2021-09-09T14:01:12.537Z|00002|netdev_offload_tc(revalidator13)|ERR|EC_DEBUG: port 3
>
>  2021-09-09T14:01:12.537Z|00004|netdev_offload_tc(revalidator13)|DBG|updating old handle: 1 prio: 2
>
>  2021-09-09T14:01:12.553Z|00005|netdev_offload_tc(revalidator13)|ERR|EC_DBG: EXIT ok b13915aa-c7a8-4574-8dcf-439f03929b8e.
>
>  Here is a dp dump showing the wrong/old port:
>
>  $ ovs-appctl dpctl/dump-flows -m system@ovs-system type=tc
>
>  ufid:b13915aa-c7a8-4574-8dcf-439f03929b8e, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vnet4),packet_type(ns=0/0,id=0/0),eth(src=52:54:00:88:51:38,dst=04:f4:bc:28:57:01),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:180, bytes:15120, used:0.230s, dp:tc, actions:sample(sample=10.0%,actions(userspace(pid=2771969963,sFlow(vid=0,pcp=0,output=2147483650),actions))),enp5s0f0
>
>  Guess this is the area you need to look at del_filter_and_ufid_mapping():
>
>  2389     if (get_ufid_tc_mapping(ufid, &id) == 0) {
>
>  2390         VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d",
>
>  2391                     id.handle, id.prio);
>
>  2392         info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, ufid);
>
>  2393     }
>
>  There might also be a problem in the sgid mapping as it’s also using the uuid as the hash, and there could be a collision during the update phase.
>
>  I’m working on some escalations so, please take a look and see if you can fix this.
>
>  I need to restart OVS, and then send a ping, and I see the problem once, I need to restart OVS each time to see it.
>
>  I’ll wait for v15 to see how you fixed it ;)
Chris Mi Oct. 9, 2021, 8:16 a.m. UTC | #5
On 10/1/2021 5:43 PM, Eelco Chaudron wrote:
>
> On 1 Oct 2021, at 11:35, Eelco Chaudron wrote:
>
>> Hi Chris,
>>
>> I just started to review this patchset, but as some of the v14 discussions have not finished, I’ll copy them over to v15. This way, all the open items are contained in a single thread/revision. I would suggest clear up the open items before you send a new revision.
>>
>> This is only a visual review, as I did not test the code, assuming we need a new revision anyway with the test cases integrated.
>>
>> Cheers,
>>
>> Eelco
>
> Forgot about this issue (see below), was this fixed in v15? I did not see any reply to the email chain?
I still can't reproduce the issue locally, so I didn't reply the email 
explicitly. But I revised the code in v15.
I think the change should fix it. In function dpif_sflow_attr_equal() of 
v14, only ufid is compared. In v15,
all the fields are compared. That will prevent the gid mapping to be 
reused wrongly when the flow updates
the action for the same ufid. Since this issue is a corner case, in 
dpif_sflow_attr_hash(), only ufid is hashed to save CPU cycles.
Hopefully that will fix the issue. 😁
>
>
>> Here is some debug info, it seems related to an update to an existing handle, which does not update the tc part:
>>
>>   Here the flow gets added:
>>
>>   2021-09-09T14:01:12.278Z|00001|netdev_offload_tc(handler1)|ERR|EC_DBG: Enter b13915aa-c7a8-4574-8dcf-439f03929b8e.
>>
>>   2021-09-09T14:01:12.278Z|00002|netdev_offload_tc(handler1)|ERR|EC_DEBUG: port -2147483646
>>
>>   2021-09-09T14:01:12.285Z|00003|netdev_offload_tc(handler1)|ERR|EC_DBG: EXIT ok b13915aa-c7a8-4574-8dcf-439f03929b8e.
>>
>>   Here the revalidator updates it with new port information:
>>
>>   2021-09-09T14:01:12.537Z|00001|netdev_offload_tc(revalidator13)|ERR|EC_DBG: Enter b13915aa-c7a8-4574-8dcf-439f03929b8e.
>>
>>   2021-09-09T14:01:12.537Z|00002|netdev_offload_tc(revalidator13)|ERR|EC_DEBUG: port 3
>>
>>   2021-09-09T14:01:12.537Z|00004|netdev_offload_tc(revalidator13)|DBG|updating old handle: 1 prio: 2
>>
>>   2021-09-09T14:01:12.553Z|00005|netdev_offload_tc(revalidator13)|ERR|EC_DBG: EXIT ok b13915aa-c7a8-4574-8dcf-439f03929b8e.
>>
>>   Here is a dp dump showing the wrong/old port:
>>
>>   $ ovs-appctl dpctl/dump-flows -m system@ovs-system type=tc
>>
>>   ufid:b13915aa-c7a8-4574-8dcf-439f03929b8e, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vnet4),packet_type(ns=0/0,id=0/0),eth(src=52:54:00:88:51:38,dst=04:f4:bc:28:57:01),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:180, bytes:15120, used:0.230s, dp:tc, actions:sample(sample=10.0%,actions(userspace(pid=2771969963,sFlow(vid=0,pcp=0,output=2147483650),actions))),enp5s0f0
>>
>>   Guess this is the area you need to look at del_filter_and_ufid_mapping():
>>
>>   2389     if (get_ufid_tc_mapping(ufid, &id) == 0) {
>>
>>   2390         VLOG_DBG_RL(&rl, "updating old handle: %d prio: %d",
>>
>>   2391                     id.handle, id.prio);
>>
>>   2392         info->tc_modify_flow_deleted = !del_filter_and_ufid_mapping(&id, ufid);
>>
>>   2393     }
>>
>>   There might also be a problem in the sgid mapping as it’s also using the uuid as the hash, and there could be a collision during the update phase.
>>
>>   I’m working on some escalations so, please take a look and see if you can fix this.
>>
>>   I need to restart OVS, and then send a ping, and I see the problem once, I need to restart OVS each time to see it.
>>
>>   I’ll wait for v15 to see how you fixed it ;)
Eelco Chaudron Oct. 12, 2021, 6:53 a.m. UTC | #6
On 9 Oct 2021, at 10:16, Chris Mi wrote:

> On 10/1/2021 5:43 PM, Eelco Chaudron wrote:
>>
>> On 1 Oct 2021, at 11:35, Eelco Chaudron wrote:
>>
>>> Hi Chris,
>>>
>>> I just started to review this patchset, but as some of the v14 discussions have not finished, I’ll copy them over to v15. This way, all the open items are contained in a single thread/revision. I would suggest clear up the open items before you send a new revision.
>>>
>>> This is only a visual review, as I did not test the code, assuming we need a new revision anyway with the test cases integrated.
>>>
>>> Cheers,
>>>
>>> Eelco
>>
>> Forgot about this issue (see below), was this fixed in v15? I did not see any reply to the email chain?
> I still can't reproduce the issue locally, so I didn't reply the email explicitly. But I revised the code in v15.
> I think the change should fix it. In function dpif_sflow_attr_equal() of v14, only ufid is compared. In v15,
> all the fields are compared. That will prevent the gid mapping to be reused wrongly when the flow updates
> the action for the same ufid. Since this issue is a corner case, in dpif_sflow_attr_hash(), only ufid is hashed to save CPU cycles.
> Hopefully that will fix the issue. 😁

I did some testing, and I can no longer replicate the issue with v15 (and it comes back as soon as I load v14). So I guess the change you mention fixed it.

//Eelco

<SNIP>
Chris Mi Oct. 12, 2021, 7:05 a.m. UTC | #7
On 10/12/2021 2:53 PM, Eelco Chaudron wrote:
>
> On 9 Oct 2021, at 10:16, Chris Mi wrote:
>
>> On 10/1/2021 5:43 PM, Eelco Chaudron wrote:
>>> On 1 Oct 2021, at 11:35, Eelco Chaudron wrote:
>>>
>>>> Hi Chris,
>>>>
>>>> I just started to review this patchset, but as some of the v14 discussions have not finished, I’ll copy them over to v15. This way, all the open items are contained in a single thread/revision. I would suggest clear up the open items before you send a new revision.
>>>>
>>>> This is only a visual review, as I did not test the code, assuming we need a new revision anyway with the test cases integrated.
>>>>
>>>> Cheers,
>>>>
>>>> Eelco
>>> Forgot about this issue (see below), was this fixed in v15? I did not see any reply to the email chain?
>> I still can't reproduce the issue locally, so I didn't reply the email explicitly. But I revised the code in v15.
>> I think the change should fix it. In function dpif_sflow_attr_equal() of v14, only ufid is compared. In v15,
>> all the fields are compared. That will prevent the gid mapping to be reused wrongly when the flow updates
>> the action for the same ufid. Since this issue is a corner case, in dpif_sflow_attr_hash(), only ufid is hashed to save CPU cycles.
>> Hopefully that will fix the issue. 😁
> I did some testing, and I can no longer replicate the issue with v15 (and it comes back as soon as I load v14). So I guess the change you mention fixed it.
>
> //Eelco
>
> <SNIP>
>
Thanks a lot for verifying it, Eelco.

I'll send v16 with the test case integrated today for further review.

Thanks,
Chris
Eelco Chaudron Oct. 12, 2021, 7:09 a.m. UTC | #8
On 12 Oct 2021, at 9:05, Chris Mi wrote:

> On 10/12/2021 2:53 PM, Eelco Chaudron wrote:
>>
>> On 9 Oct 2021, at 10:16, Chris Mi wrote:
>>
>>> On 10/1/2021 5:43 PM, Eelco Chaudron wrote:
>>>> On 1 Oct 2021, at 11:35, Eelco Chaudron wrote:
>>>>
>>>>> Hi Chris,
>>>>>
>>>>> I just started to review this patchset, but as some of the v14 discussions have not finished, I’ll copy them over to v15. This way, all the open items are contained in a single thread/revision. I would suggest clear up the open items before you send a new revision.
>>>>>
>>>>> This is only a visual review, as I did not test the code, assuming we need a new revision anyway with the test cases integrated.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Eelco
>>>> Forgot about this issue (see below), was this fixed in v15? I did not see any reply to the email chain?
>>> I still can't reproduce the issue locally, so I didn't reply the email explicitly. But I revised the code in v15.
>>> I think the change should fix it. In function dpif_sflow_attr_equal() of v14, only ufid is compared. In v15,
>>> all the fields are compared. That will prevent the gid mapping to be reused wrongly when the flow updates
>>> the action for the same ufid. Since this issue is a corner case, in dpif_sflow_attr_hash(), only ufid is hashed to save CPU cycles.
>>> Hopefully that will fix the issue. 😁
>> I did some testing, and I can no longer replicate the issue with v15 (and it comes back as soon as I load v14). So I guess the change you mention fixed it.
>>
>> //Eelco
>>
>> <SNIP>
>>
> Thanks a lot for verifying it, Eelco.
>
> I'll send v16 with the test case integrated today for further review.

Ack, will try to review and test it next week.