Message ID | 1506404199-23579-1-git-send-email-yliu@fridaylinux.org |
---|---|
Headers | show |
Series | OVS-DPDK flow offload with rte_flow | expand |
On 9/25/17, 10:37 PM, "Yuanhan Liu" <yliu@fridaylinux.org> wrote: Some hightlights in v3 ====================== Here is the v3, with 2 major changes and further testings (including many flows). This took more effort than I thought, thus v3 publication has been delayed for a while. The first major change is the mark and id association is done with array instead of CMAP now. This gives us further performance gain: it could be up to 70% now (please see the exact number below). This change also make the code a bit more complex though, due to the lock issue. RCU is used (not quite sure it's been used rightly though). For now, RCU only protects the array base address update (due to reallocate), it doesn't protect the array item (array[i] = xx]) change. I think it's buggy and I need rethink about it. The second major change is there is a thread introduced to do the real flow offload. This is for diminishing the overhead by hw flow offload installation/deletion at data path. See patch 9 for more detailed info. [Darrell] There might be other options to handle this such as dividing time to HWOL in a given PMD. Another option to have PMD isolation. In the last discussion, RSS action was suggested to use to get rid of the QUEUE action workaround. Unfortunately, it didn't work. The flow creation failed with MLX5 PMD driver, and that's the only driver in DPDK that supports RSS action so far. [Darrell] I wonder if we should take a pause before jumping into the next version of the code. If workarounds are required in OVS code, then so be it as long as they are not overly disruptive to the existing code and hard to maintain. In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable option to avoid some unpleasant OVS workarounds. This would make a significant difference in the code paths, if we supported it, so we need to be sure as early as possible. The support needed would be in some drivers and seems reasonably doable. Moreover, this was discussed in the last dpdk meeting and the support was indicated as existing?, although I only verified the MLX code, myself. I had seen the MLX code supporting _RSS action and there are some checks for supported cases; when you say “it didn't work”, what was the issue ? Let us have a discussion also about the Intel nic side and the Napatech side. It seems reasonable to ask where the disconnect is and whether this support can be added and then make a decision based on the answers. What do you think? I also tested many flows this time. The result is more exciting: it could be up to 267% boost, with 512 mega flows (with each has another 512 exact matching flows, thus it's 512*512=256K flows in total), one core and one queue doing PHY-PHY forwarding. For the offload case, the performance keeps the same as with one flow only: because the cost of the mark to flow translation is constant, no matter how many flows are inserted (as far as they are all offloaded). However, for the vanilla ovs-dpdk, the more flows, the worse the performance is. In another word, the more flows, the bigger difference we will see. There were too many discussions in last version. I'm sorry if I missed some comments and didn't do the corresponding changes in v3. Please let me know if I made such mistakes. And below are the formal cover letter introduction, for someone who is the first time to see this patchset. --- Hi, Here is a joint work from Mellanox and Napatech, to enable the flow hw offload with the DPDK generic flow interface (rte_flow). The basic idea is to associate the flow with a mark id (a unit32_t number). Later, we then get the flow directly from the mark id, bypassing the heavy emc processing, including miniflow_extract. The association is done with array in patch 1. It also reuses the flow APIs introduced while adding the tc offloads. The emc bypassing is done in patch 2. The flow offload is done in patch 4, which mainly does two things: - translate the ovs match to DPDK rte flow patterns - bind those patterns with a MARK action. Afterwards, the NIC will set the mark id in every pkt's mbuf when it matches the flow. That's basically how we could get the flow directly from the received mbuf. While testing with PHY-PHY forwarding with one core, one queue and one flow, I got about 70% performance boost. For PHY-vhost forwarding, I got about 50% performance boost. It's basically the performance I got with v1, when the tcp_flags is the ignored. In summary, the CMPA to array change gives up yet another 16% performance boost. The major issue mentioned in v1 is also workarounded: the queue index is never set to 0 blindly anymore, but set to the rxq that first receives the upcall pkt. Note that it's disabled by default, which can be enabled by: $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true v3: - The mark and id association is done with array instead of CMAP. - Added a thread to do hw offload operations - Removed macros completely - dropped the patch to set FDIR_CONF, which is a workround some Intel NICs. - Added a debug patch to show all flow patterns we have created. - Misc fixes v2: - workaround the queue action issue - fixed the tcp_flags being skipped issue, which also fixed the build warnings - fixed l2 patterns for Intel nic - Converted some macros to functions - did not hardcode the max number of flow/action - rebased on top of the latest code Thanks. --yliu --- Finn Christensen (2): netdev-dpdk: implement flow put with rte flow netdev-dpdk: retry with queue action Shachar Beiser (1): dpif-netdev: record rx queue id for the upcall Yuanhan Liu (6): dpif-netdev: associate flow with a mark id dpif-netdev: retrieve flow directly from the flow mark netdev-dpdk: convert ufid to dpdk flow netdev-dpdk: remove offloaded flow on deletion netdev-dpdk: add debug for rte flow patterns dpif-netdev: do hw flow offload in another thread lib/dp-packet.h | 14 + lib/dpif-netdev.c | 421 ++++++++++++++++++++++++++++- lib/flow.c | 155 ++++++++--- lib/flow.h | 1 + lib/netdev-dpdk.c | 776 +++++++++++++++++++++++++++++++++++++++++++++++++++++- lib/netdev.c | 1 + lib/netdev.h | 7 + 7 files changed, 1331 insertions(+), 44 deletions(-) -- 2.7.4
On Tue, Sep 26, 2017 at 09:58:16PM +0000, Darrell Ball wrote: > The second major change is there is a thread introduced to do the real > flow offload. This is for diminishing the overhead by hw flow offload > installation/deletion at data path. See patch 9 for more detailed info. > > [Darrell] There might be other options to handle this such as dividing time > to HWOL in a given PMD. > Another option to have PMD isolation. Good to know, though I'm not sure I understand it so far. But it can be discussed later. That's also the reason I put it in the last patch, so that we could easily turn it to another solution, or even simpler (just remove it). > In the last discussion, RSS action was suggested to use to get rid of > the QUEUE action workaround. Unfortunately, it didn't work. The flow > creation failed with MLX5 PMD driver, and that's the only driver in > DPDK that supports RSS action so far. > > > [Darrell] > I wonder if we should take a pause before jumping into the next version of the code. I have no objections. > If workarounds are required in OVS code, then so be it as long as they are not > overly disruptive to the existing code and hard to maintain. > In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable option > to avoid some unpleasant OVS workarounds. > This would make a significant difference in the code paths, if we supported it, so > we need to be sure as early as possible. I agree. > The support needed would be in some drivers and seems reasonably doable. > Moreover, this was discussed in the last dpdk meeting and the support was > indicated as existing?, although I only verified the MLX code, myself. From the code point of view, yes, the code was there months ago. > I had seen the MLX code supporting _RSS action and there are some checks for > supported cases; when you say “it didn't work”, what was the issue ? I actually have met the error months ago, I even debugged it. IIRC, the error is from libibverbs, when trying to create the hw flow. I think I need double-confirm it with our colleague who developed this feature. > Let us have a discussion also about the Intel nic side and the Napatech side. I think it's not necessary for Napatech, because they can support MARK only action. For Intel, yes, I think Intel folks could give some comments here. > It seems reasonable to ask where the disconnect is and whether this support > can be added and then make a decision based on the answers. No objections. --yliu
On Wed, Sep 27, 2017 at 09:24:54AM +0800, Yuanhan Liu wrote: > > The support needed would be in some drivers and seems reasonably doable. > > Moreover, this was discussed in the last dpdk meeting and the support was > > indicated as existing?, although I only verified the MLX code, myself. > > >From the code point of view, yes, the code was there months ago. > > > I had seen the MLX code supporting _RSS action and there are some checks for > > supported cases; when you say “it didn't work”, what was the issue ? > > I actually have met the error months ago, I even debugged it. IIRC, > the error is from libibverbs, when trying to create the hw flow. I > think I need double-confirm it with our colleague who developed this > feature. Hmm, I think I was wrong, and apparenetly, I think I have did something stupid in the last try. I double confirmed it just now, at least the flow creation now works. I just need take some time to intergrate it to OVS-DPDK, to see how it goes. I will get it back to you when I have done that. Currently, I was occupied by something else. --yliu
On 9/26/17, 6:25 PM, "Yuanhan Liu" <yliu@fridaylinux.org> wrote: On Tue, Sep 26, 2017 at 09:58:16PM +0000, Darrell Ball wrote: > The second major change is there is a thread introduced to do the real > flow offload. This is for diminishing the overhead by hw flow offload > installation/deletion at data path. See patch 9 for more detailed info. > > [Darrell] There might be other options to handle this such as dividing time > to HWOL in a given PMD. > Another option to have PMD isolation. Good to know, though I'm not sure I understand it so far. But it can be discussed later. That's also the reason I put it in the last patch, so that we could easily turn it to another solution, or even simpler (just remove it). > In the last discussion, RSS action was suggested to use to get rid of > the QUEUE action workaround. Unfortunately, it didn't work. The flow > creation failed with MLX5 PMD driver, and that's the only driver in > DPDK that supports RSS action so far. > > > [Darrell] > I wonder if we should take a pause before jumping into the next version of the code. I have no objections. > If workarounds are required in OVS code, then so be it as long as they are not > overly disruptive to the existing code and hard to maintain. > In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable option > to avoid some unpleasant OVS workarounds. > This would make a significant difference in the code paths, if we supported it, so > we need to be sure as early as possible. I agree. > The support needed would be in some drivers and seems reasonably doable. > Moreover, this was discussed in the last dpdk meeting and the support was > indicated as existing?, although I only verified the MLX code, myself. From the code point of view, yes, the code was there months ago. > I had seen the MLX code supporting _RSS action and there are some checks for > supported cases; when you say “it didn't work”, what was the issue ? I actually have met the error months ago, I even debugged it. IIRC, the error is from libibverbs, when trying to create the hw flow. I think I need double-confirm it with our colleague who developed this feature. > Let us have a discussion also about the Intel nic side and the Napatech side. I think it's not necessary for Napatech, because they can support MARK only action. It is not necessary for Napatech; however to avoid the need to detect the Napatech special (good) case, ‘ideally’ we do the exact same programming even in Napatech case. For Intel, yes, I think Intel folks could give some comments here. > It seems reasonable to ask where the disconnect is and whether this support > can be added and then make a decision based on the answers. No objections. --yliu
On Wed, Sep 27, 2017 at 03:13:33AM +0000, Darrell Ball wrote: > > > On 9/26/17, 6:25 PM, "Yuanhan Liu" <yliu@fridaylinux.org> wrote: > > On Tue, Sep 26, 2017 at 09:58:16PM +0000, Darrell Ball wrote: > > The second major change is there is a thread introduced to do the real > > flow offload. This is for diminishing the overhead by hw flow offload > > installation/deletion at data path. See patch 9 for more detailed info. > > > > [Darrell] There might be other options to handle this such as dividing time > > to HWOL in a given PMD. > > Another option to have PMD isolation. > > Good to know, though I'm not sure I understand it so far. But it can be > discussed later. That's also the reason I put it in the last patch, so > that we could easily turn it to another solution, or even simpler (just > remove it). > > > In the last discussion, RSS action was suggested to use to get rid of > > the QUEUE action workaround. Unfortunately, it didn't work. The flow > > creation failed with MLX5 PMD driver, and that's the only driver in > > DPDK that supports RSS action so far. > > > > > > [Darrell] > > I wonder if we should take a pause before jumping into the next version of the code. > > I have no objections. > > > If workarounds are required in OVS code, then so be it as long as they are not > > overly disruptive to the existing code and hard to maintain. > > In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable option > > to avoid some unpleasant OVS workarounds. > > This would make a significant difference in the code paths, if we supported it, so > > we need to be sure as early as possible. > > I agree. > > > The support needed would be in some drivers and seems reasonably doable. > > Moreover, this was discussed in the last dpdk meeting and the support was > > indicated as existing?, although I only verified the MLX code, myself. > > From the code point of view, yes, the code was there months ago. > > > I had seen the MLX code supporting _RSS action and there are some checks for > > supported cases; when you say “it didn't work”, what was the issue ? > > I actually have met the error months ago, I even debugged it. IIRC, > the error is from libibverbs, when trying to create the hw flow. I > think I need double-confirm it with our colleague who developed this > feature. > > > Let us have a discussion also about the Intel nic side and the Napatech side. > > I think it's not necessary for Napatech, because they can support > MARK only action. > > It is not necessary for Napatech; however to avoid the need to detect the Napatech > special (good) case, ‘ideally’ we do the exact same programming even in Napatech case. Will following look okay to you (assuming we have the probe ability and DPDK has some basic capability feedback)? if (!pure_mark_cap_probed) { if (dpdk_rte_flow_has_rss_action_support) { append_rss_action(); } else { fail and return; } } /* create flow once, with no retries, if fails, let it fail */ flow = rte_flow_create(...); I assume that's how the code looks like finally. What do you think? --yliu > For Intel, yes, I think Intel folks could give some comments here. > > > It seems reasonable to ask where the disconnect is and whether this support > > can be added and then make a decision based on the answers. > > No objections. > > --yliu > >
On 9/26/17, 8:27 PM, "Yuanhan Liu" <yliu@fridaylinux.org> wrote: On Wed, Sep 27, 2017 at 03:13:33AM +0000, Darrell Ball wrote: > > > On 9/26/17, 6:25 PM, "Yuanhan Liu" <yliu@fridaylinux.org> wrote: > > On Tue, Sep 26, 2017 at 09:58:16PM +0000, Darrell Ball wrote: > > The second major change is there is a thread introduced to do the real > > flow offload. This is for diminishing the overhead by hw flow offload > > installation/deletion at data path. See patch 9 for more detailed info. > > > > [Darrell] There might be other options to handle this such as dividing time > > to HWOL in a given PMD. > > Another option to have PMD isolation. > > Good to know, though I'm not sure I understand it so far. But it can be > discussed later. That's also the reason I put it in the last patch, so > that we could easily turn it to another solution, or even simpler (just > remove it). > > > In the last discussion, RSS action was suggested to use to get rid of > > the QUEUE action workaround. Unfortunately, it didn't work. The flow > > creation failed with MLX5 PMD driver, and that's the only driver in > > DPDK that supports RSS action so far. > > > > > > [Darrell] > > I wonder if we should take a pause before jumping into the next version of the code. > > I have no objections. > > > If workarounds are required in OVS code, then so be it as long as they are not > > overly disruptive to the existing code and hard to maintain. > > In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable option > > to avoid some unpleasant OVS workarounds. > > This would make a significant difference in the code paths, if we supported it, so > > we need to be sure as early as possible. > > I agree. > > > The support needed would be in some drivers and seems reasonably doable. > > Moreover, this was discussed in the last dpdk meeting and the support was > > indicated as existing?, although I only verified the MLX code, myself. > > From the code point of view, yes, the code was there months ago. > > > I had seen the MLX code supporting _RSS action and there are some checks for > > supported cases; when you say “it didn't work”, what was the issue ? > > I actually have met the error months ago, I even debugged it. IIRC, > the error is from libibverbs, when trying to create the hw flow. I > think I need double-confirm it with our colleague who developed this > feature. > > > Let us have a discussion also about the Intel nic side and the Napatech side. > > I think it's not necessary for Napatech, because they can support > MARK only action. > > It is not necessary for Napatech; however to avoid the need to detect the Napatech > special (good) case, ‘ideally’ we do the exact same programming even in Napatech case. Will following look okay to you (assuming we have the probe ability and DPDK has some basic capability feedback)? if (!pure_mark_cap_probed) { if (dpdk_rte_flow_has_rss_action_support) { append_rss_action(); } else { fail and return; } } /* create flow once, with no retries, if fails, let it fail */ flow = rte_flow_create(...); I assume that's how the code looks like finally. What do you think? [Darrell] It looks fine; of course, if we could drop dependencies on cap probe, it would be ideal (. --yliu > For Intel, yes, I think Intel folks could give some comments here. > > > It seems reasonable to ask where the disconnect is and whether this support > > can be added and then make a decision based on the answers. > > No objections. > > --yliu > >
-----Original Message----- From: Darrell Ball [mailto:dball@vmware.com] Sent: 27. september 2017 05:54 To: Yuanhan Liu <yliu@fridaylinux.org> Cc: dev@openvswitch.org; Finn Christensen <fc@napatech.com>; Chandran Sugesh <sugesh.chandran@intel.com>; Simon Horman <simon.horman@netronome.com> Subject: Re: [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow On 9/26/17, 8:27 PM, "Yuanhan Liu" <yliu@fridaylinux.org> wrote: On Wed, Sep 27, 2017 at 03:13:33AM +0000, Darrell Ball wrote: > > > On 9/26/17, 6:25 PM, "Yuanhan Liu" <yliu@fridaylinux.org> wrote: > > On Tue, Sep 26, 2017 at 09:58:16PM +0000, Darrell Ball wrote: > > The second major change is there is a thread introduced to do the real > > flow offload. This is for diminishing the overhead by hw flow offload > > installation/deletion at data path. See patch 9 for more detailed info. > > > > [Darrell] There might be other options to handle this such as dividing time > > to HWOL in a given PMD. > > Another option to have PMD isolation. > > Good to know, though I'm not sure I understand it so far. But it can be > discussed later. That's also the reason I put it in the last patch, so > that we could easily turn it to another solution, or even simpler (just > remove it). > > > In the last discussion, RSS action was suggested to use to get rid of > > the QUEUE action workaround. Unfortunately, it didn't work. The flow > > creation failed with MLX5 PMD driver, and that's the only driver in > > DPDK that supports RSS action so far. > > > > > > [Darrell] > > I wonder if we should take a pause before jumping into the next version of the code. > > I have no objections. > > > If workarounds are required in OVS code, then so be it as long as they are not > > overly disruptive to the existing code and hard to maintain. > > In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable option > > to avoid some unpleasant OVS workarounds. > > This would make a significant difference in the code paths, if we supported it, so > > we need to be sure as early as possible. > > I agree. > > > The support needed would be in some drivers and seems reasonably doable. > > Moreover, this was discussed in the last dpdk meeting and the support was > > indicated as existing?, although I only verified the MLX code, myself. > > From the code point of view, yes, the code was there months ago. > > > I had seen the MLX code supporting _RSS action and there are some checks for > > supported cases; when you say “it didn't work”, what was the issue ? > > I actually have met the error months ago, I even debugged it. IIRC, > the error is from libibverbs, when trying to create the hw flow. I > think I need double-confirm it with our colleague who developed this > feature. > > > Let us have a discussion also about the Intel nic side and the Napatech side. > > I think it's not necessary for Napatech, because they can support > MARK only action. > > It is not necessary for Napatech; however to avoid the need to detect the Napatech > special (good) case, ‘ideally’ we do the exact same programming even in Napatech case. Will following look okay to you (assuming we have the probe ability and DPDK has some basic capability feedback)? if (!pure_mark_cap_probed) { if (dpdk_rte_flow_has_rss_action_support) { append_rss_action(); } else { fail and return; } } /* create flow once, with no retries, if fails, let it fail */ flow = rte_flow_create(...); I assume that's how the code looks like finally. What do you think? [Darrell] It looks fine; of course, if we could drop dependencies on cap probe, it would be ideal (. [Finn] From a Napatech point of view, we would definitely want to use the RTE_FLOW_ACTION_TYPE_RSS if it were implement. It definitely makes good sense to me. I have 2 reasons for this: 1. It would fit nicely into the way Napatech does flow programming in nic 2. We would be fully RSS controlled through OVS Furthermore, Future RSS splitting on a per megaflow basis will be possible. IMO the "pure_mark_cap_probed" function is a temp work-around to make it work. The RSS seems to be a solution to the queue problem. --yliu > For Intel, yes, I think Intel folks could give some comments here. > > > It seems reasonable to ask where the disconnect is and whether this support > > can be added and then make a decision based on the answers. > > No objections. > > --yliu > >
On Wed, Sep 27, 2017 at 09:12:49AM +0000, Finn Christensen wrote: > > [Darrell] It looks fine; of course, if we could drop dependencies on cap > probe, it would be ideal (. > > > [Finn] > From a Napatech point of view, we would definitely want to use the > RTE_FLOW_ACTION_TYPE_RSS if it were implement. It definitely makes good > sense to me. I have 2 reasons for this: > 1. It would fit nicely into the way Napatech does flow programming in nic > 2. We would be fully RSS controlled through OVS > Furthermore, Future RSS splitting on a per megaflow basis will be possible. > IMO the "pure_mark_cap_probed" function is a temp work-around to make it work. > The RSS seems to be a solution to the queue problem. Finn, that's really good to know! I do agree without this probe, it makes the code simpler and cleaner. Few questions though. Have Napatech already implemented rss action? If not, what's the plan? And are you okay with following code? add_mark_action(); add_rss_action_unconditionally(); flow = rte_create_flow(...); if (!flow) return -1; That is, no probes, no feedbacks. If it failed, it just failed (flow offload then is just skipped). But I do think some feedbacks are necessary. If we know the rte_flow cap in the begnning, we could simply disable the flow offload for a specific port if we know it doesn't have offload ability. This would avoid the repeat effort of trying to do hw offload completely. --yliu
-----Original Message----- From: Yuanhan Liu [mailto:yliu@fridaylinux.org] Sent: 27. september 2017 11:47 To: Finn Christensen <fc@napatech.com> Cc: Darrell Ball <dball@vmware.com>; dev@openvswitch.org; Chandran Sugesh <sugesh.chandran@intel.com>; Simon Horman <simon.horman@netronome.com> Subject: Re: [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow On Wed, Sep 27, 2017 at 09:12:49AM +0000, Finn Christensen wrote: > > [Darrell] It looks fine; of course, if we could drop dependencies on cap > probe, it would be ideal (. > > > [Finn] > From a Napatech point of view, we would definitely want to use the > RTE_FLOW_ACTION_TYPE_RSS if it were implement. It definitely makes > good sense to me. I have 2 reasons for this: > 1. It would fit nicely into the way Napatech does flow programming in > nic 2. We would be fully RSS controlled through OVS Furthermore, > Future RSS splitting on a per megaflow basis will be possible. > IMO the "pure_mark_cap_probed" function is a temp work-around to make it work. > The RSS seems to be a solution to the queue problem. Finn, that's really good to know! I do agree without this probe, it makes the code simpler and cleaner. Few questions though. Have Napatech already implemented rss action? If not, what's the plan? [Finn] Our nic handles rss on a per flow basis, but we have not yet used rte rss action for OVS. In OVS we simply handles RSS config outside it. The full control of rss through OVS is better though. And are you okay with following code? add_mark_action(); add_rss_action_unconditionally(); flow = rte_create_flow(...); if (!flow) return -1; That is, no probes, no feedbacks. If it failed, it just failed (flow offload then is just skipped). [Finn] Yes, we can easily make this work. Good suggestion! But I do think some feedbacks are necessary. If we know the rte_flow cap in the begnning, we could simply disable the flow offload for a specific port if we know it doesn't have offload ability. This would avoid the repeat effort of trying to do hw offload completely. [Finn] This seems to be a good idea. --yliu
On 9/27/17, 3:41 AM, "Finn Christensen" <fc@napatech.com> wrote: -----Original Message----- From: Yuanhan Liu [mailto:yliu@fridaylinux.org] Sent: 27. september 2017 11:47 To: Finn Christensen <fc@napatech.com> Cc: Darrell Ball <dball@vmware.com>; dev@openvswitch.org; Chandran Sugesh <sugesh.chandran@intel.com>; Simon Horman <simon.horman@netronome.com> Subject: Re: [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow On Wed, Sep 27, 2017 at 09:12:49AM +0000, Finn Christensen wrote: > > [Darrell] It looks fine; of course, if we could drop dependencies on cap > probe, it would be ideal (. > > > [Finn] > From a Napatech point of view, we would definitely want to use the > RTE_FLOW_ACTION_TYPE_RSS if it were implement. It definitely makes > good sense to me. I have 2 reasons for this: > 1. It would fit nicely into the way Napatech does flow programming in > nic 2. We would be fully RSS controlled through OVS Furthermore, > Future RSS splitting on a per megaflow basis will be possible. > IMO the "pure_mark_cap_probed" function is a temp work-around to make it work. > The RSS seems to be a solution to the queue problem. Finn, that's really good to know! I do agree without this probe, it makes the code simpler and cleaner. Few questions though. Have Napatech already implemented rss action? If not, what's the plan? [Finn] Our nic handles rss on a per flow basis, but we have not yet used rte rss action for OVS. In OVS we simply handles RSS config outside it. The full control of rss through OVS is better though. And are you okay with following code? add_mark_action(); add_rss_action_unconditionally(); flow = rte_create_flow(...); if (!flow) return -1; That is, no probes, no feedbacks. If it failed, it just failed (flow offload then is just skipped). [Finn] Yes, we can easily make this work. Good suggestion! But I do think some feedbacks are necessary. If we know the rte_flow cap in the begnning, we could simply disable the flow offload for a specific port if we know it doesn't have offload ability. This would avoid the repeat effort of trying to do hw offload completely. [Finn] This seems to be a good idea. [Darrell] Regarding the general question of capability checking, it is fine to have this support in general and we already identified some cases where it would be best to use this if we can (the question mostly comes down to support by RTE and drivers). Probing is different and we usually use the term for some try and error checking to configure something during initialization and see if works and then mark the associated capability as enabled or disabled. We could also use a different kind of probing here in a dynamic fashion, where we try to do HWOL and if it fails X times we don’t try again unless we detect the port has been reconfigured, for example, in case the capability check is not possible or not implemented yet. --yliu Disclaimer: This email and any files transmitted with it may contain confidential information intended for the addressee(s) only. The information is not to be surrendered or copied to unauthorized persons. If you have received this communication in error, please notify the sender immediately and delete this e-mail from your system.
Regards _Sugesh > -----Original Message----- > From: Darrell Ball [mailto:dball@vmware.com] > Sent: Wednesday, September 27, 2017 7:55 PM > To: Finn Christensen <fc@napatech.com>; Yuanhan Liu <yliu@fridaylinux.org> > Cc: dev@openvswitch.org; Chandran, Sugesh <sugesh.chandran@intel.com>; > Simon Horman <simon.horman@netronome.com> > Subject: Re: [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow > > > > On 9/27/17, 3:41 AM, "Finn Christensen" <fc@napatech.com> wrote: > > -----Original Message----- > From: Yuanhan Liu [mailto:yliu@fridaylinux.org] > Sent: 27. september 2017 11:47 > To: Finn Christensen <fc@napatech.com> > Cc: Darrell Ball <dball@vmware.com>; dev@openvswitch.org; Chandran > Sugesh <sugesh.chandran@intel.com>; Simon Horman > <simon.horman@netronome.com> > Subject: Re: [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow > > On Wed, Sep 27, 2017 at 09:12:49AM +0000, Finn Christensen wrote: > > > > [Darrell] It looks fine; of course, if we could drop dependencies on cap > > probe, it would be ideal (. > > > > > > [Finn] > > From a Napatech point of view, we would definitely want to use the > > RTE_FLOW_ACTION_TYPE_RSS if it were implement. It definitely makes > > good sense to me. I have 2 reasons for this: > > 1. It would fit nicely into the way Napatech does flow programming in > > nic 2. We would be fully RSS controlled through OVS Furthermore, > > Future RSS splitting on a per megaflow basis will be possible. > > IMO the "pure_mark_cap_probed" function is a temp work-around to > make it work. > > The RSS seems to be a solution to the queue problem. > > Finn, that's really good to know! I do agree without this probe, it makes the > code simpler and cleaner. > > Few questions though. Have Napatech already implemented rss action? If > not, what's the plan? > > [Finn] > Our nic handles rss on a per flow basis, but we have not yet used rte rss action > for OVS. In OVS we simply handles RSS config outside it. > The full control of rss through OVS is better though. > > And are you okay with following code? > > add_mark_action(); > add_rss_action_unconditionally(); > > flow = rte_create_flow(...); > if (!flow) > return -1; > > That is, no probes, no feedbacks. If it failed, it just failed (flow offload then > is just skipped). > > [Finn] > Yes, we can easily make this work. Good suggestion! > > > But I do think some feedbacks are necessary. If we know the rte_flow cap > in the begnning, we could simply disable the flow offload for a specific port > if we know it doesn't have offload ability. This would avoid the repeat > effort of trying to do hw offload completely. > > [Finn] > This seems to be a good idea. > > [Darrell] Regarding the general question of capability checking, it is fine to have > this support > in general and we already identified some cases where it would be best > to use this > if we can (the question mostly comes down to support by RTE and > drivers). > > Probing is different and we usually use the term for some try and error > checking to configure > something during initialization and see if works and then mark the > associated capability as enabled > or disabled. We could also use a different kind of probing here in a > dynamic fashion, where we try to do > HWOL and if it fails X times we don’t try again unless we detect the port > has been reconfigured, > for example, in case the capability check is not possible or not > implemented yet. [Sugesh] I agree with Darrel here and we are also looking at implementing a capability APIs to expose the device feature sets. I will check with our DPDK team and will post the update. > > > > > --yliu > Disclaimer: This email and any files transmitted with it may contain > confidential information intended for the addressee(s) only. The information is > not to be surrendered or copied to unauthorized persons. If you have received > this communication in error, please notify the sender immediately and delete > this e-mail from your system. >
-----Original Message----- From: Chandran, Sugesh [mailto:sugesh.chandran@intel.com] Sent: 2. oktober 2017 18:22 To: Darrell Ball <dball@vmware.com>; Finn Christensen <fc@napatech.com>; Yuanhan Liu <yliu@fridaylinux.org> Cc: dev@openvswitch.org; Simon Horman <simon.horman@netronome.com> Subject: RE: [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow Regards _Sugesh > -----Original Message----- > From: Darrell Ball [mailto:dball@vmware.com] > Sent: Wednesday, September 27, 2017 7:55 PM > To: Finn Christensen <fc@napatech.com>; Yuanhan Liu > <yliu@fridaylinux.org> > Cc: dev@openvswitch.org; Chandran, Sugesh <sugesh.chandran@intel.com>; > Simon Horman <simon.horman@netronome.com> > Subject: Re: [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow > > > > On 9/27/17, 3:41 AM, "Finn Christensen" <fc@napatech.com> wrote: > > -----Original Message----- > From: Yuanhan Liu [mailto:yliu@fridaylinux.org] > Sent: 27. september 2017 11:47 > To: Finn Christensen <fc@napatech.com> > Cc: Darrell Ball <dball@vmware.com>; dev@openvswitch.org; Chandran > Sugesh <sugesh.chandran@intel.com>; Simon Horman > <simon.horman@netronome.com> > Subject: Re: [PATCH v3 0/9] OVS-DPDK flow offload with > rte_flow > > On Wed, Sep 27, 2017 at 09:12:49AM +0000, Finn Christensen wrote: > > > > [Darrell] It looks fine; of course, if we could drop dependencies on cap > > probe, it would be ideal (. > > > > > > [Finn] > > From a Napatech point of view, we would definitely want to use the > > RTE_FLOW_ACTION_TYPE_RSS if it were implement. It definitely makes > > good sense to me. I have 2 reasons for this: > > 1. It would fit nicely into the way Napatech does flow programming in > > nic 2. We would be fully RSS controlled through OVS Furthermore, > > Future RSS splitting on a per megaflow basis will be possible. > > IMO the "pure_mark_cap_probed" function is a temp work-around to > make it work. > > The RSS seems to be a solution to the queue problem. > > Finn, that's really good to know! I do agree without this probe, it makes the > code simpler and cleaner. > > Few questions though. Have Napatech already implemented rss action? If > not, what's the plan? > > [Finn] > Our nic handles rss on a per flow basis, but we have not yet used rte rss action > for OVS. In OVS we simply handles RSS config outside it. > The full control of rss through OVS is better though. > > And are you okay with following code? > > add_mark_action(); > add_rss_action_unconditionally(); > > flow = rte_create_flow(...); > if (!flow) > return -1; > > That is, no probes, no feedbacks. If it failed, it just failed (flow offload then > is just skipped). > > [Finn] > Yes, we can easily make this work. Good suggestion! > > > But I do think some feedbacks are necessary. If we know the rte_flow cap > in the begnning, we could simply disable the flow offload for a specific port > if we know it doesn't have offload ability. This would avoid the repeat > effort of trying to do hw offload completely. > > [Finn] > This seems to be a good idea. > > [Darrell] Regarding the general question of capability checking, it is > fine to have this support > in general and we already identified some cases where > it would be best to use this > if we can (the question mostly comes down to support by > RTE and drivers). > > Probing is different and we usually use the term for > some try and error checking to configure > something during initialization and see if works and > then mark the associated capability as enabled > or disabled. We could also use a different kind of > probing here in a dynamic fashion, where we try to do > HWOL and if it fails X times we don’t try again unless we > detect the port has been reconfigured, > for example, in case the capability check is not possible > or not implemented yet. [Sugesh] I agree with Darrel here and we are also looking at implementing a capability APIs to expose the device feature sets. I will check with our DPDK team and will post the update. [Finn] I also agree here, however, the "X times failed and then mark port as HWOL disabled", should only be done initially, where no successfully offloads has yet occurred. > > > > > --yliu > Disclaimer: This email and any files transmitted with it may > contain confidential information intended for the addressee(s) only. > The information is not to be surrendered or copied to unauthorized > persons. If you have received this communication in error, please > notify the sender immediately and delete this e-mail from your system. >
On Wed, Sep 27, 2017 at 10:09:20AM +0800, Yuanhan Liu wrote: > On Wed, Sep 27, 2017 at 09:24:54AM +0800, Yuanhan Liu wrote: > > > The support needed would be in some drivers and seems reasonably doable. > > > Moreover, this was discussed in the last dpdk meeting and the support was > > > indicated as existing?, although I only verified the MLX code, myself. > > > > >From the code point of view, yes, the code was there months ago. > > > > > I had seen the MLX code supporting _RSS action and there are some checks for > > > supported cases; when you say “it didn't work”, what was the issue ? > > > > I actually have met the error months ago, I even debugged it. IIRC, > > the error is from libibverbs, when trying to create the hw flow. I > > think I need double-confirm it with our colleague who developed this > > feature. > > Hmm, I think I was wrong, and apparenetly, I think I have did something > stupid in the last try. I double confirmed it just now, at least the > flow creation now works. I just need take some time to intergrate it > to OVS-DPDK, to see how it goes. I will get it back to you when I have > done that. Currently, I was occupied by something else. Here is a short update I should have done in last week (apologize for being late). The RSS action should work. I also have intergated it to ovs-dpdk. I intended to send v4 in last week. Later, I noticed a fatal bug (regarding to MQ). I was still heavily occupied by something else, that I don't have time to fix that. I have tried to figure out a solution at spare time though. I might have found a solution. I just don't know the results yet without really trying it. I will update progress when I have made some. Please also note that it's very likely I won't have time on that before the end of this month :/ --yliu
I see that this series appears to have stalled. Should we expect a new version sometime soon? Do you need more reviews on the current version?
On Tue, Oct 31, 2017 at 02:19:20PM -0700, Ben Pfaff wrote: > I see that this series appears to have stalled. Should we expect a new > version sometime soon? Do you need more reviews on the current version? My bad. As I have stated in another email, I meant to send v4 earlier, but I found a bug. Meanwhile, I was heavily occupied by something else, that I don't have time to shape it. Likely, I could be able to start working on v4 at mid of this month. --yliu
On Wed, Nov 01, 2017 at 11:24:30PM +0800, Yuanhan Liu wrote: > On Tue, Oct 31, 2017 at 02:19:20PM -0700, Ben Pfaff wrote: > > I see that this series appears to have stalled. Should we expect a new > > version sometime soon? Do you need more reviews on the current version? > > My bad. As I have stated in another email, I meant to send v4 earlier, but > I found a bug. Meanwhile, I was heavily occupied by something else, that I > don't have time to shape it. > > Likely, I could be able to start working on v4 at mid of this month. There's no hurry from my perspective, my main goal was to make sure that I/we hadn't dropped the ball. I'll look forward to the next version.