diff mbox series

[net-next,1/5] ice: add the virtchnl handler for AdminQ command

Message ID 20200713174320.3982049-2-anthony.l.nguyen@intel.com
State Rejected
Delegated to: David Miller
Headers show
Series 100GbE Intel Wired LAN Driver Updates 2020-07-13 | expand

Commit Message

Tony Nguyen July 13, 2020, 5:43 p.m. UTC
From: Haiyue Wang <haiyue.wang@intel.com>

The DCF (Device Config Function) is a named trust VF (always with ID 0,
single entity per PF port) that can act as a sole controlling entity to
exercise advance functionality such as adding switch rules for the rest
of VFs.

To achieve this approach, this VF is permitted to send some basic AdminQ
commands to the PF through virtual channel (mailbox), then the PF driver
sends these commands to the firmware, and returns the response to the VF
again through virtual channel.

The AdminQ command from DCF is split into two parts: one is the AdminQ
descriptor, the other is the buffer (the descriptor has BUF flag set).
These two parts should be sent in order, so that the PF can handle them
correctly.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Tested-by: Nannan Lu <nannan.lu@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/Makefile       |   2 +-
 drivers/net/ethernet/intel/ice/ice.h          |   2 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   6 +
 drivers/net/ethernet/intel/ice/ice_dcf.c      |  49 +++++++
 drivers/net/ethernet/intel/ice/ice_dcf.h      |  20 +++
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 130 ++++++++++++++++++
 .../net/ethernet/intel/ice/ice_virtchnl_pf.h  |   1 +
 include/linux/avf/virtchnl.h                  |  10 ++
 8 files changed, 219 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_dcf.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_dcf.h

Comments

Jakub Kicinski July 13, 2020, 10:48 p.m. UTC | #1
On Mon, 13 Jul 2020 10:43:16 -0700 Tony Nguyen wrote:
> From: Haiyue Wang <haiyue.wang@intel.com>
> 
> The DCF (Device Config Function) is a named trust VF (always with ID 0,
> single entity per PF port) that can act as a sole controlling entity to
> exercise advance functionality such as adding switch rules for the rest
> of VFs.

But why? This looks like a bifurcated driver to me.

> To achieve this approach, this VF is permitted to send some basic AdminQ
> commands to the PF through virtual channel (mailbox), then the PF driver
> sends these commands to the firmware, and returns the response to the VF
> again through virtual channel.
> 
> The AdminQ command from DCF is split into two parts: one is the AdminQ
> descriptor, the other is the buffer (the descriptor has BUF flag set).
> These two parts should be sent in order, so that the PF can handle them
> correctly.
> 
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Tested-by: Nannan Lu <nannan.lu@intel.com>
> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Haiyue Wang July 14, 2020, 1:29 a.m. UTC | #2
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, July 14, 2020 06:49
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>
> Cc: davem@davemloft.net; Wang, Haiyue <haiyue.wang@intel.com>; netdev@vger.kernel.org;
> nhorman@redhat.com; sassmann@redhat.com; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; Lu, Nannan
> <nannan.lu@intel.com>; Bowers, AndrewX <andrewx.bowers@intel.com>
> Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 
> On Mon, 13 Jul 2020 10:43:16 -0700 Tony Nguyen wrote:
> > From: Haiyue Wang <haiyue.wang@intel.com>
> >
> > The DCF (Device Config Function) is a named trust VF (always with ID 0,
> > single entity per PF port) that can act as a sole controlling entity to
> > exercise advance functionality such as adding switch rules for the rest
> > of VFs.
> 
> But why? This looks like a bifurcated driver to me.
> 

Yes, like bifurcated about flow control. This expands Intel AVF virtual channel
commands, so that VF can talk to hardware indirectly, which is under control of
PF. Then VF can set up the flow control for other VFs. This enrich current PF's
Flow Director filter for PF itself only by ethtool. 

> > To achieve this approach, this VF is permitted to send some basic AdminQ
> > commands to the PF through virtual channel (mailbox), then the PF driver
> > sends these commands to the firmware, and returns the response to the VF
> > again through virtual channel.
> >
> > The AdminQ command from DCF is split into two parts: one is the AdminQ
> > descriptor, the other is the buffer (the descriptor has BUF flag set).
> > These two parts should be sent in order, so that the PF can handle them
> > correctly.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Tested-by: Nannan Lu <nannan.lu@intel.com>
> > Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jakub Kicinski July 14, 2020, 6:24 p.m. UTC | #3
On Tue, 14 Jul 2020 01:29:40 +0000 Wang, Haiyue wrote:
> > On Mon, 13 Jul 2020 10:43:16 -0700 Tony Nguyen wrote:  
> > > From: Haiyue Wang <haiyue.wang@intel.com>
> > >
> > > The DCF (Device Config Function) is a named trust VF (always with ID 0,
> > > single entity per PF port) that can act as a sole controlling entity to
> > > exercise advance functionality such as adding switch rules for the rest
> > > of VFs.  
> > 
> > But why? This looks like a bifurcated driver to me.
> 
> Yes, like bifurcated about flow control. This expands Intel AVF virtual channel
> commands, so that VF can talk to hardware indirectly, which is under control of
> PF. Then VF can set up the flow control for other VFs. This enrich current PF's
> Flow Director filter for PF itself only by ethtool. 

Could you say a little more about the application and motivation for
this?

We are talking about a single control domain here, correct?
Haiyue Wang July 15, 2020, 1:17 a.m. UTC | #4
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Wednesday, July 15, 2020 02:24
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net; netdev@vger.kernel.org;
> nhorman@redhat.com; sassmann@redhat.com; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; Lu, Nannan
> <nannan.lu@intel.com>; Bowers, AndrewX <andrewx.bowers@intel.com>
> Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 
> On Tue, 14 Jul 2020 01:29:40 +0000 Wang, Haiyue wrote:
> > > On Mon, 13 Jul 2020 10:43:16 -0700 Tony Nguyen wrote:
> > > > From: Haiyue Wang <haiyue.wang@intel.com>
> > > >
> > > > The DCF (Device Config Function) is a named trust VF (always with ID 0,
> > > > single entity per PF port) that can act as a sole controlling entity to
> > > > exercise advance functionality such as adding switch rules for the rest
> > > > of VFs.
> > >
> > > But why? This looks like a bifurcated driver to me.
> >
> > Yes, like bifurcated about flow control. This expands Intel AVF virtual channel
> > commands, so that VF can talk to hardware indirectly, which is under control of
> > PF. Then VF can set up the flow control for other VFs. This enrich current PF's
> > Flow Director filter for PF itself only by ethtool.
> 
> Could you say a little more about the application and motivation for
> this?
> 

Sure, I will try to describe the whole story.

> We are talking about a single control domain here, correct?

Correct.

As you know, with the help of vfio-pci kernel module, we can write the user space
driver for PCI devices, like DPDK. ;-)

We have
  1). user space iavf framework:
	http://git.dpdk.org/dpdk/tree/drivers/common/iavf

  2). user space iavf driver:
      http://git.dpdk.org/dpdk/tree/drivers/net/iavf

  3). user space ice driver with no SR-IOV support:
      http://git.dpdk.org/dpdk/tree/drivers/net/ice

Nowadays, the concept of control path and data path separation is popular, we tried
to design a software defined control path by the above software portfolio, and the
scenario is described in:
      http://doc.dpdk.org/guides/nics/ice.html 23.4.3. Device Config Function (DCF)

With the patch in user space ice driver:
      http://git.dpdk.org/dpdk/commit/?id=c5dccda9f2ae6ecc716892c233a0dadc94e013da

and this patch set in ice kernel driver, we can now promote the VF from iAVF (data
path we called) to DCF (control path) for each PF device.
Jakub Kicinski July 15, 2020, 6:03 p.m. UTC | #5
On Wed, 15 Jul 2020 01:17:26 +0000 Wang, Haiyue wrote:
> > Could you say a little more about the application and motivation for
> > this?
> 
> Sure, I will try to describe the whole story.
> 
> > We are talking about a single control domain here, correct?  
> 
> Correct.

We have a long standing policy of not supporting or helping bifurcated
drivers.

I'm tossing this from patchwork.
Haiyue Wang July 15, 2020, 6:18 p.m. UTC | #6
> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On Behalf Of Jakub Kicinski
> Sent: Thursday, July 16, 2020 02:04
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; davem@davemloft.net; netdev@vger.kernel.org;
> nhorman@redhat.com; sassmann@redhat.com; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; Lu, Nannan
> <nannan.lu@intel.com>; Bowers, AndrewX <andrewx.bowers@intel.com>
> Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 
> On Wed, 15 Jul 2020 01:17:26 +0000 Wang, Haiyue wrote:
> > > Could you say a little more about the application and motivation for
> > > this?
> >
> > Sure, I will try to describe the whole story.
> >
> > > We are talking about a single control domain here, correct?
> >
> > Correct.
> 
> We have a long standing policy of not supporting or helping bifurcated
> drivers.

I searched the concept about 'Bifurcated', not sure the below is the corret:
 "Queue Splitting (Bifurcated Driver) is a design that allows for directing
  some traffic to user space, while keeping the remaining traffic in the
  traditional Linux networking stack."

What we did is some path finding about how to partition the hardware function
in real world to meet customer's requirement flexibly.

> 
> I'm tossing this from patchwork.

Thanks for your time, I understand your concern from another point. We fix the
real world issue by our limited wisdom, and it is nice to open source our design
to get the feedback of the net community.

Problems
-	User app expects to take advantage of a few SR-IOV PF capabilities (e.g.
      binary/ternary classify) in raw manner
-	It doesn't expect to take control of entire SR-IOV PF device from user
      space
 
Motivation
-	Single control domain is always managed by kernel SR-IOV PF driver
-	It allows app to issue access intent for raw capabilities via named
      trust virtchnl
Anirudh Venkataramanan July 23, 2020, 12:37 a.m. UTC | #7
On Wed, 2020-07-15 at 11:03 -0700, Jakub Kicinski wrote:
> On Wed, 15 Jul 2020 01:17:26 +0000 Wang, Haiyue wrote:
> > > Could you say a little more about the application and motivation
> > > for
> > > this?
> > 
> > Sure, I will try to describe the whole story.
> > 
> > > We are talking about a single control domain here, correct?  
> > 
> > Correct.
> 
> We have a long standing policy of not supporting or helping
> bifurcated
> drivers.

Jakub,
 
From what I understand, a bifurcated driver carves out the host
interface netdev's queues (resources if you want to further generalize
this) and allows another entity to control/use them. If this is the
definition, then DCF is not bifurcating the driver.
 
DCF is on top of SR-IOV VFs and the device resources required for the
DCF VF are made available by the device through the PCI configuration
space during SR-IOV init. This part isn't anything different from the
usual SR-IOV VF instantiation. The DCF feature adds the ability for a
trusted VF to add flow rules. So this is really just an extension of
the VF-PF interface that allows advanced flow/switch rules programming.
Additionally, the driver also has a list of operations that the DCF VF
is allowed to execute.

Can you please clarify how you (and the community) define bifurcated
driver?

Thanks,
Ani
Jakub Kicinski July 23, 2020, 1:07 a.m. UTC | #8
On Thu, 23 Jul 2020 00:37:29 +0000 Venkataramanan, Anirudh wrote:
> Can you please clarify how you (and the community) define bifurcated
> driver?

No amount of clarification from me will change the fact that you need
this for DPDK.
Tom Herbert July 25, 2020, 4:39 p.m. UTC | #9
On Wed, Jul 22, 2020 at 6:07 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 23 Jul 2020 00:37:29 +0000 Venkataramanan, Anirudh wrote:
> > Can you please clarify how you (and the community) define bifurcated
> > driver?
>
> No amount of clarification from me will change the fact that you need
> this for DPDK.

Jakub,

The fundamental problem we have wrt DPDK is that they are not just
satisfied to just bypass the kernel datapath, they are endeavouring to
bypass the kernel control path as well with things like RTE flow API.
The purpose of this patch set, AFAICT, is to keep the kernel in the
control plane path so that we can maintain one consistent management
view of device resources. The unpleasant alternative is that DPDK will
send control messages directly to the device thereby bypassing the
kernel control plane and thus resulting in two independent entities
managing the same device and forcing a bifurcated control plane in the
device (which of course results in a complete mess!).

Tom
Jakub Kicinski July 27, 2020, 11:04 p.m. UTC | #10
On Sat, 25 Jul 2020 09:39:07 -0700 Tom Herbert wrote:
> Jakub,
> 
> The fundamental problem we have wrt DPDK is that they are not just
> satisfied to just bypass the kernel datapath, they are endeavouring to
> bypass the kernel control path as well with things like RTE flow API.
> The purpose of this patch set, AFAICT, is to keep the kernel in the
> control plane path so that we can maintain one consistent management
> view of device resources. The unpleasant alternative is that DPDK will
> send control messages directly to the device thereby bypassing the
> kernel control plane and thus resulting in two independent entities
> managing the same device and forcing a bifurcated control plane in the
> device (which of course results in a complete mess!).

Sorry for a late reply.

I try to do my best to predict what effect the community pushing back
on merging features will have. It does appear that the unpleasant
alternative you mention is becoming more and more prevalent. I believe
this is a result of multiple factors - convenience of the single point
of control, backward/forward compat, the growing size of the driver SW
stack, relative SW vs Si development cost in a NIC project, software
distribution models..

My day to day experience working with NICs shows that FW has already
taken over high perf NICs, and I hate it. I'd take DPDK over closed
source FW any time of the day, but I will not fool myself into thinking
that expansion of FW control can be stopped by the kernel opening the
floodgates to anything the vendors want to throw at us. 

Ergo the lesser evil argument does not apply.


In this case, I'm guessing, Intel can reuse RTE flow -> AQ code written
to run on PFs on the special VF.

This community has selected switchdev + flower for programming flows.
I believe implementing flower offloads would solve your use case, and
at the same time be most beneficial to the netdev community.
Haiyue Wang Aug. 3, 2020, 10:39 a.m. UTC | #11
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, July 28, 2020 07:04
> To: Tom Herbert <tom@herbertland.com>
> Cc: Venkataramanan, Anirudh <anirudh.venkataramanan@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>;
> davem@davemloft.net; nhorman@redhat.com; sassmann@redhat.com; Bowers, AndrewX
> <andrewx.bowers@intel.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Lu, Nannan <nannan.lu@intel.com>
> Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 


> In this case, I'm guessing, Intel can reuse RTE flow -> AQ code written
> to run on PFs on the special VF.
> 
> This community has selected switchdev + flower for programming flows.
> I believe implementing flower offloads would solve your use case, and
> at the same time be most beneficial to the netdev community.

Jakub,

Thanks, I deep into the switchdev, it is kernel software bridge for hardware
offload, and each port is registered with register_netdev. So this solution
is not suitable for current case: VF can be assigned to VMs.

BR,
Haiyue
Jakub Kicinski Aug. 3, 2020, 8:45 p.m. UTC | #12
On Mon, 3 Aug 2020 10:39:52 +0000 Wang, Haiyue wrote:
> > In this case, I'm guessing, Intel can reuse RTE flow -> AQ code written
> > to run on PFs on the special VF.
> > 
> > This community has selected switchdev + flower for programming flows.
> > I believe implementing flower offloads would solve your use case, and
> > at the same time be most beneficial to the netdev community.  
> 
> Jakub,
> 
> Thanks, I deep into the switchdev, it is kernel software bridge for hardware
> offload, and each port is registered with register_netdev. So this solution
> is not suitable for current case: VF can be assigned to VMs.

You may be missing the concept of a representor.

Sridhar from Intel was investigating this, I believe, at some point.
Perhaps sync with him?
Haiyue Wang Aug. 5, 2020, 1:06 a.m. UTC | #13
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Tuesday, August 4, 2020 04:46
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: Tom Herbert <tom@herbertland.com>; Venkataramanan, Anirudh <anirudh.venkataramanan@intel.com>;
> davem@davemloft.net; nhorman@redhat.com; sassmann@redhat.com; Bowers, AndrewX
> <andrewx.bowers@intel.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org;
> Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Lu, Nannan <nannan.lu@intel.com>; Liang, Cunming
> <cunming.liang@intel.com>
> Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 
> On Mon, 3 Aug 2020 10:39:52 +0000 Wang, Haiyue wrote:
> > > In this case, I'm guessing, Intel can reuse RTE flow -> AQ code written
> > > to run on PFs on the special VF.
> > >
> > > This community has selected switchdev + flower for programming flows.
> > > I believe implementing flower offloads would solve your use case, and
> > > at the same time be most beneficial to the netdev community.
> >
> > Jakub,
> >
> > Thanks, I deep into the switchdev, it is kernel software bridge for hardware
> > offload, and each port is registered with register_netdev. So this solution
> > is not suitable for current case: VF can be assigned to VMs.
> 
> You may be missing the concept of a representor.
> 

I found the concept, thanks, missed it!
Liang, Cunming Aug. 26, 2020, 2:45 a.m. UTC | #14
> -----Original Message-----
> From: Wang, Haiyue <haiyue.wang@intel.com>
> Sent: Wednesday, August 5, 2020 9:06 AM
> To: Jakub Kicinski <kuba@kernel.org>
> Cc: Tom Herbert <tom@herbertland.com>; Venkataramanan, Anirudh
> <anirudh.venkataramanan@intel.com>; davem@davemloft.net;
> nhorman@redhat.com; sassmann@redhat.com; Bowers, AndrewX
> <andrewx.bowers@intel.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> netdev@vger.kernel.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Lu,
> Nannan <nannan.lu@intel.com>; Liang, Cunming <cunming.liang@intel.com>
> Subject: RE: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 
> > -----Original Message-----
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Tuesday, August 4, 2020 04:46
> > To: Wang, Haiyue <haiyue.wang@intel.com>
> > Cc: Tom Herbert <tom@herbertland.com>; Venkataramanan, Anirudh
> > <anirudh.venkataramanan@intel.com>;
> > davem@davemloft.net; nhorman@redhat.com; sassmann@redhat.com;
> Bowers,
> > AndrewX <andrewx.bowers@intel.com>; Kirsher, Jeffrey T
> > <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Nguyen, Anthony
> > L <anthony.l.nguyen@intel.com>; Lu, Nannan <nannan.lu@intel.com>;
> > Liang, Cunming <cunming.liang@intel.com>
> > Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ
> > command
> >
> > On Mon, 3 Aug 2020 10:39:52 +0000 Wang, Haiyue wrote:
> > > > In this case, I'm guessing, Intel can reuse RTE flow -> AQ code
> > > > written to run on PFs on the special VF.
> > > >
> > > > This community has selected switchdev + flower for programming flows.
> > > > I believe implementing flower offloads would solve your use case,
> > > > and at the same time be most beneficial to the netdev community.

Jacob, thanks for the feedback. We proposed the previous solution in our eagerness to satisfy customers who were using mature, and validated (for their data centers) host kernels and still enable rapid adaption to new network control planes.

When revisiting the real problems we were facing, basically we're looking for a rapid self-iteration control plane independent of a mature deployed host kernel. Definitely kernel is the most suitable path for a control plane and we need to enhance the kernel to add the missing piece required for this solution. Best way to achieve this is allow such use cases is to deploy control plane on latest kernel running as virtual machine. We shared some thoughts on netdev 0x14 workshop, attached link as https://github.com/seaturn/switchdev-trust-vf/blob/master/netconf-workshop.pdf .

As a follow-up, we'll continue work on tc-generic proposal and look for programming rate improvement. As an independent effort of enhancing tc-generic/switchdev on trusted VF, delegating device specific capabilities (e.g. eswitch) to an assignable trusted VF brings all the benefit of a separated kernel to upgrade up-to-date features in the pace of applications, and always prevent host stack from any connectivity (e.g. stable access) issues.

Will be happy to answer any queries...and thank you for guiding us in the right path.

> > >
> > > Jakub,
> > >
> > > Thanks, I deep into the switchdev, it is kernel software bridge for
> > > hardware offload, and each port is registered with register_netdev.
> > > So this solution is not suitable for current case: VF can be assigned to VMs.
> >
> > You may be missing the concept of a representor.
> >
> 
> I found the concept, thanks, missed it!
Liang, Cunming Aug. 26, 2020, 2:53 a.m. UTC | #15
> -----Original Message-----
> From: Wang, Haiyue <haiyue.wang@intel.com>
> Sent: Wednesday, August 5, 2020 9:06 AM
> To: Jakub Kicinski <kuba@kernel.org>
> Cc: Tom Herbert <tom@herbertland.com>; Venkataramanan, Anirudh
> <anirudh.venkataramanan@intel.com>; davem@davemloft.net;
> nhorman@redhat.com; sassmann@redhat.com; Bowers, AndrewX
> <andrewx.bowers@intel.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> netdev@vger.kernel.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Lu,
> Nannan <nannan.lu@intel.com>; Liang, Cunming <cunming.liang@intel.com>
> Subject: RE: [net-next 1/5] ice: add the virtchnl handler for AdminQ command
> 
> > -----Original Message-----
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Tuesday, August 4, 2020 04:46
> > To: Wang, Haiyue <haiyue.wang@intel.com>
> > Cc: Tom Herbert <tom@herbertland.com>; Venkataramanan, Anirudh
> > <anirudh.venkataramanan@intel.com>;
> > davem@davemloft.net; nhorman@redhat.com; sassmann@redhat.com;
> Bowers,
> > AndrewX <andrewx.bowers@intel.com>; Kirsher, Jeffrey T
> > <jeffrey.t.kirsher@intel.com>; netdev@vger.kernel.org; Nguyen, Anthony
> > L <anthony.l.nguyen@intel.com>; Lu, Nannan <nannan.lu@intel.com>;
> > Liang, Cunming <cunming.liang@intel.com>
> > Subject: Re: [net-next 1/5] ice: add the virtchnl handler for AdminQ
> > command
> >
> > On Mon, 3 Aug 2020 10:39:52 +0000 Wang, Haiyue wrote:
> > > > In this case, I'm guessing, Intel can reuse RTE flow -> AQ code
> > > > written to run on PFs on the special VF.
> > > >
> > > > This community has selected switchdev + flower for programming flows.
> > > > I believe implementing flower offloads would solve your use case,
> > > > and at the same time be most beneficial to the netdev community.

Jakub, thanks for the feedback. We proposed the previous solution in our eagerness to satisfy customers who were using mature, and validated (for their data centers) host kernels and still enable rapid adaption to new network control planes.

When revisiting the real problems we were facing, basically we're looking for a rapid self-iteration control plane independent of a mature deployed host kernel. Definitely kernel is the most suitable path for a control plane and we need to enhance the kernel to add the missing piece required for this solution. Best way to achieve this is allow such use cases is to deploy control plane on latest kernel running as virtual machine. We shared some thoughts on netdev 0x14 workshop, attached link as https://github.com/seaturn/switchdev-trust-vf/blob/master/netconf-workshop.pdf.

As a follow-up, we'll continue work on tc-generic proposal and look for programming rate improvement. As an independent effort of enhancing tc-generic/switchdev on trusted VF, delegating device specific capabilities (e.g. eswitch) to an assignable trusted VF brings all the benefit of a separated kernel to upgrade up-to-date features in the pace of applications, and always prevent host stack from any connectivity (e.g. stable access) issues.

Will be happy to answer any queries...and thank you for guiding us in the right path.

> > >
> > > Jakub,
> > >
> > > Thanks, I deep into the switchdev, it is kernel software bridge for
> > > hardware offload, and each port is registered with register_netdev.
> > > So this solution is not suitable for current case: VF can be assigned to VMs.
> >
> > You may be missing the concept of a representor.
> >
> 
> I found the concept, thanks, missed it!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/Makefile b/drivers/net/ethernet/intel/ice/Makefile
index 980bbcc64b4b..eb83b5fe11c3 100644
--- a/drivers/net/ethernet/intel/ice/Makefile
+++ b/drivers/net/ethernet/intel/ice/Makefile
@@ -24,7 +24,7 @@  ice-y := ice_main.o	\
 	 ice_flow.o	\
 	 ice_devlink.o	\
 	 ice_ethtool.o
-ice-$(CONFIG_PCI_IOV) += ice_virtchnl_pf.o ice_sriov.o
+ice-$(CONFIG_PCI_IOV) += ice_virtchnl_pf.o ice_sriov.o ice_dcf.o
 ice-$(CONFIG_DCB) += ice_dcb.o ice_dcb_nl.o ice_dcb_lib.o
 ice-$(CONFIG_RFS_ACCEL) += ice_arfs.o
 ice-$(CONFIG_XDP_SOCKETS) += ice_xsk.o
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 7486d010a619..a6e419a3f547 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -435,6 +435,8 @@  struct ice_pf {
 	u32 tx_timeout_recovery_level;
 	char int_name[ICE_INT_NAME_STR_LEN];
 	u32 sw_int_count;
+
+	struct ice_dcf dcf;
 };
 
 struct ice_netdev_priv {
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 99c39249613a..89b91ccaf533 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1830,6 +1830,12 @@  enum ice_adminq_opc {
 	ice_aqc_opc_update_vsi				= 0x0211,
 	ice_aqc_opc_free_vsi				= 0x0213,
 
+	/* recipe commands */
+	ice_aqc_opc_add_recipe				= 0x0290,
+	ice_aqc_opc_recipe_to_profile			= 0x0291,
+	ice_aqc_opc_get_recipe				= 0x0292,
+	ice_aqc_opc_get_recipe_to_profile		= 0x0293,
+
 	/* switch rules population commands */
 	ice_aqc_opc_add_sw_rules			= 0x02A0,
 	ice_aqc_opc_update_sw_rules			= 0x02A1,
diff --git a/drivers/net/ethernet/intel/ice/ice_dcf.c b/drivers/net/ethernet/intel/ice/ice_dcf.c
new file mode 100644
index 000000000000..cbe60a0cb2d2
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_dcf.c
@@ -0,0 +1,49 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2018-2020, Intel Corporation. */
+
+#include "ice.h"
+
+static const enum ice_adminq_opc aqc_permitted_tbl[] = {
+	/* Generic Firmware Admin commands */
+	ice_aqc_opc_get_ver,
+	ice_aqc_opc_req_res,
+	ice_aqc_opc_release_res,
+	ice_aqc_opc_list_func_caps,
+	ice_aqc_opc_list_dev_caps,
+
+	/* Package Configuration Admin Commands */
+	ice_aqc_opc_update_pkg,
+	ice_aqc_opc_get_pkg_info_list,
+
+	/* PHY commands */
+	ice_aqc_opc_get_phy_caps,
+	ice_aqc_opc_get_link_status,
+
+	/* Switch Block */
+	ice_aqc_opc_get_sw_cfg,
+	ice_aqc_opc_alloc_res,
+	ice_aqc_opc_free_res,
+	ice_aqc_opc_add_recipe,
+	ice_aqc_opc_recipe_to_profile,
+	ice_aqc_opc_get_recipe,
+	ice_aqc_opc_get_recipe_to_profile,
+	ice_aqc_opc_add_sw_rules,
+	ice_aqc_opc_update_sw_rules,
+	ice_aqc_opc_remove_sw_rules,
+};
+
+/**
+ * ice_dcf_aq_cmd_permitted - validate the AdminQ command permitted or not
+ * @desc: descriptor describing the command
+ */
+bool ice_dcf_aq_cmd_permitted(struct ice_aq_desc *desc)
+{
+	u16 opc = le16_to_cpu(desc->opcode);
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(aqc_permitted_tbl); i++)
+		if (opc == aqc_permitted_tbl[i])
+			return true;
+
+	return false;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_dcf.h b/drivers/net/ethernet/intel/ice/ice_dcf.h
new file mode 100644
index 000000000000..9edb2d5d9d8f
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_dcf.h
@@ -0,0 +1,20 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2018-2020, Intel Corporation. */
+
+#ifndef _ICE_DCF_H_
+#define _ICE_DCF_H_
+
+struct ice_dcf {
+	/* Handle the AdminQ command between the DCF (Device Config Function)
+	 * and the firmware.
+	 */
+#define ICE_DCF_AQ_DESC_TIMEOUT	(HZ / 10)
+	struct ice_aq_desc aq_desc;
+	u8 aq_desc_received;
+	unsigned long aq_desc_expires;
+};
+
+#ifdef CONFIG_PCI_IOV
+bool ice_dcf_aq_cmd_permitted(struct ice_aq_desc *desc);
+#endif /* CONFIG_PCI_IOV */
+#endif /* _ICE_DCF_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 16a2f2526ccc..0851944b8fd4 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -3652,6 +3652,130 @@  static int ice_vf_init_vlan_stripping(struct ice_vf *vf)
 		return ice_vsi_manage_vlan_stripping(vsi, false);
 }
 
+/**
+ * ice_dcf_handle_aq_cmd - handle the AdminQ command from DCF to FW
+ * @vf: pointer to the VF info
+ * @aq_desc: the AdminQ command descriptor
+ * @aq_buf: the AdminQ command buffer if aq_buf_size is non-zero
+ * @aq_buf_size: the AdminQ command buffer size
+ *
+ * The VF splits the AdminQ command into two parts: one is the descriptor of
+ * AdminQ command, the other is the buffer of AdminQ command (the descriptor
+ * has BUF flag set). When both of them are received by PF, this function will
+ * forward them to firmware once to get the AdminQ's response. And also, the
+ * filled descriptor and buffer of the response will be sent back to VF one by
+ * one through the virtchnl.
+ */
+static int
+ice_dcf_handle_aq_cmd(struct ice_vf *vf, struct ice_aq_desc *aq_desc,
+		      u8 *aq_buf, u16 aq_buf_size)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct ice_pf *pf = vf->pf;
+	enum virtchnl_ops v_op;
+	enum ice_status aq_ret;
+	u16 v_msg_len = 0;
+	u8 *v_msg = NULL;
+	int ret;
+
+	pf->dcf.aq_desc_received = false;
+
+	if ((aq_buf && !aq_buf_size) || (!aq_buf && aq_buf_size))
+		return -EINVAL;
+
+	aq_ret = ice_aq_send_cmd(&pf->hw, aq_desc, aq_buf, aq_buf_size, NULL);
+	/* It needs to send back the AQ response message if ICE_ERR_AQ_ERROR
+	 * returns, some AdminQ handlers will use the error code filled by FW
+	 * to do exception handling.
+	 */
+	if (aq_ret && aq_ret != ICE_ERR_AQ_ERROR) {
+		v_ret = VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR;
+		v_op = VIRTCHNL_OP_DCF_CMD_DESC;
+		goto err;
+	}
+
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DCF_CMD_DESC, v_ret,
+				    (u8 *)aq_desc, sizeof(*aq_desc));
+	/* Bail out so we don't send the VIRTCHNL_OP_DCF_CMD_BUFF message
+	 * below if failure happens or no AdminQ command buffer.
+	 */
+	if (ret || !aq_buf_size)
+		return ret;
+
+	v_op = VIRTCHNL_OP_DCF_CMD_BUFF;
+	v_msg_len = le16_to_cpu(aq_desc->datalen);
+
+	/* buffer is not updated if data length exceeds buffer size */
+	if (v_msg_len > aq_buf_size)
+		v_msg_len = 0;
+	else if (v_msg_len)
+		v_msg = aq_buf;
+
+	/* send the response back to the VF */
+err:
+	return ice_vc_send_msg_to_vf(vf, v_op, v_ret, v_msg, v_msg_len);
+}
+
+/**
+ * ice_vc_dcf_cmd_desc_msg - handle the DCF AdminQ command descriptor
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command descriptor
+ * @len: length of the message
+ */
+static int ice_vc_dcf_cmd_desc_msg(struct ice_vf *vf, u8 *msg, u16 len)
+{
+	struct ice_aq_desc *aq_desc = (struct ice_aq_desc *)msg;
+	struct ice_pf *pf = vf->pf;
+
+	if (len != sizeof(*aq_desc) || !ice_dcf_aq_cmd_permitted(aq_desc)) {
+		/* In case to avoid the VIRTCHNL_OP_DCF_CMD_DESC message with
+		 * the ICE_AQ_FLAG_BUF set followed by another bad message
+		 * VIRTCHNL_OP_DCF_CMD_DESC.
+		 */
+		pf->dcf.aq_desc_received = false;
+		goto err;
+	}
+
+	/* The AdminQ descriptor needs to be stored for use when the followed
+	 * VIRTCHNL_OP_DCF_CMD_BUFF is received.
+	 */
+	if (aq_desc->flags & cpu_to_le16(ICE_AQ_FLAG_BUF)) {
+		pf->dcf.aq_desc = *aq_desc;
+		pf->dcf.aq_desc_received = true;
+		pf->dcf.aq_desc_expires = jiffies + ICE_DCF_AQ_DESC_TIMEOUT;
+		return 0;
+	}
+
+	return ice_dcf_handle_aq_cmd(vf, aq_desc, NULL, 0);
+
+	/* send the response back to the VF */
+err:
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DCF_CMD_DESC,
+				     VIRTCHNL_STATUS_ERR_PARAM, NULL, 0);
+}
+
+/**
+ * ice_vc_dcf_cmd_buff_msg - handle the DCF AdminQ command buffer
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer which holds the command buffer
+ * @len: length of the message
+ */
+static int ice_vc_dcf_cmd_buff_msg(struct ice_vf *vf, u8 *msg, u16 len)
+{
+	struct ice_pf *pf = vf->pf;
+
+	if (!len || !pf->dcf.aq_desc_received ||
+	    time_is_before_jiffies(pf->dcf.aq_desc_expires))
+		goto err;
+
+	return ice_dcf_handle_aq_cmd(vf, &pf->dcf.aq_desc, msg, len);
+
+	/* send the response back to the VF */
+err:
+	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DCF_CMD_BUFF,
+				     VIRTCHNL_STATUS_ERR_PARAM, NULL, 0);
+}
+
 /**
  * ice_vc_process_vf_msg - Process request from VF
  * @pf: pointer to the PF structure
@@ -3762,6 +3886,12 @@  void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event)
 	case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING:
 		err = ice_vc_dis_vlan_stripping(vf);
 		break;
+	case VIRTCHNL_OP_DCF_CMD_DESC:
+		err = ice_vc_dcf_cmd_desc_msg(vf, msg, msglen);
+		break;
+	case VIRTCHNL_OP_DCF_CMD_BUFF:
+		err = ice_vc_dcf_cmd_buff_msg(vf, msg, msglen);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
index 67aa9110fdd1..4a257415f6a5 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
@@ -4,6 +4,7 @@ 
 #ifndef _ICE_VIRTCHNL_PF_H_
 #define _ICE_VIRTCHNL_PF_H_
 #include "ice.h"
+#include "ice_dcf.h"
 
 /* Restrict number of MAC Addr and VLAN that non-trusted VF can programmed */
 #define ICE_MAX_VLAN_PER_VF		8
diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index 40bad71865ea..cd627a217b1c 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -136,6 +136,9 @@  enum virtchnl_ops {
 	VIRTCHNL_OP_DISABLE_CHANNELS = 31,
 	VIRTCHNL_OP_ADD_CLOUD_FILTER = 32,
 	VIRTCHNL_OP_DEL_CLOUD_FILTER = 33,
+	/* opcode 34, 35, 36, 37 and 38 are reserved */
+	VIRTCHNL_OP_DCF_CMD_DESC = 39,
+	VIRTCHNL_OP_DCF_CMD_BUFF = 40,
 };
 
 /* These macros are used to generate compilation errors if a structure/union
@@ -828,6 +831,13 @@  virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 	case VIRTCHNL_OP_DEL_CLOUD_FILTER:
 		valid_len = sizeof(struct virtchnl_filter);
 		break;
+	case VIRTCHNL_OP_DCF_CMD_DESC:
+	case VIRTCHNL_OP_DCF_CMD_BUFF:
+		/* These two opcodes are specific to handle the AdminQ command,
+		 * so the validation needs to be done in PF's context.
+		 */
+		valid_len = msglen;
+		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
 	case VIRTCHNL_OP_UNKNOWN: