Message ID | 20170523114404.20387-5-saeedm@mellanox.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, May 23, 2017 at 02:44:02PM +0300, Saeed Mahameed wrote: > From: Ilan Tayari <ilant@mellanox.com> > > Mellanox Innova is a NIC with ConnectX and an FPGA on the same > board. The FPGA is a bump-on-the-wire and thus affects operation of > the mlx5_core driver on the ConnectX ASIC. > > Add basic support for Innova in mlx5_core. > > This allows using the Innova card as a regular NIC, by detecting > the FPGA capability bit, and verifying its load state before > initializing ConnectX interfaces. That doesn't sound like cx4/cx5 hw that mlx5 driver suppose to support. > Also detect FPGA fatal runtime failures and enter error state if > they ever happen. > > All new FPGA-related logic is placed in its own subdirectory 'fpga', > which may be built by selecting CONFIG_MLX5_FPGA. > This prepares for further support of various Innova features in later > patchsets. > Additional details about hardware architecture will be provided as > more features get submitted. > > Signed-off-by: Ilan Tayari <ilant@mellanox.com> > Reviewed-by: Boris Pismenny <borisp@mellanox.com> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> > --- > MAINTAINERS | 10 + > drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 10 + > drivers/net/ethernet/mellanox/mlx5/core/Makefile | 3 + > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 11 ++ > drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.c | 64 +++++++ > drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.h | 59 ++++++ > .../net/ethernet/mellanox/mlx5/core/fpga/core.c | 202 +++++++++++++++++++++ > .../net/ethernet/mellanox/mlx5/core/fpga/core.h | 99 ++++++++++ > drivers/net/ethernet/mellanox/mlx5/core/main.c | 19 +- > include/linux/mlx5/device.h | 6 + > include/linux/mlx5/driver.h | 5 + > include/linux/mlx5/mlx5_ifc.h | 11 +- > include/linux/mlx5/mlx5_ifc_fpga.h | 144 +++++++++++++++ > 13 files changed, 640 insertions(+), 3 deletions(-) Can you put it into different driver? Dumping everything into by far the biggest nic driver already is already huge headache in terms on maintainability, debugging and back ports. Look at how intel splits their drivers. ixgb, ixgbe, ixgbevf are different drivers thought they have a lot in common. On one side it's a bit of copy paste, but on the other side it makes drivers much easier to develop and maintain independently. ConnectX-6 code and any future hw support doesn't belong to mlx5 driver at all.
> -----Original Message----- > From: Alexei Starovoitov [mailto:alexei.starovoitov@gmail.com] > > On Tue, May 23, 2017 at 02:44:02PM +0300, Saeed Mahameed wrote: > > From: Ilan Tayari <ilant@mellanox.com> > > > > Mellanox Innova is a NIC with ConnectX and an FPGA on the same > > board. The FPGA is a bump-on-the-wire and thus affects operation of > > the mlx5_core driver on the ConnectX ASIC. > > > > Add basic support for Innova in mlx5_core. > > > > This allows using the Innova card as a regular NIC, by detecting > > the FPGA capability bit, and verifying its load state before > > initializing ConnectX interfaces. > > That doesn't sound like cx4/cx5 hw that mlx5 driver suppose to support. Hi Alexei, Thanks for your feedback. Let me address your concerns. I didn't state it, but the current iterations of Innova cards include the ConnectX-4LX chip, which is exactly what mlx5 driver supports. The PCI interface is *only* through the ConnectX chip with some new Firmware commands. This patch doesn't register any new PCI device, so it does not make sense to create a separate driver. > > > Also detect FPGA fatal runtime failures and enter error state if > > they ever happen. > > > > All new FPGA-related logic is placed in its own subdirectory 'fpga', > > which may be built by selecting CONFIG_MLX5_FPGA. > > This prepares for further support of various Innova features in later > > patchsets. > > Additional details about hardware architecture will be provided as > > more features get submitted. > > > > Signed-off-by: Ilan Tayari <ilant@mellanox.com> > > Reviewed-by: Boris Pismenny <borisp@mellanox.com> > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> > > --- > > MAINTAINERS | 10 + > > drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 10 + > > drivers/net/ethernet/mellanox/mlx5/core/Makefile | 3 + > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 11 ++ > > drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.c | 64 +++++++ > > drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.h | 59 ++++++ > > .../net/ethernet/mellanox/mlx5/core/fpga/core.c | 202 > +++++++++++++++++++++ > > .../net/ethernet/mellanox/mlx5/core/fpga/core.h | 99 ++++++++++ > > drivers/net/ethernet/mellanox/mlx5/core/main.c | 19 +- > > include/linux/mlx5/device.h | 6 + > > include/linux/mlx5/driver.h | 5 + > > include/linux/mlx5/mlx5_ifc.h | 11 +- > > include/linux/mlx5/mlx5_ifc_fpga.h | 144 > +++++++++++++++ > > 13 files changed, 640 insertions(+), 3 deletions(-) > > Can you put it into different driver? Dumping everything into by far > the biggest nic driver already is already huge headache in terms on > maintainability, debugging and back ports. > Look at how intel splits their drivers. > ixgb, ixgbe, ixgbevf are different drivers thought they have a lot in > common. On one side it's a bit of copy paste, but on the other side > it makes drivers much easier to develop and maintain independently. > ConnectX-6 code and any future hw support doesn't belong to > mlx5 driver at all. If you build your driver without explicitly enabling CONFIG_MLX5_FPGA=y (default is N) then you get none of this, and your driver is practically the same as before. The FPGA and CX operation is very tightly integrated. If you don't want any of this, simply don't opt-in to CONFIG_MLX5_FPGA. If you do want this, then splitting some of the logic to a separate kernel object will not gain anything useful (logic would stay the same), and just pollute the exported symbol table and open up the door for issues of inter-module compatibility and so on. Furthermore, the next patchset will introduce IPSec offload capabilities Which are declared in netdev->hw_features. Those cannot be modified after the netdevice is created, so all the extra logic has to be integrated into the mlx5 ethernet driver. This is another reason why a separate driver is a bad idea. We will include details about the board and how things work in the cover letter of the IPSec offload patchset. Ilan.
On Thu, May 25, 2017 at 8:20 AM, Ilan Tayari <ilant@mellanox.com> wrote: >> -----Original Message----- >> From: Alexei Starovoitov [mailto:alexei.starovoitov@gmail.com] >> >> On Tue, May 23, 2017 at 02:44:02PM +0300, Saeed Mahameed wrote: >> > From: Ilan Tayari <ilant@mellanox.com> >> > >> > Mellanox Innova is a NIC with ConnectX and an FPGA on the same >> > board. The FPGA is a bump-on-the-wire and thus affects operation of >> > the mlx5_core driver on the ConnectX ASIC. >> > >> > Add basic support for Innova in mlx5_core. >> > >> > This allows using the Innova card as a regular NIC, by detecting >> > the FPGA capability bit, and verifying its load state before >> > initializing ConnectX interfaces. >> >> That doesn't sound like cx4/cx5 hw that mlx5 driver suppose to support. > > Hi Alexei, > Thanks for your feedback. > Let me address your concerns. > > I didn't state it, but the current iterations of Innova cards include the > ConnectX-4LX chip, which is exactly what mlx5 driver supports. > > The PCI interface is *only* through the ConnectX chip with some new > Firmware commands. This patch doesn't register any new PCI device, > so it does not make sense to create a separate driver. > >> >> > Also detect FPGA fatal runtime failures and enter error state if >> > they ever happen. >> > >> > All new FPGA-related logic is placed in its own subdirectory 'fpga', >> > which may be built by selecting CONFIG_MLX5_FPGA. >> > This prepares for further support of various Innova features in later >> > patchsets. >> > Additional details about hardware architecture will be provided as >> > more features get submitted. >> > >> > Signed-off-by: Ilan Tayari <ilant@mellanox.com> >> > Reviewed-by: Boris Pismenny <borisp@mellanox.com> >> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> >> > --- >> > MAINTAINERS | 10 + >> > drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 10 + >> > drivers/net/ethernet/mellanox/mlx5/core/Makefile | 3 + >> > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 11 ++ >> > drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.c | 64 +++++++ >> > drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.h | 59 ++++++ >> > .../net/ethernet/mellanox/mlx5/core/fpga/core.c | 202 >> +++++++++++++++++++++ >> > .../net/ethernet/mellanox/mlx5/core/fpga/core.h | 99 ++++++++++ >> > drivers/net/ethernet/mellanox/mlx5/core/main.c | 19 +- >> > include/linux/mlx5/device.h | 6 + >> > include/linux/mlx5/driver.h | 5 + >> > include/linux/mlx5/mlx5_ifc.h | 11 +- >> > include/linux/mlx5/mlx5_ifc_fpga.h | 144 >> +++++++++++++++ >> > 13 files changed, 640 insertions(+), 3 deletions(-) >> >> Can you put it into different driver? Dumping everything into by far >> the biggest nic driver already is already huge headache in terms on >> maintainability, debugging and back ports. >> Look at how intel splits their drivers. >> ixgb, ixgbe, ixgbevf are different drivers thought they have a lot in >> common. On one side it's a bit of copy paste, but on the other side I don't think the ixgb example is the same, simply ixgb, ixgbe, ixgbevf have different PCI IDs and even different SW/FW interfaces. On the other hand, same mlx5 driver can support all of ConnetX4/5/6 device IDs with the same code flows, same interfaces. >> it makes drivers much easier to develop and maintain independently. >> ConnectX-6 code and any future hw support doesn't belong to >> mlx5 driver at all. Sorry i must disagree with you on this for the same reasons Ilan mentioned. We can perfectly achieve the same with modular driver design all under the same .ko module, with some kconfig flags to reduce the amount of code/features this .ko provides. > > If you build your driver without explicitly enabling CONFIG_MLX5_FPGA=y > (default is N) then you get none of this, and your driver is practically > the same as before. > > The FPGA and CX operation is very tightly integrated. > If you don't want any of this, simply don't opt-in to CONFIG_MLX5_FPGA. > > If you do want this, then splitting some of the logic to a > separate kernel object will not gain anything useful (logic would stay > the same), and just pollute the exported symbol table and open up the door > for issues of inter-module compatibility and so on. > > Furthermore, the next patchset will introduce IPSec offload capabilities > Which are declared in netdev->hw_features. Those cannot be modified > after the netdevice is created, so all the extra logic has to be > integrated into the mlx5 ethernet driver. This is another reason why > a separate driver is a bad idea. > > We will include details about the board and how things work in the > cover letter of the IPSec offload patchset. > > Ilan.
On 05/25/2017 06:40 AM, Saeed Mahameed wrote: > On Thu, May 25, 2017 at 8:20 AM, Ilan Tayari <ilant@mellanox.com> wrote: >>> -----Original Message----- >>> Can you put it into different driver? Dumping everything into by far >>> the biggest nic driver already is already huge headache in terms on >>> maintainability, debugging and back ports. >>> Look at how intel splits their drivers. >>> ixgb, ixgbe, ixgbevf are different drivers thought they have a lot in >>> common. On one side it's a bit of copy paste, but on the other side > > I don't think the ixgb example is the same, simply ixgb, ixgbe, > ixgbevf have different PCI IDs > and even different SW/FW interfaces. On the other hand, same mlx5 > driver can support all of > ConnetX4/5/6 device IDs with the same code flows, same interfaces. > >>> it makes drivers much easier to develop and maintain independently. >>> ConnectX-6 code and any future hw support doesn't belong to >>> mlx5 driver at all. > > Sorry i must disagree with you on this for the same reasons Ilan mentioned. > We can perfectly achieve the same with modular driver design all under the > same .ko module, with some kconfig flags to reduce the amount of code/features > this .ko provides. If I get this right, the FPGA is independent and could in theory be used for non network stuff. It really should have it's own driver in that case, and you should provide accessor functionality via the mlx5 driver. We have this with other devices in the kernel where a primary device driver provides an interface for an additional sub-driver to access another device behind it. Like bt-coexist in some of the wifi drivers allowing access to a bluetooth device behind it. Jes
On Thu, May 25, 2017 at 05:20:04AM +0000, Ilan Tayari wrote: > > If you do want this, then splitting some of the logic to a > separate kernel object will not gain anything useful (logic would stay > the same), and just pollute the exported symbol table and open up the door > for issues of inter-module compatibility and so on. in other words instead of properly designing inter-module api you just want to add everything into one giant 'driver' because it's easier to do so. Understandable, but it leads to unmaintainable device infrastructure long term. > Furthermore, the next patchset will introduce IPSec offload capabilities > Which are declared in netdev->hw_features. Those cannot be modified > after the netdevice is created, so all the extra logic has to be > integrated into the mlx5 ethernet driver. This is another reason why > a separate driver is a bad idea. that increases my concerns even more. ipsec offload is a new feature for new hw, yet you're trying to hide it in the mlx5 'driver'. mlx5 already supports cx4/cx4-lx/cx5 and not even released cx6. All of them have different feature sets. The only common piece is driver/fw cmd interface and _some_ aspects of rx/tx descriptors. Everything else is different. cx4-lx doesn't have infiniband and rdma, yet there is a ton of code in the driver that deals with it. cx5 has fancy storage interfaces for nvme, etc I don't think they are part of the mlx5 'driver' yet. Are you going to dump them in there as well? Take a look at the simplest feature-wise cx4-lx. It has eswitch which is full l2 switch with mac table, drop policy, counters and such. Yet kernel knows nothing about it. Everything is hidden in mlx5 'driver' with its own special interfaces to manage it. Now you want to hide fpga with custom logic behind mlx5 'driver' and manage it through mlx5 interfaces? That's not acceptable. mlx fpga got to have generic kernel representation that intel and other fpga vendors can use. mlx5 driver has to be modularized and since it's not too late cx6 support has to be removed from it. Thankfully only few cx6 pci ids were added to mlx5, but driver knows nothing about cx6 features, so we can properly design it. Since driver/fw interface is common between cx4+ this part can be moved into mlx_core.ko or done as library the way chelsio did it in libcxgb.
On Tue, May 23, 2017 at 02:44:02PM +0300, Saeed Mahameed wrote: > From: Ilan Tayari <ilant@mellanox.com> > > Mellanox Innova is a NIC with ConnectX and an FPGA on the same > board. The FPGA is a bump-on-the-wire and thus affects operation of > the mlx5_core driver on the ConnectX ASIC. > > Add basic support for Innova in mlx5_core. > > This allows using the Innova card as a regular NIC, by detecting > the FPGA capability bit, and verifying its load state before > initializing ConnectX interfaces. > > Also detect FPGA fatal runtime failures and enter error state if > they ever happen. > > All new FPGA-related logic is placed in its own subdirectory 'fpga', > which may be built by selecting CONFIG_MLX5_FPGA. > This prepares for further support of various Innova features in later > patchsets. > Additional details about hardware architecture will be provided as > more features get submitted. > > Signed-off-by: Ilan Tayari <ilant@mellanox.com> > Reviewed-by: Boris Pismenny <borisp@mellanox.com> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> > --- > MAINTAINERS | 10 + > drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 10 + > drivers/net/ethernet/mellanox/mlx5/core/Makefile | 3 + > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 11 ++ > drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.c | 64 +++++++ > drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.h | 59 ++++++ > .../net/ethernet/mellanox/mlx5/core/fpga/core.c | 202 +++++++++++++++++++++ > .../net/ethernet/mellanox/mlx5/core/fpga/core.h | 99 ++++++++++ > drivers/net/ethernet/mellanox/mlx5/core/main.c | 19 +- > include/linux/mlx5/device.h | 6 + > include/linux/mlx5/driver.h | 5 + > include/linux/mlx5/mlx5_ifc.h | 11 +- > include/linux/mlx5/mlx5_ifc_fpga.h | 144 +++++++++++++++ > 13 files changed, 640 insertions(+), 3 deletions(-) Dave, please revert this Innova fpga stuff. I think you pushed it by accident, since it was mixed with other valid changes. The discussion didn't conclude. Myself and Jes are clearly against such changes. It definitely needs more discussion and wider consensus. Thanks
From: Alexei Starovoitov <alexei.starovoitov@gmail.com> Date: Thu, 25 May 2017 20:58:32 -0700 > Dave, please revert this Innova fpga stuff. > I think you pushed it by accident, since it was mixed with > other valid changes. > The discussion didn't conclude. > Myself and Jes are clearly against such changes. > It definitely needs more discussion and wider consensus. Why don't you finish your discussion, then I can revert or leave it in there? If someone puts an FPGA inside of a device, and it's still behind the PCI ID of the ethernet card, how they expose that thing is largely at the discression of the driver author. So feel free to discuss things with them, but in the end unless I am convinced otherwise (and I'm currently not), what they are doing now is fine by me. I will also state that it can't be that every time Mellanox tries to add something significant to their driver everyone screams "oh my, this thing is so hard to maintain as it is, you can't add this feature _too_." Sorry, it's an important driver, it's big, it has a lot of features, and all those problems people talk about simply come with the territory. This FPGA thing is just a drop in the bucket as far as long term maintainence and backporting is concerned, so if that truly is the basis of why you are against these changes, there is not much weight to that argument. Use a less featurefull device, which uses a driver with slower development and less changes, if these things truly bother you. Thanks.
On Fri, May 26, 2017 at 12:13:27AM -0400, David Miller wrote: > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Date: Thu, 25 May 2017 20:58:32 -0700 > > > Dave, please revert this Innova fpga stuff. > > I think you pushed it by accident, since it was mixed with > > other valid changes. > > The discussion didn't conclude. > > Myself and Jes are clearly against such changes. > > It definitely needs more discussion and wider consensus. > > Why don't you finish your discussion, then I can revert or leave it in > there? Not really. What you're saying is 'shut up. mellanox can do whatever they like as long as it's hidden behind pcie id'. > If someone puts an FPGA inside of a device, and it's still behind the > PCI ID of the ethernet card, how they expose that thing is largely at > the discression of the driver author. So you're saying it's ok to program fpga through nic whereas larger kernel community is trying to come up with a solution to represent fpga as a proper device for the kernel. I suspect there will be popcorn threads during the next merge window. > So feel free to discuss things with them, but in the end unless I am > convinced otherwise (and I'm currently not), what they are doing now > is fine by me. If you didn't merge it already it would be fair course of action, but it's clearly not the case. > I will also state that it can't be that every time Mellanox tries to > add something significant to their driver everyone screams "oh my, > this thing is so hard to maintain as it is, you can't add this feature > _too_." Maintaining is only small part of it. The key argument against it that all these things fpga, eswitch, etc are _hidden_ behind the nic. It already causes headaches in production, because kernel cannot see them. > Sorry, it's an important driver, it's big, it has a lot of features, > and all those problems people talk about simply come with the > territory. It's not a driver feature. It's not a nic. It's a different physical device that mlx choose to program via nic. There is also 'bluefield' nic on the horizon. From host point of view it looks like the same cx5/6 nic, but it has arm cpus on board. Are you going to allow programming it via nic as part of mlx5 driver as well? It's no different than fpga hiding behind the nic... > Use a less featurefull device, which uses a driver with slower > development and less changes, if these things truly bother you. like google did. food for thought.
On Thu, May 25, 2017 at 11:48 PM, Jes Sorensen <jsorensen@fb.com> wrote: > On 05/25/2017 06:40 AM, Saeed Mahameed wrote: >> >> On Thu, May 25, 2017 at 8:20 AM, Ilan Tayari <ilant@mellanox.com> wrote: >>>> >>>> -----Original Message----- >>>> Can you put it into different driver? Dumping everything into by far >>>> the biggest nic driver already is already huge headache in terms on >>>> maintainability, debugging and back ports. >>>> Look at how intel splits their drivers. >>>> ixgb, ixgbe, ixgbevf are different drivers thought they have a lot in >>>> common. On one side it's a bit of copy paste, but on the other side >> >> >> I don't think the ixgb example is the same, simply ixgb, ixgbe, >> ixgbevf have different PCI IDs >> and even different SW/FW interfaces. On the other hand, same mlx5 >> driver can support all of >> ConnetX4/5/6 device IDs with the same code flows, same interfaces. >> >>>> it makes drivers much easier to develop and maintain independently. >>>> ConnectX-6 code and any future hw support doesn't belong to >>>> mlx5 driver at all. >> >> >> Sorry i must disagree with you on this for the same reasons Ilan >> mentioned. >> We can perfectly achieve the same with modular driver design all under the >> same .ko module, with some kconfig flags to reduce the amount of >> code/features >> this .ko provides. > > > If I get this right, the FPGA is independent and could in theory be used for > non network stuff. It really should have it's own driver in that case, and > you should provide accessor functionality via the mlx5 driver. > Hi Jes, No, It is clearly stated in the commit message : "The FPGA is a bump-on-the-wire and thus affects operation of the mlx5_core driver on the ConnectX ASIC." Which means mlx5 FPGA user can only write logic which affects only packets going in/out A ConnectX chip - so it is only network stuff -. > We have this with other devices in the kernel where a primary device driver > provides an interface for an additional sub-driver to access another device > behind it. Like bt-coexist in some of the wifi drivers allowing access to a > bluetooth device behind it. > Blutooth over wifi or vise versa is a very good example to what you are requesting. But, it doesn't fit to what we are trying to do here. mlx5 FGPA is a ConnectX card feature, not a new protocol. > Jes
On Fri, May 26, 2017 at 6:07 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Thu, May 25, 2017 at 05:20:04AM +0000, Ilan Tayari wrote: >> >> If you do want this, then splitting some of the logic to a >> separate kernel object will not gain anything useful (logic would stay >> the same), and just pollute the exported symbol table and open up the door >> for issues of inter-module compatibility and so on. > > in other words instead of properly designing inter-module api > you just want to add everything into one giant 'driver' > because it's easier to do so. Understandable, but it leads > to unmaintainable device infrastructure long term. > What is wrong with giant 'driver' ? :-) BTW making the fpga a separate module is as easy as adding one line to the make file and moving the entry points functions (fpga_init/cleanup) form mlx5 init flow into a "mlx5_interface" instance and register it with the mlx5_core. But again i don't see a point of doing so, the code will still look the same and sit in the same places. so one giant ko file or small separate modules is basically the same, from my point of view. modular design have nothing to do with the compilation output binary, it should be considered regardless of the compiler output. Which we tried to not far a go to break the driver into sub-module directories, each with its own responsibility, but it was too late for that. And for FPGA support, we did it correctly this time, all the new code is under mlx5/core/fgpa .. if you don't like core/fpga, ok just simply "CONFIG_MLX5_FPGA = no" and all of that code/directory will disappear from your ko file and will not be touched in compilation time, and the core code will still look the same. >> Furthermore, the next patchset will introduce IPSec offload capabilities >> Which are declared in netdev->hw_features. Those cannot be modified >> after the netdevice is created, so all the extra logic has to be >> integrated into the mlx5 ethernet driver. This is another reason why >> a separate driver is a bad idea. > > that increases my concerns even more. > ipsec offload is a new feature for new hw, yet you're trying to > hide it in the mlx5 'driver'. > Well, this is a well known dilemma, if one has a new feature and has no sandbox area to put it in the first phase, it is better to start from the most common place for that feature which is the originating place, before defining APIs/infrastructures, until the feature is complete and every body is happy about it. Same was done for GRO, XDP , you name it .. nothing is being hidden here, we are adding new HW offloads, not that different from csum offloads, which no one can live without these days, and i believe in few years IPSec and TLS and security offloads will be no less important than csum offloads. Someone has to start somewhere and we choose to start in our driver, if you have a better IPSec generic interface that allows modular module separation we are happy to hear about it. > mlx5 already supports cx4/cx4-lx/cx5 and not even released cx6. I will take this as a complement ;-), please see more info below. > All of them have different feature sets. Not different feature sets, but i expect some changes through out the evolution of the ConnectX chip , and those changes can be: 1. internal bug fixes/internal improvements. 2. new extra features from the previous generation. But the chip is the same chip, i.e same driver works on all of the chips with the basic/standard feature set. > The only common piece is driver/fw cmd interface and _some_ aspects > of rx/tx descriptors. Everything else is different. I can add more to the list. basic offloads, steering management, eswitch, HW objects and semantics QPs/RQs/SQs/CQs/EQs and their manipulation semantics, internal page memory management, NIC initialization and teardown flows are the same and many more. > cx4-lx doesn't have infiniband and rdma, yet there is a ton of code although it doesn't have infinband support, cx4-lx have rdma support, you can perfectly load mlx5_ib and work with RDMA over ethernet (RoCE). which is 99% of the mlx5_ib&mlx5_core code. (really !) so the ton of code is still valid for cx4-lx. > in the driver that deals with it. > cx5 has fancy storage interfaces for nvme, etc I don't think they > are part of the mlx5 'driver' yet. Are you going to dump them > in there as well? > Take a look at the simplest feature-wise cx4-lx. It has > eswitch which is full l2 switch with mac table, drop policy, > counters and such. Yet kernel knows nothing about it. > Everything is hidden in mlx5 'driver' with its own special > interfaces to manage it. 1. eswitch is a very integral steering feature of the ConnectX family and is valid for all ConnectX4/lx/5/6 .. cards. 2. kenrel doesn't and can't know about it since in the early days of SRIOV implementation (what the cool kids of switchdev mode call the legacy sriov mode) is the responsibility of the device driver and it is an internal arch of the HCA. so it is not our fault it is hidden in mlx5, we just went with the flow.. 3. I agree with you, device driver should be as transparent as possible to the kernel, and should provide as much info as needed to the stack. 4. switchdev mode came to solve this exact issue, and the kernel stack even up to user space are perfectly aware of the nic eswitch capabilities. being said, even if the kernel is aware or not aware of the eswitch it doesn't mean it should or shouldn't sit in the same mlx5 ko file. > Now you want to hide fpga with custom logic behind mlx5 'driver' > and manage it through mlx5 interfaces? > That's not acceptable. > mlx fpga got to have generic kernel representation that intel > and other fpga vendors can use. > quoting Ilan from his commit message: "The FPGA is a bump-on-the-wire and thus affects operation of the mlx5_core driver on the ConnectX ASIC." This is not a general purpose FPGA ! it even doesn't have a PCI Id or it is own PCI function. it simply allows the ConnectX Chip user to inspect/inject or run user specif login on traffic going in/out of the chip. But i agree with you some serious API brainstorming should be considered, but not at this stage, this patch only tells the ConnectX card (if you have FPGA, please enable it). > mlx5 driver has to be modularized and since it's not too late > cx6 support has to be removed from it. > I agree with you on the first part, Yes i would like to better modularize the driver, i will even consider taking it to the extreme and have separate module for each sub mlx5 module e.g: mlx5_core.ko mlx5_eswitch.ko mlx5_vxaln.ko mlx5_vf_repreentor.ko mlx5_ipoib.ko ... even mlx5_xdp.ko ( not really ;-) ) If this what it takes to more logically and modularly break mlx5/core into hierarchical design, Please count me in. And i would like to hear Dave explicitly say i am ok with this first. But, re cx6 I am not sure i can agree on this for the simple fact: it took 3 years to get mlx5 (Cx4/lx) driver to provide similar feature set as mlx4 (Cx3). and it literally took 0 lines of codes 1 day of regression to test mlx5 driver over CX5 ! Now All we need is to add the new features of CX5 on top of the already existing driver and i will be more than happy to hear about exciting Ideas on how to modularly do that in the same driver, and provide clear separation (beleive me it is is to do, if i am allowed to split the driver into directories) but i can't do it in a completely separate driver with a base code library, it is simply wrong and non scalable (even if someone is not careful enough it could make a lot of dependency mess between mlx5 modules) > Thankfully only few cx6 pci ids were added to mlx5, but driver > knows nothing about cx6 features, so we can properly design it. > > Since driver/fw interface is common between cx4+ this part > can be moved into mlx_core.ko or done as library > the way chelsio did it in libcxgb. > driver/fw interfaces are not the only common part, it is all basic standard netdev/rdma and linux networkinng stuff are the same. -Saeed.
From: Alexei Starovoitov <alexei.starovoitov@gmail.com> Date: Thu, 25 May 2017 21:40:59 -0700 > On Fri, May 26, 2017 at 12:13:27AM -0400, David Miller wrote: >> From: Alexei Starovoitov <alexei.starovoitov@gmail.com> >> Date: Thu, 25 May 2017 20:58:32 -0700 >> >> > Dave, please revert this Innova fpga stuff. >> > I think you pushed it by accident, since it was mixed with >> > other valid changes. >> > The discussion didn't conclude. >> > Myself and Jes are clearly against such changes. >> > It definitely needs more discussion and wider consensus. >> >> Why don't you finish your discussion, then I can revert or leave it in >> there? > > Not really. What you're saying is 'shut up. mellanox can do > whatever they like as long as it's hidden behind pcie id'. No, what I'm saying is, discuss things. And if in the end you agree with Mellanox, we don't have flapping of the change in and out of the tree over and over. My understanding is that this FPGA offloads processing the data stream, which in a way is not much different than JIT'ing eBPF onto Netronome cpus. :-) Maybe the programming interface is different, maybe all the algorithms are implemented in discrete ASICs, but effect is quite similar.
On Fri, May 26, 2017 at 11:59:26AM +0300, Saeed Mahameed wrote: > > And for FPGA support, we did it correctly this time, all the new code > is under mlx5/core/fgpa .. s/correctly/incorrectly/ ? > Well, this is a well known dilemma, if one has a new feature and has no sandbox > area to put it in the first phase, it is better to start from the most > common place > for that feature which is the originating place, before defining > APIs/infrastructures, > until the feature is complete and every body is happy about it. There is driver/fpga to manage fpga, but mlx fpga+nic combo will be managed via mlx5/core/fpga/ Adding fpga folks for visibility. > 1. eswitch is a very integral steering feature of the ConnectX family > and is valid for all ConnectX4/lx/5/6 .. cards. and kernel has no visibility into it and mlx has no incentive to expose it, since community needs to agree on api whereas hiding in the driver is 'mlx territory' and much easier to push arbitrary patches to. > "The FPGA is a bump-on-the-wire and thus affects operation of > the mlx5_core driver on the ConnectX ASIC." > > This is not a general purpose FPGA ! it even doesn't have a PCI Id or > it is own PCI function. > it simply allows the ConnectX Chip user to inspect/inject or run user > specif login on traffic going in/out of the chip. ... and bluefield arm cpus are not general purpose cpus and only used to offload networking just like this fpga? > But i agree with you some serious API brainstorming should be > considered, but not at this stage, > this patch only tells the ConnectX card (if you have FPGA, please enable it). serious api discussion needs be done before things like eswitch, fpga exposed to users otherwise we end up in the eswitch situation that drops packets based on its own invisible logic, but with hidden fpga it will be even more obscure.
On Fri, May 26, 2017 at 10:56:25AM -0700, Alexei Starovoitov wrote: > > for that feature which is the originating place, before defining > > APIs/infrastructures, > > until the feature is complete and every body is happy about it. > > There is driver/fpga to manage fpga, but mlx fpga+nic combo > will be managed via mlx5/core/fpga/ > > Adding fpga folks for visibility. It would be good to use the existing fpga loading infrastructure to get the bitstream into the NIC, and to use the same Xilinx bitstream format as eg Zynq does for consistency. I'm unclear how this works - there must be more to it than just a 'bump on the wire', there must be some communication channel between the FPGA and Linux to set operational data (eg load keys) etc. If that is register mapped into a PCI-BAR someplace then it really should use the FPGA layer functions to manage binding drivers to that register window. If it is mailbox command based then it is not as good of a fit. Is this FPGA expected to be customer programmable? In that case you really need the full infrastructure to bind the right driver (possibly a customer driver) to the current FPGA, to expose the correct operational interface to the kernel. Jason
On 05/26/2017 04:29 AM, Saeed Mahameed wrote: > On Thu, May 25, 2017 at 11:48 PM, Jes Sorensen <jsorensen@fb.com> wrote: >> On 05/25/2017 06:40 AM, Saeed Mahameed wrote: > Hi Jes, > > No, It is clearly stated in the commit message : > > "The FPGA is a bump-on-the-wire and thus affects operation of > the mlx5_core driver on the ConnectX ASIC." > > Which means mlx5 FPGA user can only write logic which affects only > packets going in/out > A ConnectX chip - so it is only network stuff -. > >> We have this with other devices in the kernel where a primary device driver >> provides an interface for an additional sub-driver to access another device >> behind it. Like bt-coexist in some of the wifi drivers allowing access to a >> bluetooth device behind it. > > Blutooth over wifi or vise versa is a very good example to what you > are requesting. > But, it doesn't fit to what we are trying to do here. mlx5 FGPA is a > ConnectX card feature, not a new protocol. In that case it would need to be an optional module that can be enabled or disabled at build time. Cheers, Jes
> -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > > On Fri, May 26, 2017 at 10:56:25AM -0700, Alexei Starovoitov wrote: > > > > for that feature which is the originating place, before defining > > > APIs/infrastructures, > > > until the feature is complete and every body is happy about it. > > > > There is driver/fpga to manage fpga, but mlx fpga+nic combo > > will be managed via mlx5/core/fpga/ > > > > Adding fpga folks for visibility. > > It would be good to use the existing fpga loading infrastructure to > get the bitstream into the NIC, and to use the same Xilinx bitstream > format as eg Zynq does for consistency. > > I'm unclear how this works - there must be more to it than just a > 'bump on the wire', there must be some communication channel between > the FPGA and Linux to set operational data (eg load keys) etc. > > If that is register mapped into a PCI-BAR someplace then it really > should use the FPGA layer functions to manage binding drivers to that > register window. > > If it is mailbox command based then it is not as good of a fit. > Hi Jason, Thanks for taking the time to consider this! This is neither PCI-bar mapped, nor mailbox command. The FPGA is indeed a bump-on-the-wire. (It has I2C to the CX4 chip, but that is for debug purposes, and too slow to perform real programming) The FPGA is programmed with RoCE packets. We are going to share a lot of details about this in the IPSec offload patchset, so I don't want to repeat it here. In short, we open a RoCEv2 QP between the connect and the FPGA chips. Over this QP, we can communicate at high speed with the FPGA. > Is this FPGA expected to be customer programmable? In that case you > really need the full infrastructure to bind the right driver (possibly > a customer driver) to the current FPGA, to expose the correct > operational interface to the kernel. Most of the Innova family of cards are not customer-programmable. Over time, they may require an upgrade (essentially a FW upgrade) but not with customer logic. One flavor of the product, the Innova Flex, allows customer logic in the FPGA. But even then it is "wrapped" by Mellanox shell logic. We plan to have an in-kernel API for writing client drivers for Innova Flex. > > Jason
> -----Original Message----- > From: Jes Sorensen [mailto:jsorensen@fb.com] > > On 05/26/2017 04:29 AM, Saeed Mahameed wrote: > > On Thu, May 25, 2017 at 11:48 PM, Jes Sorensen <jsorensen@fb.com> wrote: > >> On 05/25/2017 06:40 AM, Saeed Mahameed wrote: > > Hi Jes, > > > > No, It is clearly stated in the commit message : > > > > "The FPGA is a bump-on-the-wire and thus affects operation of > > the mlx5_core driver on the ConnectX ASIC." > > > > Which means mlx5 FPGA user can only write logic which affects only > > packets going in/out > > A ConnectX chip - so it is only network stuff -. > > > >> We have this with other devices in the kernel where a primary device > driver > >> provides an interface for an additional sub-driver to access another > device > >> behind it. Like bt-coexist in some of the wifi drivers allowing access > to a > >> bluetooth device behind it. > > > > Blutooth over wifi or vise versa is a very good example to what you > > are requesting. > > But, it doesn't fit to what we are trying to do here. mlx5 FGPA is a > > ConnectX card feature, not a new protocol. > > In that case it would need to be an optional module that can be enabled > or disabled at build time. Jes, It is already modular like that. See CONFIG_MLX5_FPGA. Ilan. > > Cheers, > Jes >
On Fri, May 26, 2017 at 8:56 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Fri, May 26, 2017 at 11:59:26AM +0300, Saeed Mahameed wrote: >> But i agree with you some serious API brainstorming should be >> considered, but not at this stage, this patch only tells the ConnectX card >> (if you have FPGA, please enable it). > serious api discussion needs be done before things like eswitch, fpga > exposed to users otherwise we end up in the eswitch situation that > drops packets based on its own invisible logic, but with hidden fpga > it will be even more obscure. Alexei, ACK for your comment on e-switch, what we call the legacy mode for SRIOV in Linux was a big mistake. As you know, since 4.8 we're @ MLNX are working to fix it... we introduced a model for VF port representors which has set the foundations for e-switch data-path to be programmed as an offload for kernel SW switching. We also introduced the code to offload TC flower base data-path and support that in for both switch (Spectrum) ASIC (mlxsw driver) and NIC ASIC (mlx5 driver) Or.
On Sun, May 28, 2017 at 07:22:27AM +0000, Ilan Tayari wrote: > This is neither PCI-bar mapped, nor mailbox command. > The FPGA is indeed a bump-on-the-wire. > (It has I2C to the CX4 chip, but that is for debug purposes, and too slow > to perform real programming) Wait.. So if it truely has nothing to do with the existing mellanox driver, then nothing more than the fpga loader should be in the mlx5 directory? > One flavor of the product, the Innova Flex, allows customer logic in the > FPGA. But even then it is "wrapped" by Mellanox shell logic. > We plan to have an in-kernel API for writing client drivers for Innova > Flex. The FPGA subsystem already has APIs for partial reconfiguration use cases, you probably should not re-invent that under the mlx5 driver. Jason
> -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > Subject: Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova > > On Sun, May 28, 2017 at 07:22:27AM +0000, Ilan Tayari wrote: > > > This is neither PCI-bar mapped, nor mailbox command. > > The FPGA is indeed a bump-on-the-wire. > > (It has I2C to the CX4 chip, but that is for debug purposes, and too > slow > > to perform real programming) > > Wait.. So if it truely has nothing to do with the existing mellanox > driver, then nothing more than the fpga loader should be in the mlx5 > directory? True, except in specific cases when the FPGA may mangle the packets in a way that the netdevice configures, and the driver needs to adapt the data path. Such is the case of IPSec and TLS offloads. Those are tied to the mlx5 Ethernet driver (isolated with a kconfig). > > > One flavor of the product, the Innova Flex, allows customer logic in the > > FPGA. But even then it is "wrapped" by Mellanox shell logic. > > We plan to have an in-kernel API for writing client drivers for Innova > > Flex. > > The FPGA subsystem already has APIs for partial reconfiguration use > cases, you probably should not re-invent that under the mlx5 driver. When the time comes to introduce the API for that, we will investigate all Options available. Currently this is not in the plans for Linux. (We do have an out-of-tree 'hack' driver for this, but it's planned to be replaced with something more appropriate later on) > > Jason
On Mon, May 29, 2017 at 03:58:33PM +0000, Ilan Tayari wrote: > > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > > Subject: Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova > > > > On Sun, May 28, 2017 at 07:22:27AM +0000, Ilan Tayari wrote: > > > > > This is neither PCI-bar mapped, nor mailbox command. > > > The FPGA is indeed a bump-on-the-wire. > > > (It has I2C to the CX4 chip, but that is for debug purposes, and too > > slow > > > to perform real programming) > > > > Wait.. So if it truely has nothing to do with the existing mellanox > > driver, then nothing more than the fpga loader should be in the mlx5 > > directory? > > True, except in specific cases when the FPGA may mangle the packets in > a way that the netdevice configures, and the driver needs to adapt the > data path. > Such is the case of IPSec and TLS offloads. > Those are tied to the mlx5 Ethernet driver (isolated with a kconfig). But there is nothing stopping this sort of FPGA mangling logic being downstream of any NIC, Mellanox is just the first to do this. I think you'd be better to add something to the net stack to model this post-nic mangling hardware, than trying to hide it in a driver. Jason
> -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > Subject: Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova > > On Mon, May 29, 2017 at 03:58:33PM +0000, Ilan Tayari wrote: > > > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > > > Subject: Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for > Innova > > > > > > On Sun, May 28, 2017 at 07:22:27AM +0000, Ilan Tayari wrote: > > > > > > > This is neither PCI-bar mapped, nor mailbox command. > > > > The FPGA is indeed a bump-on-the-wire. > > > > (It has I2C to the CX4 chip, but that is for debug purposes, and too > > > slow > > > > to perform real programming) > > > > > > Wait.. So if it truely has nothing to do with the existing mellanox > > > driver, then nothing more than the fpga loader should be in the mlx5 > > > directory? > > > > True, except in specific cases when the FPGA may mangle the packets in > > a way that the netdevice configures, and the driver needs to adapt the > > data path. > > > Such is the case of IPSec and TLS offloads. > > Those are tied to the mlx5 Ethernet driver (isolated with a kconfig). > > But there is nothing stopping this sort of FPGA mangling logic being > downstream of any NIC, Mellanox is just the first to do this. > > I think you'd be better to add something to the net stack to model > this post-nic mangling hardware, than trying to hide it in a driver. Of course. For IPSec, this is already in the kernel. See this patchset: http://www.mail-archive.com/netdev@vger.kernel.org/msg162876.html For TLS, Dave Watson is working with our guys to add it. > > Jason
> -----Original Message----- > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] > > > -----Original Message----- > > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > > Subject: Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova > > > > On Mon, May 29, 2017 at 03:58:33PM +0000, Ilan Tayari wrote: > > > > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > > > > Subject: Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for > > Innova > > > > > > > > On Sun, May 28, 2017 at 07:22:27AM +0000, Ilan Tayari wrote: > > > > > > > > > This is neither PCI-bar mapped, nor mailbox command. > > > > > The FPGA is indeed a bump-on-the-wire. > > > > > (It has I2C to the CX4 chip, but that is for debug purposes, and > too > > > > slow > > > > > to perform real programming) > > > > > > > > Wait.. So if it truely has nothing to do with the existing mellanox > > > > driver, then nothing more than the fpga loader should be in the mlx5 > > > > directory? > > > > > > True, except in specific cases when the FPGA may mangle the packets in > > > a way that the netdevice configures, and the driver needs to adapt the > > > data path. > > > > > Such is the case of IPSec and TLS offloads. > > > Those are tied to the mlx5 Ethernet driver (isolated with a kconfig). > > > > But there is nothing stopping this sort of FPGA mangling logic being > > downstream of any NIC, Mellanox is just the first to do this. > > > > I think you'd be better to add something to the net stack to model > > this post-nic mangling hardware, than trying to hide it in a driver. > > Of course. > > For IPSec, this is already in the kernel. > See this patchset: > http://www.mail-archive.com/netdev@vger.kernel.org/msg162876.html Sorry, I pointed at the RFC by mistake. This is the relevant pull request: https://patchwork.ozlabs.org/patch/752707/ > > For TLS, Dave Watson is working with our guys to add it. > > > > > Jason
On Mon, May 29, 2017 at 04:09:06PM +0000, Ilan Tayari wrote: > > For IPSec, this is already in the kernel. > > See this patchset: > > http://www.mail-archive.com/netdev@vger.kernel.org/msg162876.html > > Sorry, I pointed at the RFC by mistake. > > This is the relevant pull request: > https://patchwork.ozlabs.org/patch/752707/ This is connecting ipsec to a netdev, while Innova seems to be a network connected ipsec accelerator configured using IP packets. Those two things don't seem to be the same. Jason
On 05/28/2017 03:24 AM, Ilan Tayari wrote: >> -----Original Message----- >> From: Jes Sorensen [mailto:jsorensen@fb.com] >> >> On 05/26/2017 04:29 AM, Saeed Mahameed wrote: >>> On Thu, May 25, 2017 at 11:48 PM, Jes Sorensen <jsorensen@fb.com> wrote: >>>> On 05/25/2017 06:40 AM, Saeed Mahameed wrote: >>> Hi Jes, >>> >>> No, It is clearly stated in the commit message : >>> >>> "The FPGA is a bump-on-the-wire and thus affects operation of >>> the mlx5_core driver on the ConnectX ASIC." >>> >>> Which means mlx5 FPGA user can only write logic which affects only >>> packets going in/out >>> A ConnectX chip - so it is only network stuff -. >>> >>>> We have this with other devices in the kernel where a primary device >> driver >>>> provides an interface for an additional sub-driver to access another >> device >>>> behind it. Like bt-coexist in some of the wifi drivers allowing access >> to a >>>> bluetooth device behind it. >>> >>> Blutooth over wifi or vise versa is a very good example to what you >>> are requesting. >>> But, it doesn't fit to what we are trying to do here. mlx5 FGPA is a >>> ConnectX card feature, not a new protocol. >> >> In that case it would need to be an optional module that can be enabled >> or disabled at build time. > > Jes, > > It is already modular like that. See CONFIG_MLX5_FPGA. [jes@xpeas netdev.git]$ grep CONFIG_MLX5_FPGA drivers/net/ethernet/mellanox/mlx5/core/* [jes@xpeas netdev.git]$ Which git tree am I supposed to look at? Jes
On Fri, 2017-06-02 at 16:31 -0400, Jes Sorensen wrote: > > > It is already modular like that. See CONFIG_MLX5_FPGA. > > [jes@xpeas netdev.git]$ grep CONFIG_MLX5_FPGA > drivers/net/ethernet/mellanox/mlx5/core/* > [jes@xpeas netdev.git]$ > > Which git tree am I supposed to look at? net-next
> -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > Subject: Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova > > On Mon, May 29, 2017 at 04:09:06PM +0000, Ilan Tayari wrote: > > > > For IPSec, this is already in the kernel. > > > See this patchset: > > > http://www.mail-archive.com/netdev@vger.kernel.org/msg162876.html > > > > Sorry, I pointed at the RFC by mistake. > > > > This is the relevant pull request: > > https://patchwork.ozlabs.org/patch/752707/ > > This is connecting ipsec to a netdev, while Innova seems to be a Jason, "network connected ipsec accelerator configured using IP packets." No. This is incorrect. Where did you get that from? It is an IPSec-capable NIC. See here for details: http://www.mellanox.com/page/products_dyn?product_family=249&mtag=programmable_network_adapters The driver we are intending to submit for it, interfaces with the mentioned API in the pull-request (Netdev xdo callbacks) So you configure it from userspace with regular IPSec 'ip xfrm state' commands or over netlink with your favorite IKE daemon. > > Those two things don't seem to be the same. > > Jason
On Sun, Jun 04, 2017 at 07:51:24AM +0000, Ilan Tayari wrote: > > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > > Subject: Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova > > > > On Mon, May 29, 2017 at 04:09:06PM +0000, Ilan Tayari wrote: > > > > > > For IPSec, this is already in the kernel. > > > > See this patchset: > > > > http://www.mail-archive.com/netdev@vger.kernel.org/msg162876.html > > > > > > Sorry, I pointed at the RFC by mistake. > > > > > > This is the relevant pull request: > > > https://patchwork.ozlabs.org/patch/752707/ > > > > This is connecting ipsec to a netdev, while Innova seems to be a > > Jason, > > "network connected ipsec accelerator configured using IP packets." > No. This is incorrect. > Where did you get that from? That is what you described to me - you said the only way to configure the FPGA was via IP packets in-band. It is not part of the NIC and the NIC only loads the FPGA bitstream. Maybe you should explain more how this works? > So you configure it from userspace with regular IPSec 'ip xfrm > state' commands or over netlink with your favorite IKE daemon. But you don't give an ip xfrm configuration to the NIC when you submit the work request? By your description it sounded liked the FPGA pattern matches packets from the NIC side to apply the xfrm. Jason
> -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > Subject: Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova > > On Sun, Jun 04, 2017 at 07:51:24AM +0000, Ilan Tayari wrote: > > > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > > > Subject: Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for > Innova > > > > > > On Mon, May 29, 2017 at 04:09:06PM +0000, Ilan Tayari wrote: > > > > > > > > For IPSec, this is already in the kernel. > > > > > See this patchset: > > > > > http://www.mail-archive.com/netdev@vger.kernel.org/msg162876.html > > > > > > > > Sorry, I pointed at the RFC by mistake. > > > > > > > > This is the relevant pull request: > > > > https://patchwork.ozlabs.org/patch/752707/ > > > > > > This is connecting ipsec to a netdev, while Innova seems to be a > > > > Jason, > > > > "network connected ipsec accelerator configured using IP packets." > > No. This is incorrect. > > Where did you get that from? > > That is what you described to me - you said the only way to configure > the FPGA was via IP packets in-band. It is not part of the NIC and the > NIC only loads the FPGA bitstream. > > Maybe you should explain more how this works? I think this is just a misunderstanding. The FPGA is part of this NIC, not something external. The fact that we configure the FPGA using special inband packets isn't changing anything. IMO, it might have been any other bus on the card, standard or proprietary, and the arguments for how to design the driver would stay the same. Note that these packets are not generated as IP packets at the host. They are RoCE packets. The command payload is generated by the driver, then the QP and RoCE headers are generates by the ConnectX ASIC. The packet is then consumed by the FPGA. On the way back, the response is generated by the FPGA, and consumed by the ConnectX ASIC, with the response payload delivered to the QP in the driver. So neither the host stack nor the network are aware of them. They exist momentarily only on the internal traces on the board and not anywhere else. > > > So you configure it from userspace with regular IPSec 'ip xfrm > > state' commands or over netlink with your favorite IKE daemon. > > But you don't give an ip xfrm configuration to the NIC when you submit > the work request? By your description it sounded liked the FPGA > pattern matches packets from the NIC side to apply the xfrm. > > Jason I didn't want to dig into this, because it's not submitted yet. But here's an example configuration flow for creating an IPSec SA: * User gives 'ip xfrm state add' command in shell, with 'offload' argument * iproute2 sends XFRM_MSG_NEWSA netlink message with XFRMA_OFFLOAD_DEV attribute * Xfrm stack generates a new xfrm_state object and calls the netdevice's xdo_dev_state_add callback * mlx5e driver formats HW-specific ADD_SA command packet and sends it over RoCE QP to FPGA * FPGA adds the SA to its offloaded SADB, and sends a response packet for 'success' That's pretty much it. Does this help to understand what goes on? I don't mind explaining further, but I think you will just see it in the patchset when we submit. Ilan.
From: Ilan Tayari <ilant@mellanox.com> Date: Tue, 6 Jun 2017 06:52:15 +0000 > The fact that we configure the FPGA using special inband packets isn't > changing anything. IMO, it might have been any other bus on the card, > standard or proprietary, and the arguments for how to design the driver > would stay the same. +1 > So neither the host stack nor the network are aware of them. > They exist momentarily only on the internal traces on the board and not > anywhere else. +1
On Tue, Jun 06, 2017 at 06:52:15AM +0000, Ilan Tayari wrote: > So neither the host stack nor the network are aware of them. > They exist momentarily only on the internal traces on the board and not > anywhere else. Is that really true? If you are creating rocee QPs' then the RDMA stack sees this stuff and now we have buried a RDMA ULP inside an ethernet driver which seems really wonky.. > I don't mind explaining further, but I think you will just see it in the > patchset when we submit. You described exactly what I thought.. I just disagree with you that an ethernet connected and controlled IP accelerator is 'part of the NIC', even if it happens to be colocated on the same circuit board. Jason
On Tue, Jun 06, 2017 at 10:17:09AM -0600, Jason Gunthorpe wrote: > On Tue, Jun 06, 2017 at 06:52:15AM +0000, Ilan Tayari wrote: > > > So neither the host stack nor the network are aware of them. > > They exist momentarily only on the internal traces on the board and not > > anywhere else. > > Is that really true? If you are creating rocee QPs' then the RDMA > stack sees this stuff and now we have buried a RDMA ULP inside an > ethernet driver which seems really wonky.. > > > I don't mind explaining further, but I think you will just see it in the > > patchset when we submit. > > You described exactly what I thought.. I just disagree with you that > an ethernet connected and controlled IP accelerator is 'part of the > NIC', even if it happens to be colocated on the same circuit board. +1 what Ilan described is a kernel bypass done by hw. This is non starter in production. Same as eswitch this fpga is not represented as a kernel object, there is no way to debug it. NIC crafts roce packets back and forth?! so it's like rdma, but without using kernel rdma stack? When hw ipsec or tls will mysteriously drop or mangle the packets how this can be debugged? Does fpga have attached ddr to store/forward the packets? How memory issues will be reported? No MCE errors ever? Buffer overflow? How many receive queues inside fpga? How health check of fgpa itself will be done? Through roce packets? I would buy the lack of kernel visibility if this fpga+nic combo was a prototype, but it's being presented as a production device with subsequent changes to core networking stack and that's where I have a problem with its sw architecture.
From: Alexei Starovoitov <alexei.starovoitov@gmail.com> Date: Tue, 6 Jun 2017 10:42:35 -0700 > so it's like rdma, but without using kernel rdma stack? No sockets here, just transformation rules. It's like offloading a complex TC rule to hardware version of that transformation. Yes, there is state, but I argue that it is no different than TC offloading rules. What if TC had "hash" and "crypt" operations and we attached them to appropriate u32 matches? You wouldn't be able to tell the difference. I think you are way over-obsessed with this FPGA offload thing, quite frankly.
On Tue, Jun 06, 2017 at 01:47:26PM -0400, David Miller wrote: > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Date: Tue, 6 Jun 2017 10:42:35 -0700 > > > so it's like rdma, but without using kernel rdma stack? > > No sockets here, just transformation rules. It's like offloading > a complex TC rule to hardware version of that transformation. > > Yes, there is state, but I argue that it is no different than TC > offloading rules. What if TC had "hash" and "crypt" operations > and we attached them to appropriate u32 matches? You wouldn't > be able to tell the difference. there is huge difference in underlying hw. fpga is a separate device with its own phy and mac layers, its own queues, packet parsing and rdma logic. Where as tc offload is happening within the same hw queues/memory/stats management logic. My understanding that when I do 'ethtool -L' to change number of queues or 'ethtool -G' it changes the memory layout that tc offload is operating on as well. When I do 'ethtool -S' it shows me the stats for the device that tc offload rules are integral part of. Whereas fpga is a different physical device with its own buffers and such. We can add 'ethtool -G_fpga, -L_fpga', etc but this type of discussion needs to happen _before_ the whole thing is merged. It will never happen after the fact. Just look at mlx responses, they still don't acknowledge the issue and instead pushing for ipsec, tls (in other words: new features) instead of addressing production issues that are obviously not glamorous to work on and fix. > I think you are way over-obsessed with this FPGA offload thing, > quite frankly. if we didn't have issues with eswitch that drops packets and we don't even know how many, I wouldn't be complaining. There is a discussion going on to add few counters for eswitch visibility, but it's taking forever and it's not at the point of exposing eswitch as a kernel object. Why? because it's hard to refactor it now into something like devlink or whatever new abstraction that would be needed.
From: Alexei Starovoitov <alexei.starovoitov@gmail.com> Date: Tue, 6 Jun 2017 11:34:59 -0700 > fpga is a separate device with its own phy and mac layers, its > own queues, packet parsing and rdma logic. Because that's how they bolted it onto the ASIC in current implementation, it might not always be that way and be fully integrated in the future. And I stress the word "implementation" as in "implementation detail" the visible behavior is going to be the same, the difference is how the thing is hooked up and maybe how you program it.
On Tue, Jun 06, 2017 at 02:38:24PM -0400, David Miller wrote: > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Date: Tue, 6 Jun 2017 11:34:59 -0700 > > > fpga is a separate device with its own phy and mac layers, its > > own queues, packet parsing and rdma logic. > > Because that's how they bolted it onto the ASIC in current > implementation, it might not always be that way and be fully > integrated in the future. > > And I stress the word "implementation" as in "implementation detail" > the visible behavior is going to be the same, the difference is how > the thing is hooked up and maybe how you program it. whether fpga is a separate chip or part of the same asic makes no difference. They are still different devices from sw point of view. If in the future mlx will make it into the nic in a way that encryption shares all memory management logic and there is no fpga at all then it indeed will be similar to tc offload. Right now it's not and needs different sw architecture.
From: Alexei Starovoitov <alexei.starovoitov@gmail.com> Date: Tue, 6 Jun 2017 11:55:33 -0700 > If in the future mlx will make it into the nic in a way that > encryption shares all memory management logic and there is no fpga > at all then it indeed will be similar to tc offload. Right now it's > not and needs different sw architecture. If the visible effect is identical, I fundamentally disagree with you. I don't care if there is a frog sitting on the PHY that transforms the packets, it's all the same if the visible behavior is identical.
On Tue, Jun 06, 2017 at 03:01:51PM -0400, David Miller wrote: > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Date: Tue, 6 Jun 2017 11:55:33 -0700 > > > If in the future mlx will make it into the nic in a way that > > encryption shares all memory management logic and there is no fpga > > at all then it indeed will be similar to tc offload. Right now it's > > not and needs different sw architecture. > > If the visible effect is identical, I fundamentally disagree with you. > > I don't care if there is a frog sitting on the PHY that transforms > the packets, it's all the same if the visible behavior is identical. that frog is a good example why we disagree. I need to check the pulse of that frog and last time it ate. In production I cannot have magical creatures do stuff for me. I need to monitor all components, debug and mitigate the issues. If encryption is done by the nic, I get all the monitoring and debugging as part of the standard tools. When it's a frog hidden by the nic, I cannot do much when the fire erupts, hence frog and production environment don't mix. To move things forward... how about marking the whole thing CONFIG_EXPERIMENTAL instead of revert? Right now it's effectively non-production==experimental code and I want to make it clear.
On Tue, Jun 06, 2017 at 03:44:53PM -0700, Alexei Starovoitov wrote: > On Tue, Jun 06, 2017 at 03:01:51PM -0400, David Miller wrote: > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > > Date: Tue, 6 Jun 2017 11:55:33 -0700 > > > > > If in the future mlx will make it into the nic in a way that > > > encryption shares all memory management logic and there is no fpga > > > at all then it indeed will be similar to tc offload. Right now it's > > > not and needs different sw architecture. > > > > If the visible effect is identical, I fundamentally disagree with you. > > > > I don't care if there is a frog sitting on the PHY that transforms > > the packets, it's all the same if the visible behavior is identical. > > that frog is a good example why we disagree. > I need to check the pulse of that frog and last time it ate. It is probably over-engineered for a single frog, but maybe you could use a modified RFC 2795? Andrew
On Wed, Jun 7, 2017 at 1:44 AM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Tue, Jun 06, 2017 at 03:01:51PM -0400, David Miller wrote: >> From: Alexei Starovoitov <alexei.starovoitov@gmail.com> >> Date: Tue, 6 Jun 2017 11:55:33 -0700 >> >> > If in the future mlx will make it into the nic in a way that >> > encryption shares all memory management logic and there is no fpga >> > at all then it indeed will be similar to tc offload. Right now it's >> > not and needs different sw architecture. >> >> If the visible effect is identical, I fundamentally disagree with you. >> >> I don't care if there is a frog sitting on the PHY that transforms >> the packets, it's all the same if the visible behavior is identical. > > that frog is a good example why we disagree. > I need to check the pulse of that frog and last time it ate. > In production I cannot have magical creatures do stuff for me. > I need to monitor all components, debug and mitigate the issues. Every HW vendor has his own magical creatures. you don't want just to have a kernel object representative for every HW unit! that's just not scalable. > If encryption is done by the nic, I get all the monitoring and > debugging as part of the standard tools. When it's a frog > hidden by the nic, I cannot do much when the fire erupts, > hence frog and production environment don't mix. > To move things forward... Let's assume there was no FPGA but the ASIC provided the encryption feature, and still a fire erupts, what would you have done differently ? Only the vendor will know how to debug regardless of what creature went nuts! Frog or a Cat the kernel shouldn't care much "it is all implementation details (HW implementation)", we should use the entry point to this device (PCI device/netdevice) as an abstraction point, and standardizing visibility/debuggability should be per 'kernel<->stack<->driver<->HW' features/flows/transactions. Having a full driver<->HW visibility/standardization eliminates the need of vendor specific drivers and each vendor should implement only standard HWs that can work with generic drivers. Re debuggability in ConnectX architecture, there is a well defined health reporting and monitoring mechanism between FW and driver. and it is up to the FW to check the pulse of every frog it is babysitting and report back to the driver. believe me you don't want the driver to know about every single wire in the chip. > how about marking the whole thing CONFIG_EXPERIMENTAL instead of revert? > Right now it's effectively non-production==experimental code and > I want to make it clear. >
On Tue, Jun 6, 2017 at 7:17 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Tue, Jun 06, 2017 at 06:52:15AM +0000, Ilan Tayari wrote: > >> So neither the host stack nor the network are aware of them. >> They exist momentarily only on the internal traces on the board and not >> anywhere else. > > Is that really true? If you are creating rocee QPs' then the RDMA > stack sees this stuff and now we have buried a RDMA ULP inside an > ethernet driver which seems really wonky.. It is not an ethernet driver, mlx5_core provides both RDMA and ethernet interfaces to both mlx5_ib and the mlx5e netdevice. so it is perfectly capable of creating QPs on its own, after all it is the one creating QPs for the RDMA stack :). rdma_create_qp->mlx5_ib_create_qp->mlx5_core_create_qp. > >> I don't mind explaining further, but I think you will just see it in the >> patchset when we submit. > > You described exactly what I thought.. I just disagree with you that > an ethernet connected and controlled IP accelerator is 'part of the > NIC', even if it happens to be colocated on the same circuit board. > > Jason
On Wed, Jun 07, 2017 at 07:16:42AM +0300, Saeed Mahameed wrote: > On Tue, Jun 6, 2017 at 7:17 PM, Jason Gunthorpe > <jgunthorpe@obsidianresearch.com> wrote: > > On Tue, Jun 06, 2017 at 06:52:15AM +0000, Ilan Tayari wrote: > > > >> So neither the host stack nor the network are aware of them. > >> They exist momentarily only on the internal traces on the board and not > >> anywhere else. > > > > Is that really true? If you are creating rocee QPs' then the RDMA > > stack sees this stuff and now we have buried a RDMA ULP inside an > > ethernet driver which seems really wonky.. > > It is not an ethernet driver, mlx5_core provides both RDMA and > ethernet interfaces to both mlx5_ib and the mlx5e netdevice. > > so it is perfectly capable of creating QPs on its own, after all it is > the one creating QPs for the RDMA stack :). > > rdma_create_qp->mlx5_ib_create_qp->mlx5_core_create_qp. Wait, so you built a RDMA ULP inside your driver without using the RDMA API? This keep getting more ugly :( What about security? What if user space sends some raw packets to the FPGA - can it reprogram the ISPEC settings or worse? Jason
On Wed, Jun 7, 2017 at 6:48 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Wed, Jun 07, 2017 at 07:16:42AM +0300, Saeed Mahameed wrote: >> On Tue, Jun 6, 2017 at 7:17 PM, Jason Gunthorpe >> <jgunthorpe@obsidianresearch.com> wrote: >> > On Tue, Jun 06, 2017 at 06:52:15AM +0000, Ilan Tayari wrote: >> > >> >> So neither the host stack nor the network are aware of them. >> >> They exist momentarily only on the internal traces on the board and not >> >> anywhere else. >> > >> > Is that really true? If you are creating rocee QPs' then the RDMA >> > stack sees this stuff and now we have buried a RDMA ULP inside an >> > ethernet driver which seems really wonky.. >> >> It is not an ethernet driver, mlx5_core provides both RDMA and >> ethernet interfaces to both mlx5_ib and the mlx5e netdevice. >> >> so it is perfectly capable of creating QPs on its own, after all it is >> the one creating QPs for the RDMA stack :). >> >> rdma_create_qp->mlx5_ib_create_qp->mlx5_core_create_qp. > > Wait, so you built a RDMA ULP inside your driver without using the > RDMA API? > No !! I am just showing you that the ib_core eventually will end up calling mlx5_core to create a QP. so mlx5_core can create the QP it self since it is the one eventually creating QPs. we just call mlx5_core_create_qp directly. > This keep getting more ugly :( > > What about security? What if user space sends some raw packets to the > FPGA - can it reprogram the ISPEC settings or worse? > No such thing. This QP is only for internal driver/HW communications, as it is faster from the existing command interface. it is not meant to be exposed for any raw user space usages at all, without proper standard API adapter of course. > Jason
On Wed, Jun 07, 2017 at 10:13:43PM +0300, Saeed Mahameed wrote: > On Wed, Jun 7, 2017 at 6:48 PM, Jason Gunthorpe > <jgunthorpe@obsidianresearch.com> wrote: > > On Wed, Jun 07, 2017 at 07:16:42AM +0300, Saeed Mahameed wrote: > >> On Tue, Jun 6, 2017 at 7:17 PM, Jason Gunthorpe > >> <jgunthorpe@obsidianresearch.com> wrote: > >> > On Tue, Jun 06, 2017 at 06:52:15AM +0000, Ilan Tayari wrote: > >> > > >> >> So neither the host stack nor the network are aware of them. > >> >> They exist momentarily only on the internal traces on the board and not > >> >> anywhere else. > >> > > >> > Is that really true? If you are creating rocee QPs' then the RDMA > >> > stack sees this stuff and now we have buried a RDMA ULP inside an > >> > ethernet driver which seems really wonky.. > >> > >> It is not an ethernet driver, mlx5_core provides both RDMA and > >> ethernet interfaces to both mlx5_ib and the mlx5e netdevice. > >> > >> so it is perfectly capable of creating QPs on its own, after all it is > >> the one creating QPs for the RDMA stack :). > >> > >> rdma_create_qp->mlx5_ib_create_qp->mlx5_core_create_qp. > > > > Wait, so you built a RDMA ULP inside your driver without using the > > RDMA API? > > > > No !! > I am just showing you that the ib_core eventually will end up calling > mlx5_core to create a QP. > so mlx5_core can create the QP it self since it is the one eventually > creating QPs. > we just call mlx5_core_create_qp directly. Which is building a RDMA ULP inside a driver without using the core code :( > > This keep getting more ugly :( > > > > What about security? What if user space sends some raw packets to the > > FPGA - can it reprogram the ISPEC settings or worse? > > > > No such thing. This QP is only for internal driver/HW communications, > as it is faster from the existing command interface. > it is not meant to be exposed for any raw user space usages at all, > without proper standard API adapter of course. I'm not asking about the QP, I'm asking what happens after the NIC part. You use ROCE packets to control the FPGA. What prevents userspace from forcibly constructing roce packets and sending them to the FPGA. How does the FPGA know for certain the packet came from the kernel QP and not someplace else. This is especially true for mlx nics as there are many raw packet bypass mechanisms available to userspace. Jason
On Wed, 2017-06-07 at 13:21 -0600, Jason Gunthorpe wrote: > On Wed, Jun 07, 2017 at 10:13:43PM +0300, Saeed Mahameed wrote: > > > > No !! > > I am just showing you that the ib_core eventually will end up > > calling > > mlx5_core to create a QP. > > so mlx5_core can create the QP it self since it is the one > > eventually > > creating QPs. > > we just call mlx5_core_create_qp directly. > > Which is building a RDMA ULP inside a driver without using the core > code :( Aren't the transmit/receive queues of the Ethernet netdevice on mlx4/mlx5 hardware QPs too? Those bypass the RDMA subsystem entirely. Just because something uses a QP on hardware that does *everything* via QPs doesn't necessarily mean it must go through the RDMA subsystem. Now, the fact that the content of the packets is basically a RoCE packet does make things a bit fuzzier, but if their packets are specially crafted RoCE packets that aren't really intended to be fully RoCE spec compliant (maybe they don't support all the options as normal RoCE QPs), then I can see hiding them from the larger RoCE portion of the RDMA stack. > > > > > > > > This keep getting more ugly :( > > > > > > What about security? What if user space sends some raw packets to > > > the > > > FPGA - can it reprogram the ISPEC settings or worse? > > > > > > > No such thing. This QP is only for internal driver/HW > > communications, > > as it is faster from the existing command interface. > > it is not meant to be exposed for any raw user space usages at all, > > without proper standard API adapter of course. > > I'm not asking about the QP, I'm asking what happens after the NIC > part. You use ROCE packets to control the FPGA. What prevents > userspace from forcibly constructing roce packets and sending them to > the FPGA. How does the FPGA know for certain the packet came from the > kernel QP and not someplace else. This is a valid concern. > This is especially true for mlx nics as there are many raw packet > bypass mechanisms available to userspace. Right. The question becomes: Does the firmware filter outgoing raw ETH QPs such that a nefarious user could not send a crafted RoCE packet that the bump on the wire would intercept and accept?
> On Jun 10, 2017, at 1:24 AM, Doug Ledford <dledford@redhat.com> wrote: > >> On Wed, 2017-06-07 at 13:21 -0600, Jason Gunthorpe wrote: >>> On Wed, Jun 07, 2017 at 10:13:43PM +0300, Saeed Mahameed wrote: >>> >>> No !! >>> I am just showing you that the ib_core eventually will end up >>> calling >>> mlx5_core to create a QP. >>> so mlx5_core can create the QP it self since it is the one >>> eventually >>> creating QPs. >>> we just call mlx5_core_create_qp directly. >> >> Which is building a RDMA ULP inside a driver without using the core >> code :( > > Aren't the transmit/receive queues of the Ethernet netdevice on > mlx4/mlx5 hardware QPs too? Those bypass the RDMA subsystem entirely. > Just because something uses a QP on hardware that does *everything* > via QPs doesn't necessarily mean it must go through the RDMA subsystem. > > Now, the fact that the content of the packets is basically a RoCE > packet does make things a bit fuzzier, but if their packets are > specially crafted RoCE packets that aren't really intended to be fully > RoCE spec compliant (maybe they don't support all the options as normal > RoCE QPs), then I can see hiding them from the larger RoCE portion of > the RDMA stack. > >>> >>>> >>>> This keep getting more ugly :( >>>> >>>> What about security? What if user space sends some raw packets to >>>> the >>>> FPGA - can it reprogram the ISPEC settings or worse? >>>> >>> >>> No such thing. This QP is only for internal driver/HW >>> communications, >>> as it is faster from the existing command interface. >>> it is not meant to be exposed for any raw user space usages at all, >>> without proper standard API adapter of course. >> >> I'm not asking about the QP, I'm asking what happens after the NIC >> part. You use ROCE packets to control the FPGA. What prevents >> userspace from forcibly constructing roce packets and sending them to >> the FPGA. How does the FPGA know for certain the packet came from the >> kernel QP and not someplace else. > > This is a valid concern. > >> This is especially true for mlx nics as there are many raw packet >> bypass mechanisms available to userspace. > All of the Raw packet bypass mechanisms are restricted to CAP_NET_RAW, and thus malicious users can't simply open a RAW Packet QP and send it to the FPGA.. > Right. The question becomes: Does the firmware filter outgoing raw ETH > QPs such that a nefarious user could not send a crafted RoCE packet > that the bump on the wire would intercept and accept? > > -- > Doug Ledford <dledford@redhat.com> > GPG KeyID: B826A3330E572FDD > > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe@obsidianresearch.com] > Subject: Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova > > > > This keep getting more ugly :( > > > > > > What about security? What if user space sends some raw packets to the > > > FPGA - can it reprogram the ISPEC settings or worse? > > > > > > > No such thing. This QP is only for internal driver/HW communications, > > as it is faster from the existing command interface. > > it is not meant to be exposed for any raw user space usages at all, > > without proper standard API adapter of course. > > I'm not asking about the QP, I'm asking what happens after the NIC > part. You use ROCE packets to control the FPGA. What prevents > userspace from forcibly constructing roce packets and sending them to > the FPGA. How does the FPGA know for certain the packet came from the > kernel QP and not someplace else. > > This is especially true for mlx nics as there are many raw packet > bypass mechanisms available to userspace. Hi Jason, The device uses internal signaling that ensures that no entity other than the mlx5 driver can talk over the FPGA channel. This is also the reason why this is not a "ULP in a driver", but rather an internal bus that happens to use some of our existing HW features. As explained earlier, this "bus" is an internal device implementation issue, and has nothing to do with the network or RDMA stack. Ilan.
On Sun, Jun 11, 2017 at 05:59:04AM +0000, Ilan Tayari wrote: > > This is especially true for mlx nics as there are many raw packet > > bypass mechanisms available to userspace. > > The device uses internal signaling that ensures that no entity other > than the mlx5 driver can talk over the FPGA channel. This is also > the reason why this is not a "ULP in a driver", but rather an > internal bus that happens to use some of our existing HW features. You are taking both positions at once - that this is *just* a 'bump on the wire' and everything is done via ethernet packets, and also that there is a special internal bus that nothing needs to know about. It is hard to be both ways, if it is ethernet packets there there are ways to create those packets outside the driver's control and security is a big question. Jason
On Sat, Jun 10, 2017 at 02:11:13PM +0000, Majd Dibbiny wrote: > >> This is especially true for mlx nics as there are many raw packet > >> bypass mechanisms available to userspace. > All of the Raw packet bypass mechanisms are restricted to > CAP_NET_RAW, and thus malicious users can't simply open a RAW Packet > QP and send it to the FPGA.. It is big expansion of CAP_NET_RAW to also basically also include reconfiguring ipsec xfrm. Plus, if someone configures ethernet bridging (eg in a VM situation) then could a hacked VM reconfigure this FPGA? Jason
On Mon, Jun 12, 2017 at 7:17 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Sat, Jun 10, 2017 at 02:11:13PM +0000, Majd Dibbiny wrote: > >> >> This is especially true for mlx nics as there are many raw packet >> >> bypass mechanisms available to userspace. > >> All of the Raw packet bypass mechanisms are restricted to >> CAP_NET_RAW, and thus malicious users can't simply open a RAW Packet >> QP and send it to the FPGA.. > > It is big expansion of CAP_NET_RAW to also basically also include > reconfiguring ipsec xfrm. > > Plus, if someone configures ethernet bridging (eg in a VM situation) > then could a hacked VM reconfigure this FPGA? > > Jason Hi Jason As we mentioned earlier, the device ensures that only the FPGA component in the driver can configure the FPGA regardless of any type of standard traffic. Thanks, Saeed.
diff --git a/MAINTAINERS b/MAINTAINERS index f7d568b8f133..374ebf1b5d5d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8304,6 +8304,16 @@ W: http://www.mellanox.com Q: http://patchwork.ozlabs.org/project/netdev/list/ F: drivers/net/ethernet/mellanox/mlx5/core/en_* +MELLANOX ETHERNET INNOVA DRIVER +M: Ilan Tayari <ilant@mellanox.com> +R: Boris Pismenny <borisp@mellanox.com> +L: netdev@vger.kernel.org +S: Supported +W: http://www.mellanox.com +Q: http://patchwork.ozlabs.org/project/netdev/list/ +F: drivers/net/ethernet/mellanox/mlx5/core/fpga/* +F: include/linux/mlx5/mlx5_ifc_fpga.h + MELLANOX ETHERNET SWITCH DRIVERS M: Jiri Pirko <jiri@mellanox.com> M: Ido Schimmel <idosch@mellanox.com> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig index fc52d742b7f7..28cf88483ca4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig @@ -11,6 +11,16 @@ config MLX5_CORE Core driver for low level functionality of the ConnectX-4 and Connect-IB cards by Mellanox Technologies. +config MLX5_FPGA + bool "Mellanox Technologies Innova support" + depends on MLX5_CORE + ---help--- + Build support for the Innova family of network cards by Mellanox + Technologies. Innova network cards are comprised of a ConnectX chip + and an FPGA chip on one board. If you select this option, the + mlx5_core driver will include the Innova FPGA core and allow building + sandbox-specific client drivers. + config MLX5_CORE_EN bool "Mellanox Technologies ConnectX-4 Ethernet support" depends on NETDEVICES && ETHERNET && PCI && MLX5_CORE diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 9e644615f07a..12556c03eec4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -1,10 +1,13 @@ obj-$(CONFIG_MLX5_CORE) += mlx5_core.o +subdir-ccflags-y += -I$(src) mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \ health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \ mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \ fs_counters.o rl.o lag.o dev.o +mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o + mlx5_core-$(CONFIG_MLX5_CORE_EN) += wq.o eswitch.o eswitch_offloads.o \ en_main.o en_common.o en_fs.o en_ethtool.o en_tx.o \ en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c index df0034d8f48c..01d2cd7e4746 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c @@ -35,6 +35,7 @@ #include <linux/mlx5/driver.h> #include <linux/mlx5/cmd.h> #include "mlx5_core.h" +#include "fpga/core.h" #ifdef CONFIG_MLX5_CORE_EN #include "eswitch.h" #endif @@ -156,6 +157,8 @@ static const char *eqe_type_str(u8 type) return "MLX5_EVENT_TYPE_PAGE_FAULT"; case MLX5_EVENT_TYPE_PPS_EVENT: return "MLX5_EVENT_TYPE_PPS_EVENT"; + case MLX5_EVENT_TYPE_FPGA_ERROR: + return "MLX5_EVENT_TYPE_FPGA_ERROR"; default: return "Unrecognized event"; } @@ -476,6 +479,11 @@ static irqreturn_t mlx5_eq_int(int irq, void *eq_ptr) if (dev->event) dev->event(dev, MLX5_DEV_EVENT_PPS, (unsigned long)eqe); break; + + case MLX5_EVENT_TYPE_FPGA_ERROR: + mlx5_fpga_event(dev, eqe->type, &eqe->data.raw); + break; + default: mlx5_core_warn(dev, "Unhandled event 0x%x on EQ 0x%x\n", eqe->type, eq->eqn); @@ -693,6 +701,9 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev) if (MLX5_CAP_GEN(dev, pps)) async_event_mask |= (1ull << MLX5_EVENT_TYPE_PPS_EVENT); + if (MLX5_CAP_GEN(dev, fpga)) + async_event_mask |= (1ull << MLX5_EVENT_TYPE_FPGA_ERROR); + err = mlx5_create_map_eq(dev, &table->cmd_eq, MLX5_EQ_VEC_CMD, MLX5_NUM_CMD_EQE, 1ull << MLX5_EVENT_TYPE_CMD, "mlx5_cmd_eq", MLX5_EQ_TYPE_ASYNC); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.c new file mode 100644 index 000000000000..99cba644b4fc --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.c @@ -0,0 +1,64 @@ +/* + * Copyright (c) 2017, Mellanox Technologies. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include <linux/etherdevice.h> +#include <linux/mlx5/cmd.h> +#include <linux/mlx5/driver.h> + +#include "mlx5_core.h" +#include "fpga/cmd.h" + +int mlx5_fpga_caps(struct mlx5_core_dev *dev, u32 *caps) +{ + u32 in[MLX5_ST_SZ_DW(fpga_cap)] = {0}; + + return mlx5_core_access_reg(dev, in, sizeof(in), caps, + MLX5_ST_SZ_BYTES(fpga_cap), + MLX5_REG_FPGA_CAP, 0, 0); +} + +int mlx5_fpga_query(struct mlx5_core_dev *dev, struct mlx5_fpga_query *query) +{ + u32 in[MLX5_ST_SZ_DW(fpga_ctrl)] = {0}; + u32 out[MLX5_ST_SZ_DW(fpga_ctrl)]; + int err; + + err = mlx5_core_access_reg(dev, in, sizeof(in), out, sizeof(out), + MLX5_REG_FPGA_CTRL, 0, false); + if (err) + return err; + + query->status = MLX5_GET(fpga_ctrl, out, status); + query->admin_image = MLX5_GET(fpga_ctrl, out, flash_select_admin); + query->oper_image = MLX5_GET(fpga_ctrl, out, flash_select_oper); + return 0; +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.h b/drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.h new file mode 100644 index 000000000000..a74396a61bc3 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/cmd.h @@ -0,0 +1,59 @@ +/* + * Copyright (c) 2017, Mellanox Technologies, Ltd. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef __MLX5_FPGA_H__ +#define __MLX5_FPGA_H__ + +#include <linux/mlx5/driver.h> + +enum mlx5_fpga_image { + MLX5_FPGA_IMAGE_USER = 0, + MLX5_FPGA_IMAGE_FACTORY, +}; + +enum mlx5_fpga_status { + MLX5_FPGA_STATUS_SUCCESS = 0, + MLX5_FPGA_STATUS_FAILURE = 1, + MLX5_FPGA_STATUS_IN_PROGRESS = 2, + MLX5_FPGA_STATUS_NONE = 0xFFFF, +}; + +struct mlx5_fpga_query { + enum mlx5_fpga_image admin_image; + enum mlx5_fpga_image oper_image; + enum mlx5_fpga_status status; +}; + +int mlx5_fpga_caps(struct mlx5_core_dev *dev, u32 *caps); +int mlx5_fpga_query(struct mlx5_core_dev *dev, struct mlx5_fpga_query *query); + +#endif /* __MLX5_FPGA_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c new file mode 100644 index 000000000000..d88b332e9669 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c @@ -0,0 +1,202 @@ +/* + * Copyright (c) 2017, Mellanox Technologies. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include <linux/module.h> +#include <linux/etherdevice.h> +#include <linux/mlx5/driver.h> + +#include "mlx5_core.h" +#include "fpga/core.h" + +static const char *const mlx5_fpga_error_strings[] = { + "Null Syndrome", + "Corrupted DDR", + "Flash Timeout", + "Internal Link Error", + "Watchdog HW Failure", + "I2C Failure", + "Image Changed", + "Temperature Critical", +}; + +static struct mlx5_fpga_device *mlx5_fpga_device_alloc(void) +{ + struct mlx5_fpga_device *fdev = NULL; + + fdev = kzalloc(sizeof(*fdev), GFP_KERNEL); + if (!fdev) + return NULL; + + spin_lock_init(&fdev->state_lock); + fdev->state = MLX5_FPGA_STATUS_NONE; + return fdev; +} + +static const char *mlx5_fpga_image_name(enum mlx5_fpga_image image) +{ + switch (image) { + case MLX5_FPGA_IMAGE_USER: + return "user"; + case MLX5_FPGA_IMAGE_FACTORY: + return "factory"; + default: + return "unknown"; + } +} + +static int mlx5_fpga_device_load_check(struct mlx5_fpga_device *fdev) +{ + struct mlx5_fpga_query query; + int err; + + err = mlx5_fpga_query(fdev->mdev, &query); + if (err) { + mlx5_fpga_err(fdev, "Failed to query status: %d\n", err); + return err; + } + + fdev->last_admin_image = query.admin_image; + fdev->last_oper_image = query.oper_image; + + mlx5_fpga_dbg(fdev, "Status %u; Admin image %u; Oper image %u\n", + query.status, query.admin_image, query.oper_image); + + if (query.status != MLX5_FPGA_STATUS_SUCCESS) { + mlx5_fpga_err(fdev, "%s image failed to load; status %u\n", + mlx5_fpga_image_name(fdev->last_oper_image), + query.status); + return -EIO; + } + + return 0; +} + +int mlx5_fpga_device_start(struct mlx5_core_dev *mdev) +{ + struct mlx5_fpga_device *fdev = mdev->fpga; + unsigned long flags; + int err; + + if (!fdev) + return 0; + + err = mlx5_fpga_device_load_check(fdev); + if (err) + goto out; + + err = mlx5_fpga_caps(fdev->mdev, + fdev->mdev->caps.hca_cur[MLX5_CAP_FPGA]); + if (err) + goto out; + + mlx5_fpga_info(fdev, "device %u; %s image, version %u\n", + MLX5_CAP_FPGA(fdev->mdev, fpga_device), + mlx5_fpga_image_name(fdev->last_oper_image), + MLX5_CAP_FPGA(fdev->mdev, image_version)); + +out: + spin_lock_irqsave(&fdev->state_lock, flags); + fdev->state = err ? MLX5_FPGA_STATUS_FAILURE : MLX5_FPGA_STATUS_SUCCESS; + spin_unlock_irqrestore(&fdev->state_lock, flags); + return err; +} + +int mlx5_fpga_device_init(struct mlx5_core_dev *mdev) +{ + struct mlx5_fpga_device *fdev = NULL; + + if (!MLX5_CAP_GEN(mdev, fpga)) { + mlx5_core_dbg(mdev, "FPGA capability not present\n"); + return 0; + } + + mlx5_core_dbg(mdev, "Initializing FPGA\n"); + + fdev = mlx5_fpga_device_alloc(); + if (!fdev) + return -ENOMEM; + + fdev->mdev = mdev; + mdev->fpga = fdev; + + return 0; +} + +void mlx5_fpga_device_cleanup(struct mlx5_core_dev *mdev) +{ + kfree(mdev->fpga); + mdev->fpga = NULL; +} + +static const char *mlx5_fpga_syndrome_to_string(u8 syndrome) +{ + if (syndrome < ARRAY_SIZE(mlx5_fpga_error_strings)) + return mlx5_fpga_error_strings[syndrome]; + return "Unknown"; +} + +void mlx5_fpga_event(struct mlx5_core_dev *mdev, u8 event, void *data) +{ + struct mlx5_fpga_device *fdev = mdev->fpga; + const char *event_name; + bool teardown = false; + unsigned long flags; + u8 syndrome; + + if (event != MLX5_EVENT_TYPE_FPGA_ERROR) { + mlx5_fpga_warn_ratelimited(fdev, "Unexpected event %u\n", + event); + return; + } + + syndrome = MLX5_GET(fpga_error_event, data, syndrome); + event_name = mlx5_fpga_syndrome_to_string(syndrome); + + spin_lock_irqsave(&fdev->state_lock, flags); + switch (fdev->state) { + case MLX5_FPGA_STATUS_SUCCESS: + mlx5_fpga_warn(fdev, "Error %u: %s\n", syndrome, event_name); + teardown = true; + break; + default: + mlx5_fpga_warn_ratelimited(fdev, "Unexpected error event %u: %s\n", + syndrome, event_name); + } + spin_unlock_irqrestore(&fdev->state_lock, flags); + /* We tear-down the card's interfaces and functionality because + * the FPGA bump-on-the-wire is misbehaving and we lose ability + * to communicate with the network. User may still be able to + * recover by re-programming or debugging the FPGA + */ + if (teardown) + mlx5_trigger_health_work(fdev->mdev); +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.h b/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.h new file mode 100644 index 000000000000..c55044d66778 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/core.h @@ -0,0 +1,99 @@ +/* + * Copyright (c) 2017, Mellanox Technologies, Ltd. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef __MLX5_FPGA_CORE_H__ +#define __MLX5_FPGA_CORE_H__ + +#ifdef CONFIG_MLX5_FPGA + +#include "fpga/cmd.h" + +/* Represents an Innova device */ +struct mlx5_fpga_device { + struct mlx5_core_dev *mdev; + spinlock_t state_lock; /* Protects state transitions */ + enum mlx5_fpga_status state; + enum mlx5_fpga_image last_admin_image; + enum mlx5_fpga_image last_oper_image; +}; + +#define mlx5_fpga_dbg(__adev, format, ...) \ + dev_dbg(&(__adev)->mdev->pdev->dev, "FPGA: %s:%d:(pid %d): " format, \ + __func__, __LINE__, current->pid, ##__VA_ARGS__) + +#define mlx5_fpga_err(__adev, format, ...) \ + dev_err(&(__adev)->mdev->pdev->dev, "FPGA: %s:%d:(pid %d): " format, \ + __func__, __LINE__, current->pid, ##__VA_ARGS__) + +#define mlx5_fpga_warn(__adev, format, ...) \ + dev_warn(&(__adev)->mdev->pdev->dev, "FPGA: %s:%d:(pid %d): " format, \ + __func__, __LINE__, current->pid, ##__VA_ARGS__) + +#define mlx5_fpga_warn_ratelimited(__adev, format, ...) \ + dev_warn_ratelimited(&(__adev)->mdev->pdev->dev, "FPGA: %s:%d: " \ + format, __func__, __LINE__, ##__VA_ARGS__) + +#define mlx5_fpga_notice(__adev, format, ...) \ + dev_notice(&(__adev)->mdev->pdev->dev, "FPGA: " format, ##__VA_ARGS__) + +#define mlx5_fpga_info(__adev, format, ...) \ + dev_info(&(__adev)->mdev->pdev->dev, "FPGA: " format, ##__VA_ARGS__) + +int mlx5_fpga_device_init(struct mlx5_core_dev *mdev); +void mlx5_fpga_device_cleanup(struct mlx5_core_dev *mdev); +int mlx5_fpga_device_start(struct mlx5_core_dev *mdev); +void mlx5_fpga_event(struct mlx5_core_dev *mdev, u8 event, void *data); + +#else + +static inline int mlx5_fpga_device_init(struct mlx5_core_dev *mdev) +{ + return 0; +} + +static inline void mlx5_fpga_device_cleanup(struct mlx5_core_dev *mdev) +{ +} + +static inline int mlx5_fpga_device_start(struct mlx5_core_dev *mdev) +{ + return 0; +} + +static inline void mlx5_fpga_event(struct mlx5_core_dev *mdev, u8 event, + void *data) +{ +} + +#endif + +#endif /* __MLX5_FPGA_CORE_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index f933922d5cca..ad0202cef203 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -56,6 +56,7 @@ #ifdef CONFIG_MLX5_CORE_EN #include "eswitch.h" #endif +#include "fpga/core.h" MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>"); MODULE_DESCRIPTION("Mellanox Connect-IB, ConnectX-4 core driver"); @@ -1113,10 +1114,16 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv, goto err_disable_msix; } + err = mlx5_fpga_device_init(dev); + if (err) { + dev_err(&pdev->dev, "fpga device init failed %d\n", err); + goto err_put_uars; + } + err = mlx5_start_eqs(dev); if (err) { dev_err(&pdev->dev, "Failed to start pages and async EQs\n"); - goto err_put_uars; + goto err_fpga_init; } err = alloc_comp_eqs(dev); @@ -1147,6 +1154,12 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv, goto err_sriov; } + err = mlx5_fpga_device_start(dev); + if (err) { + dev_err(&pdev->dev, "fpga device start failed %d\n", err); + goto err_reg_dev; + } + if (mlx5_device_registered(dev)) { mlx5_attach_device(dev); } else { @@ -1182,6 +1195,9 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv, err_stop_eqs: mlx5_stop_eqs(dev); +err_fpga_init: + mlx5_fpga_device_cleanup(dev); + err_put_uars: mlx5_put_uars_page(dev, priv->uar); @@ -1246,6 +1262,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv, mlx5_irq_clear_affinity_hints(dev); free_comp_eqs(dev); mlx5_stop_eqs(dev); + mlx5_fpga_device_cleanup(dev); mlx5_put_uars_page(dev, priv->uar); mlx5_disable_msix(dev); if (cleanup) diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index dd9a263ed368..786a43843da9 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -300,6 +300,8 @@ enum mlx5_event { MLX5_EVENT_TYPE_PAGE_FAULT = 0xc, MLX5_EVENT_TYPE_NIC_VPORT_CHANGE = 0xd, + + MLX5_EVENT_TYPE_FPGA_ERROR = 0x20, }; enum { @@ -967,6 +969,7 @@ enum mlx5_cap_type { MLX5_CAP_RESERVED, MLX5_CAP_VECTOR_CALC, MLX5_CAP_QOS, + MLX5_CAP_FPGA, /* NUM OF CAP Types */ MLX5_CAP_NUM }; @@ -1088,6 +1091,9 @@ enum mlx5_mcam_feature_groups { #define MLX5_CAP_MCAM_FEATURE(mdev, fld) \ MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_feature_cap_mask.enhanced_features.fld) +#define MLX5_CAP_FPGA(mdev, cap) \ + MLX5_GET(fpga_cap, (mdev)->caps.hca_cur[MLX5_CAP_FPGA], cap) + enum { MLX5_CMD_STAT_OK = 0x0, MLX5_CMD_STAT_INT_ERR = 0x1, diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index a277bb36c21f..55bb712643cb 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -108,6 +108,8 @@ enum { MLX5_REG_QTCT = 0x400a, MLX5_REG_DCBX_PARAM = 0x4020, MLX5_REG_DCBX_APP = 0x4021, + MLX5_REG_FPGA_CAP = 0x4022, + MLX5_REG_FPGA_CTRL = 0x4023, MLX5_REG_PCAP = 0x5001, MLX5_REG_PMTU = 0x5003, MLX5_REG_PTYS = 0x5004, @@ -761,6 +763,9 @@ struct mlx5_core_dev { atomic_t num_qps; u32 issi; struct mlx5e_resources mlx5e_res; +#ifdef CONFIG_MLX5_FPGA + struct mlx5_fpga_device *fpga; +#endif #ifdef CONFIG_RFS_ACCEL struct cpu_rmap *rmap; #endif diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 32de0724b400..6fa1eb6766af 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -32,6 +32,8 @@ #ifndef MLX5_IFC_H #define MLX5_IFC_H +#include "mlx5_ifc_fpga.h" + enum { MLX5_EVENT_TYPE_CODING_COMPLETION_EVENTS = 0x0, MLX5_EVENT_TYPE_CODING_PATH_MIGRATED_SUCCEEDED = 0x1, @@ -56,7 +58,8 @@ enum { MLX5_EVENT_TYPE_CODING_STALL_VL_EVENT = 0x1b, MLX5_EVENT_TYPE_CODING_DROPPED_PACKET_LOGGED_EVENT = 0x1f, MLX5_EVENT_TYPE_CODING_COMMAND_INTERFACE_COMPLETION = 0xa, - MLX5_EVENT_TYPE_CODING_PAGE_REQUEST = 0xb + MLX5_EVENT_TYPE_CODING_PAGE_REQUEST = 0xb, + MLX5_EVENT_TYPE_CODING_FPGA_ERROR = 0x20, }; enum { @@ -854,7 +857,8 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 max_tc[0x4]; u8 reserved_at_1d0[0x1]; u8 dcbx[0x1]; - u8 reserved_at_1d2[0x4]; + u8 reserved_at_1d2[0x3]; + u8 fpga[0x1]; u8 rol_s[0x1]; u8 rol_g[0x1]; u8 reserved_at_1d8[0x1]; @@ -2186,6 +2190,7 @@ union mlx5_ifc_hca_cap_union_bits { struct mlx5_ifc_e_switch_cap_bits e_switch_cap; struct mlx5_ifc_vector_calc_cap_bits vector_calc_cap; struct mlx5_ifc_qos_cap_bits qos_cap; + struct mlx5_ifc_fpga_cap_bits fpga_cap; u8 reserved_at_0[0x8000]; }; @@ -8182,6 +8187,8 @@ union mlx5_ifc_ports_control_registers_document_bits { struct mlx5_ifc_sltp_reg_bits sltp_reg; struct mlx5_ifc_mtpps_reg_bits mtpps_reg; struct mlx5_ifc_mtppse_reg_bits mtppse_reg; + struct mlx5_ifc_fpga_ctrl_bits fpga_ctrl_bits; + struct mlx5_ifc_fpga_cap_bits fpga_cap_bits; u8 reserved_at_0[0x60e0]; }; diff --git a/include/linux/mlx5/mlx5_ifc_fpga.h b/include/linux/mlx5/mlx5_ifc_fpga.h new file mode 100644 index 000000000000..0032d10ac6cf --- /dev/null +++ b/include/linux/mlx5/mlx5_ifc_fpga.h @@ -0,0 +1,144 @@ +/* + * Copyright (c) 2017, Mellanox Technologies, Ltd. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef MLX5_IFC_FPGA_H +#define MLX5_IFC_FPGA_H + +struct mlx5_ifc_fpga_shell_caps_bits { + u8 max_num_qps[0x10]; + u8 reserved_at_10[0x8]; + u8 total_rcv_credits[0x8]; + + u8 reserved_at_20[0xe]; + u8 qp_type[0x2]; + u8 reserved_at_30[0x5]; + u8 rae[0x1]; + u8 rwe[0x1]; + u8 rre[0x1]; + u8 reserved_at_38[0x4]; + u8 dc[0x1]; + u8 ud[0x1]; + u8 uc[0x1]; + u8 rc[0x1]; + + u8 reserved_at_40[0x1a]; + u8 log_ddr_size[0x6]; + + u8 max_fpga_qp_msg_size[0x20]; + + u8 reserved_at_80[0x180]; +}; + +struct mlx5_ifc_fpga_cap_bits { + u8 fpga_id[0x8]; + u8 fpga_device[0x18]; + + u8 register_file_ver[0x20]; + + u8 fpga_ctrl_modify[0x1]; + u8 reserved_at_41[0x5]; + u8 access_reg_query_mode[0x2]; + u8 reserved_at_48[0x6]; + u8 access_reg_modify_mode[0x2]; + u8 reserved_at_50[0x10]; + + u8 reserved_at_60[0x20]; + + u8 image_version[0x20]; + + u8 image_date[0x20]; + + u8 image_time[0x20]; + + u8 shell_version[0x20]; + + u8 reserved_at_100[0x80]; + + struct mlx5_ifc_fpga_shell_caps_bits shell_caps; + + u8 reserved_at_380[0x8]; + u8 ieee_vendor_id[0x18]; + + u8 sandbox_product_version[0x10]; + u8 sandbox_product_id[0x10]; + + u8 sandbox_basic_caps[0x20]; + + u8 reserved_at_3e0[0x10]; + u8 sandbox_extended_caps_len[0x10]; + + u8 sandbox_extended_caps_addr[0x40]; + + u8 fpga_ddr_start_addr[0x40]; + + u8 fpga_cr_space_start_addr[0x40]; + + u8 fpga_ddr_size[0x20]; + + u8 fpga_cr_space_size[0x20]; + + u8 reserved_at_500[0x300]; +}; + +struct mlx5_ifc_fpga_ctrl_bits { + u8 reserved_at_0[0x8]; + u8 operation[0x8]; + u8 reserved_at_10[0x8]; + u8 status[0x8]; + + u8 reserved_at_20[0x8]; + u8 flash_select_admin[0x8]; + u8 reserved_at_30[0x8]; + u8 flash_select_oper[0x8]; + + u8 reserved_at_40[0x40]; +}; + +enum { + MLX5_FPGA_ERROR_EVENT_SYNDROME_CORRUPTED_DDR = 0x1, + MLX5_FPGA_ERROR_EVENT_SYNDROME_FLASH_TIMEOUT = 0x2, + MLX5_FPGA_ERROR_EVENT_SYNDROME_INTERNAL_LINK_ERROR = 0x3, + MLX5_FPGA_ERROR_EVENT_SYNDROME_WATCHDOG_FAILURE = 0x4, + MLX5_FPGA_ERROR_EVENT_SYNDROME_I2C_FAILURE = 0x5, + MLX5_FPGA_ERROR_EVENT_SYNDROME_IMAGE_CHANGED = 0x6, + MLX5_FPGA_ERROR_EVENT_SYNDROME_TEMPERATURE_CRITICAL = 0x7, +}; + +struct mlx5_ifc_fpga_error_event_bits { + u8 reserved_at_0[0x40]; + + u8 reserved_at_40[0x18]; + u8 syndrome[0x8]; + + u8 reserved_at_60[0x80]; +}; + +#endif /* MLX5_IFC_FPGA_H */