mbox

[pull,request,for-next] Mellanox mlx5 Reorganize core driver directory layout

Message ID 1484241755-17603-1-git-send-email-saeedm@mellanox.com
State Rejected, archived
Delegated to: David Miller
Headers show

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git tags/mlx5-dir-layout-reorg

Message

Saeed Mahameed Jan. 12, 2017, 5:22 p.m. UTC
Hi Dave and Doug,

This pull request includes one patch from Leon, this patch as described 
below will change the driver directory structure and layout for better,
logical and modular driver files separation.

This change is important to both rdma and net maintainers in order to 
have smoother management of driver patches for different mlx5 sub modules
and smoother rdma-next vs. net-next features submissions.

Please find more info below -in the tag commit message-,
review and let us know if there's any problem.

This change doesn't introduce any conflicts with the current mlx5
fixes and cleanups posted on 2017-01-10 to net branch, and merge tests
worked flawlessly with no issues.

This is the last pull request meant for both rdma-next and net-next.
Once pulled, this will be the base shared code for both trees.

Thanks,
Saeed and Leon.

The following changes since commit f502d834950a28e02651bb7e2cc7111ddd352644:

  net/mlx5: Activate support for 4K UARs (2017-01-09 20:25:10 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git tags/mlx5-dir-layout-reorg

for you to fetch changes up to 8a9a0375cee4888f7d70e651f3e76347c731c51f:

  {net, IB}/mlx5: Reorganize driver file layout (2017-01-11 13:40:43 +0200)

----------------------------------------------------------------
{net, IB}/mlx5: Reorganize driver file layout

    This patch organizes mlx5 driver file layout to better reflect mlx5
    modularity and allow future separation between EN, IB and shared code
    parts.

    The new structure:
     * drivers/net/ethernet/mellanox/mlx5/*       - core HW/PCI driver logic
     * drivers/net/ethernet/mellanox/mlx5/en/*    - ethernet
     * drivers/net/ethernet/mellanox/mlx5/fs/*    - flow steering
     * drivers/net/ethernet/mellanox/mlx5/sriov/* - SR-IOV and E-Switch
     * drivers/net/ethernet/mellanox/mlx5/lib/*   - common mlx5 commands and API library

    In future submissions, we will shrink the "lib" directory to the code
    related to both subsystems only, while IB part will be moved to
    drivers/infiniband/hw/mlx5, and EN will be moved to "en" directory.
    Such separation will make this library (shared) code to be lean and minimal,
    and help avoid future conflicts between IB and net submissions.

    The proposed structure allows us to remove include/linux/mlx5,
    which belongs solely to Mellanox's devices and don't need to
    be exposed in common linux include directory.

    The following change goes together with update of MAINTAINERS file to
    more granular maintainership roles:
     * drivers/net/ethernet/mellanox/mlx5/*       - Saeed, Matan and Leon
     * drivers/net/ethernet/mellanox/mlx5/en/*    - Saeed
     * drivers/net/ethernet/mellanox/mlx5/fs/*    - Saeed, Matan and Leon
     * drivers/net/ethernet/mellanox/mlx5/sriov/* - Saeed
     * drivers/net/ethernet/mellanox/mlx5/lib/*   - Saeed, Matan and Leon
     * include/uapi/rdma/mlx5-abi.h               - Matan and Leon

Thanks,
	Leon Romanovsky.

----------------------------------------------------------------
Leon Romanovsky (1):
      {net, IB}/mlx5: Reorganize driver file layout

 MAINTAINERS                                         | 11 ++++++-----
 drivers/infiniband/hw/mlx5/Makefile                 |  1 +
 drivers/infiniband/hw/mlx5/ib_virt.c                |  2 +-
 drivers/infiniband/hw/mlx5/mad.c                    |  4 ++--
 drivers/infiniband/hw/mlx5/main.c                   |  6 +++---
 drivers/infiniband/hw/mlx5/mlx5_ib.h                | 10 +++++-----
 drivers/infiniband/hw/mlx5/srq.c                    |  4 ++--
 drivers/net/ethernet/mellanox/Kconfig               |  2 +-
 drivers/net/ethernet/mellanox/Makefile              |  2 +-
 .../net/ethernet/mellanox/mlx5/{core => }/Kconfig   |  0
 drivers/net/ethernet/mellanox/mlx5/Makefile         | 21 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlx5/{core => }/cmd.c |  5 ++---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile    | 13 -------------
 .../net/ethernet/mellanox/mlx5/{core => }/debugfs.c |  8 ++++----
 drivers/net/ethernet/mellanox/mlx5/{core => }/dev.c |  4 ++--
 .../mellanox/mlx5/{core/en_arfs.c => en/arfs.c}     |  4 ++--
 .../mellanox/mlx5/{core/en_clock.c => en/clock.c}   |  2 +-
 .../mellanox/mlx5/{core/en_common.c => en/common.c} |  2 +-
 .../mellanox/mlx5/{core/en_dcbnl.c => en/dcbnl.c}   |  2 +-
 .../net/ethernet/mellanox/mlx5/{core => en}/en.h    | 18 +++++++++---------
 .../mlx5/{core/en_ethtool.c => en/ethtool.c}        |  2 +-
 .../mellanox/mlx5/{core/en_fs.c => en/fs.c}         |  4 ++--
 .../mlx5/{core/en_fs_ethtool.c => en/fs_ethtool.c}  |  4 ++--
 .../mellanox/mlx5/{core/en_main.c => en/main.c}     | 10 +++++-----
 .../mellanox/mlx5/{core/en_rep.c => en/rep.c}       |  8 ++++----
 .../mellanox/mlx5/{core/en_rx.c => en/rx.c}         |  6 +++---
 .../mellanox/mlx5/{core/en_rx_am.c => en/rx_am.c}   |  2 +-
 .../mlx5/{core/en_selftest.c => en/selftest.c}      |  2 +-
 .../mellanox/mlx5/{core/en_stats.h => en/stats.h}   |  0
 .../mellanox/mlx5/{core/en_tc.c => en/tc.c}         | 12 ++++++------
 .../mellanox/mlx5/{core/en_tc.h => en/tc.h}         |  0
 .../mellanox/mlx5/{core/en_tx.c => en/tx.c}         |  2 +-
 .../mellanox/mlx5/{core/en_txrx.c => en/txrx.c}     |  2 +-
 .../net/ethernet/mellanox/mlx5/{core => en}/vxlan.c |  6 +++---
 .../net/ethernet/mellanox/mlx5/{core => en}/vxlan.h |  4 ++--
 .../net/ethernet/mellanox/mlx5/{core => en}/wq.c    |  6 +++---
 .../net/ethernet/mellanox/mlx5/{core => en}/wq.h    |  2 +-
 drivers/net/ethernet/mellanox/mlx5/{core => }/eq.c  |  8 ++++----
 .../mellanox/mlx5/{core/fs_cmd.c => fs/cmd.c}       | 15 +++++++--------
 .../mellanox/mlx5/{core/fs_cmd.h => fs/cmd.h}       |  0
 .../mellanox/mlx5/{core/fs_core.c => fs/core.c}     |  9 ++++-----
 .../mellanox/mlx5/{core/fs_core.h => fs/core.h}     |  2 +-
 .../mlx5/{core/fs_counters.c => fs/counters.c}      | 10 +++++-----
 .../net/ethernet/mellanox/mlx5/{core => }/health.c  |  6 +++---
 drivers/net/ethernet/mellanox/mlx5/{core => }/lag.c |  6 +++---
 .../ethernet/mellanox/mlx5/{core => lib}/alloc.c    |  5 ++---
 .../net/ethernet/mellanox/mlx5/lib}/cmd.h           |  0
 .../net/ethernet/mellanox/mlx5/{core => lib}/cq.c   |  8 ++++----
 .../net/ethernet/mellanox/mlx5/lib}/cq.h            |  2 +-
 .../net/ethernet/mellanox/mlx5/lib}/device.h        |  2 +-
 .../net/ethernet/mellanox/mlx5/lib}/doorbell.h      |  0
 .../net/ethernet/mellanox/mlx5/lib}/driver.h        |  6 +++---
 .../net/ethernet/mellanox/mlx5/lib}/fs.h            |  4 ++--
 .../net/ethernet/mellanox/mlx5/{core => lib}/fw.c   |  6 +++---
 .../net/ethernet/mellanox/mlx5/{core => lib}/mad.c  |  6 +++---
 .../net/ethernet/mellanox/mlx5/{core => lib}/mcg.c  |  6 +++---
 .../net/ethernet/mellanox/mlx5/lib}/mlx5_ifc.h      |  0
 .../net/ethernet/mellanox/mlx5/{core => lib}/mr.c   |  6 +++---
 .../net/ethernet/mellanox/mlx5/{core => lib}/pd.c   |  6 +++---
 .../net/ethernet/mellanox/mlx5/{core => lib}/port.c |  8 ++++----
 .../net/ethernet/mellanox/mlx5/lib}/port.h          |  2 +-
 .../net/ethernet/mellanox/mlx5/{core => lib}/qp.c   | 11 +++++------
 .../net/ethernet/mellanox/mlx5/lib}/qp.h            |  4 ++--
 .../net/ethernet/mellanox/mlx5/{core => lib}/rl.c   |  6 +++---
 .../net/ethernet/mellanox/mlx5/{core => lib}/srq.c  | 10 +++++-----
 .../net/ethernet/mellanox/mlx5/lib}/srq.h           |  2 +-
 .../ethernet/mellanox/mlx5/{core => lib}/transobj.c |  6 +++---
 .../net/ethernet/mellanox/mlx5/lib}/transobj.h      |  2 +-
 .../net/ethernet/mellanox/mlx5/{core => lib}/uar.c  |  6 +++---
 .../ethernet/mellanox/mlx5/{core => lib}/vport.c    |  6 +++---
 .../net/ethernet/mellanox/mlx5/lib}/vport.h         |  4 ++--
 .../net/ethernet/mellanox/mlx5/{core => }/main.c    | 16 ++++++++--------
 .../mellanox/mlx5/{core/mlx5_core.h => mlx5.h}      |  0
 .../ethernet/mellanox/mlx5/{core => }/pagealloc.c   |  6 +++---
 .../mellanox/mlx5/{core => sriov}/eswitch.c         | 12 ++++++------
 .../mellanox/mlx5/{core => sriov}/eswitch.h         |  2 +-
 .../mlx5/{core => sriov}/eswitch_offloads.c         | 12 ++++++------
 .../ethernet/mellanox/mlx5/{core => sriov}/sriov.c  |  6 +++---
 78 files changed, 216 insertions(+), 211 deletions(-)
 rename drivers/net/ethernet/mellanox/mlx5/{core => }/Kconfig (100%)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/Makefile
 rename drivers/net/ethernet/mellanox/mlx5/{core => }/cmd.c (99%)
 delete mode 100644 drivers/net/ethernet/mellanox/mlx5/core/Makefile
 rename drivers/net/ethernet/mellanox/mlx5/{core => }/debugfs.c (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => }/dev.c (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_arfs.c => en/arfs.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_clock.c => en/clock.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_common.c => en/common.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_dcbnl.c => en/dcbnl.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => en}/en.h (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_ethtool.c => en/ethtool.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_fs.c => en/fs.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_fs_ethtool.c => en/fs_ethtool.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_main.c => en/main.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_rep.c => en/rep.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_rx.c => en/rx.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_rx_am.c => en/rx_am.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_selftest.c => en/selftest.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_stats.h => en/stats.h} (100%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_tc.c => en/tc.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_tc.h => en/tc.h} (100%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_tx.c => en/tx.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/en_txrx.c => en/txrx.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => en}/vxlan.c (98%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => en}/vxlan.h (97%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => en}/wq.c (98%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => en}/wq.h (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => }/eq.c (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/fs_cmd.c => fs/cmd.c} (98%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/fs_cmd.h => fs/cmd.h} (100%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/fs_core.c => fs/core.c} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/fs_core.h => fs/core.h} (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/fs_counters.c => fs/counters.c} (98%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => }/health.c (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => }/lag.c (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/alloc.c (99%)
 rename {include/linux/mlx5 => drivers/net/ethernet/mellanox/mlx5/lib}/cmd.h (100%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/cq.c (98%)
 rename {include/linux/mlx5 => drivers/net/ethernet/mellanox/mlx5/lib}/cq.h (99%)
 rename {include/linux/mlx5 => drivers/net/ethernet/mellanox/mlx5/lib}/device.h (99%)
 rename {include/linux/mlx5 => drivers/net/ethernet/mellanox/mlx5/lib}/doorbell.h (100%)
 rename {include/linux/mlx5 => drivers/net/ethernet/mellanox/mlx5/lib}/driver.h (99%)
 rename {include/linux/mlx5 => drivers/net/ethernet/mellanox/mlx5/lib}/fs.h (98%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/fw.c (98%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/mad.c (96%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/mcg.c (96%)
 rename {include/linux/mlx5 => drivers/net/ethernet/mellanox/mlx5/lib}/mlx5_ifc.h (100%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/mr.c (98%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/pd.c (96%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/port.c (99%)
 rename {include/linux/mlx5 => drivers/net/ethernet/mellanox/mlx5/lib}/port.h (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/qp.c (98%)
 rename {include/linux/mlx5 => drivers/net/ethernet/mellanox/mlx5/lib}/qp.h (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/rl.c (98%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/srq.c (99%)
 rename {include/linux/mlx5 => drivers/net/ethernet/mellanox/mlx5/lib}/srq.h (98%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/transobj.c (99%)
 rename {include/linux/mlx5 => drivers/net/ethernet/mellanox/mlx5/lib}/transobj.h (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/uar.c (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => lib}/vport.c (99%)
 rename {include/linux/mlx5 => drivers/net/ethernet/mellanox/mlx5/lib}/vport.h (98%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => }/main.c (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core/mlx5_core.h => mlx5.h} (100%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => }/pagealloc.c (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => sriov}/eswitch.c (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => sriov}/eswitch.h (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => sriov}/eswitch_offloads.c (99%)
 rename drivers/net/ethernet/mellanox/mlx5/{core => sriov}/sriov.c (98%)

Comments

David Miller Jan. 13, 2017, 5:14 p.m. UTC | #1
From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 12 Jan 2017 19:22:34 +0200

> This pull request includes one patch from Leon, this patch as described 
> below will change the driver directory structure and layout for better,
> logical and modular driver files separation.
> 
> This change is important to both rdma and net maintainers in order to 
> have smoother management of driver patches for different mlx5 sub modules
> and smoother rdma-next vs. net-next features submissions.
> 
> Please find more info below -in the tag commit message-,
> review and let us know if there's any problem.
> 
> This change doesn't introduce any conflicts with the current mlx5
> fixes and cleanups posted on 2017-01-10 to net branch, and merge tests
> worked flawlessly with no issues.
> 
> This is the last pull request meant for both rdma-next and net-next.
> Once pulled, this will be the base shared code for both trees.

This is pretty crazy, it will make all bug fix backporting to -stable
a complete nightmare for myself, Doug, various distribution maintainers
and many other people who quietly have to maintain their own trees and
do backporting.

I really don't think you can justify this rearrangement based upon the
consequences and how much activity happens in this driver.

You should have thought long and hard about the layout a long time ago
rather than after the driver has been in the tree for many years.

Sorry.
Leon Romanovsky Jan. 13, 2017, 8:29 p.m. UTC | #2
On Fri, Jan 13, 2017 at 12:14:07PM -0500, David Miller wrote:
> From: Saeed Mahameed <saeedm@mellanox.com>
> Date: Thu, 12 Jan 2017 19:22:34 +0200
>
> > This pull request includes one patch from Leon, this patch as described
> > below will change the driver directory structure and layout for better,
> > logical and modular driver files separation.
> >
> > This change is important to both rdma and net maintainers in order to
> > have smoother management of driver patches for different mlx5 sub modules
> > and smoother rdma-next vs. net-next features submissions.
> >
> > Please find more info below -in the tag commit message-,
> > review and let us know if there's any problem.
> >
> > This change doesn't introduce any conflicts with the current mlx5
> > fixes and cleanups posted on 2017-01-10 to net branch, and merge tests
> > worked flawlessly with no issues.
> >
> > This is the last pull request meant for both rdma-next and net-next.
> > Once pulled, this will be the base shared code for both trees.
>
> This is pretty crazy, it will make all bug fix backporting to -stable
> a complete nightmare for myself, Doug, various distribution maintainers
> and many other people who quietly have to maintain their own trees and
> do backporting.

Hi Dave,

I understand your worries, but our case is similar to various other
drivers, for example hfi1 which was in staging for years while
supported in RedHat and moved from there to IB. The Chelsio drivers did
similar reorg in 2016 (drivers/net/ethernet/chelsio/libcxgb) while their
drivers were in the tree for long time before.

Additionally, Doug doesn't need to maintain -stable queue and it is done
by relevant submaintainers who are adding stable tags by themselves. In
the IB case, the burden will continue to be on me and not on Doug.

>
> I really don't think you can justify this rearrangement based upon the
> consequences and how much activity happens in this driver.
>
> You should have thought long and hard about the layout a long time ago
> rather than after the driver has been in the tree for many years.
>
> Sorry.
Tom Herbert Jan. 13, 2017, 10:06 p.m. UTC | #3
On Fri, Jan 13, 2017 at 12:29 PM, Leon Romanovsky <leon@kernel.org> wrote:
> On Fri, Jan 13, 2017 at 12:14:07PM -0500, David Miller wrote:
>> From: Saeed Mahameed <saeedm@mellanox.com>
>> Date: Thu, 12 Jan 2017 19:22:34 +0200
>>
>> > This pull request includes one patch from Leon, this patch as described
>> > below will change the driver directory structure and layout for better,
>> > logical and modular driver files separation.
>> >
>> > This change is important to both rdma and net maintainers in order to
>> > have smoother management of driver patches for different mlx5 sub modules
>> > and smoother rdma-next vs. net-next features submissions.
>> >
>> > Please find more info below -in the tag commit message-,
>> > review and let us know if there's any problem.
>> >
>> > This change doesn't introduce any conflicts with the current mlx5
>> > fixes and cleanups posted on 2017-01-10 to net branch, and merge tests
>> > worked flawlessly with no issues.
>> >
>> > This is the last pull request meant for both rdma-next and net-next.
>> > Once pulled, this will be the base shared code for both trees.
>>
>> This is pretty crazy, it will make all bug fix backporting to -stable
>> a complete nightmare for myself, Doug, various distribution maintainers
>> and many other people who quietly have to maintain their own trees and
>> do backporting.
>
> Hi Dave,
>
> I understand your worries, but our case is similar to various other
> drivers, for example hfi1 which was in staging for years while
> supported in RedHat and moved from there to IB. The Chelsio drivers did
> similar reorg in 2016 (drivers/net/ethernet/chelsio/libcxgb) while their
> drivers were in the tree for long time before.
>
> Additionally, Doug doesn't need to maintain -stable queue and it is done
> by relevant submaintainers who are adding stable tags by themselves. In
> the IB case, the burden will continue to be on me and not on Doug.
>
Recently I had to backport the mlx5 driver from 4.9 to 4.6 in order to
get support for XDP. The biggest issue I faced was the lack of
modularity in the many driver features that are now supported. The
problem with backporting these new features is the spider web of
dependencies that they bring in from the rest of the kernel. I ended
up taking out en_rep, vxlan, en_tc, eswitch, and dcbnl. The result was
~340 patches which is still a lot but at least this was constrained to
patches in the mlx5 directories and are relevant to what we want to
do.

In lieu of restructuring the directories, I would much rather see more
config options so that we can build drivers that don't unnecessarily
complicate our lives with features we don't use. This is not just true
for Mellanox, but I would say it would be true of any driver that
someone is trying to deploy and maintain at large scale.

Btw, we did hit one issue in the backport. We started to get rx csum
faults (checksum complete value indicates TCP checksum is bad, but
host computation says checksum is good). I ran against 4.9 upstream
kernel and do see these, however don't see them in 4.10. I haven't
bisected yet. Is this a known issue?

Thanks,
Tom

>>
>> I really don't think you can justify this rearrangement based upon the
>> consequences and how much activity happens in this driver.
>>
>> You should have thought long and hard about the layout a long time ago
>> rather than after the driver has been in the tree for many years.
>>
>> Sorry.
Saeed Mahameed Jan. 13, 2017, 10:07 p.m. UTC | #4
On Fri, Jan 13, 2017 at 7:14 PM, David Miller <davem@davemloft.net> wrote:
> From: Saeed Mahameed <saeedm@mellanox.com>
> Date: Thu, 12 Jan 2017 19:22:34 +0200
>
>> This pull request includes one patch from Leon, this patch as described
>> below will change the driver directory structure and layout for better,
>> logical and modular driver files separation.
>>
>> This change is important to both rdma and net maintainers in order to
>> have smoother management of driver patches for different mlx5 sub modules
>> and smoother rdma-next vs. net-next features submissions.
>>
>> Please find more info below -in the tag commit message-,
>> review and let us know if there's any problem.
>>
>> This change doesn't introduce any conflicts with the current mlx5
>> fixes and cleanups posted on 2017-01-10 to net branch, and merge tests
>> worked flawlessly with no issues.
>>
>> This is the last pull request meant for both rdma-next and net-next.
>> Once pulled, this will be the base shared code for both trees.
>
> This is pretty crazy, it will make all bug fix backporting to -stable
> a complete nightmare for myself, Doug, various distribution maintainers
> and many other people who quietly have to maintain their own trees and
> do backporting.
>

I hear you,
But please bear with me here, what if we queue this patch up to -stable ? and we
(Mellanox) and specifically our dedicated inbox team, will make sure
that this patch
will land on the various distributions and for those maintaining their
own trees.
This patch is really straight forward (rename files) and I already
tried to cherry-pick it
on older kernels, I only got a couple of conflicts on some of the
"#inlcude" lines we've
changed, and they are pretty straightforward to fix, we can even avoid
this if we decide to
not move mlx5 header files in this phase.

If this is possible then all trees will be aligned and it will be a
win win situation.

> I really don't think you can justify this rearrangement based upon the
> consequences and how much activity happens in this driver.
>

Right, but this is not the only justification, I can sum it up to that
we would like
to lay out the foundation for many years to come for a well designed
driver with a modular
sub modules break down and scalable infrastructure. We already plan to
submit more mlx5
independent  sub modules - just like mlx5e (en_*) files and mlx5_ib
driver- so this was also
a reason for us to consider this re-engagement at this stage.

> You should have thought long and hard about the layout a long time ago
> rather than after the driver has been in the tree for many years.
>

I had this Idea for the separation before the mlx5 Ethernet
submission, but I wasn't the maintainer
back then, and i have been itching to submit such patch for long as
well, still i don't think
it is too late, We (Me and Leon) will keep maintaining this driver for
only god knows how many years to come,
and the mlx5 drivers are meant to serve at least 3-4 more future HW generations.

Long story short, We would like to re-arrange the driver in a way that
would serve us (the maintainers) and serve those
who are going do develop the future Stack features and the future HW
generations over the well designed (Hopefully)
mlx5 infrastructure.
Keeping it as it is today, will only make the situation worst in the
future and it will be really hard to avoid having a spaghetti code
in the mlx5 driver. All i want to point out here is that maintaining
such a flat subtree is also nightmare.

So i am only asking you to reconsider this change and give my -stable
suggestion a thought.

Thank you.
Saeed.
Saeed Mahameed Jan. 13, 2017, 10:45 p.m. UTC | #5
On Sat, Jan 14, 2017 at 12:06 AM, Tom Herbert <tom@herbertland.com> wrote:
> On Fri, Jan 13, 2017 at 12:29 PM, Leon Romanovsky <leon@kernel.org> wrote:
>> On Fri, Jan 13, 2017 at 12:14:07PM -0500, David Miller wrote:
>>> From: Saeed Mahameed <saeedm@mellanox.com>
>>> Date: Thu, 12 Jan 2017 19:22:34 +0200
>>>
>>> > This pull request includes one patch from Leon, this patch as described
>>> > below will change the driver directory structure and layout for better,
>>> > logical and modular driver files separation.
>>> >
>>> > This change is important to both rdma and net maintainers in order to
>>> > have smoother management of driver patches for different mlx5 sub modules
>>> > and smoother rdma-next vs. net-next features submissions.
>>> >
>>> > Please find more info below -in the tag commit message-,
>>> > review and let us know if there's any problem.
>>> >
>>> > This change doesn't introduce any conflicts with the current mlx5
>>> > fixes and cleanups posted on 2017-01-10 to net branch, and merge tests
>>> > worked flawlessly with no issues.
>>> >
>>> > This is the last pull request meant for both rdma-next and net-next.
>>> > Once pulled, this will be the base shared code for both trees.
>>>
>>> This is pretty crazy, it will make all bug fix backporting to -stable
>>> a complete nightmare for myself, Doug, various distribution maintainers
>>> and many other people who quietly have to maintain their own trees and
>>> do backporting.
>>
>> Hi Dave,
>>
>> I understand your worries, but our case is similar to various other
>> drivers, for example hfi1 which was in staging for years while
>> supported in RedHat and moved from there to IB. The Chelsio drivers did
>> similar reorg in 2016 (drivers/net/ethernet/chelsio/libcxgb) while their
>> drivers were in the tree for long time before.
>>
>> Additionally, Doug doesn't need to maintain -stable queue and it is done
>> by relevant submaintainers who are adding stable tags by themselves. In
>> the IB case, the burden will continue to be on me and not on Doug.
>>
> Recently I had to backport the mlx5 driver from 4.9 to 4.6 in order to
> get support for XDP. The biggest issue I faced was the lack of
> modularity in the many driver features that are now supported. The
> problem with backporting these new features is the spider web of
> dependencies that they bring in from the rest of the kernel. I ended
> up taking out en_rep, vxlan, en_tc, eswitch, and dcbnl. The result was
> ~340 patches which is still a lot but at least this was constrained to
> patches in the mlx5 directories and are relevant to what we want to
> do.
>
> In lieu of restructuring the directories, I would much rather see more
> config options so that we can build drivers that don't unnecessarily
> complicate our lives with features we don't use. This is not just true
> for Mellanox, but I would say it would be true of any driver that
> someone is trying to deploy and maintain at large scale.
>

I think we should have both, if the restructuring made right,
new whole features (e.g eswitch and eswitch offlaods or any independent module),
can sit in their own directory and keep their own logic concentrated
in one place, and only touch the
main driver code with simple entry points in the main flow,  this way
you can simply compile their whole directories
out with a config flag directly from the Makefile.

> Btw, we did hit one issue in the backport. We started to get rx csum
> faults (checksum complete value indicates TCP checksum is bad, but
> host computation says checksum is good). I ran against 4.9 upstream
> kernel and do see these, however don't see them in 4.10. I haven't
> bisected yet. Is this a known issue?
>

Not to me, I don't recall any csum related fixes or feature submitted
lately to mlx5,
Maybe something changed in the stack ?

what configuration are you running ? what traffic ?

> Thanks,
> Tom
>
>>>
>>> I really don't think you can justify this rearrangement based upon the
>>> consequences and how much activity happens in this driver.
>>>
>>> You should have thought long and hard about the layout a long time ago
>>> rather than after the driver has been in the tree for many years.
>>>
>>> Sorry.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert Jan. 13, 2017, 10:56 p.m. UTC | #6
On Fri, Jan 13, 2017 at 2:45 PM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Sat, Jan 14, 2017 at 12:06 AM, Tom Herbert <tom@herbertland.com> wrote:
>> On Fri, Jan 13, 2017 at 12:29 PM, Leon Romanovsky <leon@kernel.org> wrote:
>>> On Fri, Jan 13, 2017 at 12:14:07PM -0500, David Miller wrote:
>>>> From: Saeed Mahameed <saeedm@mellanox.com>
>>>> Date: Thu, 12 Jan 2017 19:22:34 +0200
>>>>
>>>> > This pull request includes one patch from Leon, this patch as described
>>>> > below will change the driver directory structure and layout for better,
>>>> > logical and modular driver files separation.
>>>> >
>>>> > This change is important to both rdma and net maintainers in order to
>>>> > have smoother management of driver patches for different mlx5 sub modules
>>>> > and smoother rdma-next vs. net-next features submissions.
>>>> >
>>>> > Please find more info below -in the tag commit message-,
>>>> > review and let us know if there's any problem.
>>>> >
>>>> > This change doesn't introduce any conflicts with the current mlx5
>>>> > fixes and cleanups posted on 2017-01-10 to net branch, and merge tests
>>>> > worked flawlessly with no issues.
>>>> >
>>>> > This is the last pull request meant for both rdma-next and net-next.
>>>> > Once pulled, this will be the base shared code for both trees.
>>>>
>>>> This is pretty crazy, it will make all bug fix backporting to -stable
>>>> a complete nightmare for myself, Doug, various distribution maintainers
>>>> and many other people who quietly have to maintain their own trees and
>>>> do backporting.
>>>
>>> Hi Dave,
>>>
>>> I understand your worries, but our case is similar to various other
>>> drivers, for example hfi1 which was in staging for years while
>>> supported in RedHat and moved from there to IB. The Chelsio drivers did
>>> similar reorg in 2016 (drivers/net/ethernet/chelsio/libcxgb) while their
>>> drivers were in the tree for long time before.
>>>
>>> Additionally, Doug doesn't need to maintain -stable queue and it is done
>>> by relevant submaintainers who are adding stable tags by themselves. In
>>> the IB case, the burden will continue to be on me and not on Doug.
>>>
>> Recently I had to backport the mlx5 driver from 4.9 to 4.6 in order to
>> get support for XDP. The biggest issue I faced was the lack of
>> modularity in the many driver features that are now supported. The
>> problem with backporting these new features is the spider web of
>> dependencies that they bring in from the rest of the kernel. I ended
>> up taking out en_rep, vxlan, en_tc, eswitch, and dcbnl. The result was
>> ~340 patches which is still a lot but at least this was constrained to
>> patches in the mlx5 directories and are relevant to what we want to
>> do.
>>
>> In lieu of restructuring the directories, I would much rather see more
>> config options so that we can build drivers that don't unnecessarily
>> complicate our lives with features we don't use. This is not just true
>> for Mellanox, but I would say it would be true of any driver that
>> someone is trying to deploy and maintain at large scale.
>>
>
> I think we should have both, if the restructuring made right,
> new whole features (e.g eswitch and eswitch offlaods or any independent module),
> can sit in their own directory and keep their own logic concentrated
> in one place, and only touch the
> main driver code with simple entry points in the main flow,  this way
> you can simply compile their whole directories
> out with a config flag directly from the Makefile.
>
>> Btw, we did hit one issue in the backport. We started to get rx csum
>> faults (checksum complete value indicates TCP checksum is bad, but
>> host computation says checksum is good). I ran against 4.9 upstream
>> kernel and do see these, however don't see them in 4.10. I haven't
>> bisected yet. Is this a known issue?
>>
>
> Not to me, I don't recall any csum related fixes or feature submitted
> lately to mlx5,
> Maybe something changed in the stack ?
>
> what configuration are you running ? what traffic ?
>
Nothing fancy. 8 queues and 20 concurrent netperf TCP_STREAMs trips
it. Not a lot of them, but I don't think we really should ever see
these errors.

Tom

>> Thanks,
>> Tom
>>
>>>>
>>>> I really don't think you can justify this rearrangement based upon the
>>>> consequences and how much activity happens in this driver.
>>>>
>>>> You should have thought long and hard about the layout a long time ago
>>>> rather than after the driver has been in the tree for many years.
>>>>
>>>> Sorry.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rick Jones Jan. 13, 2017, 10:59 p.m. UTC | #7
On 01/13/2017 02:56 PM, Tom Herbert wrote:
> On Fri, Jan 13, 2017 at 2:45 PM, Saeed Mahameed
>> what configuration are you running ? what traffic ?
>>
> Nothing fancy. 8 queues and 20 concurrent netperf TCP_STREAMs trips
> it. Not a lot of them, but I don't think we really should ever see
> these errors.

Straight-up defaults with netperf, or do you use specific -s/S or -m/M 
options?

happy benchmarking,

rick jones
Tom Herbert Jan. 13, 2017, 11:07 p.m. UTC | #8
On Fri, Jan 13, 2017 at 2:59 PM, Rick Jones <rick.jones2@hpe.com> wrote:
> On 01/13/2017 02:56 PM, Tom Herbert wrote:
>>
>> On Fri, Jan 13, 2017 at 2:45 PM, Saeed Mahameed
>>>
>>> what configuration are you running ? what traffic ?
>>>
>> Nothing fancy. 8 queues and 20 concurrent netperf TCP_STREAMs trips
>> it. Not a lot of them, but I don't think we really should ever see
>> these errors.
>
>
> Straight-up defaults with netperf, or do you use specific -s/S or -m/M
> options?
>
./super_netperf_tput 20 -H test001 -l 100 -t TCP_STREAM

> happy benchmarking,
>
> rick jones
>
Tom Herbert Jan. 14, 2017, 5:37 p.m. UTC | #9
On Fri, Jan 13, 2017 at 2:45 PM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Sat, Jan 14, 2017 at 12:06 AM, Tom Herbert <tom@herbertland.com> wrote:
>> On Fri, Jan 13, 2017 at 12:29 PM, Leon Romanovsky <leon@kernel.org> wrote:
>>> On Fri, Jan 13, 2017 at 12:14:07PM -0500, David Miller wrote:
>>>> From: Saeed Mahameed <saeedm@mellanox.com>
>>>> Date: Thu, 12 Jan 2017 19:22:34 +0200
>>>>
>>>> > This pull request includes one patch from Leon, this patch as described
>>>> > below will change the driver directory structure and layout for better,
>>>> > logical and modular driver files separation.
>>>> >
>>>> > This change is important to both rdma and net maintainers in order to
>>>> > have smoother management of driver patches for different mlx5 sub modules
>>>> > and smoother rdma-next vs. net-next features submissions.
>>>> >
>>>> > Please find more info below -in the tag commit message-,
>>>> > review and let us know if there's any problem.
>>>> >
>>>> > This change doesn't introduce any conflicts with the current mlx5
>>>> > fixes and cleanups posted on 2017-01-10 to net branch, and merge tests
>>>> > worked flawlessly with no issues.
>>>> >
>>>> > This is the last pull request meant for both rdma-next and net-next.
>>>> > Once pulled, this will be the base shared code for both trees.
>>>>
>>>> This is pretty crazy, it will make all bug fix backporting to -stable
>>>> a complete nightmare for myself, Doug, various distribution maintainers
>>>> and many other people who quietly have to maintain their own trees and
>>>> do backporting.
>>>
>>> Hi Dave,
>>>
>>> I understand your worries, but our case is similar to various other
>>> drivers, for example hfi1 which was in staging for years while
>>> supported in RedHat and moved from there to IB. The Chelsio drivers did
>>> similar reorg in 2016 (drivers/net/ethernet/chelsio/libcxgb) while their
>>> drivers were in the tree for long time before.
>>>
>>> Additionally, Doug doesn't need to maintain -stable queue and it is done
>>> by relevant submaintainers who are adding stable tags by themselves. In
>>> the IB case, the burden will continue to be on me and not on Doug.
>>>
>> Recently I had to backport the mlx5 driver from 4.9 to 4.6 in order to
>> get support for XDP. The biggest issue I faced was the lack of
>> modularity in the many driver features that are now supported. The
>> problem with backporting these new features is the spider web of
>> dependencies that they bring in from the rest of the kernel. I ended
>> up taking out en_rep, vxlan, en_tc, eswitch, and dcbnl. The result was
>> ~340 patches which is still a lot but at least this was constrained to
>> patches in the mlx5 directories and are relevant to what we want to
>> do.
>>
>> In lieu of restructuring the directories, I would much rather see more
>> config options so that we can build drivers that don't unnecessarily
>> complicate our lives with features we don't use. This is not just true
>> for Mellanox, but I would say it would be true of any driver that
>> someone is trying to deploy and maintain at large scale.
>>
>
> I think we should have both, if the restructuring made right,
> new whole features (e.g eswitch and eswitch offlaods or any independent module),
> can sit in their own directory and keep their own logic concentrated
> in one place, and only touch the
> main driver code with simple entry points in the main flow,  this way
> you can simply compile their whole directories
> out with a config flag directly from the Makefile.
>
That would be nice, but that is not what your new layout gives us yet.
The starting point for this structure should be to discern what the
minimum set of files is to build a functional and good performance
driver with the basic functionality including only the most common
offloads. These I would think constitute the core files for the driver
and hence make sense to be in the "core" directory. Personally I would
drop the en_ file prefix and not make an en directory, we're already
in the ethernet driver part of the tree so I don't know what this name
is supposed to convey. For reference I gave you this list of
components I disabled to do the backport, some we're not split out in
your directory layout. A good example is the VXLAN offload support,
the changes in that area we're mostly due to Alexander's patch set to
redo the interface for these accelerations-- so to back port driver
from 4.9 to 4.6 I would need to pull those in and whatever else those
depend on. It was way easier to just ifdef out the relevant calls and
not compile the vxlan files to do the backport.

Tom

>> Btw, we did hit one issue in the backport. We started to get rx csum
>> faults (checksum complete value indicates TCP checksum is bad, but
>> host computation says checksum is good). I ran against 4.9 upstream
>> kernel and do see these, however don't see them in 4.10. I haven't
>> bisected yet. Is this a known issue?
>>
>
> Not to me, I don't recall any csum related fixes or feature submitted
> lately to mlx5,
> Maybe something changed in the stack ?
>
> what configuration are you running ? what traffic ?
>
>> Thanks,
>> Tom
>>
>>>>
>>>> I really don't think you can justify this rearrangement based upon the
>>>> consequences and how much activity happens in this driver.
>>>>
>>>> You should have thought long and hard about the layout a long time ago
>>>> rather than after the driver has been in the tree for many years.
>>>>
>>>> Sorry.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Leon Romanovsky Jan. 15, 2017, 7:20 a.m. UTC | #10
On Sat, Jan 14, 2017 at 09:37:20AM -0800, Tom Herbert wrote:
> On Fri, Jan 13, 2017 at 2:45 PM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
> > On Sat, Jan 14, 2017 at 12:06 AM, Tom Herbert <tom@herbertland.com> wrote:
> >> On Fri, Jan 13, 2017 at 12:29 PM, Leon Romanovsky <leon@kernel.org> wrote:
> >>> On Fri, Jan 13, 2017 at 12:14:07PM -0500, David Miller wrote:
> >>>> From: Saeed Mahameed <saeedm@mellanox.com>
> >>>> Date: Thu, 12 Jan 2017 19:22:34 +0200
> >>>>
> >>>> > This pull request includes one patch from Leon, this patch as described
> >>>> > below will change the driver directory structure and layout for better,
> >>>> > logical and modular driver files separation.
> >>>> >
> >>>> > This change is important to both rdma and net maintainers in order to
> >>>> > have smoother management of driver patches for different mlx5 sub modules
> >>>> > and smoother rdma-next vs. net-next features submissions.
> >>>> >
> >>>> > Please find more info below -in the tag commit message-,
> >>>> > review and let us know if there's any problem.
> >>>> >
> >>>> > This change doesn't introduce any conflicts with the current mlx5
> >>>> > fixes and cleanups posted on 2017-01-10 to net branch, and merge tests
> >>>> > worked flawlessly with no issues.
> >>>> >
> >>>> > This is the last pull request meant for both rdma-next and net-next.
> >>>> > Once pulled, this will be the base shared code for both trees.
> >>>>
> >>>> This is pretty crazy, it will make all bug fix backporting to -stable
> >>>> a complete nightmare for myself, Doug, various distribution maintainers
> >>>> and many other people who quietly have to maintain their own trees and
> >>>> do backporting.
> >>>
> >>> Hi Dave,
> >>>
> >>> I understand your worries, but our case is similar to various other
> >>> drivers, for example hfi1 which was in staging for years while
> >>> supported in RedHat and moved from there to IB. The Chelsio drivers did
> >>> similar reorg in 2016 (drivers/net/ethernet/chelsio/libcxgb) while their
> >>> drivers were in the tree for long time before.
> >>>
> >>> Additionally, Doug doesn't need to maintain -stable queue and it is done
> >>> by relevant submaintainers who are adding stable tags by themselves. In
> >>> the IB case, the burden will continue to be on me and not on Doug.
> >>>
> >> Recently I had to backport the mlx5 driver from 4.9 to 4.6 in order to
> >> get support for XDP. The biggest issue I faced was the lack of
> >> modularity in the many driver features that are now supported. The
> >> problem with backporting these new features is the spider web of
> >> dependencies that they bring in from the rest of the kernel. I ended
> >> up taking out en_rep, vxlan, en_tc, eswitch, and dcbnl. The result was
> >> ~340 patches which is still a lot but at least this was constrained to
> >> patches in the mlx5 directories and are relevant to what we want to
> >> do.

We had that complexity in mind. This pull request is our initial step
to simplify backporters' work (e.g. rename of en_XXX.c to be XXX.c is
the same as en/XXX.c for the backports).

We are aware of IB backports which required EN part of driver which
had all net dependencies mentioned above and IMHO it is wrong.

Our plan is:
1. Reorg/rename files.
2. Separate en and IB from shared code, to allow parallel work.
3. Reduce en and IB dependencies, to simplify backporting.

It is really awkward to move to item 2 and 3 before we cleaned the desk.
That rename patch provided to all backporters an easy way to bring new
features into the driver.

> >>
> >> In lieu of restructuring the directories, I would much rather see more
> >> config options so that we can build drivers that don't unnecessarily
> >> complicate our lives with features we don't use. This is not just true
> >> for Mellanox, but I would say it would be true of any driver that
> >> someone is trying to deploy and maintain at large scale.

Different maintainers have a different view on the subject and not everyone
is ready to accept new drivers config options.


> >>
> >
> > I think we should have both, if the restructuring made right,
> > new whole features (e.g eswitch and eswitch offlaods or any independent module),
> > can sit in their own directory and keep their own logic concentrated
> > in one place, and only touch the
> > main driver code with simple entry points in the main flow,  this way
> > you can simply compile their whole directories
> > out with a config flag directly from the Makefile.
> >
> That would be nice, but that is not what your new layout gives us yet.
> The starting point for this structure should be to discern what the
> minimum set of files is to build a functional and good performance
> driver with the basic functionality including only the most common
> offloads. These I would think constitute the core files for the driver
> and hence make sense to be in the "core" directory. Personally I would
> drop the en_ file prefix and not make an en directory, we're already
> in the ethernet driver part of the tree so I don't know what this name
> is supposed to convey. For reference I gave you this list of
> components I disabled to do the backport, some we're not split out in
> your directory layout. A good example is the VXLAN offload support,
> the changes in that area we're mostly due to Alexander's patch set to
> redo the interface for these accelerations-- so to back port driver
> from 4.9 to 4.6 I would need to pull those in and whatever else those
> depend on. It was way easier to just ifdef out the relevant calls and
> not compile the vxlan files to do the backport.

We would love to move into that direction, but right now that core
directory serves our IB driver too. Once we separate them, your proposal
will be possible to implement.

>
> Tom
>
> >> Btw, we did hit one issue in the backport. We started to get rx csum
> >> faults (checksum complete value indicates TCP checksum is bad, but
> >> host computation says checksum is good). I ran against 4.9 upstream
> >> kernel and do see these, however don't see them in 4.10. I haven't
> >> bisected yet. Is this a known issue?
> >>
> >
> > Not to me, I don't recall any csum related fixes or feature submitted
> > lately to mlx5,
> > Maybe something changed in the stack ?
> >
> > what configuration are you running ? what traffic ?
> >
> >> Thanks,
> >> Tom
> >>
> >>>>
> >>>> I really don't think you can justify this rearrangement based upon the
> >>>> consequences and how much activity happens in this driver.
> >>>>
> >>>> You should have thought long and hard about the layout a long time ago
> >>>> rather than after the driver has been in the tree for many years.
> >>>>
> >>>> Sorry.
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Saeed Mahameed Jan. 16, 2017, 8:15 p.m. UTC | #11
On Sat, Jan 14, 2017 at 12:56 AM, Tom Herbert <tom@herbertland.com> wrote:
> On Fri, Jan 13, 2017 at 2:45 PM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Sat, Jan 14, 2017 at 12:06 AM, Tom Herbert <tom@herbertland.com> wrote:
>>> Btw, we did hit one issue in the backport. We started to get rx csum
>>> faults (checksum complete value indicates TCP checksum is bad, but
>>> host computation says checksum is good). I ran against 4.9 upstream
>>> kernel and do see these, however don't see them in 4.10. I haven't
>>> bisected yet. Is this a known issue?
>>>
>>
>> Not to me, I don't recall any csum related fixes or feature submitted
>> lately to mlx5,
>> Maybe something changed in the stack ?
>>
>> what configuration are you running ? what traffic ?
>>
> Nothing fancy. 8 queues and 20 concurrent netperf TCP_STREAMs trips
> it. Not a lot of them, but I don't think we really should ever see
> these errors.
>

Hi Tom

I've been trying to repro with no luck with kernel v4.9.
I am looking for checksum errors in dmesg, is that the place where you
saw the errors ?

can you dump the error in here ?
Saeed Mahameed Jan. 16, 2017, 8:30 p.m. UTC | #12
On Sat, Jan 14, 2017 at 12:07 AM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Fri, Jan 13, 2017 at 7:14 PM, David Miller <davem@davemloft.net> wrote:
>> From: Saeed Mahameed <saeedm@mellanox.com>
>> Date: Thu, 12 Jan 2017 19:22:34 +0200
>>
>>> This pull request includes one patch from Leon, this patch as described
>>> below will change the driver directory structure and layout for better,
>>> logical and modular driver files separation.
>>>
>>> This change is important to both rdma and net maintainers in order to
>>> have smoother management of driver patches for different mlx5 sub modules
>>> and smoother rdma-next vs. net-next features submissions.
>>>
>>> Please find more info below -in the tag commit message-,
>>> review and let us know if there's any problem.
>>>
>>> This change doesn't introduce any conflicts with the current mlx5
>>> fixes and cleanups posted on 2017-01-10 to net branch, and merge tests
>>> worked flawlessly with no issues.
>>>
>>> This is the last pull request meant for both rdma-next and net-next.
>>> Once pulled, this will be the base shared code for both trees.
>>
>> This is pretty crazy, it will make all bug fix backporting to -stable
>> a complete nightmare for myself, Doug, various distribution maintainers
>> and many other people who quietly have to maintain their own trees and
>> do backporting.
>>
>
> I hear you,
> But please bear with me here, what if we queue this patch up to -stable ? and we
> (Mellanox) and specifically our dedicated inbox team, will make sure
> that this patch
> will land on the various distributions and for those maintaining their
> own trees.
> This patch is really straight forward (rename files) and I already
> tried to cherry-pick it
> on older kernels, I only got a couple of conflicts on some of the
> "#inlcude" lines we've
> changed, and they are pretty straightforward to fix, we can even avoid
> this if we decide to
> not move mlx5 header files in this phase.
>
> If this is possible then all trees will be aligned and it will be a
> win win situation.
>

Hi Dave,

Any chance you saw my -stable suggestion above ?
I think it would really close the backporting gap.

Sorry i am bothering you with this topic, but we really desire the new
structure and
I never got your feedback on this suggestion, so i would like to hear
your thoughts.

Thanks,
Saeed.

>> I really don't think you can justify this rearrangement based upon the
>> consequences and how much activity happens in this driver.
>>
>
> Right, but this is not the only justification, I can sum it up to that
> we would like
> to lay out the foundation for many years to come for a well designed
> driver with a modular
> sub modules break down and scalable infrastructure. We already plan to
> submit more mlx5
> independent  sub modules - just like mlx5e (en_*) files and mlx5_ib
> driver- so this was also
> a reason for us to consider this re-engagement at this stage.
>
>> You should have thought long and hard about the layout a long time ago
>> rather than after the driver has been in the tree for many years.
>>
>
> I had this Idea for the separation before the mlx5 Ethernet
> submission, but I wasn't the maintainer
> back then, and i have been itching to submit such patch for long as
> well, still i don't think
> it is too late, We (Me and Leon) will keep maintaining this driver for
> only god knows how many years to come,
> and the mlx5 drivers are meant to serve at least 3-4 more future HW generations.
>
> Long story short, We would like to re-arrange the driver in a way that
> would serve us (the maintainers) and serve those
> who are going do develop the future Stack features and the future HW
> generations over the well designed (Hopefully)
> mlx5 infrastructure.
> Keeping it as it is today, will only make the situation worst in the
> future and it will be really hard to avoid having a spaghetti code
> in the mlx5 driver. All i want to point out here is that maintaining
> such a flat subtree is also nightmare.
>
> So i am only asking you to reconsider this change and give my -stable
> suggestion a thought.
>
> Thank you.
> Saeed.
Tom Herbert Jan. 16, 2017, 9:06 p.m. UTC | #13
On Mon, Jan 16, 2017 at 12:30 PM, Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
> On Sat, Jan 14, 2017 at 12:07 AM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Fri, Jan 13, 2017 at 7:14 PM, David Miller <davem@davemloft.net> wrote:
>>> From: Saeed Mahameed <saeedm@mellanox.com>
>>> Date: Thu, 12 Jan 2017 19:22:34 +0200
>>>
>>>> This pull request includes one patch from Leon, this patch as described
>>>> below will change the driver directory structure and layout for better,
>>>> logical and modular driver files separation.
>>>>
>>>> This change is important to both rdma and net maintainers in order to
>>>> have smoother management of driver patches for different mlx5 sub modules
>>>> and smoother rdma-next vs. net-next features submissions.
>>>>
>>>> Please find more info below -in the tag commit message-,
>>>> review and let us know if there's any problem.
>>>>
>>>> This change doesn't introduce any conflicts with the current mlx5
>>>> fixes and cleanups posted on 2017-01-10 to net branch, and merge tests
>>>> worked flawlessly with no issues.
>>>>
>>>> This is the last pull request meant for both rdma-next and net-next.
>>>> Once pulled, this will be the base shared code for both trees.
>>>
>>> This is pretty crazy, it will make all bug fix backporting to -stable
>>> a complete nightmare for myself, Doug, various distribution maintainers
>>> and many other people who quietly have to maintain their own trees and
>>> do backporting.
>>>
>>
>> I hear you,
>> But please bear with me here, what if we queue this patch up to -stable ? and we
>> (Mellanox) and specifically our dedicated inbox team, will make sure
>> that this patch
>> will land on the various distributions and for those maintaining their
>> own trees.
>> This patch is really straight forward (rename files) and I already
>> tried to cherry-pick it
>> on older kernels, I only got a couple of conflicts on some of the
>> "#inlcude" lines we've
>> changed, and they are pretty straightforward to fix, we can even avoid
>> this if we decide to
>> not move mlx5 header files in this phase.
>>
>> If this is possible then all trees will be aligned and it will be a
>> win win situation.
>>
>
> Hi Dave,
>
> Any chance you saw my -stable suggestion above ?
> I think it would really close the backporting gap.
>
> Sorry i am bothering you with this topic, but we really desire the new
> structure and
> I never got your feedback on this suggestion, so i would like to hear
> your thoughts.
>
Saeed,

I've already you specific suggestions on your new structure, please
consider your reviewers feedback more carefully. Again, the starting
point for your restructuring should be to separate out the minimum set
of files required to build reasonable driver and then cleanly
compartmentalize the rest of the features to make it easy for your
users to include or not include those in their build. Unless you've
done this I'm not seeing much benefit for this restructuring. Also, I
would rather see this done in one shot then expecting some sort of
evolution over time to the right solution-- as Dave said the
complexity of this driver is to far down the road to do that.

Tom

> Thanks,
> Saeed.
>
>>> I really don't think you can justify this rearrangement based upon the
>>> consequences and how much activity happens in this driver.
>>>
>>
>> Right, but this is not the only justification, I can sum it up to that
>> we would like
>> to lay out the foundation for many years to come for a well designed
>> driver with a modular
>> sub modules break down and scalable infrastructure. We already plan to
>> submit more mlx5
>> independent  sub modules - just like mlx5e (en_*) files and mlx5_ib
>> driver- so this was also
>> a reason for us to consider this re-engagement at this stage.
>>
>>> You should have thought long and hard about the layout a long time ago
>>> rather than after the driver has been in the tree for many years.
>>>
>>
>> I had this Idea for the separation before the mlx5 Ethernet
>> submission, but I wasn't the maintainer
>> back then, and i have been itching to submit such patch for long as
>> well, still i don't think
>> it is too late, We (Me and Leon) will keep maintaining this driver for
>> only god knows how many years to come,
>> and the mlx5 drivers are meant to serve at least 3-4 more future HW generations.
>>
>> Long story short, We would like to re-arrange the driver in a way that
>> would serve us (the maintainers) and serve those
>> who are going do develop the future Stack features and the future HW
>> generations over the well designed (Hopefully)
>> mlx5 infrastructure.
>> Keeping it as it is today, will only make the situation worst in the
>> future and it will be really hard to avoid having a spaghetti code
>> in the mlx5 driver. All i want to point out here is that maintaining
>> such a flat subtree is also nightmare.
>>
>> So i am only asking you to reconsider this change and give my -stable
>> suggestion a thought.
>>
>> Thank you.
>> Saeed.
Saeed Mahameed Jan. 16, 2017, 11:36 p.m. UTC | #14
On Mon, Jan 16, 2017 at 11:06 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Jan 16, 2017 at 12:30 PM, Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>> On Sat, Jan 14, 2017 at 12:07 AM, Saeed Mahameed
>> <saeedm@dev.mellanox.co.il> wrote:
>>> On Fri, Jan 13, 2017 at 7:14 PM, David Miller <davem@davemloft.net> wrote:
>>>> From: Saeed Mahameed <saeedm@mellanox.com>
>>>> Date: Thu, 12 Jan 2017 19:22:34 +0200
>>>>
>>>>> This pull request includes one patch from Leon, this patch as described
>>>>> below will change the driver directory structure and layout for better,
>>>>> logical and modular driver files separation.
>>>>>
>>>>> This change is important to both rdma and net maintainers in order to
>>>>> have smoother management of driver patches for different mlx5 sub modules
>>>>> and smoother rdma-next vs. net-next features submissions.
>>>>>
>>>>> Please find more info below -in the tag commit message-,
>>>>> review and let us know if there's any problem.
>>>>>
>>>>> This change doesn't introduce any conflicts with the current mlx5
>>>>> fixes and cleanups posted on 2017-01-10 to net branch, and merge tests
>>>>> worked flawlessly with no issues.
>>>>>
>>>>> This is the last pull request meant for both rdma-next and net-next.
>>>>> Once pulled, this will be the base shared code for both trees.
>>>>
>>>> This is pretty crazy, it will make all bug fix backporting to -stable
>>>> a complete nightmare for myself, Doug, various distribution maintainers
>>>> and many other people who quietly have to maintain their own trees and
>>>> do backporting.
>>>>
>>>
>>> I hear you,
>>> But please bear with me here, what if we queue this patch up to -stable ? and we
>>> (Mellanox) and specifically our dedicated inbox team, will make sure
>>> that this patch
>>> will land on the various distributions and for those maintaining their
>>> own trees.
>>> This patch is really straight forward (rename files) and I already
>>> tried to cherry-pick it
>>> on older kernels, I only got a couple of conflicts on some of the
>>> "#inlcude" lines we've
>>> changed, and they are pretty straightforward to fix, we can even avoid
>>> this if we decide to
>>> not move mlx5 header files in this phase.
>>>
>>> If this is possible then all trees will be aligned and it will be a
>>> win win situation.
>>>
>>
>> Hi Dave,
>>
>> Any chance you saw my -stable suggestion above ?
>> I think it would really close the backporting gap.
>>
>> Sorry i am bothering you with this topic, but we really desire the new
>> structure and
>> I never got your feedback on this suggestion, so i would like to hear
>> your thoughts.
>>
> Saeed,
>
> I've already you specific suggestions on your new structure, please
> consider your reviewers feedback more carefully. Again, the starting

I know, sorry i didn't respond on time, I though Leno's response was sufficient.

> point for your restructuring should be to separate out the minimum set
> of files required to build reasonable driver and then cleanly
> compartmentalize the rest of the features to make it easy for your
> users to include or not include those in their build. Unless you've

I see your point and i am 100% with you on this, we are all on the
same page here,
we all want to achieve the same goal -modular break down and features
logic separation-.

And in order to achieve this we need to do some big steps.
some of them will have real benefits and others are just moving code around.

One thing I need to point out (that Leon already mentioned) that the
current driver files sitting flat
in mlx5/core directory can be categorized to 4 categories
1. core related & init flows
2. common (infinband/ethernet resource management API)
3. infiniband related logic and FW commands
4. ethernet related logic and FW commands

so we figured that first we should split the bigger chunks (ethernet
vs. inifniband) into their own directories.
then i can follow up with ethernet features break down and kconfig separation.

> done this I'm not seeing much benefit for this restructuring. Also, I
> would rather see this done in one shot then expecting some sort of
> evolution over time to the right solution-- as Dave said the
> complexity of this driver is to far down the road to do that.
>

Well, I also prefer to do this in one shot, but I don't want to start
wasting time doing such a huge change
then get a NACK on submission.

Let's assume we did this in one shot i.e:
1. restructure  files and directories
2. re-arrange code (separate logic)
3. Kconfig features control - to show the immediate benefit of this change

If this is acceptable by you and we get Dave's blessing on this then i
would be happy to provide (as this is our
long term plan), but if Dave doesn't approve, then ... I don't really know.

Also for 1. backporting is really easy since such patch can be applied
easily to any kernel release back to 4.4.
but 2. and 3. are really tricky ones.

> Tom
>
>> Thanks,
>> Saeed.
>>
>>>> I really don't think you can justify this rearrangement based upon the
>>>> consequences and how much activity happens in this driver.
>>>>
>>>
>>> Right, but this is not the only justification, I can sum it up to that
>>> we would like
>>> to lay out the foundation for many years to come for a well designed
>>> driver with a modular
>>> sub modules break down and scalable infrastructure. We already plan to
>>> submit more mlx5
>>> independent  sub modules - just like mlx5e (en_*) files and mlx5_ib
>>> driver- so this was also
>>> a reason for us to consider this re-engagement at this stage.
>>>
>>>> You should have thought long and hard about the layout a long time ago
>>>> rather than after the driver has been in the tree for many years.
>>>>
>>>
>>> I had this Idea for the separation before the mlx5 Ethernet
>>> submission, but I wasn't the maintainer
>>> back then, and i have been itching to submit such patch for long as
>>> well, still i don't think
>>> it is too late, We (Me and Leon) will keep maintaining this driver for
>>> only god knows how many years to come,
>>> and the mlx5 drivers are meant to serve at least 3-4 more future HW generations.
>>>
>>> Long story short, We would like to re-arrange the driver in a way that
>>> would serve us (the maintainers) and serve those
>>> who are going do develop the future Stack features and the future HW
>>> generations over the well designed (Hopefully)
>>> mlx5 infrastructure.
>>> Keeping it as it is today, will only make the situation worst in the
>>> future and it will be really hard to avoid having a spaghetti code
>>> in the mlx5 driver. All i want to point out here is that maintaining
>>> such a flat subtree is also nightmare.
>>>
>>> So i am only asking you to reconsider this change and give my -stable
>>> suggestion a thought.
>>>
>>> Thank you.
>>> Saeed.
David Miller Jan. 18, 2017, 5:32 p.m. UTC | #15
From: Saeed Mahameed <saeedm@dev.mellanox.co.il>
Date: Mon, 16 Jan 2017 22:30:29 +0200

>> But please bear with me here, what if we queue this patch up to -stable ?

You've got to be seriously kidding me that your idea is to submit an
incredibly diruptive driver rename to -stable to solve this problem.

That is completely a non-starter.

The whole idea is to _MINIMIZE_ the amount of change happening in
-stable in order to avoid regressions and negative consequence for
everyone using the -stable tree.

I am strongly against a major reorganization of this driver, sorry.
The Synopsys folks want to do the same thing for the stmmac driver,
for even more nefarious reasons, and I'm rejecting all of their
attempts to do that as well.

If you look in that thread, they said they would "help" with the
-stable backports, and in there I explained why that is a completely
empty gesture.  There are people doing the backports outside of your
spehere of influence, who need to get their work done "right now" and
aren't going to consult you and be on your schedule for doing those
backports.
Tom Herbert Jan. 19, 2017, 5:22 a.m. UTC | #16
On Wed, Jan 18, 2017 at 9:32 AM, David Miller <davem@davemloft.net> wrote:
> From: Saeed Mahameed <saeedm@dev.mellanox.co.il>
> Date: Mon, 16 Jan 2017 22:30:29 +0200
>
>>> But please bear with me here, what if we queue this patch up to -stable ?
>
> You've got to be seriously kidding me that your idea is to submit an
> incredibly diruptive driver rename to -stable to solve this problem.
>
> That is completely a non-starter.
>
> The whole idea is to _MINIMIZE_ the amount of change happening in
> -stable in order to avoid regressions and negative consequence for
> everyone using the -stable tree.
>
> I am strongly against a major reorganization of this driver, sorry.
> The Synopsys folks want to do the same thing for the stmmac driver,
> for even more nefarious reasons, and I'm rejecting all of their
> attempts to do that as well.
>
> If you look in that thread, they said they would "help" with the
> -stable backports, and in there I explained why that is a completely
> empty gesture.  There are people doing the backports outside of your
> spehere of influence, who need to get their work done "right now" and
> aren't going to consult you and be on your schedule for doing those
> backports.

David,

In principle I  agree, but as I previously wrote backporting this
driver is already a bear. We need to do something here as this is
quickly approaching being infeasible to backport. Even if we don't
restructure the directories I'd still like to see some effort towards
feature modularization and isolation.

Tom
David Miller Jan. 19, 2017, 5:38 a.m. UTC | #17
From: Tom Herbert <tom@herbertland.com>
Date: Wed, 18 Jan 2017 21:22:53 -0800

> In principle I  agree, but as I previously wrote backporting this
> driver is already a bear. We need to do something here as this is
> quickly approaching being infeasible to backport. Even if we don't
> restructure the directories I'd still like to see some effort towards
> feature modularization and isolation.

There simply is a lot changing all the time, no restructuring is going
to reduce that.  The driver simply gets a lot of churn.

Backporting such things through a bunch of file and/or directory
renames is seriously nothing but asking for more punishment.

That's why I am against it.
Doug Ledford Jan. 19, 2017, 6:32 a.m. UTC | #18
On Thu, 2017-01-19 at 00:38 -0500, David Miller wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Wed, 18 Jan 2017 21:22:53 -0800
> 
> > In principle I  agree, but as I previously wrote backporting this
> > driver is already a bear. We need to do something here as this is
> > quickly approaching being infeasible to backport. Even if we don't
> > restructure the directories I'd still like to see some effort
> towards
> > feature modularization and isolation.
> 
> There simply is a lot changing all the time, no restructuring is
> going
> to reduce that.  The driver simply gets a lot of churn.
> 
> Backporting such things through a bunch of file and/or directory
> renames is seriously nothing but asking for more punishment.
> 
> That's why I am against it.

So, I spent the better part of two years dealing with this after the
drivers/net/mellanox drivers/net/ethernet/mellanox rename, so I'll
chime in.  Especially since Tom's problem directly relates to the
issues I dealt with when trying to take the latest Mellanox driver and
shoe-horn it into a Red Hat kernel that didn't have the same net core
(I had to pick and choose which mlx patches to take in order to work
with our existing net core, and the files were renamed on top of that).

If you rename a file from place A to place B and keep the file the
same, then a git backport can be done using techniques such as git
cherry-pick with the --follow option enabled.  It's slow, but works.

But that's not what you've brought up.  You brought up continuing the
rename to go further and split code out into files based upon features.
 While git cherry-pick can follow renames mostly automatically, it can
not follow refactors.  If they move the tcflower stuff out to a file on
its own, and you then attempt to backport some fix that touches the
tcflower support in the new location while you have not taken the move
patch, you will find that git will be totally stymied.

What you have now is not perfect and you have to sort through commits
to get the right ones and skip the ones you can't support, but things
mostly apply except when context diffs break issues.  With the refactor
and separation you speak of, git will be totally unable to help and you
will have to manually edit patches to point at new files or hand edit
the files to apply the patches.

I understand the desire for order instead of chaos, but it really will
not help.  Bringing patches from the new order back to the old chaos
will be so much harder than just continuing as things are that you will
silently curse the change shortly after it is made.