Message ID | 1463511831-13684-1-git-send-email-jarod@redhat.com |
---|---|
State | Superseded |
Delegated to: | Jeff Kirsher |
Headers | show |
On Tue, 2016-05-17 at 15:03 -0400, Jarod Wilson wrote: > I've got a bug report about an e1000e interface, where a vlan interface > is > set up on top of it: > > $ ip link add link ens1f0 name ens1f0.99 type vlan id 99 > $ ip link set ens1f0 up > $ ip link set ens1f0.99 up > $ ip addr add 192.168.99.92 dev ens1f0.99 > > At this point, I can ping another host on vlan 99, ip 192.168.99.91. > However, if I do the following: > > $ ethtool -K ens1f0 rxvlan off > > Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on > again. I'm not sure if this is actually intended behavior, or if there's > a > lack of software vlan stripping fallback, or what, but things continue to > work if I simply don't call e1000e_vlan_strip_disable() if there are > active vlans (plagiarizing a function from the e1000 driver here) on the > interface. > > Also slipped a related-ish fix to the kerneldoc text for > e1000e_vlan_strip_disable here... > > CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> > CC: intel-wired-lan@lists.osuosl.org > CC: netdev@vger.kernel.org > Signed-off-by: Jarod Wilson <jarod@redhat.com> > --- > drivers/net/ethernet/intel/e1000e/netdev.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) Raanan, please review this patch. Even though it is an RFC I will be adding it to my queue for testing. http://patchwork.ozlabs.org/patch/623238/
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On > Behalf Of Jeff Kirsher > Sent: Wednesday, May 18, 2016 2:40 PM > To: Jarod Wilson <jarod@redhat.com>; linux-kernel@vger.kernel.org; > Avargil, Raanan <raanan.avargil@intel.com> > Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org > Subject: Re: [Intel-wired-lan] [RFC PATCH net] e1000e: keep vlan interfaces > functional after rxvlan off > > On Tue, 2016-05-17 at 15:03 -0400, Jarod Wilson wrote: > > I've got a bug report about an e1000e interface, where a vlan interface > > is > > set up on top of it: > > > > $ ip link add link ens1f0 name ens1f0.99 type vlan id 99 > > $ ip link set ens1f0 up > > $ ip link set ens1f0.99 up > > $ ip addr add 192.168.99.92 dev ens1f0.99 > > > > At this point, I can ping another host on vlan 99, ip 192.168.99.91. > > However, if I do the following: > > > > $ ethtool -K ens1f0 rxvlan off > > > > Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on > > again. I'm not sure if this is actually intended behavior, or if there's > > a > > lack of software vlan stripping fallback, or what, but things continue to > > work if I simply don't call e1000e_vlan_strip_disable() if there are > > active vlans (plagiarizing a function from the e1000 driver here) on the > > interface. > > > > Also slipped a related-ish fix to the kerneldoc text for > > e1000e_vlan_strip_disable here... > > > > CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> > > CC: intel-wired-lan@lists.osuosl.org > > CC: netdev@vger.kernel.org > > Signed-off-by: Jarod Wilson <jarod@redhat.com> > > --- > > drivers/net/ethernet/intel/e1000e/netdev.c | 15 +++++++++++++-- > > 1 file changed, 13 insertions(+), 2 deletions(-) > > Raanan, please review this patch. Even though it is an RFC I will be > adding it to my queue for testing. > http://patchwork.ozlabs.org/patch/623238/ Yup, without this patch disabling rxvlan offload does indeed break vlan connectivity and with the patch I can disable and re-enable rxvlan offloads as much as I care to. It also makes it through my regression tests without problems. Tested-by: Aaron Brown <aaron.f.brown@intel.com> This is from functional - does it work - testing perspective so review is probably still in order.
>-----Original Message----- >From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On >Behalf Of Brown, Aaron F >Sent: Friday, 27 May, 2016 04:31 >To: Kirsher, Jeffrey T; Jarod Wilson; linux-kernel@vger.kernel.org; Avargil, >Raanan >Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org >Subject: Re: [Intel-wired-lan] [RFC PATCH net] e1000e: keep vlan interfaces >functional after rxvlan off > >> From: Intel-wired-lan >> [mailto:intel-wired-lan-bounces@lists.osuosl.org] On Behalf Of Jeff >> Kirsher >> Sent: Wednesday, May 18, 2016 2:40 PM >> To: Jarod Wilson <jarod@redhat.com>; linux-kernel@vger.kernel.org; >> Avargil, Raanan <raanan.avargil@intel.com> >> Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org >> Subject: Re: [Intel-wired-lan] [RFC PATCH net] e1000e: keep vlan >> interfaces functional after rxvlan off >> >> On Tue, 2016-05-17 at 15:03 -0400, Jarod Wilson wrote: >> > I've got a bug report about an e1000e interface, where a vlan >> > interface is set up on top of it: >> > >> > $ ip link add link ens1f0 name ens1f0.99 type vlan id 99 $ ip link >> > set ens1f0 up $ ip link set ens1f0.99 up $ ip addr add 192.168.99.92 >> > dev ens1f0.99 >> > >> > At this point, I can ping another host on vlan 99, ip 192.168.99.91. >> > However, if I do the following: >> > >> > $ ethtool -K ens1f0 rxvlan off >> > >> > Then no traffic passes on ens1f0.99. It comes back if I toggle >> > rxvlan on again. I'm not sure if this is actually intended behavior, >> > or if there's a lack of software vlan stripping fallback, or what, >> > but things continue to work if I simply don't call >> > e1000e_vlan_strip_disable() if there are active vlans (plagiarizing >> > a function from the e1000 driver here) on the interface. >> > >> > Also slipped a related-ish fix to the kerneldoc text for >> > e1000e_vlan_strip_disable here... >> > >> > CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> >> > CC: intel-wired-lan@lists.osuosl.org >> > CC: netdev@vger.kernel.org >> > Signed-off-by: Jarod Wilson <jarod@redhat.com> >> > --- >> > drivers/net/ethernet/intel/e1000e/netdev.c | 15 +++++++++++++-- >> > 1 file changed, 13 insertions(+), 2 deletions(-) >> >> Raanan, please review this patch. Even though it is an RFC I will be >> adding it to my queue for testing. >> http://patchwork.ozlabs.org/patch/623238/ > >Yup, without this patch disabling rxvlan offload does indeed break vlan >connectivity and with the patch I can disable and re-enable rxvlan offloads as >much as I care to. It also makes it through my regression tests without >problems. Well, what the patch essentially does is block disabling RX VLAN stripping if any VLANs are active, so there is nothing surprising there. You are essentially running with VLAN offloads enabled all the time. :) > >Tested-by: Aaron Brown <aaron.f.brown@intel.com> > >This is from functional - does it work - testing perspective so review is >probably still in order. I am not sure if this behavior is by design or not, and whether un-stripped VLAN traffic is being blocked by the HW, the driver or the stack. I have two questions: 1) Does the unpatched behavior change if promiscuous RX is set in the HW? 2) Is the behavior different in other devices (igb, ixgbe)? Do they allow VLAN traffic to pass even if VLAN offloads are disabled? I believe that the e1000e behavior should be consistent with other devices, whatever that behavior is. --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
On Tue, May 17, 2016 at 12:03 PM, Jarod Wilson <jarod@redhat.com> wrote: > I've got a bug report about an e1000e interface, where a vlan interface is > set up on top of it: > > $ ip link add link ens1f0 name ens1f0.99 type vlan id 99 > $ ip link set ens1f0 up > $ ip link set ens1f0.99 up > $ ip addr add 192.168.99.92 dev ens1f0.99 > > At this point, I can ping another host on vlan 99, ip 192.168.99.91. > However, if I do the following: > > $ ethtool -K ens1f0 rxvlan off > > Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on > again. I'm not sure if this is actually intended behavior, or if there's a > lack of software vlan stripping fallback, or what, but things continue to > work if I simply don't call e1000e_vlan_strip_disable() if there are > active vlans (plagiarizing a function from the e1000 driver here) on the > interface. > > Also slipped a related-ish fix to the kerneldoc text for > e1000e_vlan_strip_disable here... > > CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> > CC: intel-wired-lan@lists.osuosl.org > CC: netdev@vger.kernel.org > Signed-off-by: Jarod Wilson <jarod@redhat.com> > --- > drivers/net/ethernet/intel/e1000e/netdev.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c > index 75e6089..73f7452 100644 > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > @@ -154,6 +154,16 @@ void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val) > writel(val, hw->hw_addr + reg); > } > > +static bool e1000e_vlan_used(struct e1000_adapter *adapter) > +{ > + u16 vid; > + > + for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID) > + return true; > + I'm pretty sure this is always going to return true if 8021q is loaded because VLAN 0 is always added to the device even if no other VLANs are in use. > + return false; > +} > + > /** > * e1000_regdump - register printout routine > * @hw: pointer to the HW structure > @@ -2789,7 +2799,7 @@ static void e1000e_vlan_filter_enable(struct e1000_adapter *adapter) > } > > /** > - * e1000e_vlan_strip_enable - helper to disable HW VLAN stripping > + * e1000e_vlan_strip_disable - helper to disable HW VLAN stripping > * @adapter: board private structure to initialize > **/ > static void e1000e_vlan_strip_disable(struct e1000_adapter *adapter) > @@ -3443,7 +3453,8 @@ static void e1000e_set_rx_mode(struct net_device *netdev) > > ew32(RCTL, rctl); > > - if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX) > + if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX || > + e1000e_vlan_used(adapter)) > e1000e_vlan_strip_enable(adapter); > else > e1000e_vlan_strip_disable(adapter); So if the VLAN tag stripping is disabled what happens that is causing the VLAN test to fail? It sounds like this might be working around a kernel bug where a VLAN created on a device that supports hardware tag stripping only supports hardware tag stripping. Maybe a better fix would be to add a fall back so if the VLAN tag is in the frame instead of stripped it still makes it to the correct spot. - Alex
On Wed, Jun 01, 2016 at 07:31:53AM -0700, Alexander Duyck wrote: > On Tue, May 17, 2016 at 12:03 PM, Jarod Wilson <jarod@redhat.com> wrote: > > I've got a bug report about an e1000e interface, where a vlan interface is > > set up on top of it: > > > > $ ip link add link ens1f0 name ens1f0.99 type vlan id 99 > > $ ip link set ens1f0 up > > $ ip link set ens1f0.99 up > > $ ip addr add 192.168.99.92 dev ens1f0.99 > > > > At this point, I can ping another host on vlan 99, ip 192.168.99.91. > > However, if I do the following: > > > > $ ethtool -K ens1f0 rxvlan off > > > > Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on > > again. I'm not sure if this is actually intended behavior, or if there's a > > lack of software vlan stripping fallback, or what, but things continue to > > work if I simply don't call e1000e_vlan_strip_disable() if there are > > active vlans (plagiarizing a function from the e1000 driver here) on the > > interface. > > > > Also slipped a related-ish fix to the kerneldoc text for > > e1000e_vlan_strip_disable here... > > > > CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> > > CC: intel-wired-lan@lists.osuosl.org > > CC: netdev@vger.kernel.org > > Signed-off-by: Jarod Wilson <jarod@redhat.com> > > --- > > drivers/net/ethernet/intel/e1000e/netdev.c | 15 +++++++++++++-- > > 1 file changed, 13 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c > > index 75e6089..73f7452 100644 > > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > > @@ -154,6 +154,16 @@ void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val) > > writel(val, hw->hw_addr + reg); > > } > > > > +static bool e1000e_vlan_used(struct e1000_adapter *adapter) > > +{ > > + u16 vid; > > + > > + for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID) > > + return true; > > + > > I'm pretty sure this is always going to return true if 8021q is loaded > because VLAN 0 is always added to the device even if no other VLANs > are in use. Ah, hadn't considered that, I just plucked it straight from e1000. > > + return false; > > +} > > + > > /** > > * e1000_regdump - register printout routine > > * @hw: pointer to the HW structure > > @@ -2789,7 +2799,7 @@ static void e1000e_vlan_filter_enable(struct e1000_adapter *adapter) > > } > > > > /** > > - * e1000e_vlan_strip_enable - helper to disable HW VLAN stripping > > + * e1000e_vlan_strip_disable - helper to disable HW VLAN stripping > > * @adapter: board private structure to initialize > > **/ > > static void e1000e_vlan_strip_disable(struct e1000_adapter *adapter) > > @@ -3443,7 +3453,8 @@ static void e1000e_set_rx_mode(struct net_device *netdev) > > > > ew32(RCTL, rctl); > > > > - if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX) > > + if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX || > > + e1000e_vlan_used(adapter)) > > e1000e_vlan_strip_enable(adapter); > > else > > e1000e_vlan_strip_disable(adapter); > > So if the VLAN tag stripping is disabled what happens that is causing > the VLAN test to fail? It sounds like this might be working around a > kernel bug where a VLAN created on a device that supports hardware tag > stripping only supports hardware tag stripping. Maybe a better fix > would be to add a fall back so if the VLAN tag is in the frame instead > of stripped it still makes it to the correct spot. That's the main reason I labeled it as an RFC -- I wasn't sure exactly how things were intended to work when the hardware stripping was disabled. It seems quite plausible to me that this patch simply papers over the real bug: lack of a functional software fallback. I'm not particularly up on the vlan code just yet though, so I'm not yet sure where to poke next. Suggestions welcomed. :)
On Wed, Jun 1, 2016 at 12:27 PM, Jarod Wilson <jarod@redhat.com> wrote: > On Wed, Jun 01, 2016 at 07:31:53AM -0700, Alexander Duyck wrote: >> On Tue, May 17, 2016 at 12:03 PM, Jarod Wilson <jarod@redhat.com> wrote: >> > I've got a bug report about an e1000e interface, where a vlan interface is >> > set up on top of it: >> > >> > $ ip link add link ens1f0 name ens1f0.99 type vlan id 99 >> > $ ip link set ens1f0 up >> > $ ip link set ens1f0.99 up >> > $ ip addr add 192.168.99.92 dev ens1f0.99 >> > >> > At this point, I can ping another host on vlan 99, ip 192.168.99.91. >> > However, if I do the following: >> > >> > $ ethtool -K ens1f0 rxvlan off >> > >> > Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on >> > again. I'm not sure if this is actually intended behavior, or if there's a >> > lack of software vlan stripping fallback, or what, but things continue to >> > work if I simply don't call e1000e_vlan_strip_disable() if there are >> > active vlans (plagiarizing a function from the e1000 driver here) on the >> > interface. >> > >> > Also slipped a related-ish fix to the kerneldoc text for >> > e1000e_vlan_strip_disable here... >> > >> > CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> >> > CC: intel-wired-lan@lists.osuosl.org >> > CC: netdev@vger.kernel.org >> > Signed-off-by: Jarod Wilson <jarod@redhat.com> >> > --- >> > drivers/net/ethernet/intel/e1000e/netdev.c | 15 +++++++++++++-- >> > 1 file changed, 13 insertions(+), 2 deletions(-) >> > >> > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c >> > index 75e6089..73f7452 100644 >> > --- a/drivers/net/ethernet/intel/e1000e/netdev.c >> > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c >> > @@ -154,6 +154,16 @@ void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val) >> > writel(val, hw->hw_addr + reg); >> > } >> > >> > +static bool e1000e_vlan_used(struct e1000_adapter *adapter) >> > +{ >> > + u16 vid; >> > + >> > + for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID) >> > + return true; >> > + >> >> I'm pretty sure this is always going to return true if 8021q is loaded >> because VLAN 0 is always added to the device even if no other VLANs >> are in use. > > Ah, hadn't considered that, I just plucked it straight from e1000. > >> > + return false; >> > +} >> > + >> > /** >> > * e1000_regdump - register printout routine >> > * @hw: pointer to the HW structure >> > @@ -2789,7 +2799,7 @@ static void e1000e_vlan_filter_enable(struct e1000_adapter *adapter) >> > } >> > >> > /** >> > - * e1000e_vlan_strip_enable - helper to disable HW VLAN stripping >> > + * e1000e_vlan_strip_disable - helper to disable HW VLAN stripping >> > * @adapter: board private structure to initialize >> > **/ >> > static void e1000e_vlan_strip_disable(struct e1000_adapter *adapter) >> > @@ -3443,7 +3453,8 @@ static void e1000e_set_rx_mode(struct net_device *netdev) >> > >> > ew32(RCTL, rctl); >> > >> > - if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX) >> > + if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX || >> > + e1000e_vlan_used(adapter)) >> > e1000e_vlan_strip_enable(adapter); >> > else >> > e1000e_vlan_strip_disable(adapter); >> >> So if the VLAN tag stripping is disabled what happens that is causing >> the VLAN test to fail? It sounds like this might be working around a >> kernel bug where a VLAN created on a device that supports hardware tag >> stripping only supports hardware tag stripping. Maybe a better fix >> would be to add a fall back so if the VLAN tag is in the frame instead >> of stripped it still makes it to the correct spot. > > That's the main reason I labeled it as an RFC -- I wasn't sure exactly how > things were intended to work when the hardware stripping was disabled. It > seems quite plausible to me that this patch simply papers over the real > bug: lack of a functional software fallback. I'm not particularly up on > the vlan code just yet though, so I'm not yet sure where to poke next. > Suggestions welcomed. :) Well the software fallback should be the call to skb_vlan_untag in __netif_receive_skb_core. If that isn't being triggered then we should probably be fixing the files in the core code then. - Alex
On Wed, Jun 01, 2016 at 03:31:46PM -0700, Alexander Duyck wrote: > On Wed, Jun 1, 2016 at 12:27 PM, Jarod Wilson <jarod@redhat.com> wrote: > > On Wed, Jun 01, 2016 at 07:31:53AM -0700, Alexander Duyck wrote: > >> On Tue, May 17, 2016 at 12:03 PM, Jarod Wilson <jarod@redhat.com> wrote: > >> > I've got a bug report about an e1000e interface, where a vlan interface is > >> > set up on top of it: > >> > > >> > $ ip link add link ens1f0 name ens1f0.99 type vlan id 99 > >> > $ ip link set ens1f0 up > >> > $ ip link set ens1f0.99 up > >> > $ ip addr add 192.168.99.92 dev ens1f0.99 > >> > > >> > At this point, I can ping another host on vlan 99, ip 192.168.99.91. > >> > However, if I do the following: > >> > > >> > $ ethtool -K ens1f0 rxvlan off > >> > > >> > Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on > >> > again. I'm not sure if this is actually intended behavior, or if there's a > >> > lack of software vlan stripping fallback, or what, but things continue to > >> > work if I simply don't call e1000e_vlan_strip_disable() if there are > >> > active vlans (plagiarizing a function from the e1000 driver here) on the > >> > interface. > >> > > >> > Also slipped a related-ish fix to the kerneldoc text for > >> > e1000e_vlan_strip_disable here... > >> > > >> > CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> > >> > CC: intel-wired-lan@lists.osuosl.org > >> > CC: netdev@vger.kernel.org > >> > Signed-off-by: Jarod Wilson <jarod@redhat.com> > >> > --- > >> > drivers/net/ethernet/intel/e1000e/netdev.c | 15 +++++++++++++-- > >> > 1 file changed, 13 insertions(+), 2 deletions(-) > >> > > >> > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c > >> > index 75e6089..73f7452 100644 > >> > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > >> > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > >> > @@ -154,6 +154,16 @@ void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val) > >> > writel(val, hw->hw_addr + reg); > >> > } > >> > > >> > +static bool e1000e_vlan_used(struct e1000_adapter *adapter) > >> > +{ > >> > + u16 vid; > >> > + > >> > + for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID) > >> > + return true; > >> > + > >> > >> I'm pretty sure this is always going to return true if 8021q is loaded > >> because VLAN 0 is always added to the device even if no other VLANs > >> are in use. > > > > Ah, hadn't considered that, I just plucked it straight from e1000. > > > >> > + return false; > >> > +} > >> > + > >> > /** > >> > * e1000_regdump - register printout routine > >> > * @hw: pointer to the HW structure > >> > @@ -2789,7 +2799,7 @@ static void e1000e_vlan_filter_enable(struct e1000_adapter *adapter) > >> > } > >> > > >> > /** > >> > - * e1000e_vlan_strip_enable - helper to disable HW VLAN stripping > >> > + * e1000e_vlan_strip_disable - helper to disable HW VLAN stripping > >> > * @adapter: board private structure to initialize > >> > **/ > >> > static void e1000e_vlan_strip_disable(struct e1000_adapter *adapter) > >> > @@ -3443,7 +3453,8 @@ static void e1000e_set_rx_mode(struct net_device *netdev) > >> > > >> > ew32(RCTL, rctl); > >> > > >> > - if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX) > >> > + if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX || > >> > + e1000e_vlan_used(adapter)) > >> > e1000e_vlan_strip_enable(adapter); > >> > else > >> > e1000e_vlan_strip_disable(adapter); > >> > >> So if the VLAN tag stripping is disabled what happens that is causing > >> the VLAN test to fail? It sounds like this might be working around a > >> kernel bug where a VLAN created on a device that supports hardware tag > >> stripping only supports hardware tag stripping. Maybe a better fix > >> would be to add a fall back so if the VLAN tag is in the frame instead > >> of stripped it still makes it to the correct spot. > > > > That's the main reason I labeled it as an RFC -- I wasn't sure exactly how > > things were intended to work when the hardware stripping was disabled. It > > seems quite plausible to me that this patch simply papers over the real > > bug: lack of a functional software fallback. I'm not particularly up on > > the vlan code just yet though, so I'm not yet sure where to poke next. > > Suggestions welcomed. :) > > Well the software fallback should be the call to skb_vlan_untag in > __netif_receive_skb_core. If that isn't being triggered then we > should probably be fixing the files in the core code then. So I've been poking at a debug-spew-laden kernel some, and skb_vlan_untag is most definitely not getting called when rxvlan off is set. I'm still poking around, trying to track down where things are going askew.
On Thu, Jun 09, 2016 at 02:02:04PM -0400, Jarod Wilson wrote: > On Wed, Jun 01, 2016 at 03:31:46PM -0700, Alexander Duyck wrote: > > On Wed, Jun 1, 2016 at 12:27 PM, Jarod Wilson <jarod@redhat.com> wrote: > > > On Wed, Jun 01, 2016 at 07:31:53AM -0700, Alexander Duyck wrote: > > >> On Tue, May 17, 2016 at 12:03 PM, Jarod Wilson <jarod@redhat.com> wrote: > > >> > I've got a bug report about an e1000e interface, where a vlan interface is > > >> > set up on top of it: > > >> > > > >> > $ ip link add link ens1f0 name ens1f0.99 type vlan id 99 > > >> > $ ip link set ens1f0 up > > >> > $ ip link set ens1f0.99 up > > >> > $ ip addr add 192.168.99.92 dev ens1f0.99 > > >> > > > >> > At this point, I can ping another host on vlan 99, ip 192.168.99.91. > > >> > However, if I do the following: > > >> > > > >> > $ ethtool -K ens1f0 rxvlan off > > >> > > > >> > Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on > > >> > again. I'm not sure if this is actually intended behavior, or if there's a > > >> > lack of software vlan stripping fallback, or what, but things continue to > > >> > work if I simply don't call e1000e_vlan_strip_disable() if there are > > >> > active vlans (plagiarizing a function from the e1000 driver here) on the > > >> > interface. > > >> > > > >> > Also slipped a related-ish fix to the kerneldoc text for > > >> > e1000e_vlan_strip_disable here... > > >> > > > >> > CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> > > >> > CC: intel-wired-lan@lists.osuosl.org > > >> > CC: netdev@vger.kernel.org > > >> > Signed-off-by: Jarod Wilson <jarod@redhat.com> > > >> > --- > > >> > drivers/net/ethernet/intel/e1000e/netdev.c | 15 +++++++++++++-- > > >> > 1 file changed, 13 insertions(+), 2 deletions(-) > > >> > > > >> > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c > > >> > index 75e6089..73f7452 100644 > > >> > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > > >> > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > > >> > @@ -154,6 +154,16 @@ void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val) > > >> > writel(val, hw->hw_addr + reg); > > >> > } > > >> > > > >> > +static bool e1000e_vlan_used(struct e1000_adapter *adapter) > > >> > +{ > > >> > + u16 vid; > > >> > + > > >> > + for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID) > > >> > + return true; > > >> > + > > >> > > >> I'm pretty sure this is always going to return true if 8021q is loaded > > >> because VLAN 0 is always added to the device even if no other VLANs > > >> are in use. > > > > > > Ah, hadn't considered that, I just plucked it straight from e1000. > > > > > >> > + return false; > > >> > +} > > >> > + > > >> > /** > > >> > * e1000_regdump - register printout routine > > >> > * @hw: pointer to the HW structure > > >> > @@ -2789,7 +2799,7 @@ static void e1000e_vlan_filter_enable(struct e1000_adapter *adapter) > > >> > } > > >> > > > >> > /** > > >> > - * e1000e_vlan_strip_enable - helper to disable HW VLAN stripping > > >> > + * e1000e_vlan_strip_disable - helper to disable HW VLAN stripping > > >> > * @adapter: board private structure to initialize > > >> > **/ > > >> > static void e1000e_vlan_strip_disable(struct e1000_adapter *adapter) > > >> > @@ -3443,7 +3453,8 @@ static void e1000e_set_rx_mode(struct net_device *netdev) > > >> > > > >> > ew32(RCTL, rctl); > > >> > > > >> > - if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX) > > >> > + if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX || > > >> > + e1000e_vlan_used(adapter)) > > >> > e1000e_vlan_strip_enable(adapter); > > >> > else > > >> > e1000e_vlan_strip_disable(adapter); > > >> > > >> So if the VLAN tag stripping is disabled what happens that is causing > > >> the VLAN test to fail? It sounds like this might be working around a > > >> kernel bug where a VLAN created on a device that supports hardware tag > > >> stripping only supports hardware tag stripping. Maybe a better fix > > >> would be to add a fall back so if the VLAN tag is in the frame instead > > >> of stripped it still makes it to the correct spot. > > > > > > That's the main reason I labeled it as an RFC -- I wasn't sure exactly how > > > things were intended to work when the hardware stripping was disabled. It > > > seems quite plausible to me that this patch simply papers over the real > > > bug: lack of a functional software fallback. I'm not particularly up on > > > the vlan code just yet though, so I'm not yet sure where to poke next. > > > Suggestions welcomed. :) > > > > Well the software fallback should be the call to skb_vlan_untag in > > __netif_receive_skb_core. If that isn't being triggered then we > > should probably be fixing the files in the core code then. > > So I've been poking at a debug-spew-laden kernel some, and skb_vlan_untag > is most definitely not getting called when rxvlan off is set. I'm still > poking around, trying to track down where things are going askew. Okay, so rxvlan is *supposed* to only impact the rx path, right? It's looking like it is actually impacting the tx path too here. I actually *do* see calls to skb_vlan_untag with rxvlan off, if I ping from an external host, so it seems only the packets from the host with rxvlan toggled off aren't escaping correctly for some reason.
On Thu, Jun 09, 2016 at 04:55:01PM -0400, Jarod Wilson wrote: > On Thu, Jun 09, 2016 at 02:02:04PM -0400, Jarod Wilson wrote: > > On Wed, Jun 01, 2016 at 03:31:46PM -0700, Alexander Duyck wrote: > > > On Wed, Jun 1, 2016 at 12:27 PM, Jarod Wilson <jarod@redhat.com> wrote: > > > > On Wed, Jun 01, 2016 at 07:31:53AM -0700, Alexander Duyck wrote: > > > >> On Tue, May 17, 2016 at 12:03 PM, Jarod Wilson <jarod@redhat.com> wrote: > > > >> > I've got a bug report about an e1000e interface, where a vlan interface is > > > >> > set up on top of it: > > > >> > > > > >> > $ ip link add link ens1f0 name ens1f0.99 type vlan id 99 > > > >> > $ ip link set ens1f0 up > > > >> > $ ip link set ens1f0.99 up > > > >> > $ ip addr add 192.168.99.92 dev ens1f0.99 > > > >> > > > > >> > At this point, I can ping another host on vlan 99, ip 192.168.99.91. > > > >> > However, if I do the following: > > > >> > > > > >> > $ ethtool -K ens1f0 rxvlan off > > > >> > > > > >> > Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on > > > >> > again. I'm not sure if this is actually intended behavior, or if there's a > > > >> > lack of software vlan stripping fallback, or what, but things continue to > > > >> > work if I simply don't call e1000e_vlan_strip_disable() if there are > > > >> > active vlans (plagiarizing a function from the e1000 driver here) on the > > > >> > interface. > > > >> > > > > >> > Also slipped a related-ish fix to the kerneldoc text for > > > >> > e1000e_vlan_strip_disable here... > > > >> > > > > >> > CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> > > > >> > CC: intel-wired-lan@lists.osuosl.org > > > >> > CC: netdev@vger.kernel.org > > > >> > Signed-off-by: Jarod Wilson <jarod@redhat.com> > > > >> > --- > > > >> > drivers/net/ethernet/intel/e1000e/netdev.c | 15 +++++++++++++-- > > > >> > 1 file changed, 13 insertions(+), 2 deletions(-) > > > >> > > > > >> > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c > > > >> > index 75e6089..73f7452 100644 > > > >> > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > > > >> > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > > > >> > @@ -154,6 +154,16 @@ void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val) > > > >> > writel(val, hw->hw_addr + reg); > > > >> > } > > > >> > > > > >> > +static bool e1000e_vlan_used(struct e1000_adapter *adapter) > > > >> > +{ > > > >> > + u16 vid; > > > >> > + > > > >> > + for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID) > > > >> > + return true; > > > >> > + > > > >> > > > >> I'm pretty sure this is always going to return true if 8021q is loaded > > > >> because VLAN 0 is always added to the device even if no other VLANs > > > >> are in use. > > > > > > > > Ah, hadn't considered that, I just plucked it straight from e1000. > > > > > > > >> > + return false; > > > >> > +} > > > >> > + > > > >> > /** > > > >> > * e1000_regdump - register printout routine > > > >> > * @hw: pointer to the HW structure > > > >> > @@ -2789,7 +2799,7 @@ static void e1000e_vlan_filter_enable(struct e1000_adapter *adapter) > > > >> > } > > > >> > > > > >> > /** > > > >> > - * e1000e_vlan_strip_enable - helper to disable HW VLAN stripping > > > >> > + * e1000e_vlan_strip_disable - helper to disable HW VLAN stripping > > > >> > * @adapter: board private structure to initialize > > > >> > **/ > > > >> > static void e1000e_vlan_strip_disable(struct e1000_adapter *adapter) > > > >> > @@ -3443,7 +3453,8 @@ static void e1000e_set_rx_mode(struct net_device *netdev) > > > >> > > > > >> > ew32(RCTL, rctl); > > > >> > > > > >> > - if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX) > > > >> > + if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX || > > > >> > + e1000e_vlan_used(adapter)) > > > >> > e1000e_vlan_strip_enable(adapter); > > > >> > else > > > >> > e1000e_vlan_strip_disable(adapter); > > > >> > > > >> So if the VLAN tag stripping is disabled what happens that is causing > > > >> the VLAN test to fail? It sounds like this might be working around a > > > >> kernel bug where a VLAN created on a device that supports hardware tag > > > >> stripping only supports hardware tag stripping. Maybe a better fix > > > >> would be to add a fall back so if the VLAN tag is in the frame instead > > > >> of stripped it still makes it to the correct spot. > > > > > > > > That's the main reason I labeled it as an RFC -- I wasn't sure exactly how > > > > things were intended to work when the hardware stripping was disabled. It > > > > seems quite plausible to me that this patch simply papers over the real > > > > bug: lack of a functional software fallback. I'm not particularly up on > > > > the vlan code just yet though, so I'm not yet sure where to poke next. > > > > Suggestions welcomed. :) > > > > > > Well the software fallback should be the call to skb_vlan_untag in > > > __netif_receive_skb_core. If that isn't being triggered then we > > > should probably be fixing the files in the core code then. > > > > So I've been poking at a debug-spew-laden kernel some, and skb_vlan_untag > > is most definitely not getting called when rxvlan off is set. I'm still > > poking around, trying to track down where things are going askew. > > Okay, so rxvlan is *supposed* to only impact the rx path, right? It's > looking like it is actually impacting the tx path too here. I actually > *do* see calls to skb_vlan_untag with rxvlan off, if I ping from an > external host, so it seems only the packets from the host with rxvlan > toggled off aren't escaping correctly for some reason. And this leads me to believe maybe the bit in the e1000 driver that mentions explicitly that the hardware has no support for separate RX/TX vlan accel toggling rings true for e1000e as well, and thus both NETIF_F_HW_VLAN_CTAG_RX and NETIF_F_HW_VLAN_CTAG_TX need to be kept in sync. Testing this theory out shortly...
On Thu, Jun 09, 2016 at 06:17:45PM -0400, Jarod Wilson wrote: > On Thu, Jun 09, 2016 at 04:55:01PM -0400, Jarod Wilson wrote: > > On Thu, Jun 09, 2016 at 02:02:04PM -0400, Jarod Wilson wrote: > > > On Wed, Jun 01, 2016 at 03:31:46PM -0700, Alexander Duyck wrote: > > > > On Wed, Jun 1, 2016 at 12:27 PM, Jarod Wilson <jarod@redhat.com> wrote: > > > > > On Wed, Jun 01, 2016 at 07:31:53AM -0700, Alexander Duyck wrote: > > > > >> On Tue, May 17, 2016 at 12:03 PM, Jarod Wilson <jarod@redhat.com> wrote: > > > > >> > I've got a bug report about an e1000e interface, where a vlan interface is > > > > >> > set up on top of it: > > > > >> > > > > > >> > $ ip link add link ens1f0 name ens1f0.99 type vlan id 99 > > > > >> > $ ip link set ens1f0 up > > > > >> > $ ip link set ens1f0.99 up > > > > >> > $ ip addr add 192.168.99.92 dev ens1f0.99 > > > > >> > > > > > >> > At this point, I can ping another host on vlan 99, ip 192.168.99.91. > > > > >> > However, if I do the following: > > > > >> > > > > > >> > $ ethtool -K ens1f0 rxvlan off > > > > >> > > > > > >> > Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on > > > > >> > again. I'm not sure if this is actually intended behavior, or if there's a > > > > >> > lack of software vlan stripping fallback, or what, but things continue to > > > > >> > work if I simply don't call e1000e_vlan_strip_disable() if there are > > > > >> > active vlans (plagiarizing a function from the e1000 driver here) on the > > > > >> > interface. ... > > Okay, so rxvlan is *supposed* to only impact the rx path, right? It's > > looking like it is actually impacting the tx path too here. I actually > > *do* see calls to skb_vlan_untag with rxvlan off, if I ping from an > > external host, so it seems only the packets from the host with rxvlan > > toggled off aren't escaping correctly for some reason. > > And this leads me to believe maybe the bit in the e1000 driver that > mentions explicitly that the hardware has no support for separate RX/TX > vlan accel toggling rings true for e1000e as well, and thus both > NETIF_F_HW_VLAN_CTAG_RX and NETIF_F_HW_VLAN_CTAG_TX need to be kept in > sync. Testing this theory out shortly... We have a winner. If I make sure the TX flag gets toggled too, ping continues to work after disabling rxvlan. $ ping 192.168.99.91 PING 192.168.99.91 (192.168.99.91) 56(84) bytes of data. 64 bytes from 192.168.99.91: icmp_seq=1 ttl=64 time=0.591 ms 64 bytes from 192.168.99.91: icmp_seq=2 ttl=64 time=0.335 ms 64 bytes from 192.168.99.91: icmp_seq=3 ttl=64 time=0.417 ms ^C --- 192.168.99.91 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.335/0.447/0.591/0.109 ms $ sudo ethtool -K ens1f0 rxvlan off Actual changes: rx-vlan-offload: off tx-vlan-offload: off [requested on] $ ping 192.168.99.91 PING 192.168.99.91 (192.168.99.91) 56(84) bytes of data. 64 bytes from 192.168.99.91: icmp_seq=1 ttl=64 time=0.327 ms 64 bytes from 192.168.99.91: icmp_seq=2 ttl=64 time=0.393 ms 64 bytes from 192.168.99.91: icmp_seq=3 ttl=64 time=0.424 ms ^C --- 192.168.99.91 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.327/0.381/0.424/0.043 ms I'll clean things up and submit a patch tonight or tomorrow.
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 75e6089..73f7452 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -154,6 +154,16 @@ void __ew32(struct e1000_hw *hw, unsigned long reg, u32 val) writel(val, hw->hw_addr + reg); } +static bool e1000e_vlan_used(struct e1000_adapter *adapter) +{ + u16 vid; + + for_each_set_bit(vid, adapter->active_vlans, VLAN_N_VID) + return true; + + return false; +} + /** * e1000_regdump - register printout routine * @hw: pointer to the HW structure @@ -2789,7 +2799,7 @@ static void e1000e_vlan_filter_enable(struct e1000_adapter *adapter) } /** - * e1000e_vlan_strip_enable - helper to disable HW VLAN stripping + * e1000e_vlan_strip_disable - helper to disable HW VLAN stripping * @adapter: board private structure to initialize **/ static void e1000e_vlan_strip_disable(struct e1000_adapter *adapter) @@ -3443,7 +3453,8 @@ static void e1000e_set_rx_mode(struct net_device *netdev) ew32(RCTL, rctl); - if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX) + if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX || + e1000e_vlan_used(adapter)) e1000e_vlan_strip_enable(adapter); else e1000e_vlan_strip_disable(adapter);
I've got a bug report about an e1000e interface, where a vlan interface is set up on top of it: $ ip link add link ens1f0 name ens1f0.99 type vlan id 99 $ ip link set ens1f0 up $ ip link set ens1f0.99 up $ ip addr add 192.168.99.92 dev ens1f0.99 At this point, I can ping another host on vlan 99, ip 192.168.99.91. However, if I do the following: $ ethtool -K ens1f0 rxvlan off Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on again. I'm not sure if this is actually intended behavior, or if there's a lack of software vlan stripping fallback, or what, but things continue to work if I simply don't call e1000e_vlan_strip_disable() if there are active vlans (plagiarizing a function from the e1000 driver here) on the interface. Also slipped a related-ish fix to the kerneldoc text for e1000e_vlan_strip_disable here... CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: intel-wired-lan@lists.osuosl.org CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> --- drivers/net/ethernet/intel/e1000e/netdev.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)