diff mbox series

[iwl-next,v3] ice: Reset VF on Tx MDD event

Message ID 20240326164455.735739-1-marcin.szycik@linux.intel.com
State Changes Requested
Headers show
Series [iwl-next,v3] ice: Reset VF on Tx MDD event | expand

Commit Message

Marcin Szycik March 26, 2024, 4:44 p.m. UTC
In cases when VF sends malformed packets that are classified as malicious,
sometimes it causes Tx queue to freeze. This frozen queue can be stuck
for several minutes being unusable. This behavior can be reproduced with
a faulty userspace app running on VF.

When Malicious Driver Detection event occurs and the mdd-auto-reset-vf
private flag is set, perform a graceful VF reset to quickly bring VF back
to operational state. Add a log message to notify about the cause of
the reset. Add a helper for this to be reused for both TX and RX events.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Co-developed-by: Liang-Min Wang <liang-min.wang@intel.com>
Signed-off-by: Liang-Min Wang <liang-min.wang@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
---
v3: Only auto reset VF if the mdd-auto-reset-vf flag is set
v2 [1]: Revert an unneeded formatting change, fix commit message, fix a log
    message with a correct event name

[1] https://lore.kernel.org/netdev/20231102155149.2574209-1-pawel.chmielewski@intel.com
---
 drivers/net/ethernet/intel/ice/ice_main.c  | 47 +++++++++++++++++-----
 drivers/net/ethernet/intel/ice/ice_sriov.c | 14 ++++---
 drivers/net/ethernet/intel/ice/ice_sriov.h |  4 +-
 3 files changed, 46 insertions(+), 19 deletions(-)

Comments

Przemek Kitszel March 27, 2024, 6:33 a.m. UTC | #1
On 3/26/24 17:44, Marcin Szycik wrote:
> In cases when VF sends malformed packets that are classified as malicious,
> sometimes it causes Tx queue to freeze. This frozen queue can be stuck
> for several minutes being unusable. This behavior can be reproduced with
> a faulty userspace app running on VF.
> 
> When Malicious Driver Detection event occurs and the mdd-auto-reset-vf
> private flag is set, perform a graceful VF reset to quickly bring VF back
> to operational state. Add a log message to notify about the cause of
> the reset. Add a helper for this to be reused for both TX and RX events.
> 
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Co-developed-by: Liang-Min Wang <liang-min.wang@intel.com>
> Signed-off-by: Liang-Min Wang <liang-min.wang@intel.com>
> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
> ---
> v3: Only auto reset VF if the mdd-auto-reset-vf flag is set
> v2 [1]: Revert an unneeded formatting change, fix commit message, fix a log
>      message with a correct event name
> 
> [1] https://lore.kernel.org/netdev/20231102155149.2574209-1-pawel.chmielewski@intel.com
> ---
>   drivers/net/ethernet/intel/ice/ice_main.c  | 47 +++++++++++++++++-----
>   drivers/net/ethernet/intel/ice/ice_sriov.c | 14 ++++---
>   drivers/net/ethernet/intel/ice/ice_sriov.h |  4 +-
>   3 files changed, 46 insertions(+), 19 deletions(-)
> 

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Simon Horman March 28, 2024, 5:34 p.m. UTC | #2
On Tue, Mar 26, 2024 at 05:44:55PM +0100, Marcin Szycik wrote:
> In cases when VF sends malformed packets that are classified as malicious,
> sometimes it causes Tx queue to freeze. This frozen queue can be stuck
> for several minutes being unusable. This behavior can be reproduced with
> a faulty userspace app running on VF.
> 
> When Malicious Driver Detection event occurs and the mdd-auto-reset-vf
> private flag is set, perform a graceful VF reset to quickly bring VF back
> to operational state. Add a log message to notify about the cause of
> the reset. Add a helper for this to be reused for both TX and RX events.
> 
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Co-developed-by: Liang-Min Wang <liang-min.wang@intel.com>
> Signed-off-by: Liang-Min Wang <liang-min.wang@intel.com>
> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>

Hi Marcin,

If I read this correctly then a reset may be performed for several
different conditions - values of different registers - for a VF
as checked in a for loop.

I am wondering if multiple resets could occur for the same VF within
an iteration of the for loop - because more than one of the conditions is
met. And, if so, is this ok?
Marcin Szycik March 29, 2024, 11:31 a.m. UTC | #3
On 28.03.2024 18:34, Simon Horman wrote:
> On Tue, Mar 26, 2024 at 05:44:55PM +0100, Marcin Szycik wrote:
>> In cases when VF sends malformed packets that are classified as malicious,
>> sometimes it causes Tx queue to freeze. This frozen queue can be stuck
>> for several minutes being unusable. This behavior can be reproduced with
>> a faulty userspace app running on VF.
>>
>> When Malicious Driver Detection event occurs and the mdd-auto-reset-vf
>> private flag is set, perform a graceful VF reset to quickly bring VF back
>> to operational state. Add a log message to notify about the cause of
>> the reset. Add a helper for this to be reused for both TX and RX events.
>>
>> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
>> Co-developed-by: Liang-Min Wang <liang-min.wang@intel.com>
>> Signed-off-by: Liang-Min Wang <liang-min.wang@intel.com>
>> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
> 
> Hi Marcin,
> 
> If I read this correctly then a reset may be performed for several
> different conditions - values of different registers - for a VF
> as checked in a for loop.
> 
> I am wondering if multiple resets could occur for the same VF within
> an iteration of the for loop - because more than one of the conditions is
> met. And, if so, is this ok?

Hi Simon,

Good point. Nothing too bad should happen, as ice_reset_vf() acquires mutex lock
(in fact two locks), so several resets would just happen in sequence. However,
it doesn't make much sense to reset VF multiple times, so maybe instead of issuing
reset on each condition, I'll set some flag, and after checking all registers I'll
trigger reset if that flag is set. What do you think?

Thanks,
Marcin
Marcin Szycik March 29, 2024, 11:39 a.m. UTC | #4
On 29.03.2024 12:31, Marcin Szycik wrote:
> 
> 
> On 28.03.2024 18:34, Simon Horman wrote:
>> On Tue, Mar 26, 2024 at 05:44:55PM +0100, Marcin Szycik wrote:
>>> In cases when VF sends malformed packets that are classified as malicious,
>>> sometimes it causes Tx queue to freeze. This frozen queue can be stuck
>>> for several minutes being unusable. This behavior can be reproduced with
>>> a faulty userspace app running on VF.
>>>
>>> When Malicious Driver Detection event occurs and the mdd-auto-reset-vf
>>> private flag is set, perform a graceful VF reset to quickly bring VF back
>>> to operational state. Add a log message to notify about the cause of
>>> the reset. Add a helper for this to be reused for both TX and RX events.
>>>
>>> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
>>> Co-developed-by: Liang-Min Wang <liang-min.wang@intel.com>
>>> Signed-off-by: Liang-Min Wang <liang-min.wang@intel.com>
>>> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
>>
>> Hi Marcin,
>>
>> If I read this correctly then a reset may be performed for several
>> different conditions - values of different registers - for a VF
>> as checked in a for loop.
>>
>> I am wondering if multiple resets could occur for the same VF within
>> an iteration of the for loop - because more than one of the conditions is
>> met. And, if so, is this ok?
> 
> Hi Simon,
> 
> Good point. Nothing too bad should happen, as ice_reset_vf() acquires mutex lock

Sorry, that mutex doesn't matter much here, as we'd call another ice_reset_vf()
after previous one is done anyway.

> (in fact two locks), so several resets would just happen in sequence. However,
> it doesn't make much sense to reset VF multiple times, so maybe instead of issuing
> reset on each condition, I'll set some flag, and after checking all registers I'll
> trigger reset if that flag is set. What do you think?
> 
> Thanks,
> Marcin
Simon Horman March 31, 2024, 6:27 p.m. UTC | #5
On Fri, Mar 29, 2024 at 12:31:58PM +0100, Marcin Szycik wrote:
> 
> 
> On 28.03.2024 18:34, Simon Horman wrote:
> > On Tue, Mar 26, 2024 at 05:44:55PM +0100, Marcin Szycik wrote:
> >> In cases when VF sends malformed packets that are classified as malicious,
> >> sometimes it causes Tx queue to freeze. This frozen queue can be stuck
> >> for several minutes being unusable. This behavior can be reproduced with
> >> a faulty userspace app running on VF.
> >>
> >> When Malicious Driver Detection event occurs and the mdd-auto-reset-vf
> >> private flag is set, perform a graceful VF reset to quickly bring VF back
> >> to operational state. Add a log message to notify about the cause of
> >> the reset. Add a helper for this to be reused for both TX and RX events.
> >>
> >> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> >> Co-developed-by: Liang-Min Wang <liang-min.wang@intel.com>
> >> Signed-off-by: Liang-Min Wang <liang-min.wang@intel.com>
> >> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
> > 
> > Hi Marcin,
> > 
> > If I read this correctly then a reset may be performed for several
> > different conditions - values of different registers - for a VF
> > as checked in a for loop.
> > 
> > I am wondering if multiple resets could occur for the same VF within
> > an iteration of the for loop - because more than one of the conditions is
> > met. And, if so, is this ok?
> 
> Hi Simon,
> 
> Good point. Nothing too bad should happen, as ice_reset_vf() acquires mutex lock
> (in fact two locks), so several resets would just happen in sequence. However,
> it doesn't make much sense to reset VF multiple times, so maybe instead of issuing
> reset on each condition, I'll set some flag, and after checking all registers I'll
> trigger reset if that flag is set. What do you think?

Thanks Marcin,

FWIIW, that sounds like a good approach to me.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 3dea0d4c767c..e5d00491e41e 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1744,6 +1744,35 @@  static void ice_service_timer(struct timer_list *t)
 	ice_service_task_schedule(pf);
 }
 
+/**
+ * ice_mdd_maybe_reset_vf - reset VF after MDD event
+ * @pf: pointer to the PF structure
+ * @vf: pointer to the VF structure
+ * @is_rx: true for RX event, false for TX event
+ *
+ * Since the queue can get stuck on VF MDD events, the PF can be configured to
+ * automatically reset the VF by enabling the private ethtool flag
+ * mdd-auto-reset-vf.
+ */
+static void ice_mdd_maybe_reset_vf(struct ice_pf *pf, struct ice_vf *vf,
+				   bool is_rx)
+{
+	struct device *dev = ice_pf_to_dev(pf);
+
+	if (!test_bit(ICE_FLAG_MDD_AUTO_RESET_VF, pf->flags))
+		return;
+
+	/* VF MDD event counters will be cleared by reset, so print the event
+	 * prior to reset.
+	 */
+	ice_print_vf_mdd_event(vf, is_rx);
+
+	dev_info(dev, "PF-to-VF reset on VF %d due to %s MDD event\n",
+		 vf->vf_id,
+		 is_rx ? "Rx" : "Tx");
+	ice_reset_vf(vf, ICE_VF_RESET_NOTIFY | ICE_VF_RESET_LOCK);
+}
+
 /**
  * ice_handle_mdd_event - handle malicious driver detect event
  * @pf: pointer to the PF structure
@@ -1845,6 +1874,8 @@  static void ice_handle_mdd_event(struct ice_pf *pf)
 			if (netif_msg_tx_err(pf))
 				dev_info(dev, "Malicious Driver Detection event TX_PQM detected on VF %d\n",
 					 vf->vf_id);
+
+			ice_mdd_maybe_reset_vf(pf, vf, false);
 		}
 
 		reg = rd32(hw, VP_MDET_TX_TCLAN(vf->vf_id));
@@ -1855,6 +1886,8 @@  static void ice_handle_mdd_event(struct ice_pf *pf)
 			if (netif_msg_tx_err(pf))
 				dev_info(dev, "Malicious Driver Detection event TX_TCLAN detected on VF %d\n",
 					 vf->vf_id);
+
+			ice_mdd_maybe_reset_vf(pf, vf, false);
 		}
 
 		reg = rd32(hw, VP_MDET_TX_TDPU(vf->vf_id));
@@ -1865,6 +1898,8 @@  static void ice_handle_mdd_event(struct ice_pf *pf)
 			if (netif_msg_tx_err(pf))
 				dev_info(dev, "Malicious Driver Detection event TX_TDPU detected on VF %d\n",
 					 vf->vf_id);
+
+			ice_mdd_maybe_reset_vf(pf, vf, false);
 		}
 
 		reg = rd32(hw, VP_MDET_RX(vf->vf_id));
@@ -1876,17 +1911,7 @@  static void ice_handle_mdd_event(struct ice_pf *pf)
 				dev_info(dev, "Malicious Driver Detection event RX detected on VF %d\n",
 					 vf->vf_id);
 
-			/* Since the queue is disabled on VF Rx MDD events, the
-			 * PF can be configured to reset the VF through ethtool
-			 * private flag mdd-auto-reset-vf.
-			 */
-			if (test_bit(ICE_FLAG_MDD_AUTO_RESET_VF, pf->flags)) {
-				/* VF MDD event counters will be cleared by
-				 * reset, so print the event prior to reset.
-				 */
-				ice_print_vf_rx_mdd_event(vf);
-				ice_reset_vf(vf, ICE_VF_RESET_LOCK);
-			}
+			ice_mdd_maybe_reset_vf(pf, vf, true);
 		}
 	}
 	mutex_unlock(&pf->vfs.table_lock);
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
index 65e1986af777..b56bda9f66e3 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -1848,19 +1848,21 @@  ice_set_vf_port_vlan(struct net_device *netdev, int vf_id, u16 vlan_id, u8 qos,
 }
 
 /**
- * ice_print_vf_rx_mdd_event - print VF Rx malicious driver detect event
+ * ice_print_vf_mdd_event - print VF Rx malicious driver detect event
  * @vf: pointer to the VF structure
+ * @is_rx: true for RX event, false for TX event
  */
-void ice_print_vf_rx_mdd_event(struct ice_vf *vf)
+void ice_print_vf_mdd_event(struct ice_vf *vf, bool is_rx)
 {
 	struct ice_pf *pf = vf->pf;
 	struct device *dev;
 
 	dev = ice_pf_to_dev(pf);
 
-	dev_info(dev, "%d Rx Malicious Driver Detection events detected on PF %d VF %d MAC %pM. mdd-auto-reset-vfs=%s\n",
-		 vf->mdd_rx_events.count, pf->hw.pf_id, vf->vf_id,
-		 vf->dev_lan_addr,
+	dev_info(dev, "%d %s Malicious Driver Detection events detected on PF %d VF %d MAC %pM. mdd-auto-reset-vfs=%s\n",
+		 is_rx ? vf->mdd_rx_events.count : vf->mdd_tx_events.count,
+		 is_rx ? "Rx" : "Tx",
+		 pf->hw.pf_id, vf->vf_id, vf->dev_lan_addr,
 		 test_bit(ICE_FLAG_MDD_AUTO_RESET_VF, pf->flags)
 			  ? "on" : "off");
 }
@@ -1894,7 +1896,7 @@  void ice_print_vfs_mdd_events(struct ice_pf *pf)
 		if (vf->mdd_rx_events.count != vf->mdd_rx_events.last_printed) {
 			vf->mdd_rx_events.last_printed =
 							vf->mdd_rx_events.count;
-			ice_print_vf_rx_mdd_event(vf);
+			ice_print_vf_mdd_event(vf, true);
 		}
 
 		/* only print Tx MDD event message if there are new events */
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.h b/drivers/net/ethernet/intel/ice/ice_sriov.h
index 8488df38b586..a9d3ee36d0df 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.h
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.h
@@ -57,7 +57,7 @@  ice_get_vf_stats(struct net_device *netdev, int vf_id,
 void
 ice_vf_lan_overflow_event(struct ice_pf *pf, struct ice_rq_event_info *event);
 void ice_print_vfs_mdd_events(struct ice_pf *pf);
-void ice_print_vf_rx_mdd_event(struct ice_vf *vf);
+void ice_print_vf_mdd_event(struct ice_vf *vf, bool is_tx);
 bool
 ice_vc_validate_pattern(struct ice_vf *vf, struct virtchnl_proto_hdrs *proto);
 u32 ice_sriov_get_vf_total_msix(struct pci_dev *pdev);
@@ -68,7 +68,7 @@  static inline void ice_free_vfs(struct ice_pf *pf) { }
 static inline
 void ice_vf_lan_overflow_event(struct ice_pf *pf, struct ice_rq_event_info *event) { }
 static inline void ice_print_vfs_mdd_events(struct ice_pf *pf) { }
-static inline void ice_print_vf_rx_mdd_event(struct ice_vf *vf) { }
+static inline void ice_print_vf_mdd_event(struct ice_vf *vf, bool is_tx) { }
 static inline void ice_restore_all_vfs_msi_state(struct ice_pf *pf) { }
 
 static inline int