diff mbox

[v2,1/1] net: ethernet: davinci_cpdma: Add boundary for rx and tx descriptors

Message ID 1355197433-7492-1-git-send-email-mugunthanvnm@ti.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Mugunthan V N Dec. 11, 2012, 3:43 a.m. UTC
When there is heavy transmission traffic in the CPDMA, then Rx descriptors
memory is also utilized as tx desc memory looses all rx descriptors and the
driver stops working then.

This patch adds boundary for tx and rx descriptors in bd ram dividing the
descriptor memory to ensure that during heavy transmission tx doesn't use
rx descriptors.

This patch is already applied to davinci_emac driver, since CPSW and
davici_dmac shares the same CPDMA, moving the boundry seperation from
Davinci EMAC driver to CPDMA driver which was done in the following
commit

commit 86d8c07ff2448eb4e860e50f34ef6ee78e45c40c
Author: Sascha Hauer <s.hauer@pengutronix.de>
Date:   Tue Jan 3 05:27:47 2012 +0000

    net/davinci: do not use all descriptors for tx packets

    The driver uses a shared pool for both rx and tx descriptors.
    During open it queues fixed number of 128 descriptors for receive
    packets. For each received packet it tries to queue another
    descriptor. If this fails the descriptor is lost for rx.
    The driver has no limitation on tx descriptors to use, so it
    can happen during a nmap / ping -f attack that the driver
    allocates all descriptors for tx and looses all rx descriptors.
    The driver stops working then.
    To fix this limit the number of tx descriptors used to half of
    the descriptors available, the rx path uses the other half.

    Tested on a custom board using nmap / ping -f to the board from
    two different hosts.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
---
Changes from initial patch:
* Modified commit message with proper description as in previous commit message

 drivers/net/ethernet/ti/davinci_cpdma.c |   20 ++++++++++++++------
 drivers/net/ethernet/ti/davinci_emac.c  |    8 --------
 2 files changed, 14 insertions(+), 14 deletions(-)

Comments

David Miller Dec. 11, 2012, 6:30 p.m. UTC | #1
You cannot do this.

After your changes the driver no longer does any TX flow control.

It never stops the TX queue and never wakes it up later.

It just drops packets on the floor when it runs out of descriptors.

This breaks everything, and in particular packet schedulers and
TCP.

I'm not applying this.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 11, 2012, 6:34 p.m. UTC | #2
From: David Miller <davem@davemloft.net>
Date: Tue, 11 Dec 2012 13:30:58 -0500 (EST)

> 
> You cannot do this.
> 
> After your changes the driver no longer does any TX flow control.
> 
> It never stops the TX queue and never wakes it up later.
> 
> It just drops packets on the floor when it runs out of descriptors.
> 
> This breaks everything, and in particular packet schedulers and
> TCP.
> 
> I'm not applying this.

And yes I mean that the "fail_tx" path of the transmit method
is bogus too.

You can't signal "out of descriptors" and stop the queue after the
fact.  NETDEV_TX_BUSY is for handling exceptional and extraordinary
conditions, not for the normal queue full handling.

You have to stop the queue before you run out of descriptors.  When
the queue is not stopped, you are telling the core networking that you
absoultely will be able to successfully queue a packet and enough
descriptors are available.

This means the other CPDMA driver needs to be reworked too.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 11, 2012, 6:54 p.m. UTC | #3
On Tue, 2012-12-11 at 13:34 -0500, David Miller wrote:

> 
> You can't signal "out of descriptors" and stop the queue after the
> fact.  NETDEV_TX_BUSY is for handling exceptional and extraordinary
> conditions, not for the normal queue full handling.

This reminds me an idea I had about MQ/MQPRIO qdisc and BQL interaction

(BQL btw makes less probable a NETDEV_TX_BUSY event or
__QUEUE_STATE_DRV_XOFF state, but more probable a
__QUEUE_STATE_STACK_XOFF)

We dequeue a packet from qdisc, then we realize tx queue is in XOFF
state, and we have to hold the skb into gso_skb for later.

This shows in stats (tc -s qdisc dev eth0) as requeues.

Problem of these requeues is that high priority packets can not be
dequeued as long as this (possibly low prio and big TSO packet) is not
removed from gso_skb.

At 1Gbps speed, a full size TSO packet is 500 us of extra latency.

Suggested fix : add a TCQ_F_MQSLAVE flag to allow dequeue_skb() to test
the netif_xmit_frozen_or_stopped() status _before_ dequeing packet from
qdisc.

(For other qdiscs, we dont really know which tx queue will use the next
packet before dequeueing it)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 11, 2012, 6:57 p.m. UTC | #4
From: Eric Dumazet <erdnetdev@gmail.com>
Date: Tue, 11 Dec 2012 10:54:56 -0800

> Suggested fix : add a TCQ_F_MQSLAVE flag to allow dequeue_skb() to test
> the netif_xmit_frozen_or_stopped() status _before_ dequeing packet from
> qdisc.

This sounds fine to me.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 11, 2012, 7:07 p.m. UTC | #5
On Tue, 2012-12-11 at 13:57 -0500, David Miller wrote:
> From: Eric Dumazet <erdnetdev@gmail.com>
> Date: Tue, 11 Dec 2012 10:54:56 -0800
> 
> > Suggested fix : add a TCQ_F_MQSLAVE flag to allow dequeue_skb() to test
> > the netif_xmit_frozen_or_stopped() status _before_ dequeing packet from
> > qdisc.
> 
> This sounds fine to me.
> --

By the way we also could set this 'flag' for non multi queue devices.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mugunthan V N Dec. 12, 2012, 1:38 p.m. UTC | #6
On 12/12/2012 12:27 AM, David Miller wrote:
> From: Eric Dumazet <erdnetdev@gmail.com>
> Date: Tue, 11 Dec 2012 10:54:56 -0800
>
>> Suggested fix : add a TCQ_F_MQSLAVE flag to allow dequeue_skb() to test
>> the netif_xmit_frozen_or_stopped() status _before_ dequeing packet from
>> qdisc.
> This sounds fine to me.
I will submit next version with the suggestion

Regards
Mugunthan V N
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 4995673..d37f546 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -105,13 +105,13 @@  struct cpdma_ctlr {
 };
 
 struct cpdma_chan {
+	struct cpdma_desc __iomem	*head, *tail;
+	void __iomem			*hdp, *cp, *rxfree;
 	enum cpdma_state		state;
 	struct cpdma_ctlr		*ctlr;
 	int				chan_num;
 	spinlock_t			lock;
-	struct cpdma_desc __iomem	*head, *tail;
 	int				count;
-	void __iomem			*hdp, *cp, *rxfree;
 	u32				mask;
 	cpdma_handler_fn		handler;
 	enum dma_data_direction		dir;
@@ -217,7 +217,7 @@  desc_from_phys(struct cpdma_desc_pool *pool, dma_addr_t dma)
 }
 
 static struct cpdma_desc __iomem *
-cpdma_desc_alloc(struct cpdma_desc_pool *pool, int num_desc)
+cpdma_desc_alloc(struct cpdma_desc_pool *pool, int num_desc, bool is_rx)
 {
 	unsigned long flags;
 	int index;
@@ -225,8 +225,14 @@  cpdma_desc_alloc(struct cpdma_desc_pool *pool, int num_desc)
 
 	spin_lock_irqsave(&pool->lock, flags);
 
-	index = bitmap_find_next_zero_area(pool->bitmap, pool->num_desc, 0,
-					   num_desc, 0);
+	if (is_rx) {
+		index = bitmap_find_next_zero_area(pool->bitmap,
+				pool->num_desc/2, 0, num_desc, 0);
+	 } else {
+		index = bitmap_find_next_zero_area(pool->bitmap,
+				pool->num_desc, pool->num_desc/2, num_desc, 0);
+	}
+
 	if (index < pool->num_desc) {
 		bitmap_set(pool->bitmap, index, num_desc);
 		desc = pool->iomap + pool->desc_size * index;
@@ -660,6 +666,7 @@  int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
 	unsigned long			flags;
 	u32				mode;
 	int				ret = 0;
+	bool                            is_rx;
 
 	spin_lock_irqsave(&chan->lock, flags);
 
@@ -668,7 +675,8 @@  int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data,
 		goto unlock_ret;
 	}
 
-	desc = cpdma_desc_alloc(ctlr->pool, 1);
+	is_rx = (chan->rxfree != 0);
+	desc = cpdma_desc_alloc(ctlr->pool, 1, is_rx);
 	if (!desc) {
 		chan->stats.desc_alloc_fail++;
 		ret = -ENOMEM;
diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c
index fce89a0..f349273 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -120,7 +120,6 @@  static const char emac_version_string[] = "TI DaVinci EMAC Linux v6.1";
 #define EMAC_DEF_TX_CH			(0) /* Default 0th channel */
 #define EMAC_DEF_RX_CH			(0) /* Default 0th channel */
 #define EMAC_DEF_RX_NUM_DESC		(128)
-#define EMAC_DEF_TX_NUM_DESC		(128)
 #define EMAC_DEF_MAX_TX_CH		(1) /* Max TX channels configured */
 #define EMAC_DEF_MAX_RX_CH		(1) /* Max RX channels configured */
 #define EMAC_POLL_WEIGHT		(64) /* Default NAPI poll weight */
@@ -342,7 +341,6 @@  struct emac_priv {
 	u32 mac_hash2;
 	u32 multicast_hash_cnt[EMAC_NUM_MULTICAST_BITS];
 	u32 rx_addr_type;
-	atomic_t cur_tx;
 	const char *phy_id;
 #ifdef CONFIG_OF
 	struct device_node *phy_node;
@@ -1050,9 +1048,6 @@  static void emac_tx_handler(void *token, int len, int status)
 {
 	struct sk_buff		*skb = token;
 	struct net_device	*ndev = skb->dev;
-	struct emac_priv	*priv = netdev_priv(ndev);
-
-	atomic_dec(&priv->cur_tx);
 
 	if (unlikely(netif_queue_stopped(ndev)))
 		netif_start_queue(ndev);
@@ -1101,9 +1096,6 @@  static int emac_dev_xmit(struct sk_buff *skb, struct net_device *ndev)
 		goto fail_tx;
 	}
 
-	if (atomic_inc_return(&priv->cur_tx) >= EMAC_DEF_TX_NUM_DESC)
-		netif_stop_queue(ndev);
-
 	return NETDEV_TX_OK;
 
 fail_tx: