diff mbox

[V2,1/1] net: cdc_ncm: Reduce memory use when kernel memory low

Message ID 1498682129-9129-2-git-send-email-jim_baxter@mentor.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Jim Baxter June 28, 2017, 8:35 p.m. UTC
The CDC-NCM driver can require large amounts of memory to create
skb's and this can be a problem when the memory becomes fragmented.

This especially affects embedded systems that have constrained
resources but wish to maximise the throughput of CDC-NCM with 16KiB
NTB's.

The issue is after running for a while the kernel memory can become
fragmented and it needs compacting.
If the NTB allocation is needed before the memory has been compacted
the atomic allocation can fail which can cause increased latency,
large re-transmissions or disconnections depending upon the data
being transmitted at the time.
This situation occurs for less than a second until the kernel has
compacted the memory but the failed devices can take a lot longer to
recover from the failed TX packets.

To ease this temporary situation I modified the CDC-NCM TX path to
temporarily switch into a reduced memory mode which allocates an NTB
that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
sized memory block and only transmit NTB's with a single network frame
until the memory situation is resolved.
Each time this issue occurs we wait for an increasing number of
reduced size allocations before requesting a full size one to not
put additional pressure on a low memory system.

Once the memory is compacted the CDC-NCM data can resume transmitting
at the normal tx_max rate once again.

Signed-off-by: Jim Baxter <jim_baxter@mentor.com>

---

V1: Sent to linux-usb for review.
V2: Added code to increase amount of time spent making small allocations to
    reduce the burden on the system.

 drivers/net/usb/cdc_ncm.c   | 54 +++++++++++++++++++++++++++++++++++----------
 include/linux/usb/cdc_ncm.h |  3 +++
 2 files changed, 45 insertions(+), 12 deletions(-)

Comments

David Miller June 30, 2017, 4:59 p.m. UTC | #1
From: Jim Baxter <jim_baxter@mentor.com>
Date: Wed, 28 Jun 2017 21:35:29 +0100

> The CDC-NCM driver can require large amounts of memory to create
> skb's and this can be a problem when the memory becomes fragmented.
> 
> This especially affects embedded systems that have constrained
> resources but wish to maximise the throughput of CDC-NCM with 16KiB
> NTB's.
> 
> The issue is after running for a while the kernel memory can become
> fragmented and it needs compacting.
> If the NTB allocation is needed before the memory has been compacted
> the atomic allocation can fail which can cause increased latency,
> large re-transmissions or disconnections depending upon the data
> being transmitted at the time.
> This situation occurs for less than a second until the kernel has
> compacted the memory but the failed devices can take a lot longer to
> recover from the failed TX packets.
> 
> To ease this temporary situation I modified the CDC-NCM TX path to
> temporarily switch into a reduced memory mode which allocates an NTB
> that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
> sized memory block and only transmit NTB's with a single network frame
> until the memory situation is resolved.
> Each time this issue occurs we wait for an increasing number of
> reduced size allocations before requesting a full size one to not
> put additional pressure on a low memory system.
> 
> Once the memory is compacted the CDC-NCM data can resume transmitting
> at the normal tx_max rate once again.
> 
> Signed-off-by: Jim Baxter <jim_baxter@mentor.com>

If someone could review this patch, I remember this issue being discussed
a while ago, I would really appreciate it.
Jim Baxter June 30, 2017, 5:03 p.m. UTC | #2
--------------------------------------------------------------------------------
From: David S. Miller (davem@davemloft.net)
Sent: Fri, 30 Jun 2017 12:59:53 -0400
To: jim_baxter@mentor.com
Cc: linux-usb@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, oliver@neukum.org, bjorn@mork.no, David.Laight@ACULAB.COM
Subject: Re: [PATCH V2 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

 


> From: Jim Baxter <jim_baxter@mentor.com>
> Date: Wed, 28 Jun 2017 21:35:29 +0100
> 
>> The CDC-NCM driver can require large amounts of memory to create
>> skb's and this can be a problem when the memory becomes fragmented.
>>
>> This especially affects embedded systems that have constrained
>> resources but wish to maximise the throughput of CDC-NCM with 16KiB
>> NTB's.
>>
>> The issue is after running for a while the kernel memory can become
>> fragmented and it needs compacting.
>> If the NTB allocation is needed before the memory has been compacted
>> the atomic allocation can fail which can cause increased latency,
>> large re-transmissions or disconnections depending upon the data
>> being transmitted at the time.
>> This situation occurs for less than a second until the kernel has
>> compacted the memory but the failed devices can take a lot longer to
>> recover from the failed TX packets.
>>
>> To ease this temporary situation I modified the CDC-NCM TX path to
>> temporarily switch into a reduced memory mode which allocates an NTB
>> that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
>> sized memory block and only transmit NTB's with a single network frame
>> until the memory situation is resolved.
>> Each time this issue occurs we wait for an increasing number of
>> reduced size allocations before requesting a full size one to not
>> put additional pressure on a low memory system.
>>
>> Once the memory is compacted the CDC-NCM data can resume transmitting
>> at the normal tx_max rate once again.
>>
>> Signed-off-by: Jim Baxter <jim_baxter@mentor.com>
> 
> If someone could review this patch, I remember this issue being discussed
> a while ago, I would really appreciate it.
> 

Hello,

For reference this replaces the original discussion in
http://patchwork.ozlabs.org/patch/763100/ and
http://patchwork.ozlabs.org/patch/766181/

Best regards,
Jim
Bjørn Mork June 30, 2017, 5:38 p.m. UTC | #3
Jim Baxter <jim_baxter@mentor.com> writes:

> The CDC-NCM driver can require large amounts of memory to create
> skb's and this can be a problem when the memory becomes fragmented.
>
> This especially affects embedded systems that have constrained
> resources but wish to maximise the throughput of CDC-NCM with 16KiB
> NTB's.
>
> The issue is after running for a while the kernel memory can become
> fragmented and it needs compacting.
> If the NTB allocation is needed before the memory has been compacted
> the atomic allocation can fail which can cause increased latency,
> large re-transmissions or disconnections depending upon the data
> being transmitted at the time.
> This situation occurs for less than a second until the kernel has
> compacted the memory but the failed devices can take a lot longer to
> recover from the failed TX packets.
>
> To ease this temporary situation I modified the CDC-NCM TX path to
> temporarily switch into a reduced memory mode which allocates an NTB
> that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
> sized memory block and only transmit NTB's with a single network frame
> until the memory situation is resolved.
> Each time this issue occurs we wait for an increasing number of
> reduced size allocations before requesting a full size one to not
> put additional pressure on a low memory system.
>
> Once the memory is compacted the CDC-NCM data can resume transmitting
> at the normal tx_max rate once again.
>
> Signed-off-by: Jim Baxter <jim_baxter@mentor.com>

This looks good to me.

I would still be happier if we didn't need something like this, but I
understand that we do.  And this patch looks as clean as it can get.  I
haven't tested the patch under any sort of memory pressure, but I did a
basic runtime test on a single MBIM device.  As expected, I did not
notice any changes with this patch applied.

But regarding noticable effects: The patch adds no printks, counters or
sysfs attributes which could tell the user that the initial buffer
allocation has failed.  Maybe add some sort of debug helper(s) in a
followup patch? How did you verify the patch operation while testing it?

Anyway, that's no show stopper of course.  So FWIW:

Reviewed-by: Bjørn Mork <bjorn@mork.no>
Jim Baxter June 30, 2017, 5:53 p.m. UTC | #4
> Jim Baxter <jim_baxter@mentor.com> writes:
> 
>> The CDC-NCM driver can require large amounts of memory to create
>> skb's and this can be a problem when the memory becomes fragmented.
>>
>> This especially affects embedded systems that have constrained
>> resources but wish to maximise the throughput of CDC-NCM with 16KiB
>> NTB's.
>>
>> The issue is after running for a while the kernel memory can become
>> fragmented and it needs compacting.
>> If the NTB allocation is needed before the memory has been compacted
>> the atomic allocation can fail which can cause increased latency,
>> large re-transmissions or disconnections depending upon the data
>> being transmitted at the time.
>> This situation occurs for less than a second until the kernel has
>> compacted the memory but the failed devices can take a lot longer to
>> recover from the failed TX packets.
>>
>> To ease this temporary situation I modified the CDC-NCM TX path to
>> temporarily switch into a reduced memory mode which allocates an NTB
>> that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
>> sized memory block and only transmit NTB's with a single network frame
>> until the memory situation is resolved.
>> Each time this issue occurs we wait for an increasing number of
>> reduced size allocations before requesting a full size one to not
>> put additional pressure on a low memory system.
>>
>> Once the memory is compacted the CDC-NCM data can resume transmitting
>> at the normal tx_max rate once again.
>>
>> Signed-off-by: Jim Baxter <jim_baxter@mentor.com>
> 
> This looks good to me.
> 
> I would still be happier if we didn't need something like this, but I
> understand that we do.  And this patch looks as clean as it can get.  I
> haven't tested the patch under any sort of memory pressure, but I did a
> basic runtime test on a single MBIM device.  As expected, I did not
> notice any changes with this patch applied.
> 
> But regarding noticable effects: The patch adds no printks, counters or
> sysfs attributes which could tell the user that the initial buffer
> allocation has failed.  Maybe add some sort of debug helper(s) in a
> followup patch? How did you verify the patch operation while testing it?
> 
> Anyway, that's no show stopper of course.  So FWIW:
> 
> Reviewed-by: Bjørn Mork <bjorn@mork.no>
> 

Hello Bjørn,

I tested this with printk's to show when the low memory code was triggered
and the value of ctx->tx_low_mem_val and ctx->tx_low_mem_max_cnt.
I created a workqueue that slowly used up the atomic memory until the
code is triggered.

I could add debug prints, though I have noticed that cdc_ncm_fill_tx_frame()
does not currently have any debug prints do you think this is because it can be
called in an atomic context and I think debug messages if enabled could cause
too great a delay?

Regards,
Jim
Bjørn Mork June 30, 2017, 6:03 p.m. UTC | #5
"Baxter, Jim" <jim_baxter@mentor.com> writes:

> I tested this with printk's to show when the low memory code was triggered
> and the value of ctx->tx_low_mem_val and ctx->tx_low_mem_max_cnt.
> I created a workqueue that slowly used up the atomic memory until the
> code is triggered.
>
> I could add debug prints, though I have noticed that cdc_ncm_fill_tx_frame()
> does not currently have any debug prints do you think this is because it can be
> called in an atomic context and I think debug messages if enabled could cause
> too great a delay?

Yes, I guess you're right.  Maybe count the number of failed allocations
and export it along with the other driver private counters?  Or export
the tx_curr_size as a sysfs attribute? Or both?

Just an idea...  I don't expect to see this code ever being hit on my
systems :-)



Bjørn
David Miller July 3, 2017, 8:51 a.m. UTC | #6
From: Jim Baxter <jim_baxter@mentor.com>
Date: Wed, 28 Jun 2017 21:35:29 +0100

> The CDC-NCM driver can require large amounts of memory to create
> skb's and this can be a problem when the memory becomes fragmented.
> 
> This especially affects embedded systems that have constrained
> resources but wish to maximise the throughput of CDC-NCM with 16KiB
> NTB's.
> 
> The issue is after running for a while the kernel memory can become
> fragmented and it needs compacting.
> If the NTB allocation is needed before the memory has been compacted
> the atomic allocation can fail which can cause increased latency,
> large re-transmissions or disconnections depending upon the data
> being transmitted at the time.
> This situation occurs for less than a second until the kernel has
> compacted the memory but the failed devices can take a lot longer to
> recover from the failed TX packets.
> 
> To ease this temporary situation I modified the CDC-NCM TX path to
> temporarily switch into a reduced memory mode which allocates an NTB
> that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
> sized memory block and only transmit NTB's with a single network frame
> until the memory situation is resolved.
> Each time this issue occurs we wait for an increasing number of
> reduced size allocations before requesting a full size one to not
> put additional pressure on a low memory system.
> 
> Once the memory is compacted the CDC-NCM data can resume transmitting
> at the normal tx_max rate once again.
> 
> Signed-off-by: Jim Baxter <jim_baxter@mentor.com>

Patch applied, thanks.
diff mbox

Patch

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index b5cec18..f9187d8 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -89,6 +89,8 @@  struct cdc_ncm_stats {
 	CDC_NCM_SIMPLE_STAT(rx_ntbs),
 };
 
+#define CDC_NCM_LOW_MEM_MAX_CNT 10
+
 static int cdc_ncm_get_sset_count(struct net_device __always_unused *netdev, int sset)
 {
 	switch (sset) {
@@ -1055,10 +1057,10 @@  static struct usb_cdc_ncm_ndp16 *cdc_ncm_ndp(struct cdc_ncm_ctx *ctx, struct sk_
 
 	/* align new NDP */
 	if (!(ctx->drvflags & CDC_NCM_FLAG_NDP_TO_END))
-		cdc_ncm_align_tail(skb, ctx->tx_ndp_modulus, 0, ctx->tx_max);
+		cdc_ncm_align_tail(skb, ctx->tx_ndp_modulus, 0, ctx->tx_curr_size);
 
 	/* verify that there is room for the NDP and the datagram (reserve) */
-	if ((ctx->tx_max - skb->len - reserve) < ctx->max_ndp_size)
+	if ((ctx->tx_curr_size - skb->len - reserve) < ctx->max_ndp_size)
 		return NULL;
 
 	/* link to it */
@@ -1111,13 +1113,41 @@  struct sk_buff *
 
 	/* allocate a new OUT skb */
 	if (!skb_out) {
-		skb_out = alloc_skb(ctx->tx_max, GFP_ATOMIC);
+		if (ctx->tx_low_mem_val == 0) {
+			ctx->tx_curr_size = ctx->tx_max;
+			skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
+			/* If the memory allocation fails we will wait longer
+			 * each time before attempting another full size
+			 * allocation again to not overload the system
+			 * further.
+			 */
+			if (skb_out == NULL) {
+				ctx->tx_low_mem_max_cnt = min(ctx->tx_low_mem_max_cnt + 1,
+							      (unsigned)CDC_NCM_LOW_MEM_MAX_CNT);
+				ctx->tx_low_mem_val = ctx->tx_low_mem_max_cnt;
+			}
+		}
 		if (skb_out == NULL) {
-			if (skb != NULL) {
-				dev_kfree_skb_any(skb);
-				dev->net->stats.tx_dropped++;
+			/* See if a very small allocation is possible.
+			 * We will send this packet immediately and hope
+			 * that there is more memory available later.
+			 */
+			if (skb)
+				ctx->tx_curr_size = max(skb->len,
+					(u32)USB_CDC_NCM_NTB_MIN_OUT_SIZE);
+			else
+				ctx->tx_curr_size = USB_CDC_NCM_NTB_MIN_OUT_SIZE;
+			skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
+
+			/* No allocation possible so we will abort */
+			if (skb_out == NULL) {
+				if (skb != NULL) {
+					dev_kfree_skb_any(skb);
+					dev->net->stats.tx_dropped++;
+				}
+				goto exit_no_skb;
 			}
-			goto exit_no_skb;
+			ctx->tx_low_mem_val--;
 		}
 		/* fill out the initial 16-bit NTB header */
 		nth16 = (struct usb_cdc_ncm_nth16 *)memset(skb_put(skb_out, sizeof(struct usb_cdc_ncm_nth16)), 0, sizeof(struct usb_cdc_ncm_nth16));
@@ -1148,10 +1178,10 @@  struct sk_buff *
 		ndp16 = cdc_ncm_ndp(ctx, skb_out, sign, skb->len + ctx->tx_modulus + ctx->tx_remainder);
 
 		/* align beginning of next frame */
-		cdc_ncm_align_tail(skb_out,  ctx->tx_modulus, ctx->tx_remainder, ctx->tx_max);
+		cdc_ncm_align_tail(skb_out,  ctx->tx_modulus, ctx->tx_remainder, ctx->tx_curr_size);
 
 		/* check if we had enough room left for both NDP and frame */
-		if (!ndp16 || skb_out->len + skb->len + delayed_ndp_size > ctx->tx_max) {
+		if (!ndp16 || skb_out->len + skb->len + delayed_ndp_size > ctx->tx_curr_size) {
 			if (n == 0) {
 				/* won't fit, MTU problem? */
 				dev_kfree_skb_any(skb);
@@ -1227,7 +1257,7 @@  struct sk_buff *
 	/* If requested, put NDP at end of frame. */
 	if (ctx->drvflags & CDC_NCM_FLAG_NDP_TO_END) {
 		nth16 = (struct usb_cdc_ncm_nth16 *)skb_out->data;
-		cdc_ncm_align_tail(skb_out, ctx->tx_ndp_modulus, 0, ctx->tx_max);
+		cdc_ncm_align_tail(skb_out, ctx->tx_ndp_modulus, 0, ctx->tx_curr_size);
 		nth16->wNdpIndex = cpu_to_le16(skb_out->len);
 		memcpy(skb_put(skb_out, ctx->max_ndp_size), ctx->delayed_ndp16, ctx->max_ndp_size);
 
@@ -1246,9 +1276,9 @@  struct sk_buff *
 	 */
 	if (!(dev->driver_info->flags & FLAG_SEND_ZLP) &&
 	    skb_out->len > ctx->min_tx_pkt) {
-		padding_count = ctx->tx_max - skb_out->len;
+		padding_count = ctx->tx_curr_size - skb_out->len;
 		memset(skb_put(skb_out, padding_count), 0, padding_count);
-	} else if (skb_out->len < ctx->tx_max &&
+	} else if (skb_out->len < ctx->tx_curr_size &&
 		   (skb_out->len % dev->maxpacket) == 0) {
 		*skb_put(skb_out, 1) = 0;	/* force short packet */
 	}
diff --git a/include/linux/usb/cdc_ncm.h b/include/linux/usb/cdc_ncm.h
index 00d2324..021f7a8 100644
--- a/include/linux/usb/cdc_ncm.h
+++ b/include/linux/usb/cdc_ncm.h
@@ -117,6 +117,9 @@  struct cdc_ncm_ctx {
 	u32 tx_curr_frame_num;
 	u32 rx_max;
 	u32 tx_max;
+	u32 tx_curr_size;
+	u32 tx_low_mem_max_cnt;
+	u32 tx_low_mem_val;
 	u32 max_datagram_size;
 	u16 tx_max_datagrams;
 	u16 tx_remainder;