diff mbox

[RFC] net: limit maximum number of packets to mark with xmit_more

Message ID 20170825152449.29790-1-jacob.e.keller@intel.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jacob Keller Aug. 25, 2017, 3:24 p.m. UTC
Under some circumstances, such as with many stacked devices, it is
possible that dev_hard_start_xmit will bundle many packets together, and
mark them all with xmit_more.

Most drivers respond to xmit_more by skipping tail bumps on packet
rings, or similar behavior as long as xmit_more is set. This is
a performance win since it means drivers can avoid notifying hardware of
new packets repeat daily, and thus avoid wasting unnecessary PCIe or other
bandwidth.

This use of xmit_more comes with a trade off because bundling too many
packets can increase latency of the Tx packets. To avoid this, we should
limit the maximum number of packets with xmit_more.

Driver authors could modify their drivers to check for some determined
limit, but this requires all drivers to be modified in order to gain
advantage.

Instead, add a sysctl "xmit_more_max" which can be used to configure the
maximum number of xmit_more skbs to send in a sequence. This ensures
that all drivers benefit, and allows system administrators the option to
tune the value to their environment.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---

Stray thoughts and further questions....

Is this the right approach? Did I miss any other places where we should
limit? Does the limit make sense? Should it instead be a per-device
tuning nob instead of a global? Is 32 a good default?

 Documentation/sysctl/net.txt |  6 ++++++
 include/linux/netdevice.h    |  2 ++
 net/core/dev.c               | 10 +++++++++-
 net/core/sysctl_net_core.c   |  7 +++++++
 4 files changed, 24 insertions(+), 1 deletion(-)

Comments

Waskiewicz Jr, Peter Aug. 25, 2017, 3:36 p.m. UTC | #1
On 8/25/17 11:25 AM, Jacob Keller wrote:
> Under some circumstances, such as with many stacked devices, it is
> possible that dev_hard_start_xmit will bundle many packets together, and
> mark them all with xmit_more.
> 
> Most drivers respond to xmit_more by skipping tail bumps on packet
> rings, or similar behavior as long as xmit_more is set. This is
> a performance win since it means drivers can avoid notifying hardware of
> new packets repeat daily, and thus avoid wasting unnecessary PCIe or other
> bandwidth.
> 
> This use of xmit_more comes with a trade off because bundling too many
> packets can increase latency of the Tx packets. To avoid this, we should
> limit the maximum number of packets with xmit_more.
> 
> Driver authors could modify their drivers to check for some determined
> limit, but this requires all drivers to be modified in order to gain
> advantage.
> 
> Instead, add a sysctl "xmit_more_max" which can be used to configure the
> maximum number of xmit_more skbs to send in a sequence. This ensures
> that all drivers benefit, and allows system administrators the option to
> tune the value to their environment.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
> 
> Stray thoughts and further questions....
> 
> Is this the right approach? Did I miss any other places where we should
> limit? Does the limit make sense? Should it instead be a per-device
> tuning nob instead of a global? Is 32 a good default?

I actually like the idea of a per-device knob.  A xmit_more_max that's 
global in a system with 1GbE devices along with a 25/50GbE or more just 
doesn't make much sense to me.  Or having heterogeneous vendor devices 
in the same system that have different HW behaviors could mask issues 
with latency.

This seems like another incarnation of possible buffer-bloat if the max 
is too high...

> 
>   Documentation/sysctl/net.txt |  6 ++++++
>   include/linux/netdevice.h    |  2 ++
>   net/core/dev.c               | 10 +++++++++-
>   net/core/sysctl_net_core.c   |  7 +++++++
>   4 files changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
> index b67044a2575f..3d995e8f4448 100644
> --- a/Documentation/sysctl/net.txt
> +++ b/Documentation/sysctl/net.txt
> @@ -230,6 +230,12 @@ netdev_max_backlog
>   Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
>   receives packets faster than kernel can process them.
>   
> +xmit_more_max
> +-------------
> +
> +Maximum number of packets in a row to mark with skb->xmit_more. A value of zero
> +indicates no limit.

What defines "packet?"  MTU-sized packets, or payloads coming down from 
the stack (e.g. TSO's)?

-PJ
Stephen Hemminger Aug. 25, 2017, 3:58 p.m. UTC | #2
On Fri, 25 Aug 2017 15:36:22 +0000
"Waskiewicz Jr, Peter" <peter.waskiewicz.jr@intel.com> wrote:

> On 8/25/17 11:25 AM, Jacob Keller wrote:
> > Under some circumstances, such as with many stacked devices, it is
> > possible that dev_hard_start_xmit will bundle many packets together, and
> > mark them all with xmit_more.
> > 
> > Most drivers respond to xmit_more by skipping tail bumps on packet
> > rings, or similar behavior as long as xmit_more is set. This is
> > a performance win since it means drivers can avoid notifying hardware of
> > new packets repeat daily, and thus avoid wasting unnecessary PCIe or other
> > bandwidth.
> > 
> > This use of xmit_more comes with a trade off because bundling too many
> > packets can increase latency of the Tx packets. To avoid this, we should
> > limit the maximum number of packets with xmit_more.
> > 
> > Driver authors could modify their drivers to check for some determined
> > limit, but this requires all drivers to be modified in order to gain
> > advantage.
> > 
> > Instead, add a sysctl "xmit_more_max" which can be used to configure the
> > maximum number of xmit_more skbs to send in a sequence. This ensures
> > that all drivers benefit, and allows system administrators the option to
> > tune the value to their environment.
> > 
> > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> > ---
> > 
> > Stray thoughts and further questions....
> > 
> > Is this the right approach? Did I miss any other places where we should
> > limit? Does the limit make sense? Should it instead be a per-device
> > tuning nob instead of a global? Is 32 a good default?  
> 
> I actually like the idea of a per-device knob.  A xmit_more_max that's 
> global in a system with 1GbE devices along with a 25/50GbE or more just 
> doesn't make much sense to me.  Or having heterogeneous vendor devices 
> in the same system that have different HW behaviors could mask issues 
> with latency.
> 
> This seems like another incarnation of possible buffer-bloat if the max 
> is too high...
> 
> > 
> >   Documentation/sysctl/net.txt |  6 ++++++
> >   include/linux/netdevice.h    |  2 ++
> >   net/core/dev.c               | 10 +++++++++-
> >   net/core/sysctl_net_core.c   |  7 +++++++
> >   4 files changed, 24 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
> > index b67044a2575f..3d995e8f4448 100644
> > --- a/Documentation/sysctl/net.txt
> > +++ b/Documentation/sysctl/net.txt
> > @@ -230,6 +230,12 @@ netdev_max_backlog
> >   Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
> >   receives packets faster than kernel can process them.
> >   
> > +xmit_more_max
> > +-------------
> > +
> > +Maximum number of packets in a row to mark with skb->xmit_more. A value of zero
> > +indicates no limit.  
> 
> What defines "packet?"  MTU-sized packets, or payloads coming down from 
> the stack (e.g. TSO's)?

xmit_more is only a hint to the device. The device driver should ignore it unless
there are hardware advantages. The device driver is the place with HW specific
knowledge (like 4 Tx descriptors is equivalent to one PCI transaction on this device).

Anything that pushes that optimization out to the user is only useful for benchmarks
and embedded devices.
Jacob Keller Aug. 25, 2017, 4:24 p.m. UTC | #3
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Friday, August 25, 2017 8:58 AM
> To: Waskiewicz Jr, Peter <peter.waskiewicz.jr@intel.com>
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] net: limit maximum number of packets to mark with
> xmit_more
> 
> xmit_more is only a hint to the device. The device driver should ignore it unless
> there are hardware advantages. The device driver is the place with HW specific
> knowledge (like 4 Tx descriptors is equivalent to one PCI transaction on this
> device).
> 
> Anything that pushes that optimization out to the user is only useful for
> benchmarks
> and embedded devices.

Right so most drivers I've seen simply take it as a "avoid bumping tail of a ring" whenever they see xmit_more. But unfortunately in some circumstances, this results in potentially several hundred packets being set with xmit_more in a row, and then the driver doesn't bump the tail for a long time, resulting in high latency spikes..

I was trying to find a way to fix this potentially in multiple drivers, rather than just a single driver, since I figured the same sort of code might need to be needed.

So you're suggesting we should just perform some check in the device driver, even if it might be duplication?

We could also instead make it a setting in the netdev struct or something which would be set by the driver and then tell stack code to limit how many it sends at once (so that we don't need to duplicate that checking code in every driver?)

Thanks,
Jake
Jakub Kicinski Aug. 25, 2017, 7:34 p.m. UTC | #4
On Fri, 25 Aug 2017 08:24:49 -0700, Jacob Keller wrote:
> Under some circumstances, such as with many stacked devices, it is
> possible that dev_hard_start_xmit will bundle many packets together, and
> mark them all with xmit_more.

Excuse my ignorance but what are those stacked devices?  Could they
perhaps be fixed somehow?  My intuition was that long xmit_more
sequences can only happen if NIC and/or BQL are back pressuring, and
therefore we shouldn't be seeing a long xmit_more "train" arriving at
an empty device ring...
Alexander H Duyck Aug. 25, 2017, 10:33 p.m. UTC | #5
On Fri, Aug 25, 2017 at 8:58 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Fri, 25 Aug 2017 15:36:22 +0000
> "Waskiewicz Jr, Peter" <peter.waskiewicz.jr@intel.com> wrote:
>
>> On 8/25/17 11:25 AM, Jacob Keller wrote:
>> > Under some circumstances, such as with many stacked devices, it is
>> > possible that dev_hard_start_xmit will bundle many packets together, and
>> > mark them all with xmit_more.
>> >
>> > Most drivers respond to xmit_more by skipping tail bumps on packet
>> > rings, or similar behavior as long as xmit_more is set. This is
>> > a performance win since it means drivers can avoid notifying hardware of
>> > new packets repeat daily, and thus avoid wasting unnecessary PCIe or other
>> > bandwidth.
>> >
>> > This use of xmit_more comes with a trade off because bundling too many
>> > packets can increase latency of the Tx packets. To avoid this, we should
>> > limit the maximum number of packets with xmit_more.
>> >
>> > Driver authors could modify their drivers to check for some determined
>> > limit, but this requires all drivers to be modified in order to gain
>> > advantage.
>> >
>> > Instead, add a sysctl "xmit_more_max" which can be used to configure the
>> > maximum number of xmit_more skbs to send in a sequence. This ensures
>> > that all drivers benefit, and allows system administrators the option to
>> > tune the value to their environment.
>> >
>> > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> > ---
>> >
>> > Stray thoughts and further questions....
>> >
>> > Is this the right approach? Did I miss any other places where we should
>> > limit? Does the limit make sense? Should it instead be a per-device
>> > tuning nob instead of a global? Is 32 a good default?
>>
>> I actually like the idea of a per-device knob.  A xmit_more_max that's
>> global in a system with 1GbE devices along with a 25/50GbE or more just
>> doesn't make much sense to me.  Or having heterogeneous vendor devices
>> in the same system that have different HW behaviors could mask issues
>> with latency.
>>
>> This seems like another incarnation of possible buffer-bloat if the max
>> is too high...
>>
>> >
>> >   Documentation/sysctl/net.txt |  6 ++++++
>> >   include/linux/netdevice.h    |  2 ++
>> >   net/core/dev.c               | 10 +++++++++-
>> >   net/core/sysctl_net_core.c   |  7 +++++++
>> >   4 files changed, 24 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
>> > index b67044a2575f..3d995e8f4448 100644
>> > --- a/Documentation/sysctl/net.txt
>> > +++ b/Documentation/sysctl/net.txt
>> > @@ -230,6 +230,12 @@ netdev_max_backlog
>> >   Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
>> >   receives packets faster than kernel can process them.
>> >
>> > +xmit_more_max
>> > +-------------
>> > +
>> > +Maximum number of packets in a row to mark with skb->xmit_more. A value of zero
>> > +indicates no limit.
>>
>> What defines "packet?"  MTU-sized packets, or payloads coming down from
>> the stack (e.g. TSO's)?
>
> xmit_more is only a hint to the device. The device driver should ignore it unless
> there are hardware advantages. The device driver is the place with HW specific
> knowledge (like 4 Tx descriptors is equivalent to one PCI transaction on this device).
>
> Anything that pushes that optimization out to the user is only useful for benchmarks
> and embedded devices.

Actually I think I might have an idea what is going on here and I
agree that this is probably something that needs to be fixed in the
drivers. Especially since the problem isn't so much the skbs but
descriptors in the descriptor ring.

If I am not mistaken the issue is most drivers will honor the
xmit_more unless the ring cannot enqueue another packet. The problem
is if the clean-up is occurring on a different CPU than transmit we
can cause the clean-up CPU/device DMA to go idle by not providing any
notifications to the device that new packets are present. What we
should probably do is look at adding another condition which is to
force us to flush the packet if we have used over half of the
descriptors in a given ring without notifying the device. Then that
way we can be filling half while the device is processing the other
half which should result in us operating smoothly.

- Alex
Jacob Keller Aug. 28, 2017, 8:46 p.m. UTC | #6
> -----Original Message-----

> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On

> Behalf Of Alexander Duyck

> Sent: Friday, August 25, 2017 3:34 PM

> To: Stephen Hemminger <stephen@networkplumber.org>

> Cc: Waskiewicz Jr, Peter <peter.waskiewicz.jr@intel.com>; Keller, Jacob E

> <jacob.e.keller@intel.com>; netdev@vger.kernel.org

> Subject: Re: [RFC PATCH] net: limit maximum number of packets to mark with

> xmit_more

> 

> On Fri, Aug 25, 2017 at 8:58 AM, Stephen Hemminger

> <stephen@networkplumber.org> wrote:

> > On Fri, 25 Aug 2017 15:36:22 +0000

> > "Waskiewicz Jr, Peter" <peter.waskiewicz.jr@intel.com> wrote:

> >

> >> On 8/25/17 11:25 AM, Jacob Keller wrote:

> >> > Under some circumstances, such as with many stacked devices, it is

> >> > possible that dev_hard_start_xmit will bundle many packets together, and

> >> > mark them all with xmit_more.

> >> >

> >> > Most drivers respond to xmit_more by skipping tail bumps on packet

> >> > rings, or similar behavior as long as xmit_more is set. This is

> >> > a performance win since it means drivers can avoid notifying hardware of

> >> > new packets repeat daily, and thus avoid wasting unnecessary PCIe or other

> >> > bandwidth.

> >> >

> >> > This use of xmit_more comes with a trade off because bundling too many

> >> > packets can increase latency of the Tx packets. To avoid this, we should

> >> > limit the maximum number of packets with xmit_more.

> >> >

> >> > Driver authors could modify their drivers to check for some determined

> >> > limit, but this requires all drivers to be modified in order to gain

> >> > advantage.

> >> >

> >> > Instead, add a sysctl "xmit_more_max" which can be used to configure the

> >> > maximum number of xmit_more skbs to send in a sequence. This ensures

> >> > that all drivers benefit, and allows system administrators the option to

> >> > tune the value to their environment.

> >> >

> >> > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

> >> > ---

> >> >

> >> > Stray thoughts and further questions....

> >> >

> >> > Is this the right approach? Did I miss any other places where we should

> >> > limit? Does the limit make sense? Should it instead be a per-device

> >> > tuning nob instead of a global? Is 32 a good default?

> >>

> >> I actually like the idea of a per-device knob.  A xmit_more_max that's

> >> global in a system with 1GbE devices along with a 25/50GbE or more just

> >> doesn't make much sense to me.  Or having heterogeneous vendor devices

> >> in the same system that have different HW behaviors could mask issues

> >> with latency.

> >>

> >> This seems like another incarnation of possible buffer-bloat if the max

> >> is too high...

> >>

> >> >

> >> >   Documentation/sysctl/net.txt |  6 ++++++

> >> >   include/linux/netdevice.h    |  2 ++

> >> >   net/core/dev.c               | 10 +++++++++-

> >> >   net/core/sysctl_net_core.c   |  7 +++++++

> >> >   4 files changed, 24 insertions(+), 1 deletion(-)

> >> >

> >> > diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt

> >> > index b67044a2575f..3d995e8f4448 100644

> >> > --- a/Documentation/sysctl/net.txt

> >> > +++ b/Documentation/sysctl/net.txt

> >> > @@ -230,6 +230,12 @@ netdev_max_backlog

> >> >   Maximum number  of  packets,  queued  on  the  INPUT  side, when the

> interface

> >> >   receives packets faster than kernel can process them.

> >> >

> >> > +xmit_more_max

> >> > +-------------

> >> > +

> >> > +Maximum number of packets in a row to mark with skb->xmit_more. A value

> of zero

> >> > +indicates no limit.

> >>

> >> What defines "packet?"  MTU-sized packets, or payloads coming down from

> >> the stack (e.g. TSO's)?

> >

> > xmit_more is only a hint to the device. The device driver should ignore it unless

> > there are hardware advantages. The device driver is the place with HW specific

> > knowledge (like 4 Tx descriptors is equivalent to one PCI transaction on this

> device).

> >

> > Anything that pushes that optimization out to the user is only useful for

> benchmarks

> > and embedded devices.

> 

> Actually I think I might have an idea what is going on here and I

> agree that this is probably something that needs to be fixed in the

> drivers. Especially since the problem isn't so much the skbs but

> descriptors in the descriptor ring.

> 

> If I am not mistaken the issue is most drivers will honor the

> xmit_more unless the ring cannot enqueue another packet. The problem

> is if the clean-up is occurring on a different CPU than transmit we

> can cause the clean-up CPU/device DMA to go idle by not providing any

> notifications to the device that new packets are present. What we

> should probably do is look at adding another condition which is to

> force us to flush the packet if we have used over half of the

> descriptors in a given ring without notifying the device. Then that

> way we can be filling half while the device is processing the other

> half which should result in us operating smoothly.

> 

> - Alex


Ok, and that definitely is driver specific, so I would be comfortable leaving that up to driver implementation. I'll look at creating a patch to do something like this for i40e.

Thanks,
Jake
Jacob Keller Aug. 28, 2017, 8:56 p.m. UTC | #7
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
> Behalf Of Jakub Kicinski
> Sent: Friday, August 25, 2017 12:34 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] net: limit maximum number of packets to mark with
> xmit_more
> 
> On Fri, 25 Aug 2017 08:24:49 -0700, Jacob Keller wrote:
> > Under some circumstances, such as with many stacked devices, it is
> > possible that dev_hard_start_xmit will bundle many packets together, and
> > mark them all with xmit_more.
> 
> Excuse my ignorance but what are those stacked devices?  Could they
> perhaps be fixed somehow?  My intuition was that long xmit_more
> sequences can only happen if NIC and/or BQL are back pressuring, and
> therefore we shouldn't be seeing a long xmit_more "train" arriving at
> an empty device ring...

a veth device connecting a VM to the host, then connected to a bridge, which is connected to a vlan interface connected to a bond, which is hooked in active-backup to a physical device.

Sorry if I don't really know the correct way to refer to these, I just think of them as devices stacked on top of each other.

During root cause investigation I found that we (the i40e driver) sometimes received up to 100 or more SKBs in a row with xmit_more set. We were incorrectly also using xmit_more as a hint for not marking packets to get writebacks, which caused significant throughput issues. Additionally there was concern that that many packets in a row without a tail bump would cause latency issues, so I thought maybe it was best to simply guarantee that the stack didn't send us too many packets marked with xmit more at once.

It seems based on discussion that it should be up to the driver to determine exactly how to handle the xmit_more hint and to determine when it actually isn't helpful or not, so I do not think this patch makes sense now.

Thanks,
Jake
David Laight Aug. 29, 2017, 1:35 p.m. UTC | #8
From: Jakub Kicinski
> Sent: 25 August 2017 20:34
>
> On Fri, 25 Aug 2017 08:24:49 -0700, Jacob Keller wrote:
> > Under some circumstances, such as with many stacked devices, it is
> > possible that dev_hard_start_xmit will bundle many packets together, and
> > mark them all with xmit_more.
> 
> Excuse my ignorance but what are those stacked devices?  Could they
> perhaps be fixed somehow?  My intuition was that long xmit_more
> sequences can only happen if NIC and/or BQL are back pressuring, and
> therefore we shouldn't be seeing a long xmit_more "train" arriving at
> an empty device ring...

I also suspect that the packets could be coming from multiple sources.
So getting the sources to limit the number of packets with XMIT_MORE
set won't really solve any problem.

At some point the driver for the physical device will have to give it
a kick to start the transmits.

On the systems I've got (desktop x86) PCIe writes aren't really very
expensive.
Reads are a different matter entirely (2us into our fpga target).

	David.
diff mbox

Patch

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index b67044a2575f..3d995e8f4448 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -230,6 +230,12 @@  netdev_max_backlog
 Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
 receives packets faster than kernel can process them.
 
+xmit_more_max
+-------------
+
+Maximum number of packets in a row to mark with skb->xmit_more. A value of zero
+indicates no limit.
+
 netdev_rss_key
 --------------
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c5475b37a631..6341452aed09 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3321,6 +3321,8 @@  void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev);
 extern int		netdev_budget;
 extern unsigned int	netdev_budget_usecs;
 
+extern unsigned int sysctl_xmit_more_max;
+
 /* Called by rtnetlink.c:rtnl_unlock() */
 void netdev_run_todo(void);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 270b54754821..d9946d29c3a5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2983,12 +2983,19 @@  struct sk_buff *dev_hard_start_xmit(struct sk_buff *first, struct net_device *de
 {
 	struct sk_buff *skb = first;
 	int rc = NETDEV_TX_OK;
+	int xmit_count = 0;
+	bool more = true;
 
 	while (skb) {
 		struct sk_buff *next = skb->next;
 
+		if (sysctl_xmit_more_max)
+			more = xmit_count++ < sysctl_xmit_more_max;
+		if (!more)
+			xmit_count = 0;
+
 		skb->next = NULL;
-		rc = xmit_one(skb, dev, txq, next != NULL);
+		rc = xmit_one(skb, dev, txq, more && next != NULL);
 		if (unlikely(!dev_xmit_complete(rc))) {
 			skb->next = next;
 			goto out;
@@ -3523,6 +3530,7 @@  EXPORT_SYMBOL(netdev_max_backlog);
 int netdev_tstamp_prequeue __read_mostly = 1;
 int netdev_budget __read_mostly = 300;
 unsigned int __read_mostly netdev_budget_usecs = 2000;
+unsigned int __read_mostly sysctl_xmit_more_max = 32;
 int weight_p __read_mostly = 64;           /* old backlog weight */
 int dev_weight_rx_bias __read_mostly = 1;  /* bias for backlog weight */
 int dev_weight_tx_bias __read_mostly = 1;  /* bias for output_queue quota */
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index b7cd9aafe99e..6950e702e101 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -460,6 +460,13 @@  static struct ctl_table net_core_table[] = {
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= &zero,
 	},
+	{
+		.procname	= "xmit_more_max",
+		.data		= &sysctl_xmit_more_max,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.prox_handler	= proc_dointvec
+	},
 	{ }
 };