diff mbox

enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v3)

Message ID 4F611835.4080904@gmx.de
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Helge Deller March 14, 2012, 10:14 p.m. UTC
When writing to the ip_local_reserved_ports proc file it will currently clear
all previously reserved ports and update the current list with the one given 
in the input.

This behaviour makes it's usage quite hard, for example:
a) The generic proc filesystem limitation of only handle up to PAGE_SIZE-1
   characters at maximum may not be sufficient to provide all your wished-to-
   be-reserved ports at once.
b) There is no easy way to disable specific given ports, you always need to
   give the full port list at once. This makes shell scripting hard, since
   you need to parse everything yourself.
c) There is no easy way to just add specific ports or port ranges. Again,
   this would be useful for shell scripts.

The following patch solves this problem by simply extending the parser
in proc_do_large_bitmap() to accept the keywords "add" and "release" in front
of given ports or port ranges and to either add or drop the given ports
from the already existing list.

Here is an example:
$ echo "1000-2000,5000" > /proc/sys/net/ipv4/ip_local_reserved_ports
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
1000-2000,5000   (works as before, current port list is replaced by new one)

$ echo "add 3000-4000" > /proc/sys/net/ipv4/ip_local_reserved_ports
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
1000-2000,3000-4000,5000   (new ports added)

$ echo "release 1500-3500" > /proc/sys/net/ipv4/ip_local_reserved_ports
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
1000-1499,3501-4000,5000   (given ports were dropped from the list)

My main motivation for this patch is because of a huge commercial application
which by default may use lots of ports. The full port list which I would have
needed to echo to the /proc/sys/net/ipv4/ip_local_reserved_ports file was
around 30K, and in this case all ports were already combined to regions where
possible. With this patch it's now easy to split up the port ranges into
single pieces and to implement everything in simple bootup shell scripts.
Furthermore adding new or removing unneeded ports dynamically at runtime is
now easily possible.

Signed-off-by: Helge Deller <deller@gmx.de>
CC: Octavian Purdila <octavian.purdila@intel.com>
CC: netdev@vger.kernel.org
CC: David Miller <davem@davemloft.net>
CC: Cong Wang <amwang@redhat.com>
CC: "Eric W. Biederman" <ebiederm@xmission.com>

 Documentation/networking/ip-sysctl.txt |   10 +++++-
 kernel/sysctl.c                        |   51 ++++++++++++++++++++++++++++-----
 2 files changed, 53 insertions(+), 8 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric W. Biederman March 14, 2012, 10:34 p.m. UTC | #1
Helge Deller <deller@gmx.de> writes:

> When writing to the ip_local_reserved_ports proc file it will currently clear
> all previously reserved ports and update the current list with the one given 
> in the input.
>
> This behaviour makes it's usage quite hard, for example:
> a) The generic proc filesystem limitation of only handle up to PAGE_SIZE-1
>    characters at maximum may not be sufficient to provide all your wished-to-
>    be-reserved ports at once.

Fundamentally this need to be fixed first or else you will not be able
to display the bitmap through sysctl.

> b) There is no easy way to disable specific given ports, you always need to
>    give the full port list at once. This makes shell scripting hard, since
>    you need to parse everything yourself.

> c) There is no easy way to just add specific ports or port ranges. Again,
>    this would be useful for shell scripts.

Arguably b and c call for user space tools for better tools for dealing
with text ranges not a magic parser in /proc.  We already have tools
like seq, dshbak, and pdsh.  What is the difficulty of writing a little
shell utility instead of modifying /proc?

> The following patch solves this problem by simply extending the parser
> in proc_do_large_bitmap() to accept the keywords "add" and "release" in front
> of given ports or port ranges and to either add or drop the given ports
> from the already existing list.
>
> Here is an example:
> $ echo "1000-2000,5000" > /proc/sys/net/ipv4/ip_local_reserved_ports
> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
> 1000-2000,5000   (works as before, current port list is replaced by new one)
>
> $ echo "add 3000-4000" > /proc/sys/net/ipv4/ip_local_reserved_ports
> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
> 1000-2000,3000-4000,5000   (new ports added)
>
> $ echo "release 1500-3500" > /proc/sys/net/ipv4/ip_local_reserved_ports
> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
> 1000-1499,3501-4000,5000   (given ports were dropped from the list)
>
> My main motivation for this patch is because of a huge commercial application
> which by default may use lots of ports. The full port list which I would have
> needed to echo to the /proc/sys/net/ipv4/ip_local_reserved_ports file was
> around 30K, and in this case all ports were already combined to regions where
> possible. With this patch it's now easy to split up the port ranges into
> single pieces and to implement everything in simple bootup shell scripts.
> Furthermore adding new or removing unneeded ports dynamically at runtime is
> now easily possible.

You are breaking concept that the bitmap is a single value in /proc/sys
which I don't like at all.

And ultimately this is a lot of code to avoid fixing a 4K limit of
sysctl reads and writes.  If I understand this correctly after this
patch is applied you can not read the result you write in with your
commercial application.  That just seems wrong.

Can you please attack the fundamental issue first?

Eric

> Signed-off-by: Helge Deller <deller@gmx.de>
> CC: Octavian Purdila <octavian.purdila@intel.com>
> CC: netdev@vger.kernel.org
> CC: David Miller <davem@davemloft.net>
> CC: Cong Wang <amwang@redhat.com>
> CC: "Eric W. Biederman" <ebiederm@xmission.com>
>
>  Documentation/networking/ip-sysctl.txt |   10 +++++-
>  kernel/sysctl.c                        |   51 ++++++++++++++++++++++++++++-----
>  2 files changed, 53 insertions(+), 8 deletions(-)
>
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index ad3e80e..fc52546 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -624,7 +624,15 @@ ip_local_reserved_ports - list of comma separated ranges
>  	list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and
>  	10). Writing to the file will clear all previously reserved
>  	ports and update the current list with the one given in the
> -	input.
> +	input unless one of the keywords "add" or "release" is used
> +	in front of the ports in which case the given ports are added
> +	to or released from the currently existing port list.
> +	Example:
> +	$ echo "1000-2000" > /proc/sys/net/ipv4/ip_local_reserved_ports
> +	$ echo "add 3000-4000,5000" > /proc/sys/net/ipv4/ip_local_reserved_ports
> +	$ echo "release 1500-3500" > /proc/sys/net/ipv4/ip_local_reserved_ports
> +	$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
> +	1000-1499,3501-4000,5000
>  
>  	Note that ip_local_port_range and ip_local_reserved_ports
>  	settings are independent and both are considered by the kernel
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index f487f25..f9b1930 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -2805,6 +2805,8 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
>   * We use a range comma separated format (e.g. 1,3-4,10-10) so that
>   * large bitmaps may be represented in a compact manner. Writing into
>   * the file will clear the bitmap then update it with the given input.
> + * If "add" or "release" is written in front of numbers or number ranges,
> + * the given bits will be added to or released from the existing bitmap.
>   *
>   * Returns 0 on success.
>   */
> @@ -2813,11 +2815,13 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>  {
>  	int err = 0;
>  	bool first = 1;
> +	bool add_or_release = 0, xrelease = 0;
>  	size_t left = *lenp;
>  	unsigned long bitmap_len = table->maxlen;
>  	unsigned long *bitmap = (unsigned long *) table->data;
> -	unsigned long *tmp_bitmap = NULL;
> -	char tr_a[] = { '-', ',', '\n' }, tr_b[] = { ',', '\n', 0 }, c;
> +	unsigned long *tmp_bitmap = NULL, *release_bitmap = NULL;
> +	char tr_a[] = { '-', ',', ' ', '\n' },
> +	     tr_b[] = { ',', ' ', '\n', 0 }, c;
>  
>  	if (!bitmap_len || !left || (*ppos && !write)) {
>  		*lenp = 0;
> @@ -2841,8 +2845,9 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>                  }
>  		kbuf[left] = 0;
>  
> -		tmp_bitmap = kzalloc(BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long),
> -				     GFP_KERNEL);
> +		tmp_bitmap = kzalloc(2 * BITS_TO_LONGS(bitmap_len) *
> +					sizeof(unsigned long), GFP_KERNEL);
> +		release_bitmap = &tmp_bitmap[BITS_TO_LONGS(bitmap_len)];
>  		if (!tmp_bitmap) {
>  			free_page(page);
>  			return -ENOMEM;
> @@ -2852,6 +2857,29 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>  			unsigned long val_a, val_b;
>  			bool neg;
>  
> +			left -= proc_skip_spaces(&kbuf);
> +			if (!left)
> +				continue;
> +
> +			if (first || add_or_release) {
> +				if (!strnicmp(kbuf, "add ", 4)) {
> +					xrelease = 0;
> +					add_or_release = 1;
> +					kbuf += 4;
> +					left -= 4;
> +				} else
> +				if (!strnicmp(kbuf, "release ", 8)) {
> +					xrelease = 1;
> +					add_or_release = 1;
> +					kbuf += 8;
> +					left -= 8;
> +				}
> +
> +				left -= proc_skip_spaces(&kbuf);
> +				if (!left)
> +					continue;
> +			}
> +
>  			err = proc_get_long(&kbuf, &left, &val_a, &neg, tr_a,
>  					     sizeof(tr_a), &c);
>  			if (err)
> @@ -2885,12 +2913,20 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>  			}
>  
>  			while (val_a <= val_b)
> -				set_bit(val_a++, tmp_bitmap);
> +				if (xrelease)
> +					set_bit(val_a++, release_bitmap);
> +				else
> +					set_bit(val_a++, tmp_bitmap);
>  
>  			first = 0;
>  			proc_skip_char(&kbuf, &left, '\n');
>  		}
>  		free_page(page);
> +
> +		/* Do not allow adding and releasing same bits in one step. */
> +		if (!err && add_or_release &&
> +		    bitmap_intersects(tmp_bitmap, release_bitmap, bitmap_len))
> +			err = -EINVAL;
>  	} else {
>  		unsigned long bit_a, bit_b = 0;
>  
> @@ -2926,9 +2962,10 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>  
>  	if (!err) {
>  		if (write) {
> -			if (*ppos)
> +			if (*ppos || add_or_release) {
>  				bitmap_or(bitmap, bitmap, tmp_bitmap, bitmap_len);
> -			else
> +				bitmap_andnot(bitmap, bitmap, release_bitmap, bitmap_len);
> +			} else
>  				memcpy(bitmap, tmp_bitmap,
>  					BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long));
>  		}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller March 15, 2012, 11:35 p.m. UTC | #2
On 03/14/2012 11:34 PM, Eric W. Biederman wrote:
> Helge Deller<deller@gmx.de>  writes:
>
>> When writing to the ip_local_reserved_ports proc file it will currently clear
>> all previously reserved ports and update the current list with the one given
>> in the input.
>>
>> This behaviour makes it's usage quite hard, for example:
>> a) The generic proc filesystem limitation of only handle up to PAGE_SIZE-1
>>     characters at maximum may not be sufficient to provide all your wished-to-
>>     be-reserved ports at once.
>
> Fundamentally this need to be fixed first or else you will not be able
> to display the bitmap through sysctl.

Not necessarily.
Reading more than 4K from /proc/sys/net/ipv4/ip_local_reserved_ports
is possible today already (everything depends on how much chars you
read-at-once). Just try using "dd bs=1M". Writing is the major limiting factor.

>> b) There is no easy way to disable specific given ports, you always need to
>>     give the full port list at once. This makes shell scripting hard, since
>>     you need to parse everything yourself.
>
>> c) There is no easy way to just add specific ports or port ranges. Again,
>>     this would be useful for shell scripts.
>
> Arguably b and c call for user space tools for better tools for dealing
> with text ranges not a magic parser in /proc.  We already have tools
> like seq, dshbak, and pdsh.  What is the difficulty of writing a little
> shell utility instead of modifying /proc?

I'm not modifying /proc. I'm just extending it to provide an alternative
method to modify values for the large bitmap case only.
The basic existing behaviour to just read/write values is still functional.

If for example the current value of ip_local_reserved_ports is:
300,400-500,600....
then I like the idea of just
	echo "release 420-480" > ...
much more than
	cat ip_local_reserved_ports | do_complicated_shell_string_scripting_to_remove_420-480_without_getting_above_4K > ip_local_reserved_ports
BTW, in this case the range 400-500 gets split up and the output should become
300,400-419,481-500,600....

>> The following patch solves this problem by simply extending the parser
>> in proc_do_large_bitmap() to accept the keywords "add" and "release" in front
>> of given ports or port ranges and to either add or drop the given ports
>> from the already existing list.
>>
>> Here is an example:
>> $ echo "1000-2000,5000">  /proc/sys/net/ipv4/ip_local_reserved_ports
>> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
>> 1000-2000,5000   (works as before, current port list is replaced by new one)
>>
>> $ echo "add 3000-4000">  /proc/sys/net/ipv4/ip_local_reserved_ports
>> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
>> 1000-2000,3000-4000,5000   (new ports added)
>>
>> $ echo "release 1500-3500">  /proc/sys/net/ipv4/ip_local_reserved_ports
>> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
>> 1000-1499,3501-4000,5000   (given ports were dropped from the list)
>>
>> My main motivation for this patch is because of a huge commercial application
>> which by default may use lots of ports. The full port list which I would have
>> needed to echo to the /proc/sys/net/ipv4/ip_local_reserved_ports file was
>> around 30K, and in this case all ports were already combined to regions where
>> possible. With this patch it's now easy to split up the port ranges into
>> single pieces and to implement everything in simple bootup shell scripts.
>> Furthermore adding new or removing unneeded ports dynamically at runtime is
>> now easily possible.
>
> You are breaking concept that the bitmap is a single value in /proc/sys
> which I don't like at all.

Please keep in mind that this is a 65535 bit (port) bitmap, not just a "simple"
single value.
And the basic concept is still there.

> And ultimately this is a lot of code to avoid fixing a 4K limit of
> sysctl reads and writes.

I would love if the 4K problem gets fixed.
But it seems that this will lead to a major redesign and rewrite of
the whole /proc interface which I won't be able to drive.
I'm not even sure if this could be done in short time without
breaking existing compatibility. Other people here on the list can
surely comment on that better than me.

So, my patch is at least a "balanced" solution which can make the
current ip_local_reserved_ports interface useable to existing
applications. In it's current form, the usage of ip_local_reserved_ports
isfull of limitations.

>If I understand this correctly after this
> patch is applied you can not read the result you write in with your
> commercial application.  That just seems wrong.

As mentioned above: reading is possible. And with my patch
applied you usually probably won't need to read the values since you
don't care which other ports have been configured and just add/release
the ones you are interested in.

Regarding the commercial application: The idea is to provide an additional
shell-based system initscript which fills the ip_local_reserved_ports
list with initial values based on the product which the customer
installed. After the values have been set during bootup, further (dynamic)
changes of this port list won't happen unless the administrator
manually wants to add/release other ports. So, no magic in here...

> Can you please attack the fundamental issue first?

You mean the 4K/PAGE_SIZE limitation? As said above, that's probably
above my possibilities / time capatibilities. Sorry.

But of course I'm open for all kind of other ideas how and API for
the large bitmap case could be done better. Sadly the seq_file interface
doesn't seem to support reading yet...?

Helge
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index ad3e80e..fc52546 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -624,7 +624,15 @@  ip_local_reserved_ports - list of comma separated ranges
 	list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and
 	10). Writing to the file will clear all previously reserved
 	ports and update the current list with the one given in the
-	input.
+	input unless one of the keywords "add" or "release" is used
+	in front of the ports in which case the given ports are added
+	to or released from the currently existing port list.
+	Example:
+	$ echo "1000-2000" > /proc/sys/net/ipv4/ip_local_reserved_ports
+	$ echo "add 3000-4000,5000" > /proc/sys/net/ipv4/ip_local_reserved_ports
+	$ echo "release 1500-3500" > /proc/sys/net/ipv4/ip_local_reserved_ports
+	$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
+	1000-1499,3501-4000,5000
 
 	Note that ip_local_port_range and ip_local_reserved_ports
 	settings are independent and both are considered by the kernel
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f487f25..f9b1930 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2805,6 +2805,8 @@  static int proc_do_cad_pid(struct ctl_table *table, int write,
  * We use a range comma separated format (e.g. 1,3-4,10-10) so that
  * large bitmaps may be represented in a compact manner. Writing into
  * the file will clear the bitmap then update it with the given input.
+ * If "add" or "release" is written in front of numbers or number ranges,
+ * the given bits will be added to or released from the existing bitmap.
  *
  * Returns 0 on success.
  */
@@ -2813,11 +2815,13 @@  int proc_do_large_bitmap(struct ctl_table *table, int write,
 {
 	int err = 0;
 	bool first = 1;
+	bool add_or_release = 0, xrelease = 0;
 	size_t left = *lenp;
 	unsigned long bitmap_len = table->maxlen;
 	unsigned long *bitmap = (unsigned long *) table->data;
-	unsigned long *tmp_bitmap = NULL;
-	char tr_a[] = { '-', ',', '\n' }, tr_b[] = { ',', '\n', 0 }, c;
+	unsigned long *tmp_bitmap = NULL, *release_bitmap = NULL;
+	char tr_a[] = { '-', ',', ' ', '\n' },
+	     tr_b[] = { ',', ' ', '\n', 0 }, c;
 
 	if (!bitmap_len || !left || (*ppos && !write)) {
 		*lenp = 0;
@@ -2841,8 +2845,9 @@  int proc_do_large_bitmap(struct ctl_table *table, int write,
                 }
 		kbuf[left] = 0;
 
-		tmp_bitmap = kzalloc(BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long),
-				     GFP_KERNEL);
+		tmp_bitmap = kzalloc(2 * BITS_TO_LONGS(bitmap_len) *
+					sizeof(unsigned long), GFP_KERNEL);
+		release_bitmap = &tmp_bitmap[BITS_TO_LONGS(bitmap_len)];
 		if (!tmp_bitmap) {
 			free_page(page);
 			return -ENOMEM;
@@ -2852,6 +2857,29 @@  int proc_do_large_bitmap(struct ctl_table *table, int write,
 			unsigned long val_a, val_b;
 			bool neg;
 
+			left -= proc_skip_spaces(&kbuf);
+			if (!left)
+				continue;
+
+			if (first || add_or_release) {
+				if (!strnicmp(kbuf, "add ", 4)) {
+					xrelease = 0;
+					add_or_release = 1;
+					kbuf += 4;
+					left -= 4;
+				} else
+				if (!strnicmp(kbuf, "release ", 8)) {
+					xrelease = 1;
+					add_or_release = 1;
+					kbuf += 8;
+					left -= 8;
+				}
+
+				left -= proc_skip_spaces(&kbuf);
+				if (!left)
+					continue;
+			}
+
 			err = proc_get_long(&kbuf, &left, &val_a, &neg, tr_a,
 					     sizeof(tr_a), &c);
 			if (err)
@@ -2885,12 +2913,20 @@  int proc_do_large_bitmap(struct ctl_table *table, int write,
 			}
 
 			while (val_a <= val_b)
-				set_bit(val_a++, tmp_bitmap);
+				if (xrelease)
+					set_bit(val_a++, release_bitmap);
+				else
+					set_bit(val_a++, tmp_bitmap);
 
 			first = 0;
 			proc_skip_char(&kbuf, &left, '\n');
 		}
 		free_page(page);
+
+		/* Do not allow adding and releasing same bits in one step. */
+		if (!err && add_or_release &&
+		    bitmap_intersects(tmp_bitmap, release_bitmap, bitmap_len))
+			err = -EINVAL;
 	} else {
 		unsigned long bit_a, bit_b = 0;
 
@@ -2926,9 +2962,10 @@  int proc_do_large_bitmap(struct ctl_table *table, int write,
 
 	if (!err) {
 		if (write) {
-			if (*ppos)
+			if (*ppos || add_or_release) {
 				bitmap_or(bitmap, bitmap, tmp_bitmap, bitmap_len);
-			else
+				bitmap_andnot(bitmap, bitmap, release_bitmap, bitmap_len);
+			} else
 				memcpy(bitmap, tmp_bitmap,
 					BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long));
 		}