[iptables] iptables: support insisting that the lock is held

Submitted by Lorenzo Colitti on April 20, 2017, 9:23 a.m.

Details

Message ID 20170420092333.78247-1-lorenzo@google.com
State Changes Requested
Delegated to: Pablo Neira
Headers show

Commit Message

Lorenzo Colitti April 20, 2017, 9:23 a.m.
Currently, iptables programs will exit with an error if the
iptables lock cannot be acquired, but will silently continue if
the lock cannot be opened at all. This can cause unexpected
failures (with unhelpful error messages) in the presence of
concurrent updates.

This patch adds a compile-time option that causes iptables to
refuse to do anything if the lock cannot be acquired. It is a
compile-time option instead of a command-line flag because:

1. In order to reliably avoid concurrent modification, all
   invocations of iptables commands must follow this behaviour.
2. Whether or not the lock can be opened is typically not
   a run-time condition but is likely to be a configuration
   error.

Tested by deleting xtables.lock and observing that all commands
failed if iptables was compiled with --enable-strict-locking, but
succeeded otherwise.

By default, --enable-strict-locking is disabled for backwards
compatibility reasons. It can be enabled by default in a future
change if desired.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
---
 configure.ac                 | 12 +++++++++++-
 iptables/ip6tables-restore.c |  7 +------
 iptables/ip6tables.c         | 11 ++---------
 iptables/iptables-restore.c  |  7 +------
 iptables/iptables.c          | 11 ++---------
 iptables/xshared.c           | 26 +++++++++++++++++++++++++-
 iptables/xshared.h           |  4 ++--
 7 files changed, 44 insertions(+), 34 deletions(-)

Comments

Pablo Neira May 3, 2017, 11:01 a.m.
On Thu, Apr 20, 2017 at 06:23:33PM +0900, Lorenzo Colitti wrote:
> Currently, iptables programs will exit with an error if the
> iptables lock cannot be acquired, but will silently continue if
> the lock cannot be opened at all.

This sounds to me like a wrong design decision was made when
introducing this userspace lock.

> This can cause unexpected failures (with unhelpful error messages)
> in the presence of concurrent updates.
> 
> This patch adds a compile-time option that causes iptables to
> refuse to do anything if the lock cannot be acquired. It is a
> compile-time option instead of a command-line flag because:
> 
> 1. In order to reliably avoid concurrent modification, all
>    invocations of iptables commands must follow this behaviour.
> 2. Whether or not the lock can be opened is typically not
>    a run-time condition but is likely to be a configuration
>    error.
> 
> Tested by deleting xtables.lock and observing that all commands
> failed if iptables was compiled with --enable-strict-locking, but
> succeeded otherwise.
> 
> By default, --enable-strict-locking is disabled for backwards
> compatibility reasons. It can be enabled by default in a future
> change if desired.

I would like to skip this compile time switch, if the existing
behaviour is broken, we should just fix it. What is the scenario that
can indeed have an impact in terms of backward compatibility breakage?
Does it really make sense to keep a buggy behaviour around?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Conole May 3, 2017, 2:27 p.m.
Pablo Neira Ayuso <pablo@netfilter.org> writes:

> On Thu, Apr 20, 2017 at 06:23:33PM +0900, Lorenzo Colitti wrote:
>> Currently, iptables programs will exit with an error if the
>> iptables lock cannot be acquired, but will silently continue if
>> the lock cannot be opened at all.
>
> This sounds to me like a wrong design decision was made when
> introducing this userspace lock.

I wouldn't say it that way.  I looked at this a while ago, and one thing
to keep in mind is the if the particular prefix path in the filesystem
(for instance /run) isn't available, then this will cause iptables to
fail.  I'm not sure how many systems do provide /run - at the time it
might have been more common.

>> This can cause unexpected failures (with unhelpful error messages)
>> in the presence of concurrent updates.
>> 
>> This patch adds a compile-time option that causes iptables to
>> refuse to do anything if the lock cannot be acquired. It is a
>> compile-time option instead of a command-line flag because:
>> 
>> 1. In order to reliably avoid concurrent modification, all
>>    invocations of iptables commands must follow this behaviour.
>> 2. Whether or not the lock can be opened is typically not
>>    a run-time condition but is likely to be a configuration
>>    error.
>>
>> Tested by deleting xtables.lock and observing that all commands
>> failed if iptables was compiled with --enable-strict-locking, but
>> succeeded otherwise.
>> 
>> By default, --enable-strict-locking is disabled for backwards
>> compatibility reasons. It can be enabled by default in a future
>> change if desired.
>
> I would like to skip this compile time switch, if the existing
> behaviour is broken, we should just fix it. What is the scenario that
> can indeed have an impact in terms of backward compatibility breakage?
> Does it really make sense to keep a buggy behaviour around?

I'm not sure about a change to the behavior, but I agree that a compile
time switch is probably not the way to go.

-Aaron
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aaron Conole May 3, 2017, 2:51 p.m.
Aaron Conole <aconole@bytheb.org> writes:

> Pablo Neira Ayuso <pablo@netfilter.org> writes:
>
>> On Thu, Apr 20, 2017 at 06:23:33PM +0900, Lorenzo Colitti wrote:
>>> Currently, iptables programs will exit with an error if the
>>> iptables lock cannot be acquired, but will silently continue if
>>> the lock cannot be opened at all.
>>
>> This sounds to me like a wrong design decision was made when
>> introducing this userspace lock.
>
> I wouldn't say it that way.  I looked at this a while ago, and one thing
> to keep in mind is the if the particular prefix path in the filesystem
> (for instance /run) isn't available, then this will cause iptables to
> fail.  I'm not sure how many systems do provide /run - at the time it
> might have been more common.

Another issue is container systems.  Until recently, Kubernetes didn't
provide /run at all, and not all container orchestration tools will
provide this filesystem.

>>> This can cause unexpected failures (with unhelpful error messages)
>>> in the presence of concurrent updates.
>>> 
>>> This patch adds a compile-time option that causes iptables to
>>> refuse to do anything if the lock cannot be acquired. It is a
>>> compile-time option instead of a command-line flag because:
>>> 
>>> 1. In order to reliably avoid concurrent modification, all
>>>    invocations of iptables commands must follow this behaviour.
>>> 2. Whether or not the lock can be opened is typically not
>>>    a run-time condition but is likely to be a configuration
>>>    error.
>>>
>>> Tested by deleting xtables.lock and observing that all commands
>>> failed if iptables was compiled with --enable-strict-locking, but
>>> succeeded otherwise.
>>> 
>>> By default, --enable-strict-locking is disabled for backwards
>>> compatibility reasons. It can be enabled by default in a future
>>> change if desired.
>>
>> I would like to skip this compile time switch, if the existing
>> behaviour is broken, we should just fix it. What is the scenario that
>> can indeed have an impact in terms of backward compatibility breakage?
>> Does it really make sense to keep a buggy behaviour around?
>
> I'm not sure about a change to the behavior, but I agree that a compile
> time switch is probably not the way to go.

I've thought about it.  I think a better change that makes sense in the
presence of concurrent updates would be to use the wait argument as a
total time, and apply it to both the userspace lock, AND an EAGAIN from
the kernel space side.  So if the kernel space says 'locked try again',
and the user passed a -w option, we can simply keep retrying until the
wait time expires.  Does that make sense and solve your issue, Lorenzo?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Colitti May 3, 2017, 3:57 p.m.
On Wed, May 3, 2017 at 8:01 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> This sounds to me like a wrong design decision was made when
> introducing this userspace lock.

Looks like the lock was introduced by

93587a04d0f2 ("ip[6]tables: Add locking to prevent concurrent instances")

which made the lock optional for backwards compatibility:

       xt_socket = socket(AF_UNIX, SOCK_STREAM, 0);
       /* If we can't even create a socket, fall back to prior
(lockless) behavior */
       if (xt_socket < 0)
               return true;

That was almost four years ago. Sometime between 1.4 and 1.6, the lock
was switched to use flock() on /run/xtables.lock , but the fail-open
behaviour was preserved.

> I would like to skip this compile time switch, if the existing
> behaviour is broken, we should just fix it. What is the scenario that
> can indeed have an impact in terms of backward compatibility breakage?
> Does it really make sense to keep a buggy behaviour around?

I agree that the current behaviour is dangerous and error-prone. It's
bitten us badly several times now. The only use case I came up with is
running commands early on boot, from a read-only filesystem that does
not have /run/xtables.lock. If we change the behaviour, that system
might become unbootable on an iptables upgrade.

Another issue mentioned in the initial socket-based approach was that
no locking is necessary for iptables commands running in different
network namespaces. Someone with such a system might be relying on
this. The impact of this change would be to slow down concurrent
iptables commands.

These both seem like a very contrived cases though. I'll send out
another patch that changes the behaviour.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Colitti May 3, 2017, 3:59 p.m.
On Wed, May 3, 2017 at 11:27 PM, Aaron Conole <aconole@bytheb.org> wrote:
> I wouldn't say it that way.  I looked at this a while ago, and one thing
> to keep in mind is the if the particular prefix path in the filesystem
> (for instance /run) isn't available, then this will cause iptables to
> fail.  I'm not sure how many systems do provide /run - at the time it
> might have been more common.

That is a configuration error on the part of the distribution
maintainers. The location of the iptables lock is configurable at
compile time and if the distribution does not have /run, the
maintainers can use another path.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Colitti May 3, 2017, 4:07 p.m.
On Wed, May 3, 2017 at 11:51 PM, Aaron Conole <aconole@bytheb.org> wrote:
> I've thought about it.  I think a better change that makes sense in the
> presence of concurrent updates would be to use the wait argument as a
> total time, and apply it to both the userspace lock, AND an EAGAIN from
> the kernel space side.  So if the kernel space says 'locked try again',
> and the user passed a -w option, we can simply keep retrying until the
> wait time expires.  Does that make sense and solve your issue, Lorenzo?

The lock is always taken regardless of whether -w is specified. The
only difference is that with -w userspace waits instead of exiting
with an error.

I'm with Pablo on this one - the behaviour is incorrect and should be
fixed. Admins shouldn't have to wait for a concurrent execution to
happen before they know that they misconfigured the lock. I think that
if the admin runs iptables and the lock is unusable, it's a
configuration error, and it should be reported as such immediately.

Also - even if we add a check for EAGAIN, how reliable is that
codepath going to be? Concurrent updates are already pretty rare, and
the problem we saw was pretty hard to track down. If we sprinkle
checks for EAGAIN all over the code, and a future change misses one of
them, how will anyone notice breakage?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Colitti May 19, 2017, 7:10 a.m.
On Thu, May 4, 2017 at 12:57 AM, Lorenzo Colitti <lorenzo@google.com> wrote:
> > I would like to skip this compile time switch, if the existing
> > behaviour is broken, we should just fix it. What is the scenario that
> > can indeed have an impact in terms of backward compatibility breakage?
> > Does it really make sense to keep a buggy behaviour around?
>
> I'll send out another patch that changes the behaviour.

Sorry this took a while. http://patchwork.ozlabs.org/patch/764458/
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch hide | download patch | download mbox

diff --git a/configure.ac b/configure.ac
index 221812a8f3..a301bcd0db 100644
--- a/configure.ac
+++ b/configure.ac
@@ -71,6 +71,10 @@  AC_ARG_WITH([xt-lock-name], AS_HELP_STRING([--with-xt-lock-name=PATH],
 	[Path to the xtables lock [[/run/xtables.lock]]]),
 	[xt_lock_name="$withval"],
 	[xt_lock_name="/run/xtables.lock"])
+AC_ARG_ENABLE([strict-locking],
+	AS_HELP_STRING([--enable-strict-locking],
+	[Refuse to operate unless xtables lock was acquired]),
+	[enable_strict_locking="$enableval"], [enable_strict_locking="no"])
 
 libiptc_LDFLAGS2="";
 AX_CHECK_LINKER_FLAGS([-Wl,--no-as-needed],
@@ -239,6 +243,11 @@  AC_SUBST([libxtables_vmajor])
 AC_DEFINE_UNQUOTED([XT_LOCK_NAME], "${xt_lock_name}",
 	[Location of the iptables lock file])
 
+if test "x$enable_strict_locking" = "xyes" ; then
+	AC_DEFINE([ENABLE_STRICT_LOCKING], [1],
+		[Define to 1 to refuse to operate unless xtables lock is acquired])
+fi
+
 AC_CONFIG_FILES([Makefile extensions/GNUmakefile include/Makefile
 	iptables/Makefile iptables/xtables.pc
 	iptables/iptables.8 iptables/iptables-extensions.8.tmpl
@@ -273,7 +282,8 @@  Build parameters:
   Installation prefix (--prefix):	${prefix}
   Xtables extension directory:		${e_xtlibdir}
   Pkg-config directory:			${e_pkgconfigdir}
-  Xtables lock file:			${xt_lock_name}"
+  Xtables lock file:			${xt_lock_name}
+  Enable strict locking:		${enable_strict_locking}"
 
 if [[ -n "$ksourcedir" ]]; then
 	echo "  Kernel source directory:		${ksourcedir}"
diff --git a/iptables/ip6tables-restore.c b/iptables/ip6tables-restore.c
index 39a881dfce..1e50bf0db2 100644
--- a/iptables/ip6tables-restore.c
+++ b/iptables/ip6tables-restore.c
@@ -301,12 +301,7 @@  int ip6tables_restore_main(int argc, char *argv[])
 			in_table = 0;
 		} else if ((buffer[0] == '*') && (!in_table)) {
 			/* Acquire a lock before we create a new table handle */
-			lock = xtables_lock(wait, &wait_interval);
-			if (lock == XT_LOCK_BUSY) {
-				fprintf(stderr, "Another app is currently holding the xtables lock. "
-					"Perhaps you want to use the -w option?\n");
-				exit(RESOURCE_PROBLEM);
-			}
+			lock = xtables_lock_or_exit(wait, &wait_interval);
 
 			/* New table */
 			char *table;
diff --git a/iptables/ip6tables.c b/iptables/ip6tables.c
index 579d347b09..49bd006fb2 100644
--- a/iptables/ip6tables.c
+++ b/iptables/ip6tables.c
@@ -1779,15 +1779,8 @@  int do_command6(int argc, char *argv[], char **table,
 	generic_opt_check(command, cs.options);
 
 	/* Attempt to acquire the xtables lock */
-	if (!restore && xtables_lock(wait, &wait_interval) == XT_LOCK_BUSY) {
-		fprintf(stderr, "Another app is currently holding the xtables lock. ");
-		if (wait == 0)
-			fprintf(stderr, "Perhaps you want to use the -w option?\n");
-		else
-			fprintf(stderr, "Stopped waiting after %ds.\n", wait);
-		xtables_free_opts(1);
-		exit(RESOURCE_PROBLEM);
-	}
+	if (!restore)
+		xtables_lock_or_exit(wait, &wait_interval);
 
 	/* only allocate handle if we weren't called with a handle */
 	if (!*handle)
diff --git a/iptables/iptables-restore.c b/iptables/iptables-restore.c
index 876fe06d7f..d79fe328cc 100644
--- a/iptables/iptables-restore.c
+++ b/iptables/iptables-restore.c
@@ -299,12 +299,7 @@  iptables_restore_main(int argc, char *argv[])
 			in_table = 0;
 		} else if ((buffer[0] == '*') && (!in_table)) {
 			/* Acquire a lock before we create a new table handle */
-			lock = xtables_lock(wait, &wait_interval);
-			if (lock == XT_LOCK_BUSY) {
-				fprintf(stderr, "Another app is currently holding the xtables lock. "
-					"Perhaps you want to use the -w option?\n");
-				exit(RESOURCE_PROBLEM);
-			}
+			lock = xtables_lock_or_exit(wait, &wait_interval);
 
 			/* New table */
 			char *table;
diff --git a/iptables/iptables.c b/iptables/iptables.c
index 97cbda91cf..69d19feca1 100644
--- a/iptables/iptables.c
+++ b/iptables/iptables.c
@@ -1765,15 +1765,8 @@  int do_command4(int argc, char *argv[], char **table,
 	generic_opt_check(command, cs.options);
 
 	/* Attempt to acquire the xtables lock */
-	if (!restore && xtables_lock(wait, &wait_interval) == XT_LOCK_BUSY) {
-		fprintf(stderr, "Another app is currently holding the xtables lock. ");
-		if (wait == 0)
-			fprintf(stderr, "Perhaps you want to use the -w option?\n");
-		else
-			fprintf(stderr, "Stopped waiting after %ds.\n", wait);
-		xtables_free_opts(1);
-		exit(RESOURCE_PROBLEM);
-	}
+	if (!restore)
+		xtables_lock_or_exit(wait, &wait_interval);
 
 	/* only allocate handle if we weren't called with a handle */
 	if (!*handle)
diff --git a/iptables/xshared.c b/iptables/xshared.c
index 3fbe3b1a99..5f3a90f460 100644
--- a/iptables/xshared.c
+++ b/iptables/xshared.c
@@ -246,7 +246,7 @@  void xs_init_match(struct xtables_match *match)
 		match->init(match->m);
 }
 
-int xtables_lock(int wait, struct timeval *wait_interval)
+static int xtables_lock(int wait, struct timeval *wait_interval)
 {
 	struct timeval time_left, wait_time;
 	int fd, i = 0;
@@ -291,6 +291,30 @@  void xtables_unlock(int lock)
 		close(lock);
 }
 
+int xtables_lock_or_exit(int wait, struct timeval *wait_interval)
+{
+	int lock = xtables_lock(wait, wait_interval);
+
+	if (lock == XT_LOCK_BUSY) {
+		fprintf(stderr, "Another app is currently holding the xtables lock. ");
+		if (wait == 0)
+			fprintf(stderr, "Perhaps you want to use the -w option?\n");
+		else
+			fprintf(stderr, "Stopped waiting after %ds.\n", wait);
+		xtables_free_opts(1);
+		exit(RESOURCE_PROBLEM);
+#ifdef ENABLE_STRICT_LOCKING
+	} else if (lock == XT_LOCK_UNSUPPORTED) {
+		fprintf(stderr, "Failed to acquire the xtables lock: %s\n",
+			strerror(errno));
+		xtables_free_opts(1);
+		exit(RESOURCE_PROBLEM);
+#endif
+	}
+
+	return lock;
+}
+
 int parse_wait_time(int argc, char *argv[])
 {
 	int wait = -1;
diff --git a/iptables/xshared.h b/iptables/xshared.h
index 539e6c243b..63a3ab271b 100644
--- a/iptables/xshared.h
+++ b/iptables/xshared.h
@@ -93,7 +93,7 @@  extern void xs_init_match(struct xtables_match *);
  * A value >= 0 indicates the lock filedescriptor. Other values are:
  *
  * XT_LOCK_UNSUPPORTED : The system does not support locking, execution will
- * proceed lockless.
+ * proceed lockless unless compiled with --enable-strict-locking.
  *
  * XT_LOCK_BUSY : The lock was held by another process. xtables_lock only
  * returns this value when |wait| == false. If |wait| == true, xtables_lock
@@ -106,8 +106,8 @@  enum {
 	XT_LOCK_UNSUPPORTED  = -2,
 	XT_LOCK_NOT_ACQUIRED  = -3,
 };
-extern int xtables_lock(int wait, struct timeval *tv);
 extern void xtables_unlock(int lock);
+extern int xtables_lock_or_exit(int wait, struct timeval *tv);
 
 int parse_wait_time(int argc, char *argv[]);
 void parse_wait_interval(int argc, char *argv[], struct timeval *wait_interval);