Patchwork [RFC] xfrm: do not leak ESRCH to user space

login
register
mail settings
Submitter Fernando Luis Vázquez Cao
Date Oct. 23, 2008, 2:27 p.m.
Message ID <1224772039.4404.56.camel@sebastian.kern.oss.ntt.co.jp>
Download mbox | patch
Permalink /patch/5475/
State Accepted
Delegated to: David Miller
Headers show

Comments

Fernando Luis Vázquez Cao - Oct. 23, 2008, 2:27 p.m.
Hi David,

I noticed that, under certain conditions, ESRCH can be leaked from the
xfrm layer to user space through sys_connect. In particular, this seems
to happen reliably when the kernel fails to resolve a template either
because the AF_KEY receive buffer being used by racoon is full or
because the SA entry we are trying to use is in XFRM_STATE_EXPIRED
state.

However, since this could be a transient issue it could be argued that
EAGAIN would be more appropriate. Besides this error code is not even
documented in the man page for sys_connect (as of man-pages 3.07).

What is the expected behavior (I could not find anything in the RFCs)?
Should we just fix the connect(2) man page instead?

---

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Oct. 23, 2008, 9:11 p.m.
From: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
Date: Thu, 23 Oct 2008 23:27:19 +0900

> I noticed that, under certain conditions, ESRCH can be leaked from the
> xfrm layer to user space through sys_connect. In particular, this seems
> to happen reliably when the kernel fails to resolve a template either
> because the AF_KEY receive buffer being used by racoon is full or
> because the SA entry we are trying to use is in XFRM_STATE_EXPIRED
> state.
> 
> However, since this could be a transient issue it could be argued that
> EAGAIN would be more appropriate. Besides this error code is not even
> documented in the man page for sys_connect (as of man-pages 3.07).
> 
> What is the expected behavior (I could not find anything in the RFCs)?
> Should we just fix the connect(2) man page instead?

I think this case requires some care.

-EAGAIN tells the caller that it is a temporary failure and that
retrying can be expected to succeed eventually (some resource is not
available at the moment).  So applications loop when they see this
error returned, they will try again.

But that's not what is happening when ESRCH is signalled.  We found
no matching policy, and we've done nothing to make such a policy
be found in the (near) future.  It is more of a hard failure, which
should not necessarily be retried over and over again.

So converting this to -EAGAIN doesn't seem correct at all.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Fernando Luis Vázquez Cao - Oct. 24, 2008, 1:05 a.m.
On Thu, 2008-10-23 at 14:11 -0700, David Miller wrote:
> From: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
> Date: Thu, 23 Oct 2008 23:27:19 +0900
> 
> > I noticed that, under certain conditions, ESRCH can be leaked from the
> > xfrm layer to user space through sys_connect. In particular, this seems
> > to happen reliably when the kernel fails to resolve a template either
> > because the AF_KEY receive buffer being used by racoon is full or
> > because the SA entry we are trying to use is in XFRM_STATE_EXPIRED
> > state.
> > 
> > However, since this could be a transient issue it could be argued that
> > EAGAIN would be more appropriate. Besides this error code is not even
> > documented in the man page for sys_connect (as of man-pages 3.07).
> > 
> > What is the expected behavior (I could not find anything in the RFCs)?
> > Should we just fix the connect(2) man page instead?
> 
> I think this case requires some care.
> 
> -EAGAIN tells the caller that it is a temporary failure and that
> retrying can be expected to succeed eventually (some resource is not
> available at the moment).  So applications loop when they see this
> error returned, they will try again.
> 
> But that's not what is happening when ESRCH is signalled.  We found
> no matching policy, and we've done nothing to make such a policy
> be found in the (near) future.  It is more of a hard failure, which
> should not necessarily be retried over and over again.
> 
> So converting this to -EAGAIN doesn't seem correct at all.

That would be so if -ESRCH did not happen to be a transient error.
Looking at the code, the window during which an entry is in
XFRM_STATE_EXPIRED state seems to be about 2 seconds in the worst case.
Connection attempts before and after that window would most likely
result in a successful connection or -EAGAIN, respectively. Would not it
make sense to return -EAGAIN also during that 2 seconds window?

Regarding the case when the kernel does not initiate a SA resolution
because the the AF_KEY receive buffer is full, I think it fits into the
"some resource is not available at the moment" definition for -EAGAIN.
As the buffer gets emptied chances are the future attempts will succeed.

This behavior is kind of confusing, but if deemed correct I think it
deserves to be properly documented in the respective man page. Do you
want me to do that or should the error code we return to user space be
changed in any of the two cases mentioned above?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Oct. 28, 2008, 10:58 p.m.
From: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
Date: Fri, 24 Oct 2008 10:05:00 +0900

> On Thu, 2008-10-23 at 14:11 -0700, David Miller wrote:
> > From: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
> > Date: Thu, 23 Oct 2008 23:27:19 +0900
> > 
> > > I noticed that, under certain conditions, ESRCH can be leaked from the
> > > xfrm layer to user space through sys_connect. In particular, this seems
> > > to happen reliably when the kernel fails to resolve a template either
> > > because the AF_KEY receive buffer being used by racoon is full or
> > > because the SA entry we are trying to use is in XFRM_STATE_EXPIRED
> > > state.
> > > 
> > > However, since this could be a transient issue it could be argued that
> > > EAGAIN would be more appropriate. Besides this error code is not even
> > > documented in the man page for sys_connect (as of man-pages 3.07).
> > > 
> > > What is the expected behavior (I could not find anything in the RFCs)?
> > > Should we just fix the connect(2) man page instead?
> > 
> > I think this case requires some care.
> > 
> > -EAGAIN tells the caller that it is a temporary failure and that
> > retrying can be expected to succeed eventually (some resource is not
> > available at the moment).  So applications loop when they see this
> > error returned, they will try again.
> > 
> > But that's not what is happening when ESRCH is signalled.  We found
> > no matching policy, and we've done nothing to make such a policy
> > be found in the (near) future.  It is more of a hard failure, which
> > should not necessarily be retried over and over again.
> > 
> > So converting this to -EAGAIN doesn't seem correct at all.
> 
> That would be so if -ESRCH did not happen to be a transient error.

It is not set in transient conditions as far as I can see.

Look at xfrm_state_find() which is where this error is generated and
then propagates down to xfrm_tmpl_resolve_one().

In xfrm_state_find() if an acquire is in progress to resolve the
entry, the code explicitly converts all errors into -EAGAIN.

> Looking at the code, the window during which an entry is in
> XFRM_STATE_EXPIRED state seems to be about 2 seconds in the worst case.
> Connection attempts before and after that window would most likely
> result in a successful connection or -EAGAIN, respectively. Would not it
> make sense to return -EAGAIN also during that 2 seconds window?

Only if an acquire has been triggered and is in progress, which as
explained above the code already seems to handle.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Fernando Luis Vázquez Cao - Oct. 29, 2008, 2:58 a.m.
On Tue, 2008-10-28 at 15:58 -0700, David Miller wrote:
> From: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
> Date: Fri, 24 Oct 2008 10:05:00 +0900
> 
> > On Thu, 2008-10-23 at 14:11 -0700, David Miller wrote:
> > > From: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
> > > Date: Thu, 23 Oct 2008 23:27:19 +0900
> > > 
> > > > I noticed that, under certain conditions, ESRCH can be leaked from the
> > > > xfrm layer to user space through sys_connect. In particular, this seems
> > > > to happen reliably when the kernel fails to resolve a template either
> > > > because the AF_KEY receive buffer being used by racoon is full or
> > > > because the SA entry we are trying to use is in XFRM_STATE_EXPIRED
> > > > state.
> > > > 
> > > > However, since this could be a transient issue it could be argued that
> > > > EAGAIN would be more appropriate. Besides this error code is not even
> > > > documented in the man page for sys_connect (as of man-pages 3.07).
> > > > 
> > > > What is the expected behavior (I could not find anything in the RFCs)?
> > > > Should we just fix the connect(2) man page instead?
> > > 
> > > I think this case requires some care.
> > > 
> > > -EAGAIN tells the caller that it is a temporary failure and that
> > > retrying can be expected to succeed eventually (some resource is not
> > > available at the moment).  So applications loop when they see this
> > > error returned, they will try again.
> > > 
> > > But that's not what is happening when ESRCH is signalled.  We found
> > > no matching policy, and we've done nothing to make such a policy
> > > be found in the (near) future.  It is more of a hard failure, which
> > > should not necessarily be retried over and over again.
> > > 
> > > So converting this to -EAGAIN doesn't seem correct at all.
> > 
> > That would be so if -ESRCH did not happen to be a transient error.
> 
> It is not set in transient conditions as far as I can see.
> 
> Look at xfrm_state_find() which is where this error is generated and
> then propagates down to xfrm_tmpl_resolve_one().
> 
> In xfrm_state_find() if an acquire is in progress to resolve the
> entry, the code explicitly converts all errors into -EAGAIN.

Let me quote the problematic part of that function:
    ....
    } else if (x->km.state == XFRM_STATE_ACQ) {
            acquire_in_progress = 1;
    } else if (x->km.state == XFRM_STATE_ERROR ||
               x->km.state == XFRM_STATE_EXPIRED) {
            if (xfrm_selector_match(&x->sel, fl, x->sel.family) &&
                security_xfrm_state_pol_flow_match(x, pol, fl))
                    error = -ESRCH;
    }
    ....

If the kernel enters the last branch acquire_in_progress will not be set
and it could propagate down to xfrm_tmpl_resolve_one(). The reason is
that in xfrm_timer_handler() we put any entry that expired when acquire
is in progress in XFRM_STATE_EXPIRED state __before__ sleeping for two
seconds, which means the kernel will not set acquire_in_progress to 1 in
the code above.

> > Looking at the code, the window during which an entry is in
> > XFRM_STATE_EXPIRED state seems to be about 2 seconds in the worst case.
> > Connection attempts before and after that window would most likely
> > result in a successful connection or -EAGAIN, respectively. Would not it
> > make sense to return -EAGAIN also during that 2 seconds window?
> 
> Only if an acquire has been triggered and is in progress, which as
> explained above the code already seems to handle.
As explained above that does not seem to be the case.

Besides, there is also the case that km_query() in xfrm_state_find()
fails in which we would leak -ESRCH to user space too. As mentioned in a
previous email, returning -EAGAIN is arguably more appropriate if the
cause of km_query failing was that racoon's receive buffer was full
(could happen under heavy load), because as racoon reads from it the
problem will go away.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Oct. 29, 2008, 5:06 a.m.
From: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
Date: Wed, 29 Oct 2008 11:58:23 +0900

> On Tue, 2008-10-28 at 15:58 -0700, David Miller wrote:
> > From: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
> > Date: Fri, 24 Oct 2008 10:05:00 +0900
> > 
> > > Looking at the code, the window during which an entry is in
> > > XFRM_STATE_EXPIRED state seems to be about 2 seconds in the worst case.
> > > Connection attempts before and after that window would most likely
> > > result in a successful connection or -EAGAIN, respectively. Would not it
> > > make sense to return -EAGAIN also during that 2 seconds window?
> > 
> > Only if an acquire has been triggered and is in progress, which as
> > explained above the code already seems to handle.
> As explained above that does not seem to be the case.
> 
> Besides, there is also the case that km_query() in xfrm_state_find()
> fails in which we would leak -ESRCH to user space too. As mentioned in a
> previous email, returning -EAGAIN is arguably more appropriate if the
> cause of km_query failing was that racoon's receive buffer was full
> (could happen under heavy load), because as racoon reads from it the
> problem will go away.

Good points, I'll think about this some more.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Oct. 31, 2008, 7:07 a.m.
From: David Miller <davem@davemloft.net>
Date: Tue, 28 Oct 2008 22:06:47 -0700 (PDT)

> Good points, I'll think about this some more.

Fernando, you're right (of course :), so I've applied your original
patch.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff -urNp linux-2.6.27-git10/net/xfrm/xfrm_policy.c linux-2.6.27-git10-fix/net/xfrm/xfrm_policy.c
--- linux-2.6.27-git10/net/xfrm/xfrm_policy.c	2008-10-23 22:29:45.000000000 +0900
+++ linux-2.6.27-git10-fix/net/xfrm/xfrm_policy.c	2008-10-23 22:26:31.000000000 +0900
@@ -1251,6 +1251,8 @@  xfrm_tmpl_resolve_one(struct xfrm_policy
 				 -EINVAL : -EAGAIN);
 			xfrm_state_put(x);
 		}
+		else if (error == -ESRCH)
+			error = -EAGAIN;
 
 		if (!tmpl->optional)
 			goto fail;