diff mbox series

cifs: try to pick channel with a minimum of credits

Message ID 20210305142407.23652-1-aaptel@suse.com
State New
Headers show
Series cifs: try to pick channel with a minimum of credits | expand

Commit Message

Aurélien Aptel March 5, 2021, 2:24 p.m. UTC
From: Aurelien Aptel <aaptel@suse.com>

Check channel credits to prevent the client from using a starved
channel that cannot send anything.

Special care must be taken in selecting the minimum value: when
channels are created they start off with a small amount that slowly
ramps up as the channel gets used. Thus a new channel might never be
picked if the min value is too small.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
---
 fs/cifs/transport.c | 57 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 49 insertions(+), 8 deletions(-)

Comments

Steve French March 5, 2021, 4:08 p.m. UTC | #1
The comment caught my attention - is that accurate?  When a channel
has fewer than 3 credits (assuming we had two reserved for oplock and
echo), that doesn't mean that there are fewer than 3 credits in flight
- just that current credits available is lower right?  So the channel
could still recover as long as current credits + credits in flight is
at least 3.

+/*
+ * Min number of credits for a channel to be picked.
+ *
+ * Note that once a channel reaches this threshold it will never be
+ * picked again as no credits can be requested from it.
+ */
+#define CIFS_CHANNEL_MIN_CREDITS 3

On Fri, Mar 5, 2021 at 8:24 AM Aurélien Aptel <aaptel@suse.com> wrote:
>
> From: Aurelien Aptel <aaptel@suse.com>
>
> Check channel credits to prevent the client from using a starved
> channel that cannot send anything.
>
> Special care must be taken in selecting the minimum value: when
> channels are created they start off with a small amount that slowly
> ramps up as the channel gets used. Thus a new channel might never be
> picked if the min value is too small.
>
> Signed-off-by: Aurelien Aptel <aaptel@suse.com>
> ---
>  fs/cifs/transport.c | 57 ++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 49 insertions(+), 8 deletions(-)
>
> diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> index e90a1d1380b0..7bb1584b3724 100644
> --- a/fs/cifs/transport.c
> +++ b/fs/cifs/transport.c
> @@ -44,6 +44,14 @@
>  /* Max number of iovectors we can use off the stack when sending requests. */
>  #define CIFS_MAX_IOV_SIZE 8
>
> +/*
> + * Min number of credits for a channel to be picked.
> + *
> + * Note that once a channel reaches this threshold it will never be
> + * picked again as no credits can be requested from it.
> + */
> +#define CIFS_CHANNEL_MIN_CREDITS 3
> +
>  void
>  cifs_wake_up_task(struct mid_q_entry *mid)
>  {
> @@ -1051,20 +1059,53 @@ cifs_cancelled_callback(struct mid_q_entry *mid)
>  struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses)
>  {
>         uint index = 0;
> +       struct TCP_Server_Info *s;
>
>         if (!ses)
>                 return NULL;
>
> -       if (!ses->binding) {
> -               /* round robin */
> -               if (ses->chan_count > 1) {
> -                       index = (uint)atomic_inc_return(&ses->chan_seq);
> -                       index %= ses->chan_count;
> -               }
> -               return ses->chans[index].server;
> -       } else {
> +       if (ses->binding)
>                 return cifs_ses_server(ses);
> +
> +       /*
> +        * Channels are created right after the session is made. The
> +        * count cannot change after that so it is not racy to check.
> +        */
> +       if (ses->chan_count == 1)
> +               return ses->chans[index].server;
> +
> +       /* round robin */
> +       index = (uint)atomic_inc_return(&ses->chan_seq);
> +       index %= ses->chan_count;
> +       s = ses->chans[index].server;
> +
> +       /*
> +        * Checking server credits is racy, but getting a slightly
> +        * stale value should not be an issue here
> +        */
> +       if (s->credits <= CIFS_CHANNEL_MIN_CREDITS) {
> +               uint i;
> +
> +               cifs_dbg(VFS, "cannot pick conn_id=0x%llx not enough credits (%u)",
> +                        s->conn_id,
> +                        s->credits);
> +
> +               /*
> +                * Look at all other channels starting from the next
> +                * one and pick first possible channel.
> +                */
> +               for (i = 1; i < ses->chan_count; i++) {
> +                       s = ses->chans[(index+i) % ses->chan_count].server;
> +                       if (s->credits > CIFS_CHANNEL_MIN_CREDITS)
> +                               return s;
> +               }
>         }
> +
> +       /*
> +        * If none are possible, keep the initially picked one, but
> +        * later on it will block to wait for credits or fail.
> +        */
> +       return ses->chans[index].server;
>  }
>
>  int
> --
> 2.30.0
>
Shyam Prasad N March 7, 2021, 3:14 a.m. UTC | #2
Spent some time in this code path. Seems like this is more complicated
than I initially thought.
@Aurélien Aptel A few things to consider:
1. What if the credits that will be needed by the request is more than
3 (or any constant). We may still end up returning -EDEADLK to the
user when we don't find enough credits, and there are not enough
in_flight to satisfy the request. Ideally, we should still try other
channels.
2. Echo and oplock use 1 reserved credit each, which the regular
operations can steal, in case of shortage.
3. Reading server->credits without a lock can result in wildly
different values, since some CPU architectures may not update it
atomically. At worse, we may select a channel which may not have
enough credits when we get to it.

What if we do this...
wait_for_compound_request() can return -EDEADLK when it doesn't get
enough credits, and there are no requests in_flight.
In compound_send_recv(), after we call wait_for_compound_request(), we
can check it's return value. If it's -EDEADLK, we keep calling
cifs_pick_another_channel(ses, bitmask) (bitmask has bits set for
channels already selected and rejected) and
wait_for_compound_request() in a loop till we find a channel which has
enough credits, or we run out of channels; in which case we can return
-EDEADLK.
Makes sense? Do you see a problem with that approach?

Regards,
Shyam

On Fri, Mar 5, 2021 at 9:40 PM Steve French <smfrench@gmail.com> wrote:
>
> The comment caught my attention - is that accurate?  When a channel
> has fewer than 3 credits (assuming we had two reserved for oplock and
> echo), that doesn't mean that there are fewer than 3 credits in flight
> - just that current credits available is lower right?  So the channel
> could still recover as long as current credits + credits in flight is
> at least 3.
>
> +/*
> + * Min number of credits for a channel to be picked.
> + *
> + * Note that once a channel reaches this threshold it will never be
> + * picked again as no credits can be requested from it.
> + */
> +#define CIFS_CHANNEL_MIN_CREDITS 3
>
> On Fri, Mar 5, 2021 at 8:24 AM Aurélien Aptel <aaptel@suse.com> wrote:
> >
> > From: Aurelien Aptel <aaptel@suse.com>
> >
> > Check channel credits to prevent the client from using a starved
> > channel that cannot send anything.
> >
> > Special care must be taken in selecting the minimum value: when
> > channels are created they start off with a small amount that slowly
> > ramps up as the channel gets used. Thus a new channel might never be
> > picked if the min value is too small.
> >
> > Signed-off-by: Aurelien Aptel <aaptel@suse.com>
> > ---
> >  fs/cifs/transport.c | 57 ++++++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 49 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
> > index e90a1d1380b0..7bb1584b3724 100644
> > --- a/fs/cifs/transport.c
> > +++ b/fs/cifs/transport.c
> > @@ -44,6 +44,14 @@
> >  /* Max number of iovectors we can use off the stack when sending requests. */
> >  #define CIFS_MAX_IOV_SIZE 8
> >
> > +/*
> > + * Min number of credits for a channel to be picked.
> > + *
> > + * Note that once a channel reaches this threshold it will never be
> > + * picked again as no credits can be requested from it.
> > + */
> > +#define CIFS_CHANNEL_MIN_CREDITS 3
> > +
> >  void
> >  cifs_wake_up_task(struct mid_q_entry *mid)
> >  {
> > @@ -1051,20 +1059,53 @@ cifs_cancelled_callback(struct mid_q_entry *mid)
> >  struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses)
> >  {
> >         uint index = 0;
> > +       struct TCP_Server_Info *s;
> >
> >         if (!ses)
> >                 return NULL;
> >
> > -       if (!ses->binding) {
> > -               /* round robin */
> > -               if (ses->chan_count > 1) {
> > -                       index = (uint)atomic_inc_return(&ses->chan_seq);
> > -                       index %= ses->chan_count;
> > -               }
> > -               return ses->chans[index].server;
> > -       } else {
> > +       if (ses->binding)
> >                 return cifs_ses_server(ses);
> > +
> > +       /*
> > +        * Channels are created right after the session is made. The
> > +        * count cannot change after that so it is not racy to check.
> > +        */
> > +       if (ses->chan_count == 1)
> > +               return ses->chans[index].server;
> > +
> > +       /* round robin */
> > +       index = (uint)atomic_inc_return(&ses->chan_seq);
> > +       index %= ses->chan_count;
> > +       s = ses->chans[index].server;
> > +
> > +       /*
> > +        * Checking server credits is racy, but getting a slightly
> > +        * stale value should not be an issue here
> > +        */
> > +       if (s->credits <= CIFS_CHANNEL_MIN_CREDITS) {
> > +               uint i;
> > +
> > +               cifs_dbg(VFS, "cannot pick conn_id=0x%llx not enough credits (%u)",
> > +                        s->conn_id,
> > +                        s->credits);
> > +
> > +               /*
> > +                * Look at all other channels starting from the next
> > +                * one and pick first possible channel.
> > +                */
> > +               for (i = 1; i < ses->chan_count; i++) {
> > +                       s = ses->chans[(index+i) % ses->chan_count].server;
> > +                       if (s->credits > CIFS_CHANNEL_MIN_CREDITS)
> > +                               return s;
> > +               }
> >         }
> > +
> > +       /*
> > +        * If none are possible, keep the initially picked one, but
> > +        * later on it will block to wait for credits or fail.
> > +        */
> > +       return ses->chans[index].server;
> >  }
> >
> >  int
> > --
> > 2.30.0
> >
>
>
> --
> Thanks,
>
> Steve
Aurélien Aptel March 8, 2021, 11:52 a.m. UTC | #3
Hi Shyam,

Thanks for the review. I also realized we cannot make this failproof so
I went in with the idea to just avoid picking easy, non-confusing cases
of unusable channels. If that's not good we can drop the patch all
together.

Shyam Prasad N <nspmangalore@gmail.com> writes:
> Spent some time in this code path. Seems like this is more complicated
> than I initially thought.
> @Aurélien Aptel A few things to consider:
> 1. What if the credits that will be needed by the request is more than
> 3 (or any constant). We may still end up returning -EDEADLK to the
> user when we don't find enough credits, and there are not enough
> in_flight to satisfy the request. Ideally, we should still try other
> channels.

Yes you're right, it won't prevent failing if more credits are
needed. The patch wasn't meant to be failproof, just to avoid most
occurences of the problem. It's a simple sanity check with some
false-positives and false-negatives.

> 2. Echo and oplock use 1 reserved credit each, which the regular
> operations can steal, in case of shortage.

There's a dedicated server->echo_credit I think.

> 3. Reading server->credits without a lock can result in wildly
> different values, since some CPU architectures may not update it
> atomically. At worse, we may select a channel which may not have
> enough credits when we get to it

If we are just doing a read on an aligned int, at least on x86 you will
get either a stale value or the new value, you cannot get a garbage mix
of both. Which CPU architecture doesn't provide cache coherency at that
level? This is a complex question I couldn't find an easy answer to.

In any case cifs.ko is already assuming it's valid in various places. We
are doing it for some usage of the server->tcpStatus, ses->status,
tcon->tidStatus at least.

The problems of atomic read in pick_channel() aside, the credits might
change from another process between the moment the channel is picked and
the moment the credits needed are spent (server->credit -= XYZ). In
which case it will also EDEADLK later on.

> What if we do this...
> wait_for_compound_request() can return -EDEADLK when it doesn't get
> enough credits, and there are no requests in_flight.
> In compound_send_recv(), after we call wait_for_compound_request(), we
> can check it's return value. If it's -EDEADLK, we keep calling
> cifs_pick_another_channel(ses, bitmask) (bitmask has bits set for
> channels already selected and rejected) and
> wait_for_compound_request() in a loop till we find a channel which has
> enough credits, or we run out of channels; in which case we can return
> -EDEADLK.
> Makes sense? Do you see a problem with that approach?

Some code relies on server->* values so if you swap the server pointer
it at the last moment I'm not sure those values will match the new
server ptr.

Cheers,
Aurélien Aptel March 8, 2021, 4:58 p.m. UTC | #4
Hm there's a typo in my commit msg:

Special care must be taken in selecting the minimum value: when
channels are created they start off with a small amount that slowly
ramps up as the channel gets used. Thus a new channel might never be
picked if the min value is too small.
                              ^^^^^^^
                              should be "too big"

Cheers,
Shyam Prasad N March 11, 2021, 10:49 a.m. UTC | #5
> Some code relies on server->* values so if you swap the server pointer
> it at the last moment I'm not sure those values will match the new
> server ptr.

I'm not sure that I understand this. I'm not suggesting that we swap.
I'm only saying that on getting a EDEADLK error from
compound_send_recv(), try another channel instead of returning that to
userspace.
Please let me know if I'm missing something here.

Regards,
Shyam

On Mon, Mar 8, 2021 at 5:22 PM Aurélien Aptel <aaptel@suse.com> wrote:
>
> Hi Shyam,
>
> Thanks for the review. I also realized we cannot make this failproof so
> I went in with the idea to just avoid picking easy, non-confusing cases
> of unusable channels. If that's not good we can drop the patch all
> together.
>
> Shyam Prasad N <nspmangalore@gmail.com> writes:
> > Spent some time in this code path. Seems like this is more complicated
> > than I initially thought.
> > @Aurélien Aptel A few things to consider:
> > 1. What if the credits that will be needed by the request is more than
> > 3 (or any constant). We may still end up returning -EDEADLK to the
> > user when we don't find enough credits, and there are not enough
> > in_flight to satisfy the request. Ideally, we should still try other
> > channels.
>
> Yes you're right, it won't prevent failing if more credits are
> needed. The patch wasn't meant to be failproof, just to avoid most
> occurences of the problem. It's a simple sanity check with some
> false-positives and false-negatives.
>
> > 2. Echo and oplock use 1 reserved credit each, which the regular
> > operations can steal, in case of shortage.
>
> There's a dedicated server->echo_credit I think.
>
> > 3. Reading server->credits without a lock can result in wildly
> > different values, since some CPU architectures may not update it
> > atomically. At worse, we may select a channel which may not have
> > enough credits when we get to it
>
> If we are just doing a read on an aligned int, at least on x86 you will
> get either a stale value or the new value, you cannot get a garbage mix
> of both. Which CPU architecture doesn't provide cache coherency at that
> level? This is a complex question I couldn't find an easy answer to.
>
> In any case cifs.ko is already assuming it's valid in various places. We
> are doing it for some usage of the server->tcpStatus, ses->status,
> tcon->tidStatus at least.
>
> The problems of atomic read in pick_channel() aside, the credits might
> change from another process between the moment the channel is picked and
> the moment the credits needed are spent (server->credit -= XYZ). In
> which case it will also EDEADLK later on.
>
> > What if we do this...
> > wait_for_compound_request() can return -EDEADLK when it doesn't get
> > enough credits, and there are no requests in_flight.
> > In compound_send_recv(), after we call wait_for_compound_request(), we
> > can check it's return value. If it's -EDEADLK, we keep calling
> > cifs_pick_another_channel(ses, bitmask) (bitmask has bits set for
> > channels already selected and rejected) and
> > wait_for_compound_request() in a loop till we find a channel which has
> > enough credits, or we run out of channels; in which case we can return
> > -EDEADLK.
> > Makes sense? Do you see a problem with that approach?
>
> Some code relies on server->* values so if you swap the server pointer
> it at the last moment I'm not sure those values will match the new
> server ptr.
>
> Cheers,
> --
> Aurélien Aptel / SUSE Labs Samba Team
> GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)
>
Shyam Prasad N March 11, 2021, 1:31 p.m. UTC | #6
Discussed this with Aurelien.

> I'm only saying that on getting a EDEADLK error from
> compound_send_recv(), try another channel instead of returning that to
> userspace.
We both agreed that this will be a cleaner way to deal with the issue.
However, he pointed to me that the code changes will be more than what
I initially thought.
That needs more investigation.

The most likely case to hit the problem of EDEADLK is when the first
request on the channel is bigger than what we need.
The proposed fix should avoid that for basic requests by switching
channels (pending testing).
However, a large read/write could still result in EDEADLK error being returned.

Regards,
Shyam

On Thu, Mar 11, 2021 at 4:19 PM Shyam Prasad N <nspmangalore@gmail.com> wrote:
>
> > Some code relies on server->* values so if you swap the server pointer
> > it at the last moment I'm not sure those values will match the new
> > server ptr.
>
> I'm not sure that I understand this. I'm not suggesting that we swap.
> I'm only saying that on getting a EDEADLK error from
> compound_send_recv(), try another channel instead of returning that to
> userspace.
> Please let me know if I'm missing something here.
>
> Regards,
> Shyam
>
> On Mon, Mar 8, 2021 at 5:22 PM Aurélien Aptel <aaptel@suse.com> wrote:
> >
> > Hi Shyam,
> >
> > Thanks for the review. I also realized we cannot make this failproof so
> > I went in with the idea to just avoid picking easy, non-confusing cases
> > of unusable channels. If that's not good we can drop the patch all
> > together.
> >
> > Shyam Prasad N <nspmangalore@gmail.com> writes:
> > > Spent some time in this code path. Seems like this is more complicated
> > > than I initially thought.
> > > @Aurélien Aptel A few things to consider:
> > > 1. What if the credits that will be needed by the request is more than
> > > 3 (or any constant). We may still end up returning -EDEADLK to the
> > > user when we don't find enough credits, and there are not enough
> > > in_flight to satisfy the request. Ideally, we should still try other
> > > channels.
> >
> > Yes you're right, it won't prevent failing if more credits are
> > needed. The patch wasn't meant to be failproof, just to avoid most
> > occurences of the problem. It's a simple sanity check with some
> > false-positives and false-negatives.
> >
> > > 2. Echo and oplock use 1 reserved credit each, which the regular
> > > operations can steal, in case of shortage.
> >
> > There's a dedicated server->echo_credit I think.
> >
> > > 3. Reading server->credits without a lock can result in wildly
> > > different values, since some CPU architectures may not update it
> > > atomically. At worse, we may select a channel which may not have
> > > enough credits when we get to it
> >
> > If we are just doing a read on an aligned int, at least on x86 you will
> > get either a stale value or the new value, you cannot get a garbage mix
> > of both. Which CPU architecture doesn't provide cache coherency at that
> > level? This is a complex question I couldn't find an easy answer to.
> >
> > In any case cifs.ko is already assuming it's valid in various places. We
> > are doing it for some usage of the server->tcpStatus, ses->status,
> > tcon->tidStatus at least.
> >
> > The problems of atomic read in pick_channel() aside, the credits might
> > change from another process between the moment the channel is picked and
> > the moment the credits needed are spent (server->credit -= XYZ). In
> > which case it will also EDEADLK later on.
> >
> > > What if we do this...
> > > wait_for_compound_request() can return -EDEADLK when it doesn't get
> > > enough credits, and there are no requests in_flight.
> > > In compound_send_recv(), after we call wait_for_compound_request(), we
> > > can check it's return value. If it's -EDEADLK, we keep calling
> > > cifs_pick_another_channel(ses, bitmask) (bitmask has bits set for
> > > channels already selected and rejected) and
> > > wait_for_compound_request() in a loop till we find a channel which has
> > > enough credits, or we run out of channels; in which case we can return
> > > -EDEADLK.
> > > Makes sense? Do you see a problem with that approach?
> >
> > Some code relies on server->* values so if you swap the server pointer
> > it at the last moment I'm not sure those values will match the new
> > server ptr.
> >
> > Cheers,
> > --
> > Aurélien Aptel / SUSE Labs Samba Team
> > GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
> > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)
> >
>
>
> --
> Regards,
> Shyam
diff mbox series

Patch

diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c
index e90a1d1380b0..7bb1584b3724 100644
--- a/fs/cifs/transport.c
+++ b/fs/cifs/transport.c
@@ -44,6 +44,14 @@ 
 /* Max number of iovectors we can use off the stack when sending requests. */
 #define CIFS_MAX_IOV_SIZE 8
 
+/*
+ * Min number of credits for a channel to be picked.
+ *
+ * Note that once a channel reaches this threshold it will never be
+ * picked again as no credits can be requested from it.
+ */
+#define CIFS_CHANNEL_MIN_CREDITS 3
+
 void
 cifs_wake_up_task(struct mid_q_entry *mid)
 {
@@ -1051,20 +1059,53 @@  cifs_cancelled_callback(struct mid_q_entry *mid)
 struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses)
 {
 	uint index = 0;
+	struct TCP_Server_Info *s;
 
 	if (!ses)
 		return NULL;
 
-	if (!ses->binding) {
-		/* round robin */
-		if (ses->chan_count > 1) {
-			index = (uint)atomic_inc_return(&ses->chan_seq);
-			index %= ses->chan_count;
-		}
-		return ses->chans[index].server;
-	} else {
+	if (ses->binding)
 		return cifs_ses_server(ses);
+
+	/*
+	 * Channels are created right after the session is made. The
+	 * count cannot change after that so it is not racy to check.
+	 */
+	if (ses->chan_count == 1)
+		return ses->chans[index].server;
+
+	/* round robin */
+	index = (uint)atomic_inc_return(&ses->chan_seq);
+	index %= ses->chan_count;
+	s = ses->chans[index].server;
+
+	/*
+	 * Checking server credits is racy, but getting a slightly
+	 * stale value should not be an issue here
+	 */
+	if (s->credits <= CIFS_CHANNEL_MIN_CREDITS) {
+		uint i;
+
+		cifs_dbg(VFS, "cannot pick conn_id=0x%llx not enough credits (%u)",
+			 s->conn_id,
+			 s->credits);
+
+		/*
+		 * Look at all other channels starting from the next
+		 * one and pick first possible channel.
+		 */
+		for (i = 1; i < ses->chan_count; i++) {
+			s = ses->chans[(index+i) % ses->chan_count].server;
+			if (s->credits > CIFS_CHANNEL_MIN_CREDITS)
+				return s;
+		}
 	}
+
+	/*
+	 * If none are possible, keep the initially picked one, but
+	 * later on it will block to wait for credits or fail.
+	 */
+	return ses->chans[index].server;
 }
 
 int