diff mbox

sheepdog: discard the payload if the header is invalid

Message ID 1441070971-22535-1-git-send-email-namei.unix@gmail.com
State New
Headers show

Commit Message

Liu Yuan Sept. 1, 2015, 1:29 a.m. UTC
From: Liu Yuan <liuyuan@cmss.chinamobile.com>

We need to discard the payload if we get a invalid header due to whatever reason
to avoid data stream curruption. For e.g., the response consists of header plus
data payload. If we simply read the header then the data payload is left in the
socket buffer and the next time we would read the garbage data and currupt the
whole connection.

Cc: qemu-devel@nongnu.org
Cc: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <liuyuan@cmss.chinamobile.com>
---
 block/sheepdog.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Jeff Cody Sept. 1, 2015, 1:51 a.m. UTC | #1
On Tue, Sep 01, 2015 at 09:29:31AM +0800, Liu Yuan wrote:
> From: Liu Yuan <liuyuan@cmss.chinamobile.com>
> 
> We need to discard the payload if we get a invalid header due to whatever reason
> to avoid data stream curruption.

If the header is invalid / corrupted, how can rsp.data_length be
trusted?  Out of curiosity, is this an issue you are seeing occur "in
the wild"?

> For e.g., the response consists of header plus
> data payload. If we simply read the header then the data payload is left in the
> socket buffer and the next time we would read the garbage data and currupt the
> whole connection.
> 
> Cc: qemu-devel@nongnu.org
> Cc: Jeff Cody <jcody@redhat.com>
> Cc: Kevin Wolf <kwolf@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Liu Yuan <liuyuan@cmss.chinamobile.com>
> ---
>  block/sheepdog.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index 9585beb..9ed3458 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -794,6 +794,14 @@ static void coroutine_fn aio_read_response(void *opaque)
>          }
>      }
>      if (!aio_req) {
> +        if (rsp.data_length) {
> +            void *garbage = g_malloc(rsp.data_length);
> +            ret = qemu_co_recv(fd, garbage, rsp.data_length);
> +            if (ret != rsp.data_length) {
> +                error_report("failed to discard the data, %s", strerror(errno));
> +            }
> +            g_free(garbage);
> +        }
>          error_report("cannot find aio_req %x", rsp.id);
>          goto err;
>      }
> -- 
> 1.9.1
>
Liu Yuan Sept. 1, 2015, 2:05 a.m. UTC | #2
On Mon, Aug 31, 2015 at 09:51:00PM -0400, Jeff Cody wrote:
> On Tue, Sep 01, 2015 at 09:29:31AM +0800, Liu Yuan wrote:
> > From: Liu Yuan <liuyuan@cmss.chinamobile.com>
> > 
> > We need to discard the payload if we get a invalid header due to whatever reason
> > to avoid data stream curruption.
> 
> If the header is invalid / corrupted, how can rsp.data_length be
> trusted?  Out of curiosity, is this an issue you are seeing occur "in
> the wild"?

This is the defensive patch. Header is invalid in the sense that only rsp.id is
invalid due to sheepdog driver bugs, for e.g., the request was misplaced after
being sent or duplicated requests sending to sheep daemon and get the duplicated
responses for the same request.

Actually in the late 2012 we had seen this problem but we didn't find the root
cause how this happened by looking at the code statically and the problem was
gone silently while we restructured the code.

But yesterday some centos6 users reported to me the problem of
'cannot find aio_req' and hang the guest disk. That QEMU's sheepdog driver was
rather old and would bump rsp.id mismatch problem as we did in the late 2012.
By looking at the code again, I found this error handling problem. However,
this patch is not aimed to solve the rsp.id mismatch problem (If it still exist)
but just a defensive one after this problem happens.

Thanks,
Yuan
Liu Yuan Sept. 1, 2015, 10:23 a.m. UTC | #3
On Tue, Sep 01, 2015 at 10:05:38AM +0800, Liu Yuan wrote:
> On Mon, Aug 31, 2015 at 09:51:00PM -0400, Jeff Cody wrote:
> > On Tue, Sep 01, 2015 at 09:29:31AM +0800, Liu Yuan wrote:
> > > From: Liu Yuan <liuyuan@cmss.chinamobile.com>
> > > 
> > > We need to discard the payload if we get a invalid header due to whatever reason
> > > to avoid data stream curruption.
> > 
> > If the header is invalid / corrupted, how can rsp.data_length be
> > trusted?  Out of curiosity, is this an issue you are seeing occur "in
> > the wild"?

For a second thought, we might not need this patch for the upstream because of
auto-connection feature, which close the socket to bury the whole buffer.

But old QEMU without auto-reconnection, might need this patch to drain the
buffer.

Thanks,
Yuan

> 
> This is the defensive patch. Header is invalid in the sense that only rsp.id is
> invalid due to sheepdog driver bugs, for e.g., the request was misplaced after
> being sent or duplicated requests sending to sheep daemon and get the duplicated
> responses for the same request.
> 
> Actually in the late 2012 we had seen this problem but we didn't find the root
> cause how this happened by looking at the code statically and the problem was
> gone silently while we restructured the code.
> 
> But yesterday some centos6 users reported to me the problem of
> 'cannot find aio_req' and hang the guest disk. That QEMU's sheepdog driver was
> rather old and would bump rsp.id mismatch problem as we did in the late 2012.
> By looking at the code again, I found this error handling problem. However,
> this patch is not aimed to solve the rsp.id mismatch problem (If it still exist)
> but just a defensive one after this problem happens.
> 
> Thanks,
> Yuan
diff mbox

Patch

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 9585beb..9ed3458 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -794,6 +794,14 @@  static void coroutine_fn aio_read_response(void *opaque)
         }
     }
     if (!aio_req) {
+        if (rsp.data_length) {
+            void *garbage = g_malloc(rsp.data_length);
+            ret = qemu_co_recv(fd, garbage, rsp.data_length);
+            if (ret != rsp.data_length) {
+                error_report("failed to discard the data, %s", strerror(errno));
+            }
+            g_free(garbage);
+        }
         error_report("cannot find aio_req %x", rsp.id);
         goto err;
     }