Message ID | 1441070971-22535-1-git-send-email-namei.unix@gmail.com |
---|---|
State | New |
Headers | show |
On Tue, Sep 01, 2015 at 09:29:31AM +0800, Liu Yuan wrote: > From: Liu Yuan <liuyuan@cmss.chinamobile.com> > > We need to discard the payload if we get a invalid header due to whatever reason > to avoid data stream curruption. If the header is invalid / corrupted, how can rsp.data_length be trusted? Out of curiosity, is this an issue you are seeing occur "in the wild"? > For e.g., the response consists of header plus > data payload. If we simply read the header then the data payload is left in the > socket buffer and the next time we would read the garbage data and currupt the > whole connection. > > Cc: qemu-devel@nongnu.org > Cc: Jeff Cody <jcody@redhat.com> > Cc: Kevin Wolf <kwolf@redhat.com> > Cc: Stefan Hajnoczi <stefanha@redhat.com> > Signed-off-by: Liu Yuan <liuyuan@cmss.chinamobile.com> > --- > block/sheepdog.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/block/sheepdog.c b/block/sheepdog.c > index 9585beb..9ed3458 100644 > --- a/block/sheepdog.c > +++ b/block/sheepdog.c > @@ -794,6 +794,14 @@ static void coroutine_fn aio_read_response(void *opaque) > } > } > if (!aio_req) { > + if (rsp.data_length) { > + void *garbage = g_malloc(rsp.data_length); > + ret = qemu_co_recv(fd, garbage, rsp.data_length); > + if (ret != rsp.data_length) { > + error_report("failed to discard the data, %s", strerror(errno)); > + } > + g_free(garbage); > + } > error_report("cannot find aio_req %x", rsp.id); > goto err; > } > -- > 1.9.1 >
On Mon, Aug 31, 2015 at 09:51:00PM -0400, Jeff Cody wrote: > On Tue, Sep 01, 2015 at 09:29:31AM +0800, Liu Yuan wrote: > > From: Liu Yuan <liuyuan@cmss.chinamobile.com> > > > > We need to discard the payload if we get a invalid header due to whatever reason > > to avoid data stream curruption. > > If the header is invalid / corrupted, how can rsp.data_length be > trusted? Out of curiosity, is this an issue you are seeing occur "in > the wild"? This is the defensive patch. Header is invalid in the sense that only rsp.id is invalid due to sheepdog driver bugs, for e.g., the request was misplaced after being sent or duplicated requests sending to sheep daemon and get the duplicated responses for the same request. Actually in the late 2012 we had seen this problem but we didn't find the root cause how this happened by looking at the code statically and the problem was gone silently while we restructured the code. But yesterday some centos6 users reported to me the problem of 'cannot find aio_req' and hang the guest disk. That QEMU's sheepdog driver was rather old and would bump rsp.id mismatch problem as we did in the late 2012. By looking at the code again, I found this error handling problem. However, this patch is not aimed to solve the rsp.id mismatch problem (If it still exist) but just a defensive one after this problem happens. Thanks, Yuan
On Tue, Sep 01, 2015 at 10:05:38AM +0800, Liu Yuan wrote: > On Mon, Aug 31, 2015 at 09:51:00PM -0400, Jeff Cody wrote: > > On Tue, Sep 01, 2015 at 09:29:31AM +0800, Liu Yuan wrote: > > > From: Liu Yuan <liuyuan@cmss.chinamobile.com> > > > > > > We need to discard the payload if we get a invalid header due to whatever reason > > > to avoid data stream curruption. > > > > If the header is invalid / corrupted, how can rsp.data_length be > > trusted? Out of curiosity, is this an issue you are seeing occur "in > > the wild"? For a second thought, we might not need this patch for the upstream because of auto-connection feature, which close the socket to bury the whole buffer. But old QEMU without auto-reconnection, might need this patch to drain the buffer. Thanks, Yuan > > This is the defensive patch. Header is invalid in the sense that only rsp.id is > invalid due to sheepdog driver bugs, for e.g., the request was misplaced after > being sent or duplicated requests sending to sheep daemon and get the duplicated > responses for the same request. > > Actually in the late 2012 we had seen this problem but we didn't find the root > cause how this happened by looking at the code statically and the problem was > gone silently while we restructured the code. > > But yesterday some centos6 users reported to me the problem of > 'cannot find aio_req' and hang the guest disk. That QEMU's sheepdog driver was > rather old and would bump rsp.id mismatch problem as we did in the late 2012. > By looking at the code again, I found this error handling problem. However, > this patch is not aimed to solve the rsp.id mismatch problem (If it still exist) > but just a defensive one after this problem happens. > > Thanks, > Yuan
diff --git a/block/sheepdog.c b/block/sheepdog.c index 9585beb..9ed3458 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -794,6 +794,14 @@ static void coroutine_fn aio_read_response(void *opaque) } } if (!aio_req) { + if (rsp.data_length) { + void *garbage = g_malloc(rsp.data_length); + ret = qemu_co_recv(fd, garbage, rsp.data_length); + if (ret != rsp.data_length) { + error_report("failed to discard the data, %s", strerror(errno)); + } + g_free(garbage); + } error_report("cannot find aio_req %x", rsp.id); goto err; }