Message ID | 1363881940-27505-10-git-send-email-owasserm@redhat.com |
---|---|
State | New |
Headers | show |
On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote: > On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote: >> This way we send one big buffer instead of many small ones >> >> Signed-off-by: Orit Wasserman <owasserm@redhat.com> > > Why does this happen BTW? It happens in the last phase when we send the device state that consists of a lot bytes and int field that are written using qemu_put_byte/be16/... > >> --- >> savevm.c | 21 +++++++++++++++++---- >> 1 file changed, 17 insertions(+), 4 deletions(-) >> >> diff --git a/savevm.c b/savevm.c >> index 50e8fb2..13a533b 100644 >> --- a/savevm.c >> +++ b/savevm.c >> @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size) >> abort(); >> } >> >> - f->iov[f->iovcnt].iov_base = (uint8_t *)buf; >> - f->iov[f->iovcnt++].iov_len = size; >> + /* check for adjoint buffer and colace them */ >> + if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base + >> + f->iov[f->iovcnt - 1].iov_len) { >> + f->iov[f->iovcnt - 1].iov_len += size; >> + } else { >> + f->iov[f->iovcnt].iov_base = (uint8_t *)buf; >> + f->iov[f->iovcnt++].iov_len = size; >> + } >> >> f->is_write = 1; >> f->bytes_xfer += size; >> @@ -690,8 +696,15 @@ void qemu_put_byte(QEMUFile *f, int v) >> f->buf[f->buf_index++] = v; >> f->is_write = 1; >> f->bytes_xfer += 1; >> - f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1); >> - f->iov[f->iovcnt++].iov_len = 1; >> + >> + /* check for adjoint buffer and colace them */ >> + if (f->iovcnt > 0 && f->buf + (f->buf_index - 1) == >> + f->iov[f->iovcnt - 1].iov_base + f->iov[f->iovcnt - 1].iov_len) { >> + f->iov[f->iovcnt - 1].iov_len += 1; >> + } else { >> + f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1); >> + f->iov[f->iovcnt++].iov_len = 1; >> + } >> >> if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) { >> qemu_fflush(f); >> -- >> 1.7.11.7
On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote: > On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote: > > On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote: > >> This way we send one big buffer instead of many small ones > >> > >> Signed-off-by: Orit Wasserman <owasserm@redhat.com> > > > > Why does this happen BTW? > > It happens in the last phase when we send the device state that consists of a lot > bytes and int field that are written using qemu_put_byte/be16/... > Confused I thought device_state does not use _nocopy? My idea of using vmsplice relies exactly on this: we can not splice device state ... > > > >> --- > >> savevm.c | 21 +++++++++++++++++---- > >> 1 file changed, 17 insertions(+), 4 deletions(-) > >> > >> diff --git a/savevm.c b/savevm.c > >> index 50e8fb2..13a533b 100644 > >> --- a/savevm.c > >> +++ b/savevm.c > >> @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size) > >> abort(); > >> } > >> > >> - f->iov[f->iovcnt].iov_base = (uint8_t *)buf; > >> - f->iov[f->iovcnt++].iov_len = size; > >> + /* check for adjoint buffer and colace them */ > >> + if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base + > >> + f->iov[f->iovcnt - 1].iov_len) { > >> + f->iov[f->iovcnt - 1].iov_len += size; > >> + } else { > >> + f->iov[f->iovcnt].iov_base = (uint8_t *)buf; > >> + f->iov[f->iovcnt++].iov_len = size; > >> + } > >> > >> f->is_write = 1; > >> f->bytes_xfer += size; > >> @@ -690,8 +696,15 @@ void qemu_put_byte(QEMUFile *f, int v) > >> f->buf[f->buf_index++] = v; > >> f->is_write = 1; > >> f->bytes_xfer += 1; > >> - f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1); > >> - f->iov[f->iovcnt++].iov_len = 1; > >> + > >> + /* check for adjoint buffer and colace them */ > >> + if (f->iovcnt > 0 && f->buf + (f->buf_index - 1) == > >> + f->iov[f->iovcnt - 1].iov_base + f->iov[f->iovcnt - 1].iov_len) { > >> + f->iov[f->iovcnt - 1].iov_len += 1; > >> + } else { > >> + f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1); > >> + f->iov[f->iovcnt++].iov_len = 1; > >> + } > >> > >> if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) { > >> qemu_fflush(f); > >> -- > >> 1.7.11.7
On 03/21/2013 06:29 PM, Michael S. Tsirkin wrote: > On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote: >> On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote: >>> On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote: >>>> This way we send one big buffer instead of many small ones >>>> >>>> Signed-off-by: Orit Wasserman <owasserm@redhat.com> >>> >>> Why does this happen BTW? >> >> It happens in the last phase when we send the device state that consists of a lot >> bytes and int field that are written using qemu_put_byte/be16/... >> > > Confused I thought device_state does not use _nocopy? > My idea of using vmsplice relies exactly on this: > we can not splice device state ... qemu_put_buffer calls qemu_put_buffer_no_copy (you know code reuse) So I guess we will need to add some flag if we want to use vmsplice or not or split no_copy into external and internal. It won't be a big change > >>> >>>> --- >>>> savevm.c | 21 +++++++++++++++++---- >>>> 1 file changed, 17 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/savevm.c b/savevm.c >>>> index 50e8fb2..13a533b 100644 >>>> --- a/savevm.c >>>> +++ b/savevm.c >>>> @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size) >>>> abort(); >>>> } >>>> >>>> - f->iov[f->iovcnt].iov_base = (uint8_t *)buf; >>>> - f->iov[f->iovcnt++].iov_len = size; >>>> + /* check for adjoint buffer and colace them */ >>>> + if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base + >>>> + f->iov[f->iovcnt - 1].iov_len) { >>>> + f->iov[f->iovcnt - 1].iov_len += size; >>>> + } else { >>>> + f->iov[f->iovcnt].iov_base = (uint8_t *)buf; >>>> + f->iov[f->iovcnt++].iov_len = size; >>>> + } >>>> >>>> f->is_write = 1; >>>> f->bytes_xfer += size; >>>> @@ -690,8 +696,15 @@ void qemu_put_byte(QEMUFile *f, int v) >>>> f->buf[f->buf_index++] = v; >>>> f->is_write = 1; >>>> f->bytes_xfer += 1; >>>> - f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1); >>>> - f->iov[f->iovcnt++].iov_len = 1; >>>> + >>>> + /* check for adjoint buffer and colace them */ >>>> + if (f->iovcnt > 0 && f->buf + (f->buf_index - 1) == >>>> + f->iov[f->iovcnt - 1].iov_base + f->iov[f->iovcnt - 1].iov_len) { >>>> + f->iov[f->iovcnt - 1].iov_len += 1; >>>> + } else { >>>> + f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1); >>>> + f->iov[f->iovcnt++].iov_len = 1; >>>> + } >>>> >>>> if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) { >>>> qemu_fflush(f); >>>> -- >>>> 1.7.11.7
On Thu, Mar 21, 2013 at 06:40:24PM +0200, Orit Wasserman wrote: > On 03/21/2013 06:29 PM, Michael S. Tsirkin wrote: > > On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote: > >> On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote: > >>> On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote: > >>>> This way we send one big buffer instead of many small ones > >>>> > >>>> Signed-off-by: Orit Wasserman <owasserm@redhat.com> > >>> > >>> Why does this happen BTW? > >> > >> It happens in the last phase when we send the device state that consists of a lot > >> bytes and int field that are written using qemu_put_byte/be16/... > >> > > > > Confused I thought device_state does not use _nocopy? > > My idea of using vmsplice relies exactly on this: > > we can not splice device state ... > > qemu_put_buffer calls qemu_put_buffer_no_copy (you know code reuse) > So I guess we will need to add some flag if we want to use vmsplice or not > or split no_copy into external and internal. It won't be a big change Fair enough. Have time to look into this? Data copies is like 15% CPU, more with swap ... > > > >>> > >>>> --- > >>>> savevm.c | 21 +++++++++++++++++---- > >>>> 1 file changed, 17 insertions(+), 4 deletions(-) > >>>> > >>>> diff --git a/savevm.c b/savevm.c > >>>> index 50e8fb2..13a533b 100644 > >>>> --- a/savevm.c > >>>> +++ b/savevm.c > >>>> @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size) > >>>> abort(); > >>>> } > >>>> > >>>> - f->iov[f->iovcnt].iov_base = (uint8_t *)buf; > >>>> - f->iov[f->iovcnt++].iov_len = size; > >>>> + /* check for adjoint buffer and colace them */ > >>>> + if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base + > >>>> + f->iov[f->iovcnt - 1].iov_len) { > >>>> + f->iov[f->iovcnt - 1].iov_len += size; > >>>> + } else { > >>>> + f->iov[f->iovcnt].iov_base = (uint8_t *)buf; > >>>> + f->iov[f->iovcnt++].iov_len = size; > >>>> + } > >>>> > >>>> f->is_write = 1; > >>>> f->bytes_xfer += size; > >>>> @@ -690,8 +696,15 @@ void qemu_put_byte(QEMUFile *f, int v) > >>>> f->buf[f->buf_index++] = v; > >>>> f->is_write = 1; > >>>> f->bytes_xfer += 1; > >>>> - f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1); > >>>> - f->iov[f->iovcnt++].iov_len = 1; > >>>> + > >>>> + /* check for adjoint buffer and colace them */ > >>>> + if (f->iovcnt > 0 && f->buf + (f->buf_index - 1) == > >>>> + f->iov[f->iovcnt - 1].iov_base + f->iov[f->iovcnt - 1].iov_len) { > >>>> + f->iov[f->iovcnt - 1].iov_len += 1; > >>>> + } else { > >>>> + f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1); > >>>> + f->iov[f->iovcnt++].iov_len = 1; > >>>> + } > >>>> > >>>> if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) { > >>>> qemu_fflush(f); > >>>> -- > >>>> 1.7.11.7
Orit Wasserman <owasserm@redhat.com> wrote: > This way we send one big buffer instead of many small ones > > Signed-off-by: Orit Wasserman <owasserm@redhat.com> > --- > savevm.c | 21 +++++++++++++++++---- > 1 file changed, 17 insertions(+), 4 deletions(-) > > diff --git a/savevm.c b/savevm.c > index 50e8fb2..13a533b 100644 > --- a/savevm.c > +++ b/savevm.c > @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size) > abort(); > } > > - f->iov[f->iovcnt].iov_base = (uint8_t *)buf; > - f->iov[f->iovcnt++].iov_len = size; > + /* check for adjoint buffer and colace them */ s/colace/coalesce/ > + if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base + > + f->iov[f->iovcnt - 1].iov_len) { > + f->iov[f->iovcnt - 1].iov_len += size; > + } else { > + f->iov[f->iovcnt].iov_base = (uint8_t *)buf; > + f->iov[f->iovcnt++].iov_len = size; > + } Can we create a function for this? Would help make sure that we don't forget to "fix" both places when we have changes. Later, Juan.
"Michael S. Tsirkin" <mst@redhat.com> wrote: > On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote: >> On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote: >> > On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote: >> >> This way we send one big buffer instead of many small ones >> >> >> >> Signed-off-by: Orit Wasserman <owasserm@redhat.com> >> > >> > Why does this happen BTW? >> >> It happens in the last phase when we send the device state that >> consists of a lot >> bytes and int field that are written using qemu_put_byte/be16/... >> > > Confused I thought device_state does not use _nocopy? > My idea of using vmsplice relies exactly on this: > we can not splice device state ... As it is today, I am not sure that we can use vmsplice() because we are sending: <header> <page> <header> <page> <header> <page> We can optimize at some pount to write a bigger/different header and sent a bunch of pages together, but just now we don't have that code. Later, Juan.
On Thu, Mar 21, 2013 at 06:44:14PM +0100, Juan Quintela wrote: > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote: > >> On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote: > >> > On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote: > >> >> This way we send one big buffer instead of many small ones > >> >> > >> >> Signed-off-by: Orit Wasserman <owasserm@redhat.com> > >> > > >> > Why does this happen BTW? > >> > >> It happens in the last phase when we send the device state that > >> consists of a lot > >> bytes and int field that are written using qemu_put_byte/be16/... > >> > > > > Confused I thought device_state does not use _nocopy? > > My idea of using vmsplice relies exactly on this: > > we can not splice device state ... > > > As it is today, I am not sure that we can use vmsplice() because we > are sending: > > > <header> > <page> > <header> > <page> > <header> > <page> > > We can optimize at some pount to write a bigger/different header and > sent a bunch of pages together, but just now we don't have that code. > > Later, Juan. Sending the page can do vmsplice, can't it? Multipage is likely a good idea anyway, e.g. RDMA wants to do this too.
"Michael S. Tsirkin" <mst@redhat.com> wrote: >> <header> >> <page> >> <header> >> <page> >> <header> >> <page> >> >> We can optimize at some pount to write a bigger/different header and >> sent a bunch of pages together, but just now we don't have that code. >> >> Later, Juan. > > Sending the page can do vmsplice, can't it? > Multipage is likely a good idea anyway, e.g. RDMA wants to > do this too. RDMA requires de pages lock into memory, no? Later, Juan.
On Thu, Mar 21, 2013 at 07:22:01PM +0100, Juan Quintela wrote: > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > >> <header> > >> <page> > >> <header> > >> <page> > >> <header> > >> <page> > >> > >> We can optimize at some pount to write a bigger/different header and > >> sent a bunch of pages together, but just now we don't have that code. > >> > >> Later, Juan. > > > > Sending the page can do vmsplice, can't it? > > Multipage is likely a good idea anyway, e.g. RDMA wants to > > do this too. > > RDMA requires de pages lock into memory, no? > > Later, Juan. That's a separate issue, but the point is it is asynchronous. It can be made to behave exactly the same implementing this API: - send async: lock, send - flush: poll for completion, unlock
diff --git a/savevm.c b/savevm.c index 50e8fb2..13a533b 100644 --- a/savevm.c +++ b/savevm.c @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size) abort(); } - f->iov[f->iovcnt].iov_base = (uint8_t *)buf; - f->iov[f->iovcnt++].iov_len = size; + /* check for adjoint buffer and colace them */ + if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base + + f->iov[f->iovcnt - 1].iov_len) { + f->iov[f->iovcnt - 1].iov_len += size; + } else { + f->iov[f->iovcnt].iov_base = (uint8_t *)buf; + f->iov[f->iovcnt++].iov_len = size; + } f->is_write = 1; f->bytes_xfer += size; @@ -690,8 +696,15 @@ void qemu_put_byte(QEMUFile *f, int v) f->buf[f->buf_index++] = v; f->is_write = 1; f->bytes_xfer += 1; - f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1); - f->iov[f->iovcnt++].iov_len = 1; + + /* check for adjoint buffer and colace them */ + if (f->iovcnt > 0 && f->buf + (f->buf_index - 1) == + f->iov[f->iovcnt - 1].iov_base + f->iov[f->iovcnt - 1].iov_len) { + f->iov[f->iovcnt - 1].iov_len += 1; + } else { + f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1); + f->iov[f->iovcnt++].iov_len = 1; + } if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) { qemu_fflush(f);
This way we send one big buffer instead of many small ones Signed-off-by: Orit Wasserman <owasserm@redhat.com> --- savevm.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-)