diff mbox

[2/2] hw/9pfs: fix alignment issue when host filesystem block size is larger than client msize

Message ID 56C02E4F.6030303@gmail.com
State New
Headers show

Commit Message

Jevon Qiao Feb. 14, 2016, 7:35 a.m. UTC
The following patch is to fix alignment issue when host filesystem block 
size
is larger than client msize.

Thanks,
Jevon

From: Jevon Qiao <scaleqiao@gmail.com>
Date: Sun, 14 Feb 2016 15:11:08 +0800
Subject: [PATCH] hw/9pfs: fix alignment issue when host filesystem block 
size
  is larger than client msize.

Per the previous implementation, iounit will be assigned to be 0 after the
first if statement as (s->msize - P9_IOHDRSZ)/stbuf.f_bsize will be zero 
when
host filesystem block size is larger than msize. Finally, iounit will be 
equal
to s->msize - P9_IOHDRSZ, which is usually not aligned.

Signed-off-by: Jevon Qiao <scaleqiao@gmail.com>
---
  hw/9pfs/virtio-9p.c | 19 ++++++++++++++++---
  1 file changed, 16 insertions(+), 3 deletions(-)

          iounit = s->msize - P9_IOHDRSZ;
--

Comments

Aneesh Kumar K.V Feb. 14, 2016, 1:38 p.m. UTC | #1
Jevon Qiao <scaleqiao@gmail.com> writes:

> The following patch is to fix alignment issue when host filesystem block 
> size
> is larger than client msize.
>
> Thanks,
> Jevon

That is not the right format to send patch. You can send them as a
series using git-send-email.

>
> From: Jevon Qiao <scaleqiao@gmail.com>
> Date: Sun, 14 Feb 2016 15:11:08 +0800
> Subject: [PATCH] hw/9pfs: fix alignment issue when host filesystem block 
> size
>   is larger than client msize.
>
> Per the previous implementation, iounit will be assigned to be 0 after the
> first if statement as (s->msize - P9_IOHDRSZ)/stbuf.f_bsize will be zero 
> when
> host filesystem block size is larger than msize. Finally, iounit will be 
> equal
> to s->msize - P9_IOHDRSZ, which is usually not aligned.
>
> Signed-off-by: Jevon Qiao <scaleqiao@gmail.com>
> ---
>   hw/9pfs/virtio-9p.c | 19 ++++++++++++++++---
>   1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
> index f972731..005d3a8 100644
> --- a/hw/9pfs/virtio-9p.c
> +++ b/hw/9pfs/virtio-9p.c
> @@ -1326,7 +1326,7 @@ out_nofid:
>   static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path)
>   {
>       struct statfs stbuf;
> -    int32_t iounit = 0;
> +    int32_t iounit = 0, unit = 0;
>       V9fsState *s = pdu->s;
>
>       /*
> @@ -1334,8 +1334,21 @@ static int32_t get_iounit(V9fsPDU *pdu, V9fsPath 
> *path)
>        * and as well as less than (client msize - P9_IOHDRSZ))
>        */
>       if (!v9fs_co_statfs(pdu, path, &stbuf)) {
> -        iounit = stbuf.f_bsize;
> -        iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize;
> +    /*
> +     * If host filesystem block size is larger than client msize,
> +     * we will use PAGESIZE as the unit. The reason why we choose
> +     * PAGESIZE is because the data will be splitted in terms of
> +     * PAGESIZE in the virtio layer. In this case, the final
> +     * iounit is equal to the value of ((msize/unit) - 1) * unit.
> +     */
> +    if (stbuf.f_bsize > s->msize) {
> +        iounit = 4096;
> +        unit = 4096;

What page size it should be guest or host ?. Also why 4096 ?. ppc64 use
64K page size.


> +    } else {
> +            iounit = stbuf.f_bsize;
> +        unit = stbuf.f_bsize;
> +    }
> +        iounit *= (s->msize - P9_IOHDRSZ)/unit;
>       }
>       if (!iounit) {
>           iounit = s->msize - P9_IOHDRSZ;
> -- 

-aneesh
Jevon Qiao Feb. 17, 2016, 7:14 a.m. UTC | #2
Hi Aneesh,

Thank you for reviewing my code, please see my reply in-line.
On 14/2/16 21:38, Aneesh Kumar K.V wrote:
> Jevon Qiao <scaleqiao@gmail.com> writes:
>
>> The following patch is to fix alignment issue when host filesystem block
>> size
>> is larger than client msize.
>>
>> Thanks,
>> Jevon
> That is not the right format to send patch. You can send them as a
> series using git-send-email.
Yes, you're correct. I will send the patches later after I address all 
the technical comments.
>> From: Jevon Qiao <scaleqiao@gmail.com>
>> Date: Sun, 14 Feb 2016 15:11:08 +0800
>> Subject: [PATCH] hw/9pfs: fix alignment issue when host filesystem block
>> size
>>    is larger than client msize.
>>
>> Per the previous implementation, iounit will be assigned to be 0 after the
>> first if statement as (s->msize - P9_IOHDRSZ)/stbuf.f_bsize will be zero
>> when
>> host filesystem block size is larger than msize. Finally, iounit will be
>> equal
>> to s->msize - P9_IOHDRSZ, which is usually not aligned.
>>
>> Signed-off-by: Jevon Qiao <scaleqiao@gmail.com>
>> ---
>>    hw/9pfs/virtio-9p.c | 19 ++++++++++++++++---
>>    1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
>> index f972731..005d3a8 100644
>> --- a/hw/9pfs/virtio-9p.c
>> +++ b/hw/9pfs/virtio-9p.c
>> @@ -1326,7 +1326,7 @@ out_nofid:
>>    static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path)
>>    {
>>        struct statfs stbuf;
>> -    int32_t iounit = 0;
>> +    int32_t iounit = 0, unit = 0;
>>        V9fsState *s = pdu->s;
>>
>>        /*
>> @@ -1334,8 +1334,21 @@ static int32_t get_iounit(V9fsPDU *pdu, V9fsPath
>> *path)
>>         * and as well as less than (client msize - P9_IOHDRSZ))
>>         */
>>        if (!v9fs_co_statfs(pdu, path, &stbuf)) {
>> -        iounit = stbuf.f_bsize;
>> -        iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize;
>> +    /*
>> +     * If host filesystem block size is larger than client msize,
>> +     * we will use PAGESIZE as the unit. The reason why we choose
>> +     * PAGESIZE is because the data will be splitted in terms of
>> +     * PAGESIZE in the virtio layer. In this case, the final
>> +     * iounit is equal to the value of ((msize/unit) - 1) * unit.
>> +     */
>> +    if (stbuf.f_bsize > s->msize) {
>> +        iounit = 4096;
>> +        unit = 4096;
> What page size it should be guest or host ?. Also why 4096 ?. ppc64 use
> 64K page size.
The data to be read or written will be divided into pieces according to the
size of iounit and msize firstly, and then mapped to pages before being 
added
into virtqueue. Since all these operations happen in the guest side, so the
page size should be guest. Please correct me if I'm wrong.

As for the number 4096, It's the default value in Linux OS. I did not take
other platforms into account, it's my fault. To make it suitable for all 
platforms,
shall I use the function getpagesize() here?

Thanks,
Jevon
>> +    } else {
>> +            iounit = stbuf.f_bsize;
>> +        unit = stbuf.f_bsize;
>> +    }
>> +        iounit *= (s->msize - P9_IOHDRSZ)/unit;
>>        }
>>        if (!iounit) {
>>            iounit = s->msize - P9_IOHDRSZ;
>> -- 
> -aneesh
>
Greg Kurz Feb. 17, 2016, 10:24 a.m. UTC | #3
On Wed, 17 Feb 2016 15:14:48 +0800
Jevon Qiao <scaleqiao@gmail.com> wrote:

> Hi Aneesh,
> 
> Thank you for reviewing my code, please see my reply in-line.

Jevon,

Please read comments below.

> On 14/2/16 21:38, Aneesh Kumar K.V wrote:
> > Jevon Qiao <scaleqiao@gmail.com> writes:
> >  
> >> The following patch is to fix alignment issue when host filesystem block
> >> size
> >> is larger than client msize.
> >>
> >> Thanks,
> >> Jevon  
> > That is not the right format to send patch. You can send them as a
> > series using git-send-email.  
> Yes, you're correct. I will send the patches later after I address all 
> the technical comments.
> >> From: Jevon Qiao <scaleqiao@gmail.com>
> >> Date: Sun, 14 Feb 2016 15:11:08 +0800
> >> Subject: [PATCH] hw/9pfs: fix alignment issue when host filesystem block
> >> size
> >>    is larger than client msize.
> >>
> >> Per the previous implementation, iounit will be assigned to be 0 after the
> >> first if statement as (s->msize - P9_IOHDRSZ)/stbuf.f_bsize will be zero
> >> when
> >> host filesystem block size is larger than msize. Finally, iounit will be
> >> equal
> >> to s->msize - P9_IOHDRSZ, which is usually not aligned.
> >>
> >> Signed-off-by: Jevon Qiao <scaleqiao@gmail.com>
> >> ---
> >>    hw/9pfs/virtio-9p.c | 19 ++++++++++++++++---

Hmmm I just realize your tree is not up-to-date since hw/9pfs/virtio-9p.c got
renamed with this recent commit:

commit 60ce86c7140d5ca33d5fd87ce821681165d06b2a
Author: Wei Liu <wei.liu2@citrix.com>
Date:   Thu Jan 7 18:42:20 2016 +0000

    9pfs: rename virtio-9p.c to 9p.c

Also 9p.c only contains generic code now, not related to virtio... see below.

> >>    1 file changed, 16 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
> >> index f972731..005d3a8 100644
> >> --- a/hw/9pfs/virtio-9p.c
> >> +++ b/hw/9pfs/virtio-9p.c
> >> @@ -1326,7 +1326,7 @@ out_nofid:
> >>    static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path)
> >>    {
> >>        struct statfs stbuf;
> >> -    int32_t iounit = 0;
> >> +    int32_t iounit = 0, unit = 0;
> >>        V9fsState *s = pdu->s;
> >>
> >>        /*
> >> @@ -1334,8 +1334,21 @@ static int32_t get_iounit(V9fsPDU *pdu, V9fsPath
> >> *path)
> >>         * and as well as less than (client msize - P9_IOHDRSZ))
> >>         */
> >>        if (!v9fs_co_statfs(pdu, path, &stbuf)) {
> >> -        iounit = stbuf.f_bsize;
> >> -        iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize;
> >> +    /*
> >> +     * If host filesystem block size is larger than client msize,
> >> +     * we will use PAGESIZE as the unit. The reason why we choose
> >> +     * PAGESIZE is because the data will be splitted in terms of
> >> +     * PAGESIZE in the virtio layer. In this case, the final

... and here you mention virtio. Does this code really belong here ?

> >> +     * iounit is equal to the value of ((msize/unit) - 1) * unit.
> >> +     */
> >> +    if (stbuf.f_bsize > s->msize) {
> >> +        iounit = 4096;
> >> +        unit = 4096;  

This looks weird when reading the initial comment in get_iounit()... is
iounit a multiple of stbuf.f_bsize in this case ?

> > What page size it should be guest or host ?. Also why 4096 ?. ppc64 use
> > 64K page size.  
> The data to be read or written will be divided into pieces according to the
> size of iounit and msize firstly, and then mapped to pages before being 
> added
> into virtqueue. Since all these operations happen in the guest side, so the
> page size should be guest. Please correct me if I'm wrong.
> 
> As for the number 4096, It's the default value in Linux OS. I did not take
> other platforms into account, it's my fault. To make it suitable for all 
> platforms,
> shall I use the function getpagesize() here?
> 

getpagesize() will return the host page size. If you need the guest page size,
you should use TARGET_PAGE_SIZE.

And then you will hit another problem: the 9p.c file is in common-obj and
cannot contain target specific code...

Along with the other remark, I'm beginning to think you may need to move this
to virtio-9p-device.c.

> Thanks,
> Jevon
> >> +    } else {
> >> +            iounit = stbuf.f_bsize;
> >> +        unit = stbuf.f_bsize;
> >> +    }
> >> +        iounit *= (s->msize - P9_IOHDRSZ)/unit;
> >>        }
> >>        if (!iounit) {
> >>            iounit = s->msize - P9_IOHDRSZ;
> >> --   
> > -aneesh
> >  
>
Aneesh Kumar K.V Feb. 17, 2016, 2:44 p.m. UTC | #4
Jevon Qiao <scaleqiao@gmail.com> writes:

> Hi Aneesh,
>
> Thank you for reviewing my code, please see my reply in-line.
> On 14/2/16 21:38, Aneesh Kumar K.V wrote:
>> Jevon Qiao <scaleqiao@gmail.com> writes:
>>
>>> The following patch is to fix alignment issue when host filesystem block
>>> size
>>> is larger than client msize.
>>>
>>> Thanks,
>>> Jevon
>> That is not the right format to send patch. You can send them as a
>> series using git-send-email.
> Yes, you're correct. I will send the patches later after I address all 
> the technical comments.
>>> From: Jevon Qiao <scaleqiao@gmail.com>
>>> Date: Sun, 14 Feb 2016 15:11:08 +0800
>>> Subject: [PATCH] hw/9pfs: fix alignment issue when host filesystem block
>>> size
>>>    is larger than client msize.
>>>
>>> Per the previous implementation, iounit will be assigned to be 0 after the
>>> first if statement as (s->msize - P9_IOHDRSZ)/stbuf.f_bsize will be zero
>>> when
>>> host filesystem block size is larger than msize. Finally, iounit will be
>>> equal
>>> to s->msize - P9_IOHDRSZ, which is usually not aligned.
>>>
>>> Signed-off-by: Jevon Qiao <scaleqiao@gmail.com>
>>> ---
>>>    hw/9pfs/virtio-9p.c | 19 ++++++++++++++++---
>>>    1 file changed, 16 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
>>> index f972731..005d3a8 100644
>>> --- a/hw/9pfs/virtio-9p.c
>>> +++ b/hw/9pfs/virtio-9p.c
>>> @@ -1326,7 +1326,7 @@ out_nofid:
>>>    static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path)
>>>    {
>>>        struct statfs stbuf;
>>> -    int32_t iounit = 0;
>>> +    int32_t iounit = 0, unit = 0;
>>>        V9fsState *s = pdu->s;
>>>
>>>        /*
>>> @@ -1334,8 +1334,21 @@ static int32_t get_iounit(V9fsPDU *pdu, V9fsPath
>>> *path)
>>>         * and as well as less than (client msize - P9_IOHDRSZ))
>>>         */
>>>        if (!v9fs_co_statfs(pdu, path, &stbuf)) {
>>> -        iounit = stbuf.f_bsize;
>>> -        iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize;
>>> +    /*
>>> +     * If host filesystem block size is larger than client msize,
>>> +     * we will use PAGESIZE as the unit. The reason why we choose
>>> +     * PAGESIZE is because the data will be splitted in terms of
>>> +     * PAGESIZE in the virtio layer. In this case, the final
>>> +     * iounit is equal to the value of ((msize/unit) - 1) * unit.
>>> +     */
>>> +    if (stbuf.f_bsize > s->msize) {
>>> +        iounit = 4096;
>>> +        unit = 4096;
>> What page size it should be guest or host ?. Also why 4096 ?. ppc64 use
>> 64K page size.
> The data to be read or written will be divided into pieces according to the
> size of iounit and msize firstly, and then mapped to pages before being 
> added
> into virtqueue. Since all these operations happen in the guest side, so the
> page size should be guest. Please correct me if I'm wrong.

I am not sure I understand the details correctly. iounit is the size
that we use in client_read to determine the  size in which
we should request I/O from the client. But we still can't do I/O in size
larger than s->msize. If you look at the client side (kernel 9p fs), you
will find

	rsize = fid->iounit;
	if (!rsize || rsize > clnt->msize-P9_IOHDRSZ)
		rsize = clnt->msize - P9_IOHDRSZ;

if your iounit calculation ends up zero, that should be handled
correctly by

    if (!iounit) {
        iounit = s->msize - P9_IOHDRSZ;
    }
    return iounit;


So what is the issue here. ?

-aneesh
Jevon Qiao Feb. 19, 2016, 8:56 a.m. UTC | #5
Hi Aneesh,
> I am not sure I understand the details correctly. iounit is the size
> that we use in client_read to determine the  size in which
> we should request I/O from the client. But we still can't do I/O in size
> larger than s->msize. If you look at the client side (kernel 9p fs), you
> will find
>
> 	rsize = fid->iounit;
> 	if (!rsize || rsize > clnt->msize-P9_IOHDRSZ)
> 		rsize = clnt->msize - P9_IOHDRSZ;
Yes, I know this.
> if your iounit calculation ends up zero, that should be handled
> correctly by
>
>      if (!iounit) {
>          iounit = s->msize - P9_IOHDRSZ;
>      }
>      return iounit;
>
>
> So what is the issue here. ?
This will result in an alignment issue while mapping the I/O requested by
client into pages in the function of p9_nr_pages().

    int p9_nr_pages(char *data, int len)
    {
             unsigned long start_page, end_page;
             start_page =  (unsigned long)data >> PAGE_SHIFT;
             end_page = ((unsigned long)data + len + PAGE_SIZE - 1) >>
    PAGE_SHIFT;
             return end_page - start_page;
    }

Please see the following experiment I did without the fix.

1) Start qemu with cephfs,

    $ qemu-system-x86_64 /root/CentOS---6.6-64bit---2015-03-06-a.qcow2
    -smp 4 -m 4096 -fsdev
    cephfs,security_model=passthrough,id=fsdev0,path=/ -device
    virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=cephfs --enable-kvm
    -nographic -net nic -net tap,ifname=tap0,script=no,downscript=no


2) Mount the fs in the guest.

    [root@localhost ~]# mount -t 9p -o trans=virtio,version=9p2000.L
    cephfs /mnt
    [root@localhost ~]# ls -lah /mnt/8kfile
    -rw-r--r-- 1 root root 8.0K 2016-02-19 09:37 /mnt/8kfile

In this case, I used the default msize which is 8192(in Byte). Since cephfs
is using 4M as the f_bsize, the iounit will be 8168 as P9_IOHDRSZ is
equal to 24.

3) Run the following systemtap script to trace the paging result,

    [root@localhost ~]# cat p9_read.stp
    probe kernel.function("p9_virtio_zc_request").call
    {
         printf("p9_virtio_zc_request: inlen size is %d\n", int_arg(5));
    }

    probe kernel.function("p9_nr_pages").call
    {
         printf("p9_nr_pages: start_page = %ld\n", int_arg(1) >> 12);
         printf("p9_nr_pages: end_age = %ld\n", (int_arg(1) + 8168 +
    4096 -1) >> 12);
    }

4) The output I got when I copied out the file /mnt/8kfile to /tmp/ 
directory,

    p9_virtio_zc_request: inlen size is 8168
    p9_nr_pages: start_page = 34293757815
    p9_nr_pages: end_age = 34293757818

Per the text in red(start_page = 34293757815, end_page = 34293757818),
it turns out 8k data will be mapped into three pages. This could hurt the
performance.

Actually, I enabled the cephfs debug functionality added by me to see
how the data is distributed in this case, the result is as follows,

    CEPHFS_DEBUG: cephfs_preadv iov_len=4096
    CEPHFS_DEBUG: cephfs_preadv iov_len=4072
    CEPHFS_DEBUG: cephfs_preadv iov_len=24

This patch aims to fix this. And the result turns out it works quite 
well, all the
data is well aligned.

    p9_virtio_zc_request: inlen size is 4096
    p9_nr_pages: start_page = 34203171814
    p9_nr_pages: end_age = 34203171815
    p9_virtio_zc_request: inlen size is 4096
    p9_nr_pages: start_page = 34203171815
    p9_nr_pages: end_age = 34203171816

    CEPHFS_DEBUG: cephfs_preadv iov_len=4096
    CEPHFS_DEBUG: cephfs_preadv iov_len=4096

Thanks,
Jevon
> -aneesh
>
Jevon Qiao Feb. 19, 2016, 9:32 a.m. UTC | #6
Hi Greg,
>>>> From: Jevon Qiao <scaleqiao@gmail.com>
>>>> Date: Sun, 14 Feb 2016 15:11:08 +0800
>>>> Subject: [PATCH] hw/9pfs: fix alignment issue when host filesystem block
>>>> size
>>>>     is larger than client msize.
>>>>
>>>> Per the previous implementation, iounit will be assigned to be 0 after the
>>>> first if statement as (s->msize - P9_IOHDRSZ)/stbuf.f_bsize will be zero
>>>> when
>>>> host filesystem block size is larger than msize. Finally, iounit will be
>>>> equal
>>>> to s->msize - P9_IOHDRSZ, which is usually not aligned.
>>>>
>>>> Signed-off-by: Jevon Qiao <scaleqiao@gmail.com>
>>>> ---
>>>>     hw/9pfs/virtio-9p.c | 19 ++++++++++++++++---
> Hmmm I just realize your tree is not up-to-date since hw/9pfs/virtio-9p.c got
> renamed with this recent commit:
>
> commit 60ce86c7140d5ca33d5fd87ce821681165d06b2a
> Author: Wei Liu <wei.liu2@citrix.com>
> Date:   Thu Jan 7 18:42:20 2016 +0000
>
>      9pfs: rename virtio-9p.c to 9p.c
>
> Also 9p.c only contains generic code now, not related to virtio... see below.
The feature was finished before it happened, I'm sorry I did not sync my 
tree up
with master.
>>>>     1 file changed, 16 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
>>>> index f972731..005d3a8 100644
>>>> --- a/hw/9pfs/virtio-9p.c
>>>> +++ b/hw/9pfs/virtio-9p.c
>>>> @@ -1326,7 +1326,7 @@ out_nofid:
>>>>     static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path)
>>>>     {
>>>>         struct statfs stbuf;
>>>> -    int32_t iounit = 0;
>>>> +    int32_t iounit = 0, unit = 0;
>>>>         V9fsState *s = pdu->s;
>>>>
>>>>         /*
>>>> @@ -1334,8 +1334,21 @@ static int32_t get_iounit(V9fsPDU *pdu, V9fsPath
>>>> *path)
>>>>          * and as well as less than (client msize - P9_IOHDRSZ))
>>>>          */
>>>>         if (!v9fs_co_statfs(pdu, path, &stbuf)) {
>>>> -        iounit = stbuf.f_bsize;
>>>> -        iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize;
>>>> +    /*
>>>> +     * If host filesystem block size is larger than client msize,
>>>> +     * we will use PAGESIZE as the unit. The reason why we choose
>>>> +     * PAGESIZE is because the data will be splitted in terms of
>>>> +     * PAGESIZE in the virtio layer. In this case, the final
> ... and here you mention virtio. Does this code really belong here ?
Sorry for confusing you, the comment might not be very clear. Here I mean
the issue I mentioned in another thread with Aneesh. It's not related to 
virtio.
>>>> +     * iounit is equal to the value of ((msize/unit) - 1) * unit.
>>>> +     */
>>>> +    if (stbuf.f_bsize > s->msize) {
>>>> +        iounit = 4096;
>>>> +        unit = 4096;
> This looks weird when reading the initial comment in get_iounit()... is
> iounit a multiple of stbuf.f_bsize in this case ?
Yes, I think so. The stbuf.f_bsize refers to the iounit of backend 
filesystem,
and to comply with the backend is a right way to go always.
>>> What page size it should be guest or host ?. Also why 4096 ?. ppc64 use
>>> 64K page size.
>> The data to be read or written will be divided into pieces according to the
>> size of iounit and msize firstly, and then mapped to pages before being
>> added
>> into virtqueue. Since all these operations happen in the guest side, so the
>> page size should be guest. Please correct me if I'm wrong.
>>
>> As for the number 4096, It's the default value in Linux OS. I did not take
>> other platforms into account, it's my fault. To make it suitable for all
>> platforms,
>> shall I use the function getpagesize() here?
>>
> getpagesize() will return the host page size. If you need the guest page size,
> you should use TARGET_PAGE_SIZE.
> And then you will hit another problem: the 9p.c file is in common-obj and
> cannot contain target specific code...
Well, good to know this, thank you for sharing this.
> Along with the other remark, I'm beginning to think you may need to move this
> to virtio-9p-device.c.
I'll think of this, thank you for the option.

Thanks,
Jevon
>> Thanks,
>> Jevon
>>>> +    } else {
>>>> +            iounit = stbuf.f_bsize;
>>>> +        unit = stbuf.f_bsize;
>>>> +    }
>>>> +        iounit *= (s->msize - P9_IOHDRSZ)/unit;
>>>>         }
>>>>         if (!iounit) {
>>>>             iounit = s->msize - P9_IOHDRSZ;
>>>> --
>>> -aneesh
>>>
Jevon Qiao Feb. 24, 2016, 7:04 a.m. UTC | #7
[Removing ceph-devel alias]

Hi Aneesh,

Any further comment on my reply below?

Thanks,
Jevon
On 19/2/16 16:56, Jevon Qiao wrote:
> Hi Aneesh,
>> I am not sure I understand the details correctly. iounit is the size
>> that we use in client_read to determine the  size in which
>> we should request I/O from the client. But we still can't do I/O in size
>> larger than s->msize. If you look at the client side (kernel 9p fs), you
>> will find
>>
>>     rsize = fid->iounit;
>>     if (!rsize || rsize > clnt->msize-P9_IOHDRSZ)
>>         rsize = clnt->msize - P9_IOHDRSZ;
> Yes, I know this.
>> if your iounit calculation ends up zero, that should be handled
>> correctly by
>>
>>      if (!iounit) {
>>          iounit = s->msize - P9_IOHDRSZ;
>>      }
>>      return iounit;
>>
>>
>> So what is the issue here. ?
> This will result in an alignment issue while mapping the I/O requested by
> client into pages in the function of p9_nr_pages().
>
>    int p9_nr_pages(char *data, int len)
>    {
>             unsigned long start_page, end_page;
>             start_page =  (unsigned long)data >> PAGE_SHIFT;
>             end_page = ((unsigned long)data + len + PAGE_SIZE - 1) >>
>    PAGE_SHIFT;
>             return end_page - start_page;
>    }
>
> Please see the following experiment I did without the fix.
>
> 1) Start qemu with cephfs,
>
>    $ qemu-system-x86_64 /root/CentOS---6.6-64bit---2015-03-06-a.qcow2
>    -smp 4 -m 4096 -fsdev
>    cephfs,security_model=passthrough,id=fsdev0,path=/ -device
>    virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=cephfs --enable-kvm
>    -nographic -net nic -net tap,ifname=tap0,script=no,downscript=no
>
>
> 2) Mount the fs in the guest.
>
>    [root@localhost ~]# mount -t 9p -o trans=virtio,version=9p2000.L
>    cephfs /mnt
>    [root@localhost ~]# ls -lah /mnt/8kfile
>    -rw-r--r-- 1 root root 8.0K 2016-02-19 09:37 /mnt/8kfile
>
> In this case, I used the default msize which is 8192(in Byte). Since 
> cephfs
> is using 4M as the f_bsize, the iounit will be 8168 as P9_IOHDRSZ is
> equal to 24.
>
> 3) Run the following systemtap script to trace the paging result,
>
>    [root@localhost ~]# cat p9_read.stp
>    probe kernel.function("p9_virtio_zc_request").call
>    {
>         printf("p9_virtio_zc_request: inlen size is %d\n", int_arg(5));
>    }
>
>    probe kernel.function("p9_nr_pages").call
>    {
>         printf("p9_nr_pages: start_page = %ld\n", int_arg(1) >> 12);
>         printf("p9_nr_pages: end_age = %ld\n", (int_arg(1) + 8168 +
>    4096 -1) >> 12);
>    }
>
> 4) The output I got when I copied out the file /mnt/8kfile to /tmp/ 
> directory,
>
>    p9_virtio_zc_request: inlen size is 8168
>    p9_nr_pages: start_page = 34293757815
>    p9_nr_pages: end_age = 34293757818
>
> Per the text in red(start_page = 34293757815, end_page = 34293757818),
> it turns out 8k data will be mapped into three pages. This could hurt the
> performance.
>
> Actually, I enabled the cephfs debug functionality added by me to see
> how the data is distributed in this case, the result is as follows,
>
>    CEPHFS_DEBUG: cephfs_preadv iov_len=4096
>    CEPHFS_DEBUG: cephfs_preadv iov_len=4072
>    CEPHFS_DEBUG: cephfs_preadv iov_len=24
>
> This patch aims to fix this. And the result turns out it works quite 
> well, all the
> data is well aligned.
>
>    p9_virtio_zc_request: inlen size is 4096
>    p9_nr_pages: start_page = 34203171814
>    p9_nr_pages: end_age = 34203171815
>    p9_virtio_zc_request: inlen size is 4096
>    p9_nr_pages: start_page = 34203171815
>    p9_nr_pages: end_age = 34203171816
>
>    CEPHFS_DEBUG: cephfs_preadv iov_len=4096
>    CEPHFS_DEBUG: cephfs_preadv iov_len=4096
>
> Thanks,
> Jevon
>> -aneesh
>>
>
>
Jevon Qiao March 3, 2016, 3 a.m. UTC | #8
Any further question/comment on this patch?

Thanks,
Jevon
On 24/2/16 15:04, Jevon Qiao wrote:
> [Removing ceph-devel alias]
>
> Hi Aneesh,
>
> Any further comment on my reply below?
>
> Thanks,
> Jevon
> On 19/2/16 16:56, Jevon Qiao wrote:
>> Hi Aneesh,
>>> I am not sure I understand the details correctly. iounit is the size
>>> that we use in client_read to determine the  size in which
>>> we should request I/O from the client. But we still can't do I/O in 
>>> size
>>> larger than s->msize. If you look at the client side (kernel 9p fs), 
>>> you
>>> will find
>>>
>>>     rsize = fid->iounit;
>>>     if (!rsize || rsize > clnt->msize-P9_IOHDRSZ)
>>>         rsize = clnt->msize - P9_IOHDRSZ;
>> Yes, I know this.
>>> if your iounit calculation ends up zero, that should be handled
>>> correctly by
>>>
>>>      if (!iounit) {
>>>          iounit = s->msize - P9_IOHDRSZ;
>>>      }
>>>      return iounit;
>>>
>>>
>>> So what is the issue here. ?
>> This will result in an alignment issue while mapping the I/O 
>> requested by
>> client into pages in the function of p9_nr_pages().
>>
>>    int p9_nr_pages(char *data, int len)
>>    {
>>             unsigned long start_page, end_page;
>>             start_page =  (unsigned long)data >> PAGE_SHIFT;
>>             end_page = ((unsigned long)data + len + PAGE_SIZE - 1) >>
>>    PAGE_SHIFT;
>>             return end_page - start_page;
>>    }
>>
>> Please see the following experiment I did without the fix.
>>
>> 1) Start qemu with cephfs,
>>
>>    $ qemu-system-x86_64 /root/CentOS---6.6-64bit---2015-03-06-a.qcow2
>>    -smp 4 -m 4096 -fsdev
>>    cephfs,security_model=passthrough,id=fsdev0,path=/ -device
>>    virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=cephfs --enable-kvm
>>    -nographic -net nic -net tap,ifname=tap0,script=no,downscript=no
>>
>>
>> 2) Mount the fs in the guest.
>>
>>    [root@localhost ~]# mount -t 9p -o trans=virtio,version=9p2000.L
>>    cephfs /mnt
>>    [root@localhost ~]# ls -lah /mnt/8kfile
>>    -rw-r--r-- 1 root root 8.0K 2016-02-19 09:37 /mnt/8kfile
>>
>> In this case, I used the default msize which is 8192(in Byte). Since 
>> cephfs
>> is using 4M as the f_bsize, the iounit will be 8168 as P9_IOHDRSZ is
>> equal to 24.
>>
>> 3) Run the following systemtap script to trace the paging result,
>>
>>    [root@localhost ~]# cat p9_read.stp
>>    probe kernel.function("p9_virtio_zc_request").call
>>    {
>>         printf("p9_virtio_zc_request: inlen size is %d\n", int_arg(5));
>>    }
>>
>>    probe kernel.function("p9_nr_pages").call
>>    {
>>         printf("p9_nr_pages: start_page = %ld\n", int_arg(1) >> 12);
>>         printf("p9_nr_pages: end_age = %ld\n", (int_arg(1) + 8168 +
>>    4096 -1) >> 12);
>>    }
>>
>> 4) The output I got when I copied out the file /mnt/8kfile to /tmp/ 
>> directory,
>>
>>    p9_virtio_zc_request: inlen size is 8168
>>    p9_nr_pages: start_page = 34293757815
>>    p9_nr_pages: end_age = 34293757818
>>
>> Per the text in red(start_page = 34293757815, end_page = 34293757818),
>> it turns out 8k data will be mapped into three pages. This could hurt 
>> the
>> performance.
>>
>> Actually, I enabled the cephfs debug functionality added by me to see
>> how the data is distributed in this case, the result is as follows,
>>
>>    CEPHFS_DEBUG: cephfs_preadv iov_len=4096
>>    CEPHFS_DEBUG: cephfs_preadv iov_len=4072
>>    CEPHFS_DEBUG: cephfs_preadv iov_len=24
>>
>> This patch aims to fix this. And the result turns out it works quite 
>> well, all the
>> data is well aligned.
>>
>>    p9_virtio_zc_request: inlen size is 4096
>>    p9_nr_pages: start_page = 34203171814
>>    p9_nr_pages: end_age = 34203171815
>>    p9_virtio_zc_request: inlen size is 4096
>>    p9_nr_pages: start_page = 34203171815
>>    p9_nr_pages: end_age = 34203171816
>>
>>    CEPHFS_DEBUG: cephfs_preadv iov_len=4096
>>    CEPHFS_DEBUG: cephfs_preadv iov_len=4096
>>
>> Thanks,
>> Jevon
>>> -aneesh
>>>
>>
>>
>
diff mbox

Patch

diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
index f972731..005d3a8 100644
--- a/hw/9pfs/virtio-9p.c
+++ b/hw/9pfs/virtio-9p.c
@@ -1326,7 +1326,7 @@  out_nofid:
  static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path)
  {
      struct statfs stbuf;
-    int32_t iounit = 0;
+    int32_t iounit = 0, unit = 0;
      V9fsState *s = pdu->s;

      /*
@@ -1334,8 +1334,21 @@  static int32_t get_iounit(V9fsPDU *pdu, V9fsPath 
*path)
       * and as well as less than (client msize - P9_IOHDRSZ))
       */
      if (!v9fs_co_statfs(pdu, path, &stbuf)) {
-        iounit = stbuf.f_bsize;
-        iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize;
+    /*
+     * If host filesystem block size is larger than client msize,
+     * we will use PAGESIZE as the unit. The reason why we choose
+     * PAGESIZE is because the data will be splitted in terms of
+     * PAGESIZE in the virtio layer. In this case, the final
+     * iounit is equal to the value of ((msize/unit) - 1) * unit.
+     */
+    if (stbuf.f_bsize > s->msize) {
+        iounit = 4096;
+        unit = 4096;
+    } else {
+            iounit = stbuf.f_bsize;
+        unit = stbuf.f_bsize;
+    }
+        iounit *= (s->msize - P9_IOHDRSZ)/unit;
      }
      if (!iounit) {