Message ID | 1355941771-3418-1-git-send-email-namei.unix@gmail.com |
---|---|
State | New |
Headers | show |
At Thu, 20 Dec 2012 02:29:31 +0800, Liu Yuan wrote: > > From: Liu Yuan <tailai.ly@taobao.com> > > Sheepdog supports both writeback/writethrough write but has not yet supported > DIRECTIO semantics which bypass the cache completely even if Sheepdog daemon is > set up with cache enabled. > > Suppose cache is enabled on Sheepdog daemon size, the new cache control is > > cache=writeback # enable the writeback semantics for write > cache=writethrough # enable the writethrough semantics for write > cache='directsync | none | off' # disable cache competely I wonder if we should disable cache when cache=none. Many management frontend uses cache=none by default but, I think, users still expect that data is cached (e.g. by disk write cache when a raw format is used). cache=none only means that the host page cache is not used for VM disk IO. In that sense, > @@ -1118,12 +1118,19 @@ static int sd_open(BlockDriverState *bs, const char *filename, int flags) > goto out; > } > > - s->cache_enabled = true; > - s->flush_fd = connect_to_sdog(s->addr, s->port); > - if (s->flush_fd < 0) { > - error_report("failed to connect"); > - ret = s->flush_fd; > - goto out; > + if (flags & BDRV_O_NOCACHE) { > + s->cache_flags = SD_FLAG_CMD_DIRECT; > + } else if (flags & BDRV_O_CACHE_WB) { 'else' should be removed, and > + s->cache_flags = SD_FLAG_CMD_CACHE; > + } > + > + if (s->cache_flags != SD_FLAG_CMD_DIRECT) { should be 's->cache_flags == SD_FLAG_CMD_CACHE'? Do we need to send a flush request when cache=writethourgh? Thanks, Kazutaka
On 12/25/2012 03:47 PM, MORITA Kazutaka wrote: > I wonder if we should disable cache when cache=none. Many management > frontend uses cache=none by default but, I think, users still expect > that data is cached (e.g. by disk write cache when a raw format is > used). cache=none only means that the host page cache is not used for > VM disk IO. > > In that sense, > I am fine to adopt option to this semantics. >> > @@ -1118,12 +1118,19 @@ static int sd_open(BlockDriverState *bs, const char *filename, int flags) >> > goto out; >> > } >> > >> > - s->cache_enabled = true; >> > - s->flush_fd = connect_to_sdog(s->addr, s->port); >> > - if (s->flush_fd < 0) { >> > - error_report("failed to connect"); >> > - ret = s->flush_fd; >> > - goto out; >> > + if (flags & BDRV_O_NOCACHE) { >> > + s->cache_flags = SD_FLAG_CMD_DIRECT; >> > + } else if (flags & BDRV_O_CACHE_WB) { > 'else' should be removed, and > Seems should not. We need this 'else if' to allow writethrough flag. >> > + s->cache_flags = SD_FLAG_CMD_CACHE; >> > + } >> > + >> > + if (s->cache_flags != SD_FLAG_CMD_DIRECT) { > should be 's->cache_flags == SD_FLAG_CMD_CACHE'? Do we need to send a > flush request when cache=writethourgh? Looks good to me. I'll change it at V2. Thanks, Yuan
On 12/25/2012 03:47 PM, MORITA Kazutaka wrote: > I wonder if we should disable cache when cache=none. Many management > frontend uses cache=none by default but, I think, users still expect > that data is cached (e.g. by disk write cache when a raw format is > used). cache=none only means that the host page cache is not used for > VM disk IO. > > In that sense, Well, I found setting cache=directsync will contain 'BDRV_O_CACHE_WB'. Is this a bug for current master? If no, my current scheme will be the only way to bypass cache of sheepdog. Thanks, Yuan
On 12/25/2012 04:45 PM, Liu Yuan wrote: > Well, I found setting cache=directsync will contain 'BDRV_O_CACHE_WB'. > Is this a bug for current master? If no, my current scheme will be the > only way to bypass cache of sheepdog. Ping. Can anyone confirm it is a bug that 'cache=directsync' will pass BDRV_O_CACHE_WB' in the flags? Thanks, Yuan
On Thu, Jan 03, 2013 at 09:43:53PM +0800, Liu Yuan wrote: > On 12/25/2012 04:45 PM, Liu Yuan wrote: > > Well, I found setting cache=directsync will contain 'BDRV_O_CACHE_WB'. > > Is this a bug for current master? If no, my current scheme will be the > > only way to bypass cache of sheepdog. > > Ping. Can anyone confirm it is a bug that 'cache=directsync' will pass > BDRV_O_CACHE_WB' in the flags? Hi Yuan, BDRV_O_NOCACHE means bypass host page cache (O_DIRECT). BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that means whether the disk cache is writethrough or writeback. In other words, BDRV_O_NOCACHE is a host performance tweak while BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState. A protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB because it is implemented in block.c (see bdrv_co_do_writev() where QEMU will flush when after each write when !bs->enable_write_cache). Stefan
On 01/05/2013 12:38 AM, Stefan Hajnoczi wrote: > Hi Yuan, > BDRV_O_NOCACHE means bypass host page cache (O_DIRECT). > > BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that > means whether the disk cache is writethrough or writeback. > > In other words, BDRV_O_NOCACHE is a host performance tweak while > BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState. A > protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB > because it is implemented in block.c (see bdrv_co_do_writev() where QEMU > will flush when after each write when !bs->enable_write_cache). Hi Stefan, Thanks for your explanation. But after more investigation, I find myself more confused: flags passed from block layer {writeback, writethrough} 0x2042 {directsync, off, none} 0x2062 {unsafe} 0x2242 So underlying driver like Sheepdog can't depend on 'flags' passed from .bdrv_file_open() to choose the right semantics (This was possible for old QEMU IIRC). If we can't rely on the 'flags' to get the cache indications of users, would you point me how to implement tristate cache control for network block driver like Sheepdog? For e.g, I want to implement following semantics: cache=writeback|none|off # enable writeback semantics for write cache=writethrough # enable writethrough semantics for write cache=directsync # disable cache completely Thanks, Yuan
On 01/05/2013 12:40 PM, Liu Yuan wrote: > On 01/05/2013 12:38 AM, Stefan Hajnoczi wrote: >> Hi Yuan, >> BDRV_O_NOCACHE means bypass host page cache (O_DIRECT). >> >> BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that >> means whether the disk cache is writethrough or writeback. >> >> In other words, BDRV_O_NOCACHE is a host performance tweak while >> BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState. A >> protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB >> because it is implemented in block.c (see bdrv_co_do_writev() where QEMU >> will flush when after each write when !bs->enable_write_cache). > > Hi Stefan, > > Thanks for your explanation. But after more investigation, I find > myself more confused: > > flags passed from block layer > {writeback, writethrough} 0x2042 > {directsync, off, none} 0x2062 > {unsafe} 0x2242 > > So underlying driver like Sheepdog can't depend on 'flags' passed from > .bdrv_file_open() to choose the right semantics (This was possible for > old QEMU IIRC). > > If we can't rely on the 'flags' to get the cache indications of users, > would you point me how to implement tristate cache control for network > block driver like Sheepdog? For e.g, I want to implement following > semantics: > cache=writeback|none|off # enable writeback semantics for write > cache=writethrough # enable writethrough semantics for write > cache=directsync # disable cache completely > > Thanks, > Yuan > I tried the old QEMU and v1.1.0 and v1.1.2, they still worked as I expected. So I guess generic block layer got changed a bit and the 'flags' meaning turned different than old code, which did indeed allow block drivers to interpret the 'flags' passed from bdrv_file_open(). With the current upstream code, it seems that BDRV_O_CACHE_WB is always enabled and makes 'flags' completely unusable for block drivers to get the indications of user by specifying 'cache=' field. So is there other means to allow block drivers to rely on, in order to interpret the 'cache semantics'? Thanks, Yuan
On 01/05/2013 01:29 PM, Liu Yuan wrote: > On 01/05/2013 12:40 PM, Liu Yuan wrote: >> On 01/05/2013 12:38 AM, Stefan Hajnoczi wrote: >>> Hi Yuan, >>> BDRV_O_NOCACHE means bypass host page cache (O_DIRECT). >>> >>> BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that >>> means whether the disk cache is writethrough or writeback. >>> >>> In other words, BDRV_O_NOCACHE is a host performance tweak while >>> BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState. A >>> protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB >>> because it is implemented in block.c (see bdrv_co_do_writev() where QEMU >>> will flush when after each write when !bs->enable_write_cache). >> >> Hi Stefan, >> >> Thanks for your explanation. But after more investigation, I find >> myself more confused: >> >> flags passed from block layer >> {writeback, writethrough} 0x2042 >> {directsync, off, none} 0x2062 >> {unsafe} 0x2242 >> >> So underlying driver like Sheepdog can't depend on 'flags' passed from >> .bdrv_file_open() to choose the right semantics (This was possible for >> old QEMU IIRC). >> >> If we can't rely on the 'flags' to get the cache indications of users, >> would you point me how to implement tristate cache control for network >> block driver like Sheepdog? For e.g, I want to implement following >> semantics: >> cache=writeback|none|off # enable writeback semantics for write >> cache=writethrough # enable writethrough semantics for write >> cache=directsync # disable cache completely >> >> Thanks, >> Yuan >> > > I tried the old QEMU and v1.1.0 and v1.1.2, they still worked as I > expected. So I guess generic block layer got changed a bit and the > 'flags' meaning turned different than old code, which did indeed allow > block drivers to interpret the 'flags' passed from bdrv_file_open(). > > With the current upstream code, it seems that BDRV_O_CACHE_WB is always > enabled and makes 'flags' completely unusable for block drivers to get > the indications of user by specifying 'cache=' field. > > So is there other means to allow block drivers to rely on, in order to > interpret the 'cache semantics'? > I found the commit:e1e9b0ac 'always open drivers in writeback mode'. This is really undesired for network block drivers such as Sheepdog, which implement its own cache mechanism that support writeback/writethrough/directio behavior and then want to interpret these flags on its own. Is there any means for block drivers to get the semantics of 'cache=xxx' from users? Thanks, Yuan
On Sat, Jan 05, 2013 at 03:56:11PM +0800, Liu Yuan wrote: > On 01/05/2013 01:29 PM, Liu Yuan wrote: > > On 01/05/2013 12:40 PM, Liu Yuan wrote: > >> On 01/05/2013 12:38 AM, Stefan Hajnoczi wrote: > >>> Hi Yuan, > >>> BDRV_O_NOCACHE means bypass host page cache (O_DIRECT). > >>> > >>> BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that > >>> means whether the disk cache is writethrough or writeback. > >>> > >>> In other words, BDRV_O_NOCACHE is a host performance tweak while > >>> BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState. A > >>> protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB > >>> because it is implemented in block.c (see bdrv_co_do_writev() where QEMU > >>> will flush when after each write when !bs->enable_write_cache). > >> > >> Hi Stefan, > >> > >> Thanks for your explanation. But after more investigation, I find > >> myself more confused: > >> > >> flags passed from block layer > >> {writeback, writethrough} 0x2042 > >> {directsync, off, none} 0x2062 > >> {unsafe} 0x2242 > >> > >> So underlying driver like Sheepdog can't depend on 'flags' passed from > >> .bdrv_file_open() to choose the right semantics (This was possible for > >> old QEMU IIRC). > >> > >> If we can't rely on the 'flags' to get the cache indications of users, > >> would you point me how to implement tristate cache control for network > >> block driver like Sheepdog? For e.g, I want to implement following > >> semantics: > >> cache=writeback|none|off # enable writeback semantics for write > >> cache=writethrough # enable writethrough semantics for write > >> cache=directsync # disable cache completely > >> > >> Thanks, > >> Yuan > >> > > > > I tried the old QEMU and v1.1.0 and v1.1.2, they still worked as I > > expected. So I guess generic block layer got changed a bit and the > > 'flags' meaning turned different than old code, which did indeed allow > > block drivers to interpret the 'flags' passed from bdrv_file_open(). > > > > With the current upstream code, it seems that BDRV_O_CACHE_WB is always > > enabled and makes 'flags' completely unusable for block drivers to get > > the indications of user by specifying 'cache=' field. > > > > So is there other means to allow block drivers to rely on, in order to > > interpret the 'cache semantics'? > > > > I found the commit:e1e9b0ac 'always open drivers in writeback mode'. > This is really undesired for network block drivers such as Sheepdog, > which implement its own cache mechanism that support > writeback/writethrough/directio behavior and then want to interpret > these flags on its own. > > Is there any means for block drivers to get the semantics of 'cache=xxx' > from users? Hi Yuan, Please explain what cache semantics the sheepdog server supports so I can understand better what you are trying to achieve. Stefan
Am 05.01.2013 08:56, schrieb Liu Yuan: > On 01/05/2013 01:29 PM, Liu Yuan wrote: >> On 01/05/2013 12:40 PM, Liu Yuan wrote: >>> On 01/05/2013 12:38 AM, Stefan Hajnoczi wrote: >>>> Hi Yuan, >>>> BDRV_O_NOCACHE means bypass host page cache (O_DIRECT). >>>> >>>> BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that >>>> means whether the disk cache is writethrough or writeback. >>>> >>>> In other words, BDRV_O_NOCACHE is a host performance tweak while >>>> BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState. A >>>> protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB >>>> because it is implemented in block.c (see bdrv_co_do_writev() where QEMU >>>> will flush when after each write when !bs->enable_write_cache). >>> >>> Hi Stefan, >>> >>> Thanks for your explanation. But after more investigation, I find >>> myself more confused: >>> >>> flags passed from block layer >>> {writeback, writethrough} 0x2042 >>> {directsync, off, none} 0x2062 >>> {unsafe} 0x2242 >>> >>> So underlying driver like Sheepdog can't depend on 'flags' passed from >>> .bdrv_file_open() to choose the right semantics (This was possible for >>> old QEMU IIRC). >>> >>> If we can't rely on the 'flags' to get the cache indications of users, >>> would you point me how to implement tristate cache control for network >>> block driver like Sheepdog? For e.g, I want to implement following >>> semantics: >>> cache=writeback|none|off # enable writeback semantics for write >>> cache=writethrough # enable writethrough semantics for write >>> cache=directsync # disable cache completely >>> >>> Thanks, >>> Yuan >>> >> >> I tried the old QEMU and v1.1.0 and v1.1.2, they still worked as I >> expected. So I guess generic block layer got changed a bit and the >> 'flags' meaning turned different than old code, which did indeed allow >> block drivers to interpret the 'flags' passed from bdrv_file_open(). >> >> With the current upstream code, it seems that BDRV_O_CACHE_WB is always >> enabled and makes 'flags' completely unusable for block drivers to get >> the indications of user by specifying 'cache=' field. >> >> So is there other means to allow block drivers to rely on, in order to >> interpret the 'cache semantics'? >> > > I found the commit:e1e9b0ac 'always open drivers in writeback mode'. > This is really undesired for network block drivers such as Sheepdog, > which implement its own cache mechanism that support > writeback/writethrough/directio behavior and then want to interpret > these flags on its own. > > Is there any means for block drivers to get the semantics of 'cache=xxx' > from users? No, and in theory they shouldn't have to care. Why do you want to handle writethrough semantics in the block driver rather than letting qemu care for sending the right flushes? Kevin
On 01/07/2013 08:31 PM, Stefan Hajnoczi wrote: > On Sat, Jan 05, 2013 at 03:56:11PM +0800, Liu Yuan wrote: >> On 01/05/2013 01:29 PM, Liu Yuan wrote: >>> On 01/05/2013 12:40 PM, Liu Yuan wrote: >>>> On 01/05/2013 12:38 AM, Stefan Hajnoczi wrote: >>>>> Hi Yuan, >>>>> BDRV_O_NOCACHE means bypass host page cache (O_DIRECT). >>>>> >>>>> BDRV_O_CACHE_WB specifies the cache semantics that the guest sees - that >>>>> means whether the disk cache is writethrough or writeback. >>>>> >>>>> In other words, BDRV_O_NOCACHE is a host performance tweak while >>>>> BDRV_O_CACHE_WB changes the cache safety of the BlockDriverState. A >>>>> protocol driver like sheepdog doesn't need to look at BDRV_O_CACHE_WB >>>>> because it is implemented in block.c (see bdrv_co_do_writev() where QEMU >>>>> will flush when after each write when !bs->enable_write_cache). >>>> >>>> Hi Stefan, >>>> >>>> Thanks for your explanation. But after more investigation, I find >>>> myself more confused: >>>> >>>> flags passed from block layer >>>> {writeback, writethrough} 0x2042 >>>> {directsync, off, none} 0x2062 >>>> {unsafe} 0x2242 >>>> >>>> So underlying driver like Sheepdog can't depend on 'flags' passed from >>>> .bdrv_file_open() to choose the right semantics (This was possible for >>>> old QEMU IIRC). >>>> >>>> If we can't rely on the 'flags' to get the cache indications of users, >>>> would you point me how to implement tristate cache control for network >>>> block driver like Sheepdog? For e.g, I want to implement following >>>> semantics: >>>> cache=writeback|none|off # enable writeback semantics for write >>>> cache=writethrough # enable writethrough semantics for write >>>> cache=directsync # disable cache completely >>>> >>>> Thanks, >>>> Yuan >>>> >>> >>> I tried the old QEMU and v1.1.0 and v1.1.2, they still worked as I >>> expected. So I guess generic block layer got changed a bit and the >>> 'flags' meaning turned different than old code, which did indeed allow >>> block drivers to interpret the 'flags' passed from bdrv_file_open(). >>> >>> With the current upstream code, it seems that BDRV_O_CACHE_WB is always >>> enabled and makes 'flags' completely unusable for block drivers to get >>> the indications of user by specifying 'cache=' field. >>> >>> So is there other means to allow block drivers to rely on, in order to >>> interpret the 'cache semantics'? >>> >> >> I found the commit:e1e9b0ac 'always open drivers in writeback mode'. >> This is really undesired for network block drivers such as Sheepdog, >> which implement its own cache mechanism that support >> writeback/writethrough/directio behavior and then want to interpret >> these flags on its own. >> >> Is there any means for block drivers to get the semantics of 'cache=xxx' >> from users? > > Hi Yuan, > Please explain what cache semantics the sheepdog server supports so I > can understand better what you are trying to achieve. > Hi Stefan, Sheepdog support writethrough/writeback/directio(bypass) the cache, and I want to implement following semantics: cache=writeback|none|off # enable writeback semantics for write cache=writethrough # enable writethrough semantics for write cache=directsync # disable cache completely Thanks, Yuan
On 01/07/2013 09:23 PM, Kevin Wolf wrote: > No, and in theory they shouldn't have to care. > > Why do you want to handle writethrough semantics in the block driver > rather than letting qemu care for sending the right flushes? Because Sheepdog itself provide the client side cache implementation and we need means to get the cache hint from users when starting up the VM. This cache support: writethrough: provide a read cache for this VM, which is always consistent with cluster data writeback: provide a writeback cache for this VM, which only flush the dirty bits when VM send 'flush' request. directio: disable cache completely for this VM. Old QEMU (before v1.2) allows block drivers to get the cache flags, so Sheepdog can interpret these flags on its own to choose the right cache mode for each VM. doesn't QEMU need keep backward compatibility? Thanks, Yuan
On Tue, Jan 08, 2013 at 01:42:35PM +0800, Liu Yuan wrote: > On 01/07/2013 09:23 PM, Kevin Wolf wrote: > > No, and in theory they shouldn't have to care. > > > > Why do you want to handle writethrough semantics in the block driver > > rather than letting qemu care for sending the right flushes? > > Because Sheepdog itself provide the client side cache implementation and > we need means to get the cache hint from users when starting up the VM. > This cache support: > writethrough: provide a read cache for this VM, which is always > consistent with cluster data > writeback: provide a writeback cache for this VM, which only flush the > dirty bits when VM send 'flush' request. > directio: disable cache completely for this VM. > > Old QEMU (before v1.2) allows block drivers to get the cache flags, so > Sheepdog can interpret these flags on its own to choose the right cache > mode for each VM. doesn't QEMU need keep backward compatibility? This sounds fine - it fits together with QEMU like this: BDRV_NO_CACHE means sheepdog directio. We're disabling the client-side sheepdog cache. Otherwise use sheepdog writeback and let QEMU block.c decide when to flush. Never use sheepdog writethrough because it's redundant here. Stefan
On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote: > Otherwise use sheepdog writeback and let QEMU block.c decide when to > flush. Never use sheepdog writethrough because it's redundant here. I don't get it. What do you mean by 'redundant'? If we use virtio & sheepdog block driver, how can we specify writethrough mode for Sheepdog cache? Here 'writethrough' means use a pure read cache, which doesn't need flush at all. Thanks, Yuan
Am 08.01.2013 10:45, schrieb Liu Yuan: > On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote: >> Otherwise use sheepdog writeback and let QEMU block.c decide when to >> flush. Never use sheepdog writethrough because it's redundant here. > > I don't get it. What do you mean by 'redundant'? If we use virtio & > sheepdog block driver, how can we specify writethrough mode for Sheepdog > cache? Here 'writethrough' means use a pure read cache, which doesn't > need flush at all. A writethrough cache is equivalent to a write-back cache where each write is followed by a flush. qemu makes sure to send these flushes, so there is no need use Sheepdog's writethrough mode. Kevin
On 01/08/2013 06:00 PM, Kevin Wolf wrote: > Am 08.01.2013 10:45, schrieb Liu Yuan: >> On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote: >>> Otherwise use sheepdog writeback and let QEMU block.c decide when to >>> flush. Never use sheepdog writethrough because it's redundant here. >> >> I don't get it. What do you mean by 'redundant'? If we use virtio & >> sheepdog block driver, how can we specify writethrough mode for Sheepdog >> cache? Here 'writethrough' means use a pure read cache, which doesn't >> need flush at all. > > A writethrough cache is equivalent to a write-back cache where each > write is followed by a flush. qemu makes sure to send these flushes, so > there is no need use Sheepdog's writethrough mode. > Implement writethrough as writeback + flush will cause considerable overhead for network block device like Sheepdog: a single write request will be executed as two requests: write + flush. This also explains why I saw a regression about write performance: Old QEMU can issue multiple write requests in one go, but now the requests are sent one by one (even with cache=writeback set), which makes Sheepdog write performance drop a lot. Is it possible to issue multiple requests in one go as old QEMU does? It seems it is hard to restore into old semantics of cache flags due to new design of QEMU block layer. So will you accept that adding a 'flags' into BlockDriverState which carry the 'cache flags' from user to keep backward compatibility? Thanks, Yuan
Am 08.01.2013 11:39, schrieb Liu Yuan: > On 01/08/2013 06:00 PM, Kevin Wolf wrote: >> Am 08.01.2013 10:45, schrieb Liu Yuan: >>> On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote: >>>> Otherwise use sheepdog writeback and let QEMU block.c decide when to >>>> flush. Never use sheepdog writethrough because it's redundant here. >>> >>> I don't get it. What do you mean by 'redundant'? If we use virtio & >>> sheepdog block driver, how can we specify writethrough mode for Sheepdog >>> cache? Here 'writethrough' means use a pure read cache, which doesn't >>> need flush at all. >> >> A writethrough cache is equivalent to a write-back cache where each >> write is followed by a flush. qemu makes sure to send these flushes, so >> there is no need use Sheepdog's writethrough mode. > > Implement writethrough as writeback + flush will cause considerable > overhead for network block device like Sheepdog: a single write request > will be executed as two requests: write + flush Yeah, maybe we should have some kind of a FUA flag with write requests instead of sending a separate flush. > This also explains why > I saw a regression about write performance: Old QEMU can issue multiple > write requests in one go, but now the requests are sent one by one (even > with cache=writeback set), which makes Sheepdog write performance drop a > lot. Is it possible to issue multiple requests in one go as old QEMU does? Huh? We didn't change anything to that respect, or at least not that I'm aware of. qemu always only had single-request bdrv_co_writev, so if anything that batching must have happened inside Sheepdog code? Do you know what makes it not batch requests any more? > It seems it is hard to restore into old semantics of cache flags due to > new design of QEMU block layer. So will you accept that adding a 'flags' > into BlockDriverState which carry the 'cache flags' from user to keep > backward compatibility? No, going back to the old behaviour would break guest-toggled WCE. Kevin
On 01/08/2013 06:51 PM, Kevin Wolf wrote: > Am 08.01.2013 11:39, schrieb Liu Yuan: >> On 01/08/2013 06:00 PM, Kevin Wolf wrote: >>> Am 08.01.2013 10:45, schrieb Liu Yuan: >>>> On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote: >>>>> Otherwise use sheepdog writeback and let QEMU block.c decide when to >>>>> flush. Never use sheepdog writethrough because it's redundant here. >>>> >>>> I don't get it. What do you mean by 'redundant'? If we use virtio & >>>> sheepdog block driver, how can we specify writethrough mode for Sheepdog >>>> cache? Here 'writethrough' means use a pure read cache, which doesn't >>>> need flush at all. >>> >>> A writethrough cache is equivalent to a write-back cache where each >>> write is followed by a flush. qemu makes sure to send these flushes, so >>> there is no need use Sheepdog's writethrough mode. >> >> Implement writethrough as writeback + flush will cause considerable >> overhead for network block device like Sheepdog: a single write request >> will be executed as two requests: write + flush > > Yeah, maybe we should have some kind of a FUA flag with write requests > instead of sending a separate flush. > >> This also explains why >> I saw a regression about write performance: Old QEMU can issue multiple >> write requests in one go, but now the requests are sent one by one (even >> with cache=writeback set), which makes Sheepdog write performance drop a >> lot. Is it possible to issue multiple requests in one go as old QEMU does? > > Huh? We didn't change anything to that respect, or at least not that I'm > aware of. qemu always only had single-request bdrv_co_writev, so if > anything that batching must have happened inside Sheepdog code? Do you > know what makes it not batch requests any more? > QEMU v1.1.x works well with batched write requests. Sheepdog block driver doesn't do batching trick as far as I know, just send request as it is feed. There isn't noticeable changes between v1.1.x and current master regard to Sheepdog.c. To detail the different behavior, from Sheepdog daemon which receives the requests from QEMU: old: can receive multiple many requests at the virtually the same time and handle them concurrently now: only receive one request, handle it, reply and get another. So I think the problem is, current QEMU will wait for write response before sending another request. >> It seems it is hard to restore into old semantics of cache flags due to >> new design of QEMU block layer. So will you accept that adding a 'flags' >> into BlockDriverState which carry the 'cache flags' from user to keep >> backward compatibility? > > No, going back to the old behaviour would break guest-toggled WCE. > Guest-toggled WCE only works with IDE and seems that virtio-blk doesn't support it, no? And I think there are huge virtio-blk users. I didn't meant to break WCE. What I meant is to allow backward compatibility. For e.g, Sheepdog driver can make use of this dedicated cache flags to implement its own cache control and doesn't affect other drivers at all. Thanks, Yuan
Am 08.01.2013 12:08, schrieb Liu Yuan: > On 01/08/2013 06:51 PM, Kevin Wolf wrote: >> Am 08.01.2013 11:39, schrieb Liu Yuan: >>> This also explains why >>> I saw a regression about write performance: Old QEMU can issue multiple >>> write requests in one go, but now the requests are sent one by one (even >>> with cache=writeback set), which makes Sheepdog write performance drop a >>> lot. Is it possible to issue multiple requests in one go as old QEMU does? >> >> Huh? We didn't change anything to that respect, or at least not that I'm >> aware of. qemu always only had single-request bdrv_co_writev, so if >> anything that batching must have happened inside Sheepdog code? Do you >> know what makes it not batch requests any more? >> > > QEMU v1.1.x works well with batched write requests. Sheepdog block > driver doesn't do batching trick as far as I know, just send request as > it is feed. There isn't noticeable changes between v1.1.x and current > master regard to Sheepdog.c. > > To detail the different behavior, from Sheepdog daemon which receives > the requests from QEMU: > old: can receive multiple many requests at the virtually the same time > and handle them concurrently > now: only receive one request, handle it, reply and get another. > > So I think the problem is, current QEMU will wait for write response > before sending another request. I can't see a reason why it would do that. Can you bisect this? >>> It seems it is hard to restore into old semantics of cache flags due to >>> new design of QEMU block layer. So will you accept that adding a 'flags' >>> into BlockDriverState which carry the 'cache flags' from user to keep >>> backward compatibility? >> >> No, going back to the old behaviour would break guest-toggled WCE. >> > > Guest-toggled WCE only works with IDE and seems that virtio-blk doesn't > support it, no? And I think there are huge virtio-blk users. It works with virtio-blk and SCSI as well. > I didn't meant to break WCE. What I meant is to allow backward > compatibility. For e.g, Sheepdog driver can make use of this dedicated > cache flags to implement its own cache control and doesn't affect other > drivers at all. How would you do it? With a WCE that changes during runtime the idea of a flag that is passed to bdrv_open() and stays valid as long as the BlockDriverState exists doesn't match reality any more. Kevin
On 01/08/2013 07:19 PM, Kevin Wolf wrote: > I can't see a reason why it would do that. Can you bisect this? > Sure, bisect it is on my schedule, but I can't promise a deadline. >>>> >>> It seems it is hard to restore into old semantics of cache flags due to >>>> >>> new design of QEMU block layer. So will you accept that adding a 'flags' >>>> >>> into BlockDriverState which carry the 'cache flags' from user to keep >>>> >>> backward compatibility? >>> >> >>> >> No, going back to the old behaviour would break guest-toggled WCE. >>> >> >> > >> > Guest-toggled WCE only works with IDE and seems that virtio-blk doesn't >> > support it, no? And I think there are huge virtio-blk users. > It works with virtio-blk and SCSI as well. > Okay, I see the code indeed support WCE but it requires Guest kernel to support it. For the kernel doesn't support it, there is no way to disable write cache, no? >> > I didn't meant to break WCE. What I meant is to allow backward >> > compatibility. For e.g, Sheepdog driver can make use of this dedicated >> > cache flags to implement its own cache control and doesn't affect other >> > drivers at all. > How would you do it? With a WCE that changes during runtime the idea of > a flag that is passed to bdrv_open() and stays valid as long as the > BlockDriverState exists doesn't match reality any more. I am not sure if I get it right. What I meant is allow Sheepdog to control cache on the 'cache flags' at startup and ignore WCE on the run time. So you mean, if I choose witeback cache at startup, and then Guest disable it via WCE, then block layer will never send flush request down to Sheepdog driver? Thanks, Yuan
Am 08.01.2013 12:35, schrieb Liu Yuan: > On 01/08/2013 07:19 PM, Kevin Wolf wrote: >> I can't see a reason why it would do that. Can you bisect this? >> > > Sure, bisect it is on my schedule, but I can't promise a deadline. Ok, thanks. It would be good if it was before the hard freeze for 1.4. >>>>>>>> It seems it is hard to restore into old semantics of cache flags due to >>>>>>>> new design of QEMU block layer. So will you accept that adding a 'flags' >>>>>>>> into BlockDriverState which carry the 'cache flags' from user to keep >>>>>>>> backward compatibility? >>>>>> >>>>>> No, going back to the old behaviour would break guest-toggled WCE. >>>>>> >>>> >>>> Guest-toggled WCE only works with IDE and seems that virtio-blk doesn't >>>> support it, no? And I think there are huge virtio-blk users. >> It works with virtio-blk and SCSI as well. >> > > Okay, I see the code indeed support WCE but it requires Guest kernel to > support it. For the kernel doesn't support it, there is no way to > disable write cache, no? With Linux guests, it's possible for SCSI. I'm not sure about virtio-blk, but you might be right that you can't manually change it there. >>>> I didn't meant to break WCE. What I meant is to allow backward >>>> compatibility. For e.g, Sheepdog driver can make use of this dedicated >>>> cache flags to implement its own cache control and doesn't affect other >>>> drivers at all. >> How would you do it? With a WCE that changes during runtime the idea of >> a flag that is passed to bdrv_open() and stays valid as long as the >> BlockDriverState exists doesn't match reality any more. > > I am not sure if I get it right. What I meant is allow Sheepdog to > control cache on the 'cache flags' at startup and ignore WCE on the run > time. If you start with cache=writeback and then the guests switches WCE off and you ignore that, then you're causing data corruption in the case of a crash. This is not an option. > So you mean, if I choose witeback cache at startup, and then Guest > disable it via WCE, then block layer will never send flush request down > to Sheepdog driver? Yes, this is the problematic case. Currently the qemu block layer makes sure to send flushes, but if you disable that logic for Sheepdog, you would get broken behaviour in this case. Kevin
On 01/08/2013 08:12 PM, Kevin Wolf wrote: > Ok, thanks. It would be good if it was before the hard freeze for 1.4. > Oops, sorry. It is a false alarm. Last time I was running latest QEMU on tmpfs which service the request too fast. >>>>>>>>> >>>>>>>> It seems it is hard to restore into old semantics of cache flags due to >>>>>>>>> >>>>>>>> new design of QEMU block layer. So will you accept that adding a 'flags' >>>>>>>>> >>>>>>>> into BlockDriverState which carry the 'cache flags' from user to keep >>>>>>>>> >>>>>>>> backward compatibility? >>>>>>> >>>>>> >>>>>>> >>>>>> No, going back to the old behaviour would break guest-toggled WCE. >>>>>>> >>>>>> >>>>> >>>> >>>>> >>>> Guest-toggled WCE only works with IDE and seems that virtio-blk doesn't >>>>> >>>> support it, no? And I think there are huge virtio-blk users. >>> >> It works with virtio-blk and SCSI as well. >>> >> >> > >> > Okay, I see the code indeed support WCE but it requires Guest kernel to >> > support it. For the kernel doesn't support it, there is no way to >> > disable write cache, no? > With Linux guests, it's possible for SCSI. I'm not sure about > virtio-blk, but you might be right that you can't manually change it there. > >>>>> >>>> I didn't meant to break WCE. What I meant is to allow backward >>>>> >>>> compatibility. For e.g, Sheepdog driver can make use of this dedicated >>>>> >>>> cache flags to implement its own cache control and doesn't affect other >>>>> >>>> drivers at all. >>> >> How would you do it? With a WCE that changes during runtime the idea of >>> >> a flag that is passed to bdrv_open() and stays valid as long as the >>> >> BlockDriverState exists doesn't match reality any more. >> > >> > I am not sure if I get it right. What I meant is allow Sheepdog to >> > control cache on the 'cache flags' at startup and ignore WCE on the run >> > time. > If you start with cache=writeback and then the guests switches WCE off > and you ignore that, then you're causing data corruption in the case of > a crash. This is not an option. > >> > So you mean, if I choose witeback cache at startup, and then Guest >> > disable it via WCE, then block layer will never send flush request down >> > to Sheepdog driver? > Yes, this is the problematic case. Currently the qemu block layer makes > sure to send flushes, but if you disable that logic for Sheepdog, you > would get broken behaviour in this case. Maybe not for a second thought. See following combination: cache flags WCE toggled and resulting behavior writethrough writethrough writeback writetrhough (writeback + flush as expected) cache flags means specify 'cache=xxx' at startup and WCE toggled on the fly in the guest (supose guest kernel support WCE control) So the result is *not* broken. If we set cache=writethrough for sheepdog, then WCE won't take any effect because 'flush' request will be ignored by Sheepdog driver. And with cache=writeback, WCE does disable the writecache and actually turns it to a writethrough cache by sending flush req every time for write. To conclude, let Sheepdog interpret cache flags won't cause trouble even with current Guest WCE feature, the different is that if we set cache=writethrough, guest can't change it via WCE toggling. Is this behavior acceptable? Thanks, Yuan
On 01/08/2013 09:18 PM, Liu Yuan wrote: > So the result is *not* broken. If we set cache=writethrough for > sheepdog, then WCE won't take any effect because 'flush' request will be > ignored by Sheepdog driver. The merit of this 'writethrough' is that, write request will never be interpreted as two requests: write + flush in Sheepdog driver. Thanks, Yuan
Il 08/01/2013 13:12, Kevin Wolf ha scritto: >> > Okay, I see the code indeed support WCE but it requires Guest kernel to >> > support it. For the kernel doesn't support it, there is no way to >> > disable write cache, no? > With Linux guests, it's possible for SCSI. I'm not sure about > virtio-blk, but you might be right that you can't manually change it there. New enough kernels support it. Also, if you reboot from a system that supports writeback to one that doesn't (including old-enough Linux), the new QEMU will be able to use writethrough while running on the older system. Paolo
Il 08/01/2013 14:18, Liu Yuan ha scritto: > Maybe not for a second thought. See following combination: > > cache flags WCE toggled and resulting behavior > writethrough writethrough > writeback writetrhough (writeback + flush as expected) > > cache flags means specify 'cache=xxx' at startup and WCE toggled on the > fly in the guest (supose guest kernel support WCE control) > > So the result is *not* broken. If we set cache=writethrough for > sheepdog, then WCE won't take any effect because 'flush' request will be > ignored by Sheepdog driver. And with cache=writeback, WCE does disable > the writecache and actually turns it to a writethrough cache by sending > flush req every time for write. > > To conclude, let Sheepdog interpret cache flags won't cause trouble even > with current Guest WCE feature, the different is that if we set > cache=writethrough, guest can't change it via WCE toggling. Is this > behavior acceptable? But why is it useful to force-disable writeback caching? Do you have any performance numbers? Paolo
On 01/09/2013 06:25 PM, Paolo Bonzini wrote: > But why is it useful to force-disable writeback caching? Do you have > any performance numbers? This is not related to performance. What I want is to allow users to choose Sheepdog cache mode (writethrough/writeback/directio) from cache=xxx. Obviously writeback mode show much better write performance over writethrough cache in Sheepdog. NOTE: writethrough cache in Sheepdog is a readonly cache which is always consistent with cluster data thus need no flush at all. This is to boost read performance for read intensive Guest over Sheepdog iamges. Thanks, Yuan
Il 09/01/2013 11:36, Liu Yuan ha scritto: >> But why is it useful to force-disable writeback caching? Do you have >> > any performance numbers? > This is not related to performance. What I want is to allow users to > choose Sheepdog cache mode (writethrough/writeback/directio) from cache=xxx. > > Obviously writeback mode show much better write performance over > writethrough cache in Sheepdog. NOTE: writethrough cache in Sheepdog is > a readonly cache which is always consistent with cluster data thus need > no flush at all. This is to boost read performance for read intensive > Guest over Sheepdog iamges. Ok, so the questions are (compared to older QEMU's writethrough mode): 1) how slower is QEMU's emulated-writethrough mode for writes, due to the extra requests? 2) how slower is QEMU's writeback mode for reads, due to the different structure of the cache? Paolo
On 01/09/2013 06:40 PM, Paolo Bonzini wrote: > 1) how slower is QEMU's emulated-writethrough mode for writes, due to > the extra requests? > I'll collect some numbers on it. > 2) how slower is QEMU's writeback mode for reads, due to the different > structure of the cache? Sorry, I don't get your question. I didn't say QEMU's writeback mode is slow for reads, did I? Thanks, Yuan
On 01/09/2013 06:46 PM, Liu Yuan wrote:
> I'll collect some numbers on it.
Well, when I use hdparm -W 0 /dev/vdx to choose the writethrough cache
in the Guest (virtio disk), it says:
test@vm1:~$ sudo hdparm -W 0 /dev/vdb
/dev/vdb:
setting drive write-caching to 0 (off)
HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device
HDIO_DRIVE_CMD(setcache) failed: Inappropriate ioctl for device
HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device
HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
Do I need extra flags to enable WCE?
Both Guest and Host kernel is 3.7 and I start QEMU by following command:
$ qemu-system-x86_64 --enable-kvm -drive
file=~/images/test1,if=virtio,cache=writeback -smp 2 -cpu host -m 1024
-drive file=sheepdog:test,if=virtio,cache=writeback
Thanks,
Yuan
Am 09.01.2013 11:36, schrieb Liu Yuan: > On 01/09/2013 06:25 PM, Paolo Bonzini wrote: >> But why is it useful to force-disable writeback caching? Do you have >> any performance numbers? > > This is not related to performance. What I want is to allow users to > choose Sheepdog cache mode (writethrough/writeback/directio) from cache=xxx. The question was which effects these modes have, and especially what the user-visible differences between writeback+flush and writethrough cache are. > Obviously writeback mode show much better write performance over > writethrough cache in Sheepdog. NOTE: writethrough cache in Sheepdog is > a readonly cache which is always consistent with cluster data thus need > no flush at all. This is to boost read performance for read intensive > Guest over Sheepdog iamges. This is the definition of a writethrough cache. Kevin
Il 09/01/2013 11:58, Liu Yuan ha scritto: > On 01/09/2013 06:46 PM, Liu Yuan wrote: >> I'll collect some numbers on it. > > Well, when I use hdparm -W 0 /dev/vdx to choose the writethrough cache > in the Guest (virtio disk), it says: > > test@vm1:~$ sudo hdparm -W 0 /dev/vdb > > /dev/vdb: > setting drive write-caching to 0 (off) > HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device > HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device > HDIO_DRIVE_CMD(setcache) failed: Inappropriate ioctl for device > HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device > HDIO_DRIVE_CMD(flushcache) failed: Inappropriate ioctl for device > HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device > > Do I need extra flags to enable WCE? "hdparm" is specific for IDE devices. sysfs is supported by SCSI (including IDE with libata, and virtio-scsi) and virtio-blk. You can use this for SCSI: echo 'write back' > /sys/block/sda/device/scsi_disk/*/cache_type and a similar command for virtio-blk. Paolo > Both Guest and Host kernel is 3.7 and I start QEMU by following command: > > $ qemu-system-x86_64 --enable-kvm -drive > file=~/images/test1,if=virtio,cache=writeback -smp 2 -cpu host -m 1024 > -drive file=sheepdog:test,if=virtio,cache=writeback > > Thanks, > Yuan > >
On 01/09/2013 06:46 PM, Liu Yuan wrote: >> 1) how slower is QEMU's emulated-writethrough mode for writes, due to >> > the extra requests? >> > > I'll collect some numbers on it. > Okay I got some nubmers. I run three sheep daemon on the same host to emulate a 3 nodes cluster, Sheepdog image has 3 copies and I put Sheepdog client cache and Sheepdog backend storage on a tmpfs. Guest and Host are all Linux 3.7. I start QEMU by following command: $ qemu-system-x86_64 --enable-kvm -drive file=~/images/test1,if=virtio,cache=writeback -smp 2 -cpu host -m 1024 -drive file=sheepdog:test,if=virtio,cache=writeback I run 5 times 'dd if=/dev/urandom of=/dev/vdb bs=1M count=100 oflag=direct' and get the average nubmer: emulated (write + flush) old impl (single write) 13.3 M/s 13.7 M/s boost percentage: (13.7 - 13.3)/13.3 = 3%. If boost number is not big, but if we run QEMU and Sheep daemon on the separate boxes, we'll expect of bigger boost because the overhead of extra 'flush' request is increased. Besides performance, I think backward compatibility is more important: 1 if we run a old kernel host (quite possible for a long running server) which doesn't support WCE, then we will never have a chance to choose writethrough cache for guest OS against new QEMU (most users tend to update user space tools to exclude bugs) 2 The upper layer software which relies on the 'cache=xxx' to choose cache mode will fail its assumption against new QEMU. and my proposal (adding another field to BlockDriverState to allow self-interpretation cache flag) can work well with current block layer: Sheepdog driver behavior: cache flags WCE toggled and resulting behavior writethrough writethrough writeback writetrhough (writeback + flush) We can see that the writeback behavior for Guest-WCE toggling is the same as expected. The difference is that if we set cache=writethrough, guest can't change it via WCE toggling. Thanks, Yuan
On 01/09/2013 08:07 PM, Liu Yuan wrote: > and my proposal (adding another field to BlockDriverState to allow > self-interpretation cache flag) can work well with current block layer: > > Sheepdog driver behavior: > cache flags WCE toggled and resulting behavior > writethrough writethrough > writeback writetrhough (writeback + flush) > > We can see that the writeback behavior for Guest-WCE toggling is the > same as expected. The difference is that if we set cache=writethrough, > guest can't change it via WCE toggling. I am wondering if above scheme can apply to the block layer, then 'cache=xxx' can act as a stronger control over WCE toggling. Then we don't need extra field. Thanks, Yuan
Am 09.01.2013 13:07, schrieb Liu Yuan: > Besides performance, I think backward compatibility is more important: > 1 if we run a old kernel host (quite possible for a long running > server) which doesn't support WCE, then we will never have a chance to > choose writethrough cache for guest OS against new QEMU (most users tend > to update user space tools to exclude bugs) How does the host kernel even play a role in this context? > 2 The upper layer software which relies on the 'cache=xxx' to choose > cache mode will fail its assumption against new QEMU. Which assumptions do you mean? As far as I can say the behaviour hasn't changed, except possibly for the performance. > We can see that the writeback behavior for Guest-WCE toggling is the > same as expected. The difference is that if we set cache=writethrough, > guest can't change it via WCE toggling. Then you need to make sure to communicate this to the guest. I'm not convinced that adding extra logic to each device for this is a good idea. Kevin
On 01/09/2013 08:42 PM, Kevin Wolf wrote: > Am 09.01.2013 13:07, schrieb Liu Yuan: >> Besides performance, I think backward compatibility is more important: >> 1 if we run a old kernel host (quite possible for a long running >> server) which doesn't support WCE, then we will never have a chance to >> choose writethrough cache for guest OS against new QEMU (most users tend >> to update user space tools to exclude bugs) > > How does the host kernel even play a role in this context? > Oops, my mind was broken. Okay, guest OS which doesn't support WCE can't change cache type as we discussed. >> 2 The upper layer software which relies on the 'cache=xxx' to choose >> cache mode will fail its assumption against new QEMU. > > Which assumptions do you mean? As far as I can say the behaviour hasn't > changed, except possibly for the performance. > When users set 'cache=writethrough' to export only a writethrough cache to Guest, but with new QEMU, it will actually get a writeback cache as default. >> We can see that the writeback behavior for Guest-WCE toggling is the >> same as expected. The difference is that if we set cache=writethrough, >> guest can't change it via WCE toggling. > > Then you need to make sure to communicate this to the guest. I'm not > convinced that adding extra logic to each device for this is a good idea. > We don't need to communicate to the guest. I think 'cache=xxx' means what kind of cache the users *expect* to export to Guest OS. So if cache=writethrough set, Guest OS couldn't turn it to writeback cache magically. This is like I bought a disk with 'writethrough' cache built-in, I didn't expect that it turned to be a disk with writeback cache under the hood which could possible lose data when power outage happened. So Guest-WCE only works when users provide us a writeback capable disk. Thanks, Yuan
Il 09/01/2013 14:04, Liu Yuan ha scritto: > > > 2 The upper layer software which relies on the 'cache=xxx' to choose > > > cache mode will fail its assumption against new QEMU. > > > > Which assumptions do you mean? As far as I can say the behaviour hasn't > > changed, except possibly for the performance. > > When users set 'cache=writethrough' to export only a writethrough cache > to Guest, but with new QEMU, it will actually get a writeback cache as > default. They get a writeback cache implementation-wise, but they get a writethrough cache safety-wise. How the cache is implemented doesn't matter, as long as it "looks like" a writethrough cache. In fact, consider a local disk that doesn't support FUA. In old QEMU, images used to be opened with O_DSYNC and that splits each write into WRITE+FLUSH, just like new QEMU. All that changes is _where_ the flushes are created. Old QEMU changes it in the kernel, new QEMU changes it in userspace. > We don't need to communicate to the guest. I think 'cache=xxx' means > what kind of cache the users *expect* to export to Guest OS. So if > cache=writethrough set, Guest OS couldn't turn it to writeback cache > magically. This is like I bought a disk with 'writethrough' cache > built-in, I didn't expect that it turned to be a disk with writeback > cache under the hood which could possible lose data when power outage > happened. It's not by magic. It's by explicitly requesting the disk to do this. Perhaps it's a bug that the cache mode is not reset when the machine is reset. I haven't checked that, but it would be a valid complaint. Paolo > > So Guest-WCE only works when users provide us a writeback capable disk.
On 01/09/2013 11:10 PM, Paolo Bonzini wrote: > Il 09/01/2013 14:04, Liu Yuan ha scritto: >>>> 2 The upper layer software which relies on the 'cache=xxx' to choose >>>> cache mode will fail its assumption against new QEMU. >>> >>> Which assumptions do you mean? As far as I can say the behaviour hasn't >>> changed, except possibly for the performance. >> >> When users set 'cache=writethrough' to export only a writethrough cache >> to Guest, but with new QEMU, it will actually get a writeback cache as >> default. > > They get a writeback cache implementation-wise, but they get a > writethrough cache safety-wise. How the cache is implemented doesn't > matter, as long as it "looks like" a writethrough cache. > > In fact, consider a local disk that doesn't support FUA. In old QEMU, > images used to be opened with O_DSYNC and that splits each write into > WRITE+FLUSH, just like new QEMU. All that changes is _where_ the > flushes are created. Old QEMU changes it in the kernel, new QEMU > changes it in userspace. > >> We don't need to communicate to the guest. I think 'cache=xxx' means >> what kind of cache the users *expect* to export to Guest OS. So if >> cache=writethrough set, Guest OS couldn't turn it to writeback cache >> magically. This is like I bought a disk with 'writethrough' cache >> built-in, I didn't expect that it turned to be a disk with writeback >> cache under the hood which could possible lose data when power outage >> happened. > > It's not by magic. It's by explicitly requesting the disk to do this. > > Perhaps it's a bug that the cache mode is not reset when the machine is > reset. I haven't checked that, but it would be a valid complaint. > Ah I didn't get the current implementation right. I tried the 3.7 kernel and it works as expected (cache=writethrough result in a 'writethrough' cache in the guest). It looks fine to me to emulate writethrough as writeback + flush, since the profermance drop isn't big, though sheepdog itself support true writethrough cache (no flush). I am going to send v2 of directio patch for sheepdog driver. Thanks, Yuan
Kevin Wolf wrote: > Am 08.01.2013 11:39, schrieb Liu Yuan: > > On 01/08/2013 06:00 PM, Kevin Wolf wrote: > >> Am 08.01.2013 10:45, schrieb Liu Yuan: > >>> On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote: > >>>> Otherwise use sheepdog writeback and let QEMU block.c decide when to > >>>> flush. Never use sheepdog writethrough because it's redundant here. > >>> > >>> I don't get it. What do you mean by 'redundant'? If we use virtio & > >>> sheepdog block driver, how can we specify writethrough mode for Sheepdog > >>> cache? Here 'writethrough' means use a pure read cache, which doesn't > >>> need flush at all. > >> > >> A writethrough cache is equivalent to a write-back cache where each > >> write is followed by a flush. qemu makes sure to send these flushes, so > >> there is no need use Sheepdog's writethrough mode. > > > > Implement writethrough as writeback + flush will cause considerable > > overhead for network block device like Sheepdog: a single write request > > will be executed as two requests: write + flush > > Yeah, maybe we should have some kind of a FUA flag with write requests > instead of sending a separate flush. Note that write+FUA has different semantics than write+flush, at least with regular disks. write+FUA commits just what was written, while write+flush commits everything that was written before. -- Jamie
Am 10.01.2013 16:12, schrieb Jamie Lokier: > Kevin Wolf wrote: >> Am 08.01.2013 11:39, schrieb Liu Yuan: >>> On 01/08/2013 06:00 PM, Kevin Wolf wrote: >>>> Am 08.01.2013 10:45, schrieb Liu Yuan: >>>>> On 01/08/2013 05:40 PM, Stefan Hajnoczi wrote: >>>>>> Otherwise use sheepdog writeback and let QEMU block.c decide when to >>>>>> flush. Never use sheepdog writethrough because it's redundant here. >>>>> >>>>> I don't get it. What do you mean by 'redundant'? If we use virtio & >>>>> sheepdog block driver, how can we specify writethrough mode for Sheepdog >>>>> cache? Here 'writethrough' means use a pure read cache, which doesn't >>>>> need flush at all. >>>> >>>> A writethrough cache is equivalent to a write-back cache where each >>>> write is followed by a flush. qemu makes sure to send these flushes, so >>>> there is no need use Sheepdog's writethrough mode. >>> >>> Implement writethrough as writeback + flush will cause considerable >>> overhead for network block device like Sheepdog: a single write request >>> will be executed as two requests: write + flush >> >> Yeah, maybe we should have some kind of a FUA flag with write requests >> instead of sending a separate flush. > > Note that write+FUA has different semantics than write+flush, at least > with regular disks. > > write+FUA commits just what was written, while write+flush commits > everything that was written before. True. However, when you use it for implementing a writethrough mode, i.e. every single write has the FUA flag set, then it ought to be the same. One thing to take care of might be doing one explicit flush when switching from writeback to writethrough. Kevin
Paolo Bonzini wrote: > Il 09/01/2013 14:04, Liu Yuan ha scritto: > > > > 2 The upper layer software which relies on the 'cache=xxx' to choose > > > > cache mode will fail its assumption against new QEMU. > > > > > > Which assumptions do you mean? As far as I can say the behaviour hasn't > > > changed, except possibly for the performance. > > > > When users set 'cache=writethrough' to export only a writethrough cache > > to Guest, but with new QEMU, it will actually get a writeback cache as > > default. > > They get a writeback cache implementation-wise, but they get a > writethrough cache safety-wise. How the cache is implemented doesn't > matter, as long as it "looks like" a writethrough cache. > > In fact, consider a local disk that doesn't support FUA. In old QEMU, > images used to be opened with O_DSYNC and that splits each write into > WRITE+FLUSH, just like new QEMU. All that changes is _where_ the > flushes are created. Old QEMU changes it in the kernel, new QEMU > changes it in userspace. > > > We don't need to communicate to the guest. I think 'cache=xxx' means > > what kind of cache the users *expect* to export to Guest OS. So if > > cache=writethrough set, Guest OS couldn't turn it to writeback cache > > magically. This is like I bought a disk with 'writethrough' cache > > built-in, I didn't expect that it turned to be a disk with writeback > > cache under the hood which could possible lose data when power outage > > happened. > > It's not by magic. It's by explicitly requesting the disk to do this. > > Perhaps it's a bug that the cache mode is not reset when the machine is > reset. I haven't checked that, but it would be a valid complaint. The question is, is cache=writeback/cache=writethrough an initial setting of guest-visible WCE that the guest is allowed to change, or is cache=writeththrough a way of saying "don't have a write cache" (which may or may not be reflected in the guest-visible disk id). I couldn't tell from QEMU documentation which is intended. It would be a bit silly if it means different things for different backend storage. I have seen (obscure) guest code which toggled WCE to simulate FUA, and there is plenty of advice out there saying to set WCE=0 for certain kinds of databases because of its presumed crash safety. Even very ancient guests on Linux and Windows can change WCE=0 with IDE and SCSI. So from a guest point of view, I think guest setting WCE=0 should mean exactly the same as FUA every write, or flush after every write, until guest setting WCE=1. -- Jamie
Il 10/01/2013 16:25, Jamie Lokier ha scritto: >> > Perhaps it's a bug that the cache mode is not reset when the machine is >> > reset. I haven't checked that, but it would be a valid complaint. > The question is, is cache=writeback/cache=writethrough an initial > setting of guest-visible WCE that the guest is allowed to change, or > is cache=writeththrough a way of saying "don't have a write cache" > (which may or may not be reflected in the guest-visible disk id). It used to be the latter (with reflection in the disk data), but now it is the former. > I couldn't tell from QEMU documentation which is intended. It would > be a bit silly if it means different things for different backend > storage. It means the same thing for IDE, SCSI and virtio-blk. Other backends, such as SD, do not even have flush, and are really slow with cache=writethrough because they write one sector at a time. For this reason they cannot really be used in a "safe" manner. > I have seen (obscure) guest code which toggled WCE to simulate FUA, That's quite useless, since WCE=1->WCE=0 is documented to cause a flush (and it does). Might as well send a real flush. > and there is plenty of advice out there saying to set WCE=0 for > certain kinds of databases because of its presumed crash safety. Even > very ancient guests on Linux and Windows can change WCE=0 with IDE and > SCSI. This is supported in QEMU. > So from a guest point of view, I think guest setting WCE=0 should mean > exactly the same as FUA every write, or flush after every write, until > guest setting WCE=1. Yes, that's indeed how it is implemented. Paolo
Paolo Bonzini wrote: > Il 10/01/2013 16:25, Jamie Lokier ha scritto: > >> > Perhaps it's a bug that the cache mode is not reset when the machine is > >> > reset. I haven't checked that, but it would be a valid complaint. > > The question is, is cache=writeback/cache=writethrough an initial > > setting of guest-visible WCE that the guest is allowed to change, or > > is cache=writeththrough a way of saying "don't have a write cache" > > (which may or may not be reflected in the guest-visible disk id). > > It used to be the latter (with reflection in the disk data), but now it > is the former. Interesting. It could be worth a note in the manual. > > I couldn't tell from QEMU documentation which is intended. It would > > be a bit silly if it means different things for different backend > > storage. > > It means the same thing for IDE, SCSI and virtio-blk. Other backends, > such as SD, do not even have flush, and are really slow with > cache=writethrough because they write one sector at a time. For this > reason they cannot really be used in a "safe" manner. > > > I have seen (obscure) guest code which toggled WCE to simulate FUA, > > That's quite useless, since WCE=1->WCE=0 is documented to cause a flush > (and it does). Might as well send a real flush. It was because the ATA spec seemed to permit the combination of WCE with no flush command supported. So WCE=>1->WCE=0 was used to flush, and kept at WCE=0 for the subsequent logging write-FUA(s), until a non-FUA write was wanted. -- Jamie
At Thu, 10 Jan 2013 13:38:16 +0800, Liu Yuan wrote: > > On 01/09/2013 11:10 PM, Paolo Bonzini wrote: > > Il 09/01/2013 14:04, Liu Yuan ha scritto: > >>>> 2 The upper layer software which relies on the 'cache=xxx' to choose > >>>> cache mode will fail its assumption against new QEMU. > >>> > >>> Which assumptions do you mean? As far as I can say the behaviour hasn't > >>> changed, except possibly for the performance. > >> > >> When users set 'cache=writethrough' to export only a writethrough cache > >> to Guest, but with new QEMU, it will actually get a writeback cache as > >> default. > > > > They get a writeback cache implementation-wise, but they get a > > writethrough cache safety-wise. How the cache is implemented doesn't > > matter, as long as it "looks like" a writethrough cache. > > > > > In fact, consider a local disk that doesn't support FUA. In old QEMU, > > images used to be opened with O_DSYNC and that splits each write into > > WRITE+FLUSH, just like new QEMU. All that changes is _where_ the > > flushes are created. Old QEMU changes it in the kernel, new QEMU > > changes it in userspace. > > > >> We don't need to communicate to the guest. I think 'cache=xxx' means > >> what kind of cache the users *expect* to export to Guest OS. So if > >> cache=writethrough set, Guest OS couldn't turn it to writeback cache > >> magically. This is like I bought a disk with 'writethrough' cache > >> built-in, I didn't expect that it turned to be a disk with writeback > >> cache under the hood which could possible lose data when power outage > >> happened. > > > > It's not by magic. It's by explicitly requesting the disk to do this. > > > > Perhaps it's a bug that the cache mode is not reset when the machine is > > reset. I haven't checked that, but it would be a valid complaint. > > > > Ah I didn't get the current implementation right. I tried the 3.7 kernel > and it works as expected (cache=writethrough result in a 'writethrough' > cache in the guest). > > It looks fine to me to emulate writethrough as writeback + flush, since > the profermance drop isn't big, though sheepdog itself support true > writethrough cache (no flush). Can we drop the SD_FLAG_CMD_CACHE flag from sheepdog write requests when bdrv_enable_write_cache() is false? Then the requests behave like FUA writes and we can safely omit succeeding flush requests. Thanks, Kazutaka
On 01/11/2013 03:52 PM, MORITA Kazutaka wrote: > At Thu, 10 Jan 2013 13:38:16 +0800, > Liu Yuan wrote: >> >> On 01/09/2013 11:10 PM, Paolo Bonzini wrote: >>> Il 09/01/2013 14:04, Liu Yuan ha scritto: >>>>>> 2 The upper layer software which relies on the 'cache=xxx' to choose >>>>>> cache mode will fail its assumption against new QEMU. >>>>> >>>>> Which assumptions do you mean? As far as I can say the behaviour hasn't >>>>> changed, except possibly for the performance. >>>> >>>> When users set 'cache=writethrough' to export only a writethrough cache >>>> to Guest, but with new QEMU, it will actually get a writeback cache as >>>> default. >>> >>> They get a writeback cache implementation-wise, but they get a >>> writethrough cache safety-wise. How the cache is implemented doesn't >>> matter, as long as it "looks like" a writethrough cache. >>> >> >>> In fact, consider a local disk that doesn't support FUA. In old QEMU, >>> images used to be opened with O_DSYNC and that splits each write into >>> WRITE+FLUSH, just like new QEMU. All that changes is _where_ the >>> flushes are created. Old QEMU changes it in the kernel, new QEMU >>> changes it in userspace. >>> >>>> We don't need to communicate to the guest. I think 'cache=xxx' means >>>> what kind of cache the users *expect* to export to Guest OS. So if >>>> cache=writethrough set, Guest OS couldn't turn it to writeback cache >>>> magically. This is like I bought a disk with 'writethrough' cache >>>> built-in, I didn't expect that it turned to be a disk with writeback >>>> cache under the hood which could possible lose data when power outage >>>> happened. >>> >>> It's not by magic. It's by explicitly requesting the disk to do this. >>> >>> Perhaps it's a bug that the cache mode is not reset when the machine is >>> reset. I haven't checked that, but it would be a valid complaint. >>> >> >> Ah I didn't get the current implementation right. I tried the 3.7 kernel >> and it works as expected (cache=writethrough result in a 'writethrough' >> cache in the guest). >> >> It looks fine to me to emulate writethrough as writeback + flush, since >> the profermance drop isn't big, though sheepdog itself support true >> writethrough cache (no flush). > > Can we drop the SD_FLAG_CMD_CACHE flag from sheepdog write requests > when bdrv_enable_write_cache() is false? Then the requests behave > like FUA writes and we can safely omit succeeding flush requests. > Let's first implement a emulated writethroughc cache and then look for a methond if we can play tricks to implement a true writethrough cache. This would bring complexity such as before we drop SD_FLAG_CMD_CACHE, we need to flush beforehand. And more, bdrv_enable_write_cache() is always true for now, I guess this need generic change block layer to get the writethrough/writeback hints. Thanks, Yuan
> bdrv_enable_write_cache() is always > true for now, I guess this need generic change block layer to get the > writethrough/writeback hints. That's not correct. Try hdparm with an IDE disk, or sysfs with a SCSI disk (recent kernels will also support sysfs with an IDE or virtio-blk disk) and you'll see that it changes. Paolo
On 01/11/2013 05:00 PM, Paolo Bonzini wrote: > That's not correct. Try hdparm with an IDE disk, or sysfs with a SCSI > disk (recent kernels will also support sysfs with an IDE or virtio-blk > disk) and you'll see that it changes. Okay, at least at startup, even with cache=writehtough set, I saw bdrv_enable_write_cache() returns true from sd_open(). So you mean this will be reset later (after sd_open())? Thanks, Yuan
Am 11.01.2013 08:52, schrieb MORITA Kazutaka: > At Thu, 10 Jan 2013 13:38:16 +0800, > Liu Yuan wrote: >> >> On 01/09/2013 11:10 PM, Paolo Bonzini wrote: >>> Il 09/01/2013 14:04, Liu Yuan ha scritto: >>>>>> 2 The upper layer software which relies on the 'cache=xxx' to choose >>>>>> cache mode will fail its assumption against new QEMU. >>>>> >>>>> Which assumptions do you mean? As far as I can say the behaviour hasn't >>>>> changed, except possibly for the performance. >>>> >>>> When users set 'cache=writethrough' to export only a writethrough cache >>>> to Guest, but with new QEMU, it will actually get a writeback cache as >>>> default. >>> >>> They get a writeback cache implementation-wise, but they get a >>> writethrough cache safety-wise. How the cache is implemented doesn't >>> matter, as long as it "looks like" a writethrough cache. >>> >> >>> In fact, consider a local disk that doesn't support FUA. In old QEMU, >>> images used to be opened with O_DSYNC and that splits each write into >>> WRITE+FLUSH, just like new QEMU. All that changes is _where_ the >>> flushes are created. Old QEMU changes it in the kernel, new QEMU >>> changes it in userspace. >>> >>>> We don't need to communicate to the guest. I think 'cache=xxx' means >>>> what kind of cache the users *expect* to export to Guest OS. So if >>>> cache=writethrough set, Guest OS couldn't turn it to writeback cache >>>> magically. This is like I bought a disk with 'writethrough' cache >>>> built-in, I didn't expect that it turned to be a disk with writeback >>>> cache under the hood which could possible lose data when power outage >>>> happened. >>> >>> It's not by magic. It's by explicitly requesting the disk to do this. >>> >>> Perhaps it's a bug that the cache mode is not reset when the machine is >>> reset. I haven't checked that, but it would be a valid complaint. >>> >> >> Ah I didn't get the current implementation right. I tried the 3.7 kernel >> and it works as expected (cache=writethrough result in a 'writethrough' >> cache in the guest). >> >> It looks fine to me to emulate writethrough as writeback + flush, since >> the profermance drop isn't big, though sheepdog itself support true >> writethrough cache (no flush). > > Can we drop the SD_FLAG_CMD_CACHE flag from sheepdog write requests > when bdrv_enable_write_cache() is false? Then the requests behave > like FUA writes and we can safely omit succeeding flush requests. First we would need to make sure that on a writeback -> writethrough switch a flush happens. But once this is implemented, I think your suggestion would work well. Kevin
----- Messaggio originale ----- > Da: "Liu Yuan" <namei.unix@gmail.com> > A: "Paolo Bonzini" <pbonzini@redhat.com> > Cc: "Kevin Wolf" <kwolf@redhat.com>, "Stefan Hajnoczi" <stefanha@gmail.com>, qemu-devel@nongnu.org, "MORITA Kazutaka" > <morita.kazutaka@lab.ntt.co.jp> > Inviato: Venerdì, 11 gennaio 2013 10:04:23 > Oggetto: Re: [Qemu-devel] [PATCH] sheepdog: implement direct write semantics > > On 01/11/2013 05:00 PM, Paolo Bonzini wrote: > > That's not correct. Try hdparm with an IDE disk, or sysfs with a > > SCSI disk (recent kernels will also support sysfs with an IDE or > > virtio-blk disk) and you'll see that it changes. > > Okay, at least at startup, even with cache=writehtough set, I saw > bdrv_enable_write_cache() returns true from sd_open(). So you mean > this will be reset later (after sd_open())? It is always true for the protocol. It is not always true for the format. This is correct. qcow2 can do some writes in writeback mode, but they will always be followed by a flush before the write is reported to the guest. It is a bit faster, and does not lose any correctness. Paolo
On 01/11/2013 05:34 PM, Paolo Bonzini wrote: > It is always true for the protocol. It is not always true for the format. > So you mean bdrv_enable_write_cache() always return true for Sheepdog at any time? Thanks, Yuan
> On 01/11/2013 05:34 PM, Paolo Bonzini wrote: > > It is always true for the protocol. It is not always true for the > > format. > > So you mean bdrv_enable_write_cache() always return true for Sheepdog > at any time? Yes, at least now. You could in principle have a format that calls bdrv_set_enable_write_cache(bs->file, ...), but nobody does it yet. Paolo
diff --git a/block/sheepdog.c b/block/sheepdog.c index ceabc00..134329a 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -36,7 +36,8 @@ #define SD_FLAG_CMD_WRITE 0x01 #define SD_FLAG_CMD_COW 0x02 -#define SD_FLAG_CMD_CACHE 0x04 +#define SD_FLAG_CMD_CACHE 0x04 /* Writeback mode for cache */ +#define SD_FLAG_CMD_DIRECT 0x08 /* Don't use cache */ #define SD_RES_SUCCESS 0x00 /* Success */ #define SD_RES_UNKNOWN 0x01 /* Unknown error */ @@ -293,7 +294,7 @@ typedef struct BDRVSheepdogState { char name[SD_MAX_VDI_LEN]; bool is_snapshot; - bool cache_enabled; + uint32_t cache_flags; char *addr; char *port; @@ -977,8 +978,8 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, hdr.flags = SD_FLAG_CMD_WRITE | flags; } - if (s->cache_enabled) { - hdr.flags |= SD_FLAG_CMD_CACHE; + if (s->cache_flags) { + hdr.flags |= s->cache_flags; } hdr.oid = oid; @@ -1023,7 +1024,7 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req, static int read_write_object(int fd, char *buf, uint64_t oid, int copies, unsigned int datalen, uint64_t offset, - bool write, bool create, bool cache) + bool write, bool create, uint32_t cache_flags) { SheepdogObjReq hdr; SheepdogObjRsp *rsp = (SheepdogObjRsp *)&hdr; @@ -1047,9 +1048,7 @@ static int read_write_object(int fd, char *buf, uint64_t oid, int copies, hdr.opcode = SD_OP_READ_OBJ; } - if (cache) { - hdr.flags |= SD_FLAG_CMD_CACHE; - } + hdr.flags |= cache_flags; hdr.oid = oid; hdr.data_length = datalen; @@ -1072,18 +1071,19 @@ static int read_write_object(int fd, char *buf, uint64_t oid, int copies, } static int read_object(int fd, char *buf, uint64_t oid, int copies, - unsigned int datalen, uint64_t offset, bool cache) + unsigned int datalen, uint64_t offset, + uint32_t cache_flags) { return read_write_object(fd, buf, oid, copies, datalen, offset, false, - false, cache); + false, cache_flags); } static int write_object(int fd, char *buf, uint64_t oid, int copies, unsigned int datalen, uint64_t offset, bool create, - bool cache) + uint32_t cache_flags) { return read_write_object(fd, buf, oid, copies, datalen, offset, true, - create, cache); + create, cache_flags); } static int sd_open(BlockDriverState *bs, const char *filename, int flags) @@ -1118,12 +1118,19 @@ static int sd_open(BlockDriverState *bs, const char *filename, int flags) goto out; } - s->cache_enabled = true; - s->flush_fd = connect_to_sdog(s->addr, s->port); - if (s->flush_fd < 0) { - error_report("failed to connect"); - ret = s->flush_fd; - goto out; + if (flags & BDRV_O_NOCACHE) { + s->cache_flags = SD_FLAG_CMD_DIRECT; + } else if (flags & BDRV_O_CACHE_WB) { + s->cache_flags = SD_FLAG_CMD_CACHE; + } + + if (s->cache_flags != SD_FLAG_CMD_DIRECT) { + s->flush_fd = connect_to_sdog(s->addr, s->port); + if (s->flush_fd < 0) { + error_report("failed to connect"); + ret = s->flush_fd; + goto out; + } } if (snapid || tag[0] != '\0') { @@ -1140,7 +1147,7 @@ static int sd_open(BlockDriverState *bs, const char *filename, int flags) buf = g_malloc(SD_INODE_SIZE); ret = read_object(fd, buf, vid_to_vdi_oid(vid), 0, SD_INODE_SIZE, 0, - s->cache_enabled); + s->cache_flags); closesocket(fd); @@ -1387,7 +1394,7 @@ static void sd_close(BlockDriverState *bs) qemu_aio_set_fd_handler(s->fd, NULL, NULL, NULL, NULL); closesocket(s->fd); - if (s->cache_enabled) { + if (s->cache_flags) { closesocket(s->flush_fd); } g_free(s->addr); @@ -1423,7 +1430,7 @@ static int sd_truncate(BlockDriverState *bs, int64_t offset) datalen = SD_INODE_SIZE - sizeof(s->inode.data_vdi_id); s->inode.vdi_size = offset; ret = write_object(fd, (char *)&s->inode, vid_to_vdi_oid(s->inode.vdi_id), - s->inode.nr_copies, datalen, 0, false, s->cache_enabled); + s->inode.nr_copies, datalen, 0, false, s->cache_flags); close(fd); if (ret < 0) { @@ -1506,7 +1513,7 @@ static int sd_create_branch(BDRVSheepdogState *s) } ret = read_object(fd, buf, vid_to_vdi_oid(vid), s->inode.nr_copies, - SD_INODE_SIZE, 0, s->cache_enabled); + SD_INODE_SIZE, 0, s->cache_flags); closesocket(fd); @@ -1707,7 +1714,7 @@ static int coroutine_fn sd_co_flush_to_disk(BlockDriverState *bs) int ret; unsigned int wlen = 0, rlen = 0; - if (!s->cache_enabled) { + if (s->cache_flags == SD_FLAG_CMD_DIRECT) { return 0; } @@ -1723,7 +1730,7 @@ static int coroutine_fn sd_co_flush_to_disk(BlockDriverState *bs) if (rsp->result == SD_RES_INVALID_PARMS) { dprintf("disable write cache since the server doesn't support it\n"); - s->cache_enabled = false; + s->cache_flags = SD_FLAG_CMD_DIRECT; closesocket(s->flush_fd); return 0; } @@ -1774,7 +1781,7 @@ static int sd_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info) } ret = write_object(fd, (char *)&s->inode, vid_to_vdi_oid(s->inode.vdi_id), - s->inode.nr_copies, datalen, 0, false, s->cache_enabled); + s->inode.nr_copies, datalen, 0, false, s->cache_flags); if (ret < 0) { error_report("failed to write snapshot's inode."); goto cleanup; @@ -1791,7 +1798,7 @@ static int sd_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info) inode = (SheepdogInode *)g_malloc(datalen); ret = read_object(fd, (char *)inode, vid_to_vdi_oid(new_vid), - s->inode.nr_copies, datalen, 0, s->cache_enabled); + s->inode.nr_copies, datalen, 0, s->cache_flags); if (ret < 0) { error_report("failed to read new inode info. %s", strerror(errno)); @@ -1845,7 +1852,7 @@ static int sd_snapshot_goto(BlockDriverState *bs, const char *snapshot_id) buf = g_malloc(SD_INODE_SIZE); ret = read_object(fd, buf, vid_to_vdi_oid(vid), s->inode.nr_copies, - SD_INODE_SIZE, 0, s->cache_enabled); + SD_INODE_SIZE, 0, s->cache_flags); closesocket(fd); @@ -1942,7 +1949,7 @@ static int sd_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab) /* we don't need to read entire object */ ret = read_object(fd, (char *)&inode, vid_to_vdi_oid(vid), 0, SD_INODE_SIZE - sizeof(inode.data_vdi_id), 0, - s->cache_enabled); + s->cache_flags); if (ret) { continue; @@ -2003,11 +2010,11 @@ static int do_load_save_vmstate(BDRVSheepdogState *s, uint8_t *data, if (load) { ret = read_object(fd, (char *)data, vmstate_oid, s->inode.nr_copies, data_len, offset, - s->cache_enabled); + s->cache_flags); } else { ret = write_object(fd, (char *)data, vmstate_oid, s->inode.nr_copies, data_len, offset, create, - s->cache_enabled); + s->cache_flags); } if (ret < 0) {