[SRU,B,F,G,H,0/1] bcache: consider the fragmentation when update the writeback rate

Message ID	20210326031022.14814-1-dongdong.tao@canonical.com
Headers	show Return-Path: <kernel-team-bounces@lists.ubuntu.com> From: Dongdong Tao <dongdong.tao@canonical.com> To: kernel-team@lists.ubuntu.com Subject: [SRU] [B][F][G][H] [PATCH 0/1] bcache: consider the fragmentation when update the writeback rate Date: Fri, 26 Mar 2021 11:10:18 +0800 Message-Id: <20210326031022.14814-1-dongdong.tao@canonical.com> MIME-Version: 1.0 Precedence: list Cc: dominique.poulain@canonical.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" <kernel-team-bounces@lists.ubuntu.com>
Series	bcache: consider the fragmentation when update the writeback rate \| expand [SRU,B,F,G,H,0/1] bcache: consider the fragmentation when update the writeback rate [SRU,B,1/1] bcache: consider the fragmentation when update the writeback rate

Message ID

20210326031022.14814-1-dongdong.tao@canonical.com

Headers

From: Dongdong Tao <dongdong.tao@canonical.com>
To: kernel-team@lists.ubuntu.com
Subject: [SRU] [B][F][G][H] [PATCH 0/1] bcache: consider the fragmentation
 when update the writeback rate
Date: Fri, 26 Mar 2021 11:10:18 +0800
Message-Id: <20210326031022.14814-1-dongdong.tao@canonical.com>
MIME-Version: 1.0
Precedence: list
Cc: dominique.poulain@canonical.com
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: kernel-team-bounces@lists.ubuntu.com
Sender: "kernel-team" <kernel-team-bounces@lists.ubuntu.com>

Series

bcache: consider the fragmentation when update the writeback rate | expand

Message

Dongdong Tao March 26, 2021, 3:10 a.m. UTC

From: dongdong tao <dongdong.tao@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/1900438

SRU Justification:

[Impact]

This bug in bcache affects I/O performance on all versions of the kernel.
It is particularly negative on ceph if used with bcache.

Write I/O latency would suddenly go to around 1 second from around 10 ms
when hitting this issue and would easily be stuck there for hours or even days,
especially bad for ceph on bcache architecture.
This would make ceph extremely slow and make the entire cloud almost unusable. 

The root cause is that the dirty bucket had reached the 70 percent threshold,
thus causing all writes to go direct to the backing HDD device. 
It might be fine if it actually had a lot of dirty data, but this happens
when dirty data has not even reached over 10 percent, due to having high memory fragmentation.
What makes it worse is that the writeback rate might be still at minimum value (8) 
due to the writeback percent not reached, so it takes ages for bcache to really reclaim
enough dirty buckets to get itself out of this situation.

[Fix]

* 71dda2a5625f31bc3410cb69c3d31376a2b66f28 “bcache: consider the fragmentation when update the writeback rate”

The current way to calculate the writeback rate only considered the dirty sectors. 
This usually works fine when memory fragmentation is not high, but it will give us an unreasonably low writeback rate when we are in the situation that a few dirty sectors have consumed a lot of dirty buckets. In some cases, the dirty buckets reached  CUTOFF_WRITEBACK_SYNC (i.e., stopped writeback)  while the dirty data (sectors) had not even reached the writeback_percent threshold (i.e., started writeback). In that situation, the writeback rate will still be the minimum value (8*512 = 4KB/s), thus it will cause all the writes to bestuck in a non-writeback mode because of the slow writeback.

We accelerate the rate in 3 stages with different aggressiveness:
the first stage starts when dirty buckets percent reach above BCH_WRITEBACK_FRAGMENT_THRESHOLD_LOW (50), 
the second is BCH_WRITEBACK_FRAGMENT_THRESHOLD_MID (57),
the third is BCH_WRITEBACK_FRAGMENT_THRESHOLD_HIGH (64). 

By default the first stage tries to writeback the amount of dirty data
in one bucket (on average) in (1 / (dirty_buckets_percent - 50)) seconds,
the second stage tries to writeback the amount of dirty data in one bucket
in (1 / (dirty_buckets_percent - 57)) * 100 milliseconds, the third
stage tries to writeback the amount of dirty data in one bucket in
(1 / (dirty_buckets_percent - 64)) milliseconds.

The initial rate at each stage can be controlled by 3 configurable
parameters: 

writeback_rate_fp_term_{low|mid|high}

They are by default 1, 10, 1000, chosen based on testing and production data, detailed below.

A. When it comes to the low stage, it is still far from the 70%
   threshold, so we only want to give it a little bit push by setting the
   term to 1, it means the initial rate will be 170 if the fragment is 6,
   it is calculated by bucket_size/fragment, this rate is very small,
   but still much more reasonable than the minimum 8.
   For a production bcache with non-heavy workload, if the cache device
   is bigger than 1 TB, it may take hours to consume 1% buckets,
   so it is very possible to reclaim enough dirty buckets in this stage,
   thus to avoid entering the next stage.

B. If the dirty buckets ratio didn’t turn around during the first stage,
   it comes to the mid stage, then it is necessary for mid stage
   to be more aggressive than low stage, so the initial rate is chosen
   to be 10 times more than the low stage, which means 1700 as the initial
   rate if the fragment is 6. This is a normal rate
   we usually see for a normal workload when writeback happens
   because of writeback_percent.

C. If the dirty buckets ratio didn't turn around during the low and mid
   stages, it comes to the third stage, and it is the last chance that
   we can turn around to avoid the horrible cutoff writeback sync issue,
   then we choose 100 times more aggressive than the mid stage, that
   means 170000 as the initial rate if the fragment is 6. This is also
   inferred from a production bcache, I've got one week's writeback rate
   data from a production bcache which has quite heavy workloads,
   again, the writeback is triggered by the writeback percent,
   the highest rate area is around 100000 to 240000, so I believe this
   kind aggressiveness at this stage is reasonable for production.
   And it should be mostly enough because the hint is trying to reclaim
   1000 bucket per second, and from that heavy production env,
   it is consuming 50 buckets per second on average in one week's data.

Option writeback_consider_fragment is to control whether we want
this feature to be on or off, it's on by default.


[Test Plan]

I’ve put all my testing results in below google document, 
the testing clearly shows the significant performance improvement.
https://docs.google.com/document/d/1AmbIEa_2MhB9bqhC3rfga9tp7n9YX9PLn0jSUxscVW0/edit?usp=sharing

Another testing is that we had built a testing kernel based on bionic 4.15.0-99.100 + the patch,
and putting this kernel in a production environment, it’s an openstack environment with ceph on bcache as the storage. 
It runs for more than one month and doesn’t show any issues. 

[Where problems could occur]

The patch only updates the writeback rate, so it won’t have any impact on the data safety,
the only potential regression I can think of is that the backing device might be a bit busier
after the dirty buckets reached to BCH_WRITEBACK_FRAGMENT_THRESHOLD_LOW(50% by default) since
the writeback rate is accelerated under this highly fragmented situation, but that’s because
we are trying to avoid all writes hit the writeback cutoff sync threshold. 

[Other Info]

This SRU will cover ubuntu B,F,G,H releases, one patch for each of them.

dongdong tao (1):
  bcache: consider the fragmentation when update the writeback rate

 drivers/md/bcache/bcache.h    |  4 ++++
 drivers/md/bcache/sysfs.c     | 23 +++++++++++++++++++
 drivers/md/bcache/writeback.c | 42 +++++++++++++++++++++++++++++++++++
 drivers/md/bcache/writeback.h |  5 +++++
 4 files changed, 74 insertions(+)

Comments

Stefan Bader March 26, 2021, 8:40 a.m. UTC | #1

On 26.03.21 04:10, Dongdong Tao wrote:
> From: dongdong tao <dongdong.tao@canonical.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1900438
> 
> SRU Justification:
> 
> [Impact]
> 
> This bug in bcache affects I/O performance on all versions of the kernel.
> It is particularly negative on ceph if used with bcache.
> 
> Write I/O latency would suddenly go to around 1 second from around 10 ms
> when hitting this issue and would easily be stuck there for hours or even days,
> especially bad for ceph on bcache architecture.
> This would make ceph extremely slow and make the entire cloud almost unusable.
> 
> The root cause is that the dirty bucket had reached the 70 percent threshold,
> thus causing all writes to go direct to the backing HDD device.
> It might be fine if it actually had a lot of dirty data, but this happens
> when dirty data has not even reached over 10 percent, due to having high memory fragmentation.
> What makes it worse is that the writeback rate might be still at minimum value (8)
> due to the writeback percent not reached, so it takes ages for bcache to really reclaim
> enough dirty buckets to get itself out of this situation.
> 
> [Fix]
> 
> * 71dda2a5625f31bc3410cb69c3d31376a2b66f28 “bcache: consider the fragmentation when update the writeback rate”
> 
> The current way to calculate the writeback rate only considered the dirty sectors.
> This usually works fine when memory fragmentation is not high, but it will give us an unreasonably low writeback rate when we are in the situation that a few dirty sectors have consumed a lot of dirty buckets. In some cases, the dirty buckets reached  CUTOFF_WRITEBACK_SYNC (i.e., stopped writeback)  while the dirty data (sectors) had not even reached the writeback_percent threshold (i.e., started writeback). In that situation, the writeback rate will still be the minimum value (8*512 = 4KB/s), thus it will cause all the writes to bestuck in a non-writeback mode because of the slow writeback.
> 
> We accelerate the rate in 3 stages with different aggressiveness:
> the first stage starts when dirty buckets percent reach above BCH_WRITEBACK_FRAGMENT_THRESHOLD_LOW (50),
> the second is BCH_WRITEBACK_FRAGMENT_THRESHOLD_MID (57),
> the third is BCH_WRITEBACK_FRAGMENT_THRESHOLD_HIGH (64).
> 
> By default the first stage tries to writeback the amount of dirty data
> in one bucket (on average) in (1 / (dirty_buckets_percent - 50)) seconds,
> the second stage tries to writeback the amount of dirty data in one bucket
> in (1 / (dirty_buckets_percent - 57)) * 100 milliseconds, the third
> stage tries to writeback the amount of dirty data in one bucket in
> (1 / (dirty_buckets_percent - 64)) milliseconds.
> 
> The initial rate at each stage can be controlled by 3 configurable
> parameters:
> 
> writeback_rate_fp_term_{low|mid|high}
> 
> They are by default 1, 10, 1000, chosen based on testing and production data, detailed below.
> 
> A. When it comes to the low stage, it is still far from the 70%
>     threshold, so we only want to give it a little bit push by setting the
>     term to 1, it means the initial rate will be 170 if the fragment is 6,
>     it is calculated by bucket_size/fragment, this rate is very small,
>     but still much more reasonable than the minimum 8.
>     For a production bcache with non-heavy workload, if the cache device
>     is bigger than 1 TB, it may take hours to consume 1% buckets,
>     so it is very possible to reclaim enough dirty buckets in this stage,
>     thus to avoid entering the next stage.
> 
> B. If the dirty buckets ratio didn’t turn around during the first stage,
>     it comes to the mid stage, then it is necessary for mid stage
>     to be more aggressive than low stage, so the initial rate is chosen
>     to be 10 times more than the low stage, which means 1700 as the initial
>     rate if the fragment is 6. This is a normal rate
>     we usually see for a normal workload when writeback happens
>     because of writeback_percent.
> 
> C. If the dirty buckets ratio didn't turn around during the low and mid
>     stages, it comes to the third stage, and it is the last chance that
>     we can turn around to avoid the horrible cutoff writeback sync issue,
>     then we choose 100 times more aggressive than the mid stage, that
>     means 170000 as the initial rate if the fragment is 6. This is also
>     inferred from a production bcache, I've got one week's writeback rate
>     data from a production bcache which has quite heavy workloads,
>     again, the writeback is triggered by the writeback percent,
>     the highest rate area is around 100000 to 240000, so I believe this
>     kind aggressiveness at this stage is reasonable for production.
>     And it should be mostly enough because the hint is trying to reclaim
>     1000 bucket per second, and from that heavy production env,
>     it is consuming 50 buckets per second on average in one week's data.
> 
> Option writeback_consider_fragment is to control whether we want
> this feature to be on or off, it's on by default.
> 
> 
> [Test Plan]
> 
> I’ve put all my testing results in below google document,
> the testing clearly shows the significant performance improvement.
> https://docs.google.com/document/d/1AmbIEa_2MhB9bqhC3rfga9tp7n9YX9PLn0jSUxscVW0/edit?usp=sharing
> 
> Another testing is that we had built a testing kernel based on bionic 4.15.0-99.100 + the patch,
> and putting this kernel in a production environment, it’s an openstack environment with ceph on bcache as the storage.
> It runs for more than one month and doesn’t show any issues.
> 
> [Where problems could occur]
> 
> The patch only updates the writeback rate, so it won’t have any impact on the data safety,
> the only potential regression I can think of is that the backing device might be a bit busier
> after the dirty buckets reached to BCH_WRITEBACK_FRAGMENT_THRESHOLD_LOW(50% by default) since
> the writeback rate is accelerated under this highly fragmented situation, but that’s because
> we are trying to avoid all writes hit the writeback cutoff sync threshold.
> 
> [Other Info]
> 
> This SRU will cover ubuntu B,F,G,H releases, one patch for each of them.
> 
> dongdong tao (1):
>    bcache: consider the fragmentation when update the writeback rate
> 
>   drivers/md/bcache/bcache.h    |  4 ++++
>   drivers/md/bcache/sysfs.c     | 23 +++++++++++++++++++
>   drivers/md/bcache/writeback.c | 42 +++++++++++++++++++++++++++++++++++
>   drivers/md/bcache/writeback.h |  5 +++++
>   4 files changed, 74 insertions(+)
> 
Upstream and limited to one driver and good test results. Formally, the Groovy 
task was missing from the bug report (added now) and for the backport patches: 
it always helps if there is some hint about what effort was made. Either as

(backported from ...)
[<who>: <what>]

Where who usually is ones short signature (I would use smb) and what something 
like "context adjustments" or "adjust for missing foo()" or the like. This form 
will be part of the commit.

Alternatively freeform explanations might be inserted below the '---' and before 
the patch itself. That part automatically gets stripped when the patch gets applied.

Acked-by: Stefan Bader <stefan.bader@canonical.com>

Tim Gardner March 26, 2021, 12:04 p.m. UTC | #2

Acked-by: Tim Gardner <tim.gardner@canonical.com>

Nice work.

On 3/25/21 9:10 PM, Dongdong Tao wrote:
> From: dongdong tao <dongdong.tao@canonical.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1900438
> 
> SRU Justification:
> 
> [Impact]
> 
> This bug in bcache affects I/O performance on all versions of the
> kernel. It is particularly negative on ceph if used with bcache.
> 
> Write I/O latency would suddenly go to around 1 second from around 10
> ms when hitting this issue and would easily be stuck there for hours
> or even days, especially bad for ceph on bcache architecture. This
> would make ceph extremely slow and make the entire cloud almost
> unusable.
> 
> The root cause is that the dirty bucket had reached the 70 percent
> threshold, thus causing all writes to go direct to the backing HDD
> device. It might be fine if it actually had a lot of dirty data, but
> this happens when dirty data has not even reached over 10 percent,
> due to having high memory fragmentation. What makes it worse is that
> the writeback rate might be still at minimum value (8) due to the
> writeback percent not reached, so it takes ages for bcache to really
> reclaim enough dirty buckets to get itself out of this situation.
> 
> [Fix]
> 
> * 71dda2a5625f31bc3410cb69c3d31376a2b66f28 “bcache: consider the
> fragmentation when update the writeback rate”
> 
> The current way to calculate the writeback rate only considered the
> dirty sectors. This usually works fine when memory fragmentation is
> not high, but it will give us an unreasonably low writeback rate when
> we are in the situation that a few dirty sectors have consumed a lot
> of dirty buckets. In some cases, the dirty buckets reached
> CUTOFF_WRITEBACK_SYNC (i.e., stopped writeback)  while the dirty data
> (sectors) had not even reached the writeback_percent threshold (i.e.,
> started writeback). In that situation, the writeback rate will still
> be the minimum value (8*512 = 4KB/s), thus it will cause all the
> writes to bestuck in a non-writeback mode because of the slow
> writeback.
> 
> We accelerate the rate in 3 stages with different aggressiveness: the
> first stage starts when dirty buckets percent reach above
> BCH_WRITEBACK_FRAGMENT_THRESHOLD_LOW (50), the second is
> BCH_WRITEBACK_FRAGMENT_THRESHOLD_MID (57), the third is
> BCH_WRITEBACK_FRAGMENT_THRESHOLD_HIGH (64).
> 
> By default the first stage tries to writeback the amount of dirty
> data in one bucket (on average) in (1 / (dirty_buckets_percent - 50))
> seconds, the second stage tries to writeback the amount of dirty data
> in one bucket in (1 / (dirty_buckets_percent - 57)) * 100
> milliseconds, the third stage tries to writeback the amount of dirty
> data in one bucket in (1 / (dirty_buckets_percent - 64))
> milliseconds.
> 
> The initial rate at each stage can be controlled by 3 configurable 
> parameters:
> 
> writeback_rate_fp_term_{low|mid|high}
> 
> They are by default 1, 10, 1000, chosen based on testing and
> production data, detailed below.
> 
> A. When it comes to the low stage, it is still far from the 70% 
> threshold, so we only want to give it a little bit push by setting
> the term to 1, it means the initial rate will be 170 if the fragment
> is 6, it is calculated by bucket_size/fragment, this rate is very
> small, but still much more reasonable than the minimum 8. For a
> production bcache with non-heavy workload, if the cache device is
> bigger than 1 TB, it may take hours to consume 1% buckets, so it is
> very possible to reclaim enough dirty buckets in this stage, thus to
> avoid entering the next stage.
> 
> B. If the dirty buckets ratio didn’t turn around during the first
> stage, it comes to the mid stage, then it is necessary for mid stage 
> to be more aggressive than low stage, so the initial rate is chosen 
> to be 10 times more than the low stage, which means 1700 as the
> initial rate if the fragment is 6. This is a normal rate we usually
> see for a normal workload when writeback happens because of
> writeback_percent.
> 
> C. If the dirty buckets ratio didn't turn around during the low and
> mid stages, it comes to the third stage, and it is the last chance
> that we can turn around to avoid the horrible cutoff writeback sync
> issue, then we choose 100 times more aggressive than the mid stage,
> that means 170000 as the initial rate if the fragment is 6. This is
> also inferred from a production bcache, I've got one week's writeback
> rate data from a production bcache which has quite heavy workloads, 
> again, the writeback is triggered by the writeback percent, the
> highest rate area is around 100000 to 240000, so I believe this kind
> aggressiveness at this stage is reasonable for production. And it
> should be mostly enough because the hint is trying to reclaim 1000
> bucket per second, and from that heavy production env, it is
> consuming 50 buckets per second on average in one week's data.
> 
> Option writeback_consider_fragment is to control whether we want this
> feature to be on or off, it's on by default.
> 
> 
> [Test Plan]
> 
> I’ve put all my testing results in below google document, the testing
> clearly shows the significant performance improvement. 
> https://docs.google.com/document/d/1AmbIEa_2MhB9bqhC3rfga9tp7n9YX9PLn0jSUxscVW0/edit?usp=sharing
>
>  Another testing is that we had built a testing kernel based on
> bionic 4.15.0-99.100 + the patch, and putting this kernel in a
> production environment, it’s an openstack environment with ceph on
> bcache as the storage. It runs for more than one month and doesn’t
> show any issues.
> 
> [Where problems could occur]
> 
> The patch only updates the writeback rate, so it won’t have any
> impact on the data safety, the only potential regression I can think
> of is that the backing device might be a bit busier after the dirty
> buckets reached to BCH_WRITEBACK_FRAGMENT_THRESHOLD_LOW(50% by
> default) since the writeback rate is accelerated under this highly
> fragmented situation, but that’s because we are trying to avoid all
> writes hit the writeback cutoff sync threshold.
> 
> [Other Info]
> 
> This SRU will cover ubuntu B,F,G,H releases, one patch for each of
> them.
> 
> dongdong tao (1): bcache: consider the fragmentation when update the
> writeback rate
> 
> drivers/md/bcache/bcache.h    |  4 ++++ drivers/md/bcache/sysfs.c
> | 23 +++++++++++++++++++ drivers/md/bcache/writeback.c | 42
> +++++++++++++++++++++++++++++++++++ drivers/md/bcache/writeback.h |
> 5 +++++ 4 files changed, 74 insertions(+)
>

Guilherme G. Piccoli March 31, 2021, 7:55 p.m. UTC | #3

On Fri, Mar 26, 2021 at 12:11 AM Dongdong Tao
<dongdong.tao@canonical.com> wrote:
>
> From: dongdong tao <dongdong.tao@canonical.com>
>
> BugLink: https://bugs.launchpad.net/bugs/1900438
>
> [... awesome description ...]
>

This is an impressive analysis...very thorough, complex and the GDocs
have great information about the tests and their results, with graphs
even.
Great job Dongdong, thanks for the fix!

Acked-by: Guilherme G. Piccoli <gpiccoli@canonical.com>

Kelsey Skunberg April 2, 2021, 10:39 p.m. UTC | #4

applied to B/F/G master-next. thank you! 

-Kelsey

On 2021-03-26 11:10:18 , Dongdong Tao wrote:
> From: dongdong tao <dongdong.tao@canonical.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1900438
> 
> SRU Justification:
> 
> [Impact]
> 
> This bug in bcache affects I/O performance on all versions of the kernel.
> It is particularly negative on ceph if used with bcache.
> 
> Write I/O latency would suddenly go to around 1 second from around 10 ms
> when hitting this issue and would easily be stuck there for hours or even days,
> especially bad for ceph on bcache architecture.
> This would make ceph extremely slow and make the entire cloud almost unusable. 
> 
> The root cause is that the dirty bucket had reached the 70 percent threshold,
> thus causing all writes to go direct to the backing HDD device. 
> It might be fine if it actually had a lot of dirty data, but this happens
> when dirty data has not even reached over 10 percent, due to having high memory fragmentation.
> What makes it worse is that the writeback rate might be still at minimum value (8) 
> due to the writeback percent not reached, so it takes ages for bcache to really reclaim
> enough dirty buckets to get itself out of this situation.
> 
> [Fix]
> 
> * 71dda2a5625f31bc3410cb69c3d31376a2b66f28 “bcache: consider the fragmentation when update the writeback rate”
> 
> The current way to calculate the writeback rate only considered the dirty sectors. 
> This usually works fine when memory fragmentation is not high, but it will give us an unreasonably low writeback rate when we are in the situation that a few dirty sectors have consumed a lot of dirty buckets. In some cases, the dirty buckets reached  CUTOFF_WRITEBACK_SYNC (i.e., stopped writeback)  while the dirty data (sectors) had not even reached the writeback_percent threshold (i.e., started writeback). In that situation, the writeback rate will still be the minimum value (8*512 = 4KB/s), thus it will cause all the writes to bestuck in a non-writeback mode because of the slow writeback.
> 
> We accelerate the rate in 3 stages with different aggressiveness:
> the first stage starts when dirty buckets percent reach above BCH_WRITEBACK_FRAGMENT_THRESHOLD_LOW (50), 
> the second is BCH_WRITEBACK_FRAGMENT_THRESHOLD_MID (57),
> the third is BCH_WRITEBACK_FRAGMENT_THRESHOLD_HIGH (64). 
> 
> By default the first stage tries to writeback the amount of dirty data
> in one bucket (on average) in (1 / (dirty_buckets_percent - 50)) seconds,
> the second stage tries to writeback the amount of dirty data in one bucket
> in (1 / (dirty_buckets_percent - 57)) * 100 milliseconds, the third
> stage tries to writeback the amount of dirty data in one bucket in
> (1 / (dirty_buckets_percent - 64)) milliseconds.
> 
> The initial rate at each stage can be controlled by 3 configurable
> parameters: 
> 
> writeback_rate_fp_term_{low|mid|high}
> 
> They are by default 1, 10, 1000, chosen based on testing and production data, detailed below.
> 
> A. When it comes to the low stage, it is still far from the 70%
>    threshold, so we only want to give it a little bit push by setting the
>    term to 1, it means the initial rate will be 170 if the fragment is 6,
>    it is calculated by bucket_size/fragment, this rate is very small,
>    but still much more reasonable than the minimum 8.
>    For a production bcache with non-heavy workload, if the cache device
>    is bigger than 1 TB, it may take hours to consume 1% buckets,
>    so it is very possible to reclaim enough dirty buckets in this stage,
>    thus to avoid entering the next stage.
> 
> B. If the dirty buckets ratio didn’t turn around during the first stage,
>    it comes to the mid stage, then it is necessary for mid stage
>    to be more aggressive than low stage, so the initial rate is chosen
>    to be 10 times more than the low stage, which means 1700 as the initial
>    rate if the fragment is 6. This is a normal rate
>    we usually see for a normal workload when writeback happens
>    because of writeback_percent.
> 
> C. If the dirty buckets ratio didn't turn around during the low and mid
>    stages, it comes to the third stage, and it is the last chance that
>    we can turn around to avoid the horrible cutoff writeback sync issue,
>    then we choose 100 times more aggressive than the mid stage, that
>    means 170000 as the initial rate if the fragment is 6. This is also
>    inferred from a production bcache, I've got one week's writeback rate
>    data from a production bcache which has quite heavy workloads,
>    again, the writeback is triggered by the writeback percent,
>    the highest rate area is around 100000 to 240000, so I believe this
>    kind aggressiveness at this stage is reasonable for production.
>    And it should be mostly enough because the hint is trying to reclaim
>    1000 bucket per second, and from that heavy production env,
>    it is consuming 50 buckets per second on average in one week's data.
> 
> Option writeback_consider_fragment is to control whether we want
> this feature to be on or off, it's on by default.
> 
> 
> [Test Plan]
> 
> I’ve put all my testing results in below google document, 
> the testing clearly shows the significant performance improvement.
> https://docs.google.com/document/d/1AmbIEa_2MhB9bqhC3rfga9tp7n9YX9PLn0jSUxscVW0/edit?usp=sharing
> 
> Another testing is that we had built a testing kernel based on bionic 4.15.0-99.100 + the patch,
> and putting this kernel in a production environment, it’s an openstack environment with ceph on bcache as the storage. 
> It runs for more than one month and doesn’t show any issues. 
> 
> [Where problems could occur]
> 
> The patch only updates the writeback rate, so it won’t have any impact on the data safety,
> the only potential regression I can think of is that the backing device might be a bit busier
> after the dirty buckets reached to BCH_WRITEBACK_FRAGMENT_THRESHOLD_LOW(50% by default) since
> the writeback rate is accelerated under this highly fragmented situation, but that’s because
> we are trying to avoid all writes hit the writeback cutoff sync threshold. 
> 
> [Other Info]
> 
> This SRU will cover ubuntu B,F,G,H releases, one patch for each of them.
> 
> dongdong tao (1):
>   bcache: consider the fragmentation when update the writeback rate
> 
>  drivers/md/bcache/bcache.h    |  4 ++++
>  drivers/md/bcache/sysfs.c     | 23 +++++++++++++++++++
>  drivers/md/bcache/writeback.c | 42 +++++++++++++++++++++++++++++++++++
>  drivers/md/bcache/writeback.h |  5 +++++
>  4 files changed, 74 insertions(+)
> 
> -- 
> 2.17.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team