Message ID | 200812090057.33890.yur@emcraft.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Mon, Dec 8, 2008 at 2:57 PM, Yuri Tikhonov <yur@emcraft.com> wrote: > Rewrite handle_stripe_dirtying6 function to work asynchronously. > > Signed-off-by: Yuri Tikhonov <yur@emcraft.com> > Signed-off-by: Ilya Yanok <yanok@emcraft.com> > --- > drivers/md/raid5.c | 113 ++++++++++++++-------------------------------------- > 1 files changed, 30 insertions(+), 83 deletions(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index e08ed4f..f0b47bd 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -2485,99 +2485,46 @@ static void handle_stripe_dirtying6(raid5_conf_t *conf, > struct stripe_head *sh, struct stripe_head_state *s, > struct r6_state *r6s, int disks) > { > - int rcw = 0, must_compute = 0, pd_idx = sh->pd_idx, i; > + int rcw = 0, pd_idx = sh->pd_idx, i; > int qd_idx = r6s->qd_idx; > + > + set_bit(STRIPE_HANDLE, &sh->state); > for (i = disks; i--; ) { > struct r5dev *dev = &sh->dev[i]; > - /* Would I have to read this buffer for reconstruct_write */ > - if (!test_bit(R5_OVERWRITE, &dev->flags) > - && i != pd_idx && i != qd_idx > - && (!test_bit(R5_LOCKED, &dev->flags) > - ) && > - !test_bit(R5_UPTODATE, &dev->flags)) { > - if (test_bit(R5_Insync, &dev->flags)) rcw++; > - else { > - pr_debug("raid6: must_compute: " > - "disk %d flags=%#lx\n", i, dev->flags); > - must_compute++; > + /* check if we haven't enough data */ > + if (!test_bit(R5_OVERWRITE, &dev->flags) && > + i != pd_idx && i != qd_idx && > + !test_bit(R5_LOCKED, &dev->flags) && > + !(test_bit(R5_UPTODATE, &dev->flags) || > + test_bit(R5_Wantcompute, &dev->flags))) { > + rcw++; > + if (!test_bit(R5_Insync, &dev->flags)) > + continue; /* it's a failed drive */ > + > + if ( > + test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) { > + pr_debug("Read_old stripe %llu " > + "block %d for Reconstruct\n", > + (unsigned long long)sh->sector, i); > + set_bit(R5_LOCKED, &dev->flags); > + set_bit(R5_Wantread, &dev->flags); > + s->locked++; > + } else { > + pr_debug("Request delayed stripe %llu " > + "block %d for Reconstruct\n", > + (unsigned long long)sh->sector, i); > + set_bit(STRIPE_DELAYED, &sh->state); > + set_bit(STRIPE_HANDLE, &sh->state); What's the reasoning behind changing the logic here, i.e. removing must_compute and such? I'd feel more comfortable seeing copy and paste where possible with cleanups separated out into their own patch. -- Dan
On Thu, Jan 15, 2009 at 2:51 PM, Dan Williams <dan.j.williams@intel.com> wrote: > On Mon, Dec 8, 2008 at 2:57 PM, Yuri Tikhonov <yur@emcraft.com> wrote: > What's the reasoning behind changing the logic here, i.e. removing > must_compute and such? I'd feel more comfortable seeing copy and > paste where possible with cleanups separated out into their own patch. > Ok, I now see why this change was made. Please make this changelog more descriptive than "Rewrite handle_stripe_dirtying6 function to work asynchronously." Regards, Dan
On Tue, Dec 9, 2008 at 5:57 AM, Yuri Tikhonov <yur@emcraft.com> wrote: > Rewrite handle_stripe_dirtying6 function to work asynchronously. On Fri, Jan 16, 2009 at 6:21 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Thu, Jan 15, 2009 at 2:51 PM, Dan Williams <dan.j.williams@intel.com> wrote: >> On Mon, Dec 8, 2008 at 2:57 PM, Yuri Tikhonov <yur@emcraft.com> wrote: >> What's the reasoning behind changing the logic here, i.e. removing >> must_compute and such? I'd feel more comfortable seeing copy and >> paste where possible with cleanups separated out into their own patch. > > Ok, I now see why this change was made. Please make this changelog > more descriptive than "Rewrite handle_stripe_dirtying6 function to > work asynchronously." Ack, could you please make the changelog more descriptive? and or add some of your benchmark results? Thanks, -- Cheng Renquan (程任全), Shenzhen, China
On Friday, January 16, 2009 you wrote: > On Thu, Jan 15, 2009 at 2:51 PM, Dan Williams <dan.j.williams@intel.com> wrote: >> On Mon, Dec 8, 2008 at 2:57 PM, Yuri Tikhonov <yur@emcraft.com> wrote: >> What's the reasoning behind changing the logic here, i.e. removing >> must_compute and such? I'd feel more comfortable seeing copy and >> paste where possible with cleanups separated out into their own patch. >> > Ok, I now see why this change was made. Please make this changelog > more descriptive than "Rewrite handle_stripe_dirtying6 function to > work asynchronously." Sure, how about the following: " md: rewrite handle_stripe_dirtying6 in asynchronous way Processing stripe dirtying in asynchronous way requires some changes to the handle_stripe_dirtying6() algorithm. In the synchronous implementation of the stripe dirtying we processed dirtying of a degraded stripe (with partially changed strip(s) located on the failed drive(s)) inside one handle_stripe_dirtying6() call: - we computed the missed strips from the old parities, and thus got the fully up-to-date stripe, then - we did reconstruction using the new data to write. In the asynchronous case of handle_stripe_dirtying6() we don't process anything right inside this function (since we under the lock), but only schedule the necessary operations with flags. Thus, if handle_stripe_dirtying6() is performed on the top of a degraded array we should schedule the reconstruction operation when the failed strips are marked (by previously called fetch_block6()) as to be computed (with the R5_Wantcompute flag), and all the other strips of the stripe are UPTODATE. The schedule_reconstruction() function will set the STRIPE_OP_POSTXOR flag [for new parity calculation], which is then handled in raid_run_ops() after the STRIPE_OP_COMPUTE_BLK one [which causes computing of the data missed]. " Regards, Yuri -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com
Hello Cheng, On Friday, January 16, 2009 you wrote: > Ack, could you please make the changelog more descriptive? > and or add some of your benchmark results? Of course. We did benchmarking using the Xdd tool like follows: # xdd -op write -kbytes $kbytes -reqsize $reqsize -dio-passes 2 –verbose -target $target_device where $kbytes = data disks * size of disk $reqsize= data disks * chunk size $target_device = /dev/md0 This way we did write of full array size, and thus achieved the maximum performance. The test cases were RAID-6 built on the top of 14 S-ATA drives connected to 2 LSI cards (7+7) inserted into the 800 MHz Katmai board (based on ppc440spe) equipped with 4GB of 800 MHz DRAM . Here are the results (Psw - write throughput with s/w RAID-6; Phw - write throughput with the h/w accelerated RAID-6): PAGE_SIZE=4KB, chunk=64/128/256 KB Psw = 71/72/74 MBps Phw = 128/136/139 MBps PAGE_SIZE=16KB, chunk=256/512/1024 KB Psw = 81/81/82 MBps Phw = 205/244/239 MBps PAGE_SIZE=64KB, chunk=1024/2048/4096 KB Psw = 84/84/85 MBps Phw = 258/253/258 MBps PAGE_SIZE=256KB, chunk=4096/8192/16384 KB Psw = 81/83/83 MBps Phw = 288/275/274 MBps Regards, Yuri -- Yuri Tikhonov, Senior Software Engineer Emcraft Systems, www.emcraft.com
On Fri, Jan 16, 2009 at 7:24 AM, Yuri Tikhonov <yur@emcraft.com> wrote: >> Ok, I now see why this change was made. Please make this changelog >> more descriptive than "Rewrite handle_stripe_dirtying6 function to >> work asynchronously." > > Sure, how about the following: > > " > > md: rewrite handle_stripe_dirtying6 in asynchronous way > > Processing stripe dirtying in asynchronous way requires some changes > to the handle_stripe_dirtying6() algorithm. > > In the synchronous implementation of the stripe dirtying we processed > dirtying of a degraded stripe (with partially changed strip(s) located > on the failed drive(s)) inside one handle_stripe_dirtying6() call: > - we computed the missed strips from the old parities, and thus got > the fully up-to-date stripe, then > - we did reconstruction using the new data to write. > > In the asynchronous case of handle_stripe_dirtying6() we don't > process anything right inside this function (since we under the lock), > but only schedule the necessary operations with flags. Thus, if > handle_stripe_dirtying6() is performed on the top of a degraded array > we should schedule the reconstruction operation when the failed strips > are marked (by previously called fetch_block6()) as to be computed > (with the R5_Wantcompute flag), and all the other strips of the stripe > are UPTODATE. The schedule_reconstruction() function will set the > STRIPE_OP_POSTXOR flag [for new parity calculation], which is then > handled in raid_run_ops() after the STRIPE_OP_COMPUTE_BLK one [which > causes computing of the data missed]. > > " Excellent! Thanks, Dan
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index e08ed4f..f0b47bd 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -2485,99 +2485,46 @@ static void handle_stripe_dirtying6(raid5_conf_t *conf, struct stripe_head *sh, struct stripe_head_state *s, struct r6_state *r6s, int disks) { - int rcw = 0, must_compute = 0, pd_idx = sh->pd_idx, i; + int rcw = 0, pd_idx = sh->pd_idx, i; int qd_idx = r6s->qd_idx; + + set_bit(STRIPE_HANDLE, &sh->state); for (i = disks; i--; ) { struct r5dev *dev = &sh->dev[i]; - /* Would I have to read this buffer for reconstruct_write */ - if (!test_bit(R5_OVERWRITE, &dev->flags) - && i != pd_idx && i != qd_idx - && (!test_bit(R5_LOCKED, &dev->flags) - ) && - !test_bit(R5_UPTODATE, &dev->flags)) { - if (test_bit(R5_Insync, &dev->flags)) rcw++; - else { - pr_debug("raid6: must_compute: " - "disk %d flags=%#lx\n", i, dev->flags); - must_compute++; + /* check if we haven't enough data */ + if (!test_bit(R5_OVERWRITE, &dev->flags) && + i != pd_idx && i != qd_idx && + !test_bit(R5_LOCKED, &dev->flags) && + !(test_bit(R5_UPTODATE, &dev->flags) || + test_bit(R5_Wantcompute, &dev->flags))) { + rcw++; + if (!test_bit(R5_Insync, &dev->flags)) + continue; /* it's a failed drive */ + + if ( + test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) { + pr_debug("Read_old stripe %llu " + "block %d for Reconstruct\n", + (unsigned long long)sh->sector, i); + set_bit(R5_LOCKED, &dev->flags); + set_bit(R5_Wantread, &dev->flags); + s->locked++; + } else { + pr_debug("Request delayed stripe %llu " + "block %d for Reconstruct\n", + (unsigned long long)sh->sector, i); + set_bit(STRIPE_DELAYED, &sh->state); + set_bit(STRIPE_HANDLE, &sh->state); } } } - pr_debug("for sector %llu, rcw=%d, must_compute=%d\n", - (unsigned long long)sh->sector, rcw, must_compute); - set_bit(STRIPE_HANDLE, &sh->state); - - if (rcw > 0) - /* want reconstruct write, but need to get some data */ - for (i = disks; i--; ) { - struct r5dev *dev = &sh->dev[i]; - if (!test_bit(R5_OVERWRITE, &dev->flags) - && !(s->failed == 0 && (i == pd_idx || i == qd_idx)) - && !test_bit(R5_LOCKED, &dev->flags) && - !test_bit(R5_UPTODATE, &dev->flags) && - test_bit(R5_Insync, &dev->flags)) { - if ( - test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) { - pr_debug("Read_old stripe %llu " - "block %d for Reconstruct\n", - (unsigned long long)sh->sector, i); - set_bit(R5_LOCKED, &dev->flags); - set_bit(R5_Wantread, &dev->flags); - s->locked++; - } else { - pr_debug("Request delayed stripe %llu " - "block %d for Reconstruct\n", - (unsigned long long)sh->sector, i); - set_bit(STRIPE_DELAYED, &sh->state); - set_bit(STRIPE_HANDLE, &sh->state); - } - } - } /* now if nothing is locked, and if we have enough data, we can start a * write request */ - if (s->locked == 0 && rcw == 0 && + if ((s->req_compute || !test_bit(STRIPE_COMPUTE_RUN, &sh->state)) && + s->locked == 0 && rcw == 0 && !test_bit(STRIPE_BIT_DELAY, &sh->state)) { - if (must_compute > 0) { - /* We have failed blocks and need to compute them */ - switch (s->failed) { - case 0: - BUG(); - case 1: - compute_block_1(sh, r6s->failed_num[0], 0); - break; - case 2: - compute_block_2(sh, r6s->failed_num[0], - r6s->failed_num[1]); - break; - default: /* This request should have been failed? */ - BUG(); - } - } - - pr_debug("Computing parity for stripe %llu\n", - (unsigned long long)sh->sector); - compute_parity6(sh, RECONSTRUCT_WRITE); - /* now every locked buffer is ready to be written */ - for (i = disks; i--; ) - if (test_bit(R5_LOCKED, &sh->dev[i].flags)) { - pr_debug("Writing stripe %llu block %d\n", - (unsigned long long)sh->sector, i); - s->locked++; - set_bit(R5_Wantwrite, &sh->dev[i].flags); - } - if (s->locked == disks) - if (!test_and_set_bit(STRIPE_FULL_WRITE, &sh->state)) - atomic_inc(&conf->pending_full_writes); - /* after a RECONSTRUCT_WRITE, the stripe MUST be in-sync */ - set_bit(STRIPE_INSYNC, &sh->state); - - if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) { - atomic_dec(&conf->preread_active_stripes); - if (atomic_read(&conf->preread_active_stripes) < - IO_THRESHOLD) - md_wakeup_thread(conf->mddev->thread); - } + schedule_reconstruction(sh, s, 1, 0); } }