From patchwork Sun Aug 16 13:59:32 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Lord X-Patchwork-Id: 31490 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by bilbo.ozlabs.org (Postfix) with ESMTP id 337F6B6EDF for ; Sun, 16 Aug 2009 23:59:41 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752729AbZHPN7h (ORCPT ); Sun, 16 Aug 2009 09:59:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752652AbZHPN7h (ORCPT ); Sun, 16 Aug 2009 09:59:37 -0400 Received: from rtr.ca ([76.10.145.34]:57102 "EHLO mail.rtr.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751737AbZHPN7e (ORCPT ); Sun, 16 Aug 2009 09:59:34 -0400 Received: by mail.rtr.ca (Postfix, from userid 1002) id 1CB562621CB; Sun, 16 Aug 2009 09:59:34 -0400 (EDT) Received: from [10.0.0.6] (corey.localnet [10.0.0.6]) by mail.rtr.ca (Postfix) with ESMTP id BBB572621C4; Sun, 16 Aug 2009 09:59:33 -0400 (EDT) Message-ID: <4A8810C4.3050800@rtr.ca> Date: Sun, 16 Aug 2009 09:59:32 -0400 From: Mark Lord Organization: Real-Time Remedies Inc. User-Agent: Thunderbird 2.0.0.22 (X11/20090608) MIME-Version: 1.0 To: Christoph Hellwig Cc: xfs@oss.sgi.com, linux-fsdevel@vger.kernel.org, linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, jens.axboe@oracle.com, IDE/ATA development list Subject: Re: [PATCH, RFC] xfs: batched discard support References: <20090816004705.GA7347@infradead.org> <4A876255.10606@rtr.ca> <4A876CA9.20906@rtr.ca> <20090816022500.GA12392@infradead.org> <4A8802F3.6010908@rtr.ca> In-Reply-To: <4A8802F3.6010908@rtr.ca> Sender: linux-ide-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ide@vger.kernel.org Mark Lord wrote: > Christoph Hellwig wrote: >> On Sat, Aug 15, 2009 at 10:19:21PM -0400, Mark Lord wrote: >>> Mark Lord wrote: >>>> Christoph Hellwig wrote: >>> .. >>>>> Mark, any chance to try it? Just create an XFS filesystem, age it a >>>>> bit and then call the attached little trim.c program on the mountmoint >>>>> (or any file inside the filesystem for that matter) >>>> .. >>>> >>>> Looking at it now. Thanks, Christoph! >>> .. >>> >>> Fails to work on 64-bit kernel w/ 32-bit userspace (no compat ioctl). >>> Rebuilding with 32-bit kernel now.. >> >> The actual ioctl is compatible, just add the >> >> case XFS_IOC_TRIM: >> return xfs_ioc_trim(mp, arg); >> >> to xfs_file_compat_ioctl(). I'll add this to the next spin of the patch. > .. > > Okay, this gives me ENOSYS now --> discard/trim support is missing from > the lower layers. > > What other patches do I need to make this work? > > The latest from Matthew's discard tree (May 2009) don't appear to be sufficient, > even after updating them for 2.6.31-rc6. .. Okay, I got Matthews patches updated onto 2.6.31, and fixed the incompatibilities between those and the XFS TRIM patch (from Christoph), plus a sector_t printk issue. My apologies for attachments, but I am attaching the updated Christoph patch, as well as my hacked-up forward-port of Matthew's patches. Not pretty, but they work. :) Now.. running Christoph's "xfs trim" on a 4.6GB mostly already-trimmed XFS partition gave this for the first time around: [ 25.961891] Filesystem "sdb3": discarding sectors [0xc558-0x102328] [ 27.814553] Filesystem "sdb3": discarding sectors [0x10ea78-0x10e688] [ 29.771218] Filesystem "sdb3": discarding sectors [0x21d120-0x10e860] [ 31.726444] Filesystem "sdb3": discarding sectors [0x32b9a0-0x10e860] [ 33.679023] Filesystem "sdb3": discarding sectors [0x43f220-0x109860] [ 35.629948] Filesystem "sdb3": discarding sectors [0x548aa0-0x10e860] [ 37.583142] Filesystem "sdb3": discarding sectors [0x657320-0x10e860] [ 39.531822] Filesystem "sdb3": discarding sectors [0x765ba0-0x10e860] Slow, but presumably thorough. Subsequent runs were equally slow. The problem is, it still issues TRIMs to the LLD one extent at a time. Compare this with doing it all in a single TRIM command with the wiper.sh script (filesystem unmounted): [~] time wiper.sh /dev/sdb3 --commit wiper.sh: Linux SATA SSD TRIM utility, version 1.9b, by Mark Lord. Preparing for offline TRIM of free space on /dev/sdb3 (xfs non-mounted). This operation could destroy your data. Are you sure (y/N)? y Syncing disks.. Beginning TRIM operations.. Trimming 168 free extents encompassing 8793136 sectors (4294 MB) Done. real 0m1.249s user 0m0.110s sys 0m0.063s That includes the time for me to type 'y' and hit enter. :) Cheers diff -u --recursive --new-file --exclude-from=linux-2.6.31-rc6//Documentation/dontdiff --exclude='*.lds' --exclude-from=linux-2.6.31-rc6//.gitignore linux-2.6.31-rc6/block/blk-barrier.c linux/block/blk-barrier.c --- linux-2.6.31-rc6/block/blk-barrier.c 2009-08-16 09:36:36.431146680 -0400 +++ linux/block/blk-barrier.c 2009-08-16 09:20:15.164578531 -0400 @@ -425,3 +425,4 @@ DISCARD_BARRIER, NULL); } EXPORT_SYMBOL(blkdev_issue_discard); +EXPORT_SYMBOL(__blkdev_issue_discard); diff -u --recursive --new-file --exclude-from=linux-2.6.31-rc6//Documentation/dontdiff --exclude='*.lds' --exclude-from=linux-2.6.31-rc6//.gitignore linux-2.6.31-rc6/fs/xfs/linux-2.6/xfs_ioctl.c linux/fs/xfs/linux-2.6/xfs_ioctl.c --- linux-2.6.31-rc6/fs/xfs/linux-2.6/xfs_ioctl.c 2009-08-16 09:16:39.000433070 -0400 +++ linux/fs/xfs/linux-2.6/xfs_ioctl.c 2009-08-16 09:30:38.973683042 -0400 @@ -1274,6 +1274,31 @@ return 0; } +int +xfs_ioc_trim( + struct xfs_mount *mp, + __uint32_t *argp) +{ + xfs_agnumber_t agno; + int error = 0; + __uint32_t minlen; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + if (get_user(minlen, argp)) + return -EFAULT; + + down_read(&mp->m_peraglock); + for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) { + error = -xfs_trim_extents(mp, agno, minlen); + if (error) + break; + } + up_read(&mp->m_peraglock); + + return error; +} + /* * Note: some of the ioctl's return positive numbers as a * byte count indicating success, such as readlink_by_handle. @@ -1523,6 +1548,9 @@ error = xfs_errortag_clearall(mp, 1); return -error; + case XFS_IOC_TRIM: + return xfs_ioc_trim(mp, arg); + default: return -ENOTTY; } diff -u --recursive --new-file --exclude-from=linux-2.6.31-rc6//Documentation/dontdiff --exclude='*.lds' --exclude-from=linux-2.6.31-rc6//.gitignore linux-2.6.31-rc6/fs/xfs/linux-2.6/xfs_ioctl32.c linux/fs/xfs/linux-2.6/xfs_ioctl32.c --- linux-2.6.31-rc6/fs/xfs/linux-2.6/xfs_ioctl32.c 2009-06-09 23:05:27.000000000 -0400 +++ linux/fs/xfs/linux-2.6/xfs_ioctl32.c 2009-08-16 09:31:21.005588977 -0400 @@ -539,6 +539,7 @@ void __user *arg = (void __user *)p; int ioflags = 0; int error; + extern int xfs_ioc_trim(struct xfs_mount *mp, __uint32_t *argp); if (filp->f_mode & FMODE_NOCMTIME) ioflags |= IO_INVIS; @@ -564,6 +565,8 @@ case XFS_IOC_ERROR_INJECTION: case XFS_IOC_ERROR_CLEARALL: return xfs_file_ioctl(filp, cmd, p); + case XFS_IOC_TRIM: + return xfs_ioc_trim(mp, arg); #ifndef BROKEN_X86_ALIGNMENT /* These are handled fine if no alignment issues */ case XFS_IOC_ALLOCSP: diff -u --recursive --new-file --exclude-from=linux-2.6.31-rc6//Documentation/dontdiff --exclude='*.lds' --exclude-from=linux-2.6.31-rc6//.gitignore linux-2.6.31-rc6/fs/xfs/xfs_alloc.h linux/fs/xfs/xfs_alloc.h --- linux-2.6.31-rc6/fs/xfs/xfs_alloc.h 2009-06-09 23:05:27.000000000 -0400 +++ linux/fs/xfs/xfs_alloc.h 2009-08-16 09:20:15.167913313 -0400 @@ -215,4 +215,7 @@ xfs_fsblock_t bno, /* starting block number of extent */ xfs_extlen_t len); /* length of extent */ +int xfs_trim_extents(struct xfs_mount *mp, xfs_agnumber_t agno, + xfs_extlen_t minlen); + #endif /* __XFS_ALLOC_H__ */ diff -u --recursive --new-file --exclude-from=linux-2.6.31-rc6//Documentation/dontdiff --exclude='*.lds' --exclude-from=linux-2.6.31-rc6//.gitignore linux-2.6.31-rc6/fs/xfs/xfs_fs.h linux/fs/xfs/xfs_fs.h --- linux-2.6.31-rc6/fs/xfs/xfs_fs.h 2009-08-16 09:16:39.017099926 -0400 +++ linux/fs/xfs/xfs_fs.h 2009-08-16 09:20:15.171246419 -0400 @@ -475,6 +475,7 @@ #define XFS_IOC_ATTRMULTI_BY_HANDLE _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq) #define XFS_IOC_FSGEOMETRY _IOR ('X', 124, struct xfs_fsop_geom) #define XFS_IOC_GOINGDOWN _IOR ('X', 125, __uint32_t) +#define XFS_IOC_TRIM _IOR ('X', 126, __uint32_t) /* XFS_IOC_GETFSUUID ---------- deprecated 140 */ --- linux-2.6.31-rc6/fs/xfs/xfs_alloc.c 2009-06-09 23:05:27.000000000 -0400 +++ linux/fs/xfs/xfs_alloc.c 2009-08-16 09:44:51.073580438 -0400 @@ -39,6 +39,9 @@ #include "xfs_alloc.h" #include "xfs_error.h" +int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask, + unsigned type, struct completion *completion); #define XFS_ABSDIFF(a,b) (((a) <= (b)) ? ((b) - (a)) : ((a) - (b))) @@ -2609,6 +2612,97 @@ return error; } +STATIC int +xfs_trim_extent( + struct xfs_mount *mp, + xfs_agnumber_t agno, + xfs_agblock_t fbno, + xfs_extlen_t flen) +{ + xfs_daddr_t blkno = XFS_AGB_TO_DADDR(mp, agno, fbno); + sector_t nblks = XFS_FSB_TO_BB(mp, flen); + int error; + DECLARE_COMPLETION_ONSTACK(done); + + xfs_fs_cmn_err(CE_NOTE, mp, "discarding sectors [0x%llx-0x%llx]", + blkno, (u64)nblks); + + error = -__blkdev_issue_discard(mp->m_ddev_targp->bt_bdev, + blkno, nblks, GFP_NOFS, DISCARD_BARRIER, &done); + if (error && error != EOPNOTSUPP) + xfs_fs_cmn_err(CE_NOTE, mp, "discard failed, error %d", error); + return error; +} + +/* + * Notify the underlying block device about our free extent map. + * + * This walks all free extents above a minimum threshold and notifies the + * underlying device that these blocks are unused. That information is + * useful for SSDs or thinly provisioned storage in high end arrays or + * virtualization scenarios. + */ +int +xfs_trim_extents( + struct xfs_mount *mp, + xfs_agnumber_t agno, + xfs_extlen_t minlen) /* minimum extent size to bother */ +{ + struct xfs_btree_cur *cur; /* cursor for the by-block btree */ + struct xfs_buf *agbp; /* AGF buffer pointer */ + xfs_agblock_t bno; /* block the for next search */ + xfs_agblock_t fbno; /* start block of found extent */ + xfs_extlen_t flen; /* length of found extent */ + int error; + int i; + + error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp); + if (error) + return error; + + bno = 0; + for (;;) { + cur = xfs_allocbt_init_cursor(mp, NULL, agbp, agno, + XFS_BTNUM_BNO); + + error = xfs_alloc_lookup_ge(cur, bno, minlen, &i); + if (error) + goto error0; + if (!i) { + /* + * No more free extents found: done. + */ + xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR); + break; + } + + error = xfs_alloc_get_rec(cur, &fbno, &flen, &i); + if (error) + goto error0; + XFS_WANT_CORRUPTED_GOTO(i == 1, error0); + + /* + * Pass if the freespace extent isn't long enough to bother. + */ + if (flen >= minlen) { + error = xfs_trim_extent(mp, agno, fbno, flen); + if (error) { + xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR); + break; + } + } + + xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR); + bno = fbno + flen; + } + +out: + xfs_buf_relse(agbp); + return error; +error0: + xfs_btree_del_cursor(cur, XFS_BTREE_ERROR); + goto out; +} /* * AG Busy list management