Message ID | 20121121020034.10225.51692.stgit@blackbox.djwong.org |
---|---|
State | Superseded, archived |
Headers | show |
On Tue, Nov 20, 2012 at 06:00:34PM -0800, Darrick J. Wong wrote: > This creates a per-backing-device counter that tracks the number of users which > require pages to be held immutable during writeout. Eventually it will be used > to waive wait_for_page_writeback() if nobody requires stable pages. Why are we going down this stupid route again now? Just let the block device say it needs stable writes and let the VM deal with it. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 21, 2012 at 02:54:35AM -0500, Christoph Hellwig wrote: > On Tue, Nov 20, 2012 at 06:00:34PM -0800, Darrick J. Wong wrote: > > This creates a per-backing-device counter that tracks the number of users which > > require pages to be held immutable during writeout. Eventually it will be used > > to waive wait_for_page_writeback() if nobody requires stable pages. > > Why are we going down this stupid route again now? Just let the block > device say it needs stable writes and let the VM deal with it. Seems like the series actually does that, and this paragraph was just left over from earlier versions. Sorry for complaining too quickly, but you really should update the description. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> +static inline void bdi_require_stable_pages(struct backing_dev_info *bdi) > +{ > + bdi->capabilities |= BDI_CAP_STABLE_WRITES; > +} > + > +static inline void bdi_unrequire_stable_pages(struct backing_dev_info *bdi) > +{ > + bdi->capabilities &= ~BDI_CAP_STABLE_WRITES; > +} Any reason to provide these wrappers while other BDI_CAP_ values don't have it/ Also what protects bdi->capabilities against concurrent updates now that it gets modified at runtime? > +static inline void queue_require_stable_pages(struct request_queue *q) > +{ > + bdi_require_stable_pages(&q->backing_dev_info); > +} > + > +static inline void queue_unrequire_stable_pages(struct request_queue *q) > +{ > + bdi_unrequire_stable_pages(&q->backing_dev_info); > +} > + > +static inline int queue_requires_stable_pages(struct request_queue *q) > +{ > + return bdi_cap_stable_pages_required(&q->backing_dev_info); > +} Independent of the above I see no point in these wrappers that just provide a single dereference. > +static ssize_t stable_pages_required_store(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t count) Can you add a rationale on why we'd want to allow users to change the value? I can't really think of any. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ok, I'll update the description a bit. On Wed, Nov 21, 2012 at 05:56:24AM -0500, Christoph Hellwig wrote: > > +static inline void bdi_require_stable_pages(struct backing_dev_info *bdi) > > +{ > > + bdi->capabilities |= BDI_CAP_STABLE_WRITES; > > +} > > + > > +static inline void bdi_unrequire_stable_pages(struct backing_dev_info *bdi) > > +{ > > + bdi->capabilities &= ~BDI_CAP_STABLE_WRITES; > > +} > > Any reason to provide these wrappers while other BDI_CAP_ values don't > have it/ > > Also what protects bdi->capabilities against concurrent updates now that > it gets modified at runtime? Nothing seems to update ->capabilities at run time. That said, if you're really worried about concurrent updates, I can always put a spinlock around all the updates. (Or revert to the atomic_t counter, but that seemed unpopular...) I think I can drop the wrappers. > > +static inline void queue_require_stable_pages(struct request_queue *q) > > +{ > > + bdi_require_stable_pages(&q->backing_dev_info); > > +} > > + > > +static inline void queue_unrequire_stable_pages(struct request_queue *q) > > +{ > > + bdi_unrequire_stable_pages(&q->backing_dev_info); > > +} > > + > > +static inline int queue_requires_stable_pages(struct request_queue *q) > > +{ > > + return bdi_cap_stable_pages_required(&q->backing_dev_info); > > +} > > Independent of the above I see no point in these wrappers that just > provide a single dereference. > > > +static ssize_t stable_pages_required_store(struct device *dev, > > + struct device_attribute *attr, > > + const char *buf, size_t count) > > Can you add a rationale on why we'd want to allow users to change the > value? I can't really think of any. I dislike the idea that if a program is dirtying pages that are being written out, then I don't really know whether the disk will write the before or after version. If the power goes out before the inevitable second write, how do you know which version you get? Sure would be nice if I could force on stable writes if I'm feeling paranoid. --D > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 21 Nov 2012 13:52:07 -0800 "Darrick J. Wong" <darrick.wong@oracle.com> wrote: > > Can you add a rationale on why we'd want to allow users to change the > > value? I can't really think of any. > > I dislike the idea that if a program is dirtying pages that are being written > out, then I don't really know whether the disk will write the before or after > version. If the power goes out before the inevitable second write, how do you > know which version you get? Sure would be nice if I could force on stable > writes if I'm feeling paranoid. I don't think this fear is at all rational (but then you did suggest paranoia). If the power goes out, then any write that has been requested, but for which an 'fsync' hasn't completed, may - or may not - have been written. Setting this flag doesn't really change that. The filesystem should provide some degree of certainty - i.e. either old data or new data and I believe they mostly do - though ext3 with journal=writeback explicitly doesn't promise very much. Beyond that, if you want any certainty then the app must provide that by using fsync. So I'm with Christoph here: I don't think the flag should be user-settable. NeilBrown
On Wed, Nov 21, 2012 at 06:33:43PM -0800, Darrick J. Wong wrote: > How's this look to everyone? No more user-writable flag, and most of the > helper functions are gone. Now fix up the subject line as well and we might have a deal :) > (Are these emails even getting through?) At least this one is. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/Documentation/ABI/testing/sysfs-class-bdi b/Documentation/ABI/testing/sysfs-class-bdi index 5f50097..218a618 100644 --- a/Documentation/ABI/testing/sysfs-class-bdi +++ b/Documentation/ABI/testing/sysfs-class-bdi @@ -48,3 +48,10 @@ max_ratio (read-write) most of the write-back cache. For example in case of an NFS mount that is prone to get stuck, or a FUSE mount which cannot be trusted to play fair. + +stable_pages_required (read-write) + + If set, the backing device requires that all pages comprising a write + request must not be changed until writeout is complete. The system + administrator can turn this on if the hardware does not do so already. + However, once enabled, this flag cannot be disabled. diff --git a/block/blk-integrity.c b/block/blk-integrity.c index da2a818..cf2dd95 100644 --- a/block/blk-integrity.c +++ b/block/blk-integrity.c @@ -420,6 +420,8 @@ int blk_integrity_register(struct gendisk *disk, struct blk_integrity *template) } else bi->name = bi_unsupported_name; + queue_require_stable_pages(disk->queue); + return 0; } EXPORT_SYMBOL(blk_integrity_register); @@ -438,6 +440,8 @@ void blk_integrity_unregister(struct gendisk *disk) if (!disk || !disk->integrity) return; + queue_unrequire_stable_pages(disk->queue); + bi = disk->integrity; kobject_uevent(&bi->kobj, KOBJ_REMOVE); diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 2a9a9ab..af19704 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -253,6 +253,7 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio); #define BDI_CAP_EXEC_MAP 0x00000040 #define BDI_CAP_NO_ACCT_WB 0x00000080 #define BDI_CAP_SWAP_BACKED 0x00000100 +#define BDI_CAP_STABLE_WRITES 0x00000200 #define BDI_CAP_VMFLAGS \ (BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP) @@ -307,6 +308,21 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout); int pdflush_proc_obsolete(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); +static inline void bdi_require_stable_pages(struct backing_dev_info *bdi) +{ + bdi->capabilities |= BDI_CAP_STABLE_WRITES; +} + +static inline void bdi_unrequire_stable_pages(struct backing_dev_info *bdi) +{ + bdi->capabilities &= ~BDI_CAP_STABLE_WRITES; +} + +static inline bool bdi_cap_stable_pages_required(struct backing_dev_info *bdi) +{ + return bdi->capabilities & BDI_CAP_STABLE_WRITES; +} + static inline bool bdi_cap_writeback_dirty(struct backing_dev_info *bdi) { return !(bdi->capabilities & BDI_CAP_NO_WRITEBACK); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1756001..a1c6e91 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -458,6 +458,21 @@ struct request_queue { (1 << QUEUE_FLAG_SAME_COMP) | \ (1 << QUEUE_FLAG_ADD_RANDOM)) +static inline void queue_require_stable_pages(struct request_queue *q) +{ + bdi_require_stable_pages(&q->backing_dev_info); +} + +static inline void queue_unrequire_stable_pages(struct request_queue *q) +{ + bdi_unrequire_stable_pages(&q->backing_dev_info); +} + +static inline int queue_requires_stable_pages(struct request_queue *q) +{ + return bdi_cap_stable_pages_required(&q->backing_dev_info); +} + static inline void queue_lockdep_assert_held(struct request_queue *q) { if (q->queue_lock) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index d3ca2b3..fd6e5b5 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -221,12 +221,48 @@ static ssize_t max_ratio_store(struct device *dev, } BDI_SHOW(max_ratio, bdi->max_ratio) +static ssize_t stable_pages_required_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + unsigned int spw; + ssize_t ret; + + ret = kstrtouint(buf, 10, &spw); + if (ret < 0) + return ret; + + /* + * SPW could be enabled due to hw requirement, so don't + * let users disable it. + */ + if (bdi_cap_stable_pages_required(bdi) && spw == 0) + return -EINVAL; + + if (spw != 0) + bdi_require_stable_pages(bdi); + + return count; +} + +static ssize_t stable_pages_required_show(struct device *dev, + struct device_attribute *attr, + char *page) +{ + struct backing_dev_info *bdi = dev_get_drvdata(dev); + + return snprintf(page, PAGE_SIZE-1, "%d\n", + bdi_cap_stable_pages_required(bdi) ? 1 : 0); +} + #define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store) static struct device_attribute bdi_dev_attrs[] = { __ATTR_RW(read_ahead_kb), __ATTR_RW(min_ratio), __ATTR_RW(max_ratio), + __ATTR_RW(stable_pages_required), __ATTR_NULL, };
This creates a per-backing-device counter that tracks the number of users which require pages to be held immutable during writeout. Eventually it will be used to waive wait_for_page_writeback() if nobody requires stable pages. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- Documentation/ABI/testing/sysfs-class-bdi | 7 ++++++ block/blk-integrity.c | 4 +++ include/linux/backing-dev.h | 16 +++++++++++++ include/linux/blkdev.h | 15 ++++++++++++ mm/backing-dev.c | 36 +++++++++++++++++++++++++++++ 5 files changed, 78 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html