Message ID | 20180417110450.489-1-colin.king@canonical.com |
---|---|
State | New |
Headers | show |
Series | [SRU,BIONIC] UBUNTU: SAUCE: (noup) Update zfs to 0.7.5-1ubuntu15 (LP: #1764690) | expand |
On Tue, Apr 17, 2018 at 12:04:50PM +0100, Colin King wrote: > From: Colin Ian King <colin.king@canonical.com> > > BugLink: http://bugs.launchpad.net/bugs/1764690 > > This sync's SRU fixes in ZFS 0.7.5-1ubuntu15 to the kernel ZFS driver. > Fixes zfsonlinux issues fix in upstream ZFS repository: Acked-by: Seth Forshee <seth.forshee@canonical.com>
On 2018-04-17 12:04:50 , Colin King wrote: > From: Colin Ian King <colin.king@canonical.com> > > BugLink: http://bugs.launchpad.net/bugs/1764690 > > This sync's SRU fixes in ZFS 0.7.5-1ubuntu15 to the kernel ZFS driver. > Fixes zfsonlinux issues fix in upstream ZFS repository: > > - OpenZFS 8373 - TXG_WAIT in ZIL commit path > Closes zfsonlinux #6403 > - zfs promote|rename .../%recv should be an error > Closes zfsonlinux #4843, #6339 > - Fix parsable 'zfs get' for compressratios > Closes zfsonlinux #6436, #6449 > - Fix zpool events scripted mode tab separator > Closes zfsonlinux #6444, #6445 > - zv_suspend_lock in zvol_open()/zvol_release() > Closes zfsonlinux #6342 > - Allow longer SPA names in stats, allows bigger pool names > Closes zfsonlinux #6481 > - vdev_mirror: load balancing fixes > Closes zfsonlinux #6461 > - Fix zfs_ioc_pool_sync should not use fnvlist > Closes zfsonlinux #6529 > - OpenZFS 8375 - Kernel memory leak in nvpair code > Closes zfsonlinux #6578 > - OpenZFS 7261 - nvlist code should enforce name length limit > Closes zfsonlinux #6579 > - OpenZFS 5778 - nvpair_type_is_array() does not recognize > DATA_TYPE_INT8_ARRAY > Closes zfsonlinux #6580 > - dmu_objset: release bonus buffer in failure path > Closes zfsonlinux #6575 > - Fix false config_cache_write events > Closes zfsonlinux #6617 > - Fix printk() calls missing log level > Closes zfsonlinux #6672 > - Fix abdstats kstat on 32-bit systems > Closes zfsonlinux #6721 > - Relax ASSERT for #6526 > Closes zfsonlinux #6526 > - Fix coverity defects: 147480, 147584 (Logically dead code) > Closes zfsonlinux #6745 > - Fix coverity defects: CID 161388 (Resource Leak) > Closes zfsonlinux #6755 > - Use ashift=12 by default on SSDSC2BW48 disks > Closes zfsonlinux #6774 > - OpenZFS 8558, 8602 - lwp_create() returns EAGAIN > Closes zfsonlinux #6779 > - ZFS send fails to dump objects larger than 128PiB > Closes zfsonlinux #6760 > - Sort output of tunables in arc_summary.py > Closes zfsonlinux #6828 > - Fix data on evict_skips in arc_summary.py > Closes zfsonlinux #6882, #6883 > - Fix segfault in zpool iostat when adding VDEVs > Closes zfsonlinux #6748, #6872 > - ZTS: Fix create-o_ashift test case > Closes zfsonlinux #6924, #6877 > - Handle invalid options in arc_summary > Closes zfsonlinux #6983 > - Call commit callbacks from the tail of the list > Closes zfsonlinux #6986 > - Fix 'zpool add' handling of nested interior VDEVs > Closes zfsonlinux #6678, #6996 > - Fix -fsanitize=address memory leak > kmem_alloc(0, ...) in userspace returns a leakable pointer. > Closes zfsonlinux #6941 > - Revert raidz_map and _col structure types > Closes zfsonlinux #6981, #7023 > - Use zap_count instead of cached z_size for unlink > Closes zfsonlinux #7019 > - OpenZFS 8897 - zpool online -e fails assertion when run on non-leaf > vdevs > Closes zfsonlinux #7030 > - OpenZFS 8898 - creating fs with checksum=skein on the boot pools > fails ungracefully > Closes zfsonlinux #7031 > - Emit an error message before MMP suspends pool > Closes zfsonlinux #7048 > - OpenZFS 8641 - "zpool clear" and "zinject" don't work on "spare" > or "replacing" vdevs > Closes zfsonlinux #7060 > - OpenZFS 8835 - Speculative prefetch in ZFS not working for > misaligned reads > Closes zfsonlinux #7062 > - OpenZFS 8972 - zfs holds: In scripted mode, do not pad columns with > spaces > Closes zfsonlinux #7063 > - Revert "Remove wrong ASSERT in annotate_ecksum" > Closes zfsonlinux #7079 > - OpenZFS 8731 - ASSERT3U(nui64s, <=, UINT16_MAX) fails for large > blocks > Closes zfsonlinux #7079 > - Prevent zdb(8) from occasionally hanging on I/O > Closes zfsonlinux #6999 > - Fix 'zfs receive -o' when used with '-e|-d' > Closes zfsonlinux #7088 > - Change movaps to movups in AES-NI code > Closes zfsonlinux #7065, #7108 > - tx_waited -> tx_dirty_delayed in trace_dmu.h > Closes zfsonlinux #7096 > - OpenZFS 8966 - Source file zfs_acl.c, function > Closes zfsonlinux #7141 > - Fix zdb -c traverse stop on damaged objset root > Closes zfsonlinux #7099 > - Fix zle_decompress out of bound access > Closes zfsonlinux #7099 > - Fix racy assignment of zcb.zcb_haderrors > Closes zfsonlinux #7099 > - Fix zdb -R decompression > Closes zfsonlinux #7099, #4984 > - Fix zdb -E segfault > Closes zfsonlinux #7099 > - Fix zdb -ed on objset for exported pool > Closes zfsonlinux #7099, #6464 > > Signed-off-by: Colin Ian King <colin.king@canonical.com> > --- > zfs/META | 2 +- > zfs/include/sys/dmu.h | 5 ++ > zfs/include/sys/dmu_tx.h | 4 -- > zfs/include/sys/dsl_pool.h | 1 + > zfs/include/sys/trace_dmu.h | 11 ++-- > zfs/include/sys/vdev.h | 3 +- > zfs/include/sys/vdev_impl.h | 1 - > zfs/include/sys/vdev_raidz_impl.h | 34 +++++------ > zfs/include/sys/zil_impl.h | 1 - > zfs/module/icp/asm-x86_64/aes/aes_intel.S | 94 ++++++++++++++--------------- > zfs/module/icp/asm-x86_64/modes/gcm_intel.S | 2 +- > zfs/module/icp/spi/kcf_spi.c | 11 ++-- > zfs/module/nvpair/nvpair.c | 9 ++- > zfs/module/zfs/abd.c | 4 +- > zfs/module/zfs/bpobj.c | 4 +- > zfs/module/zfs/dmu.c | 2 +- > zfs/module/zfs/dmu_objset.c | 1 + > zfs/module/zfs/dmu_send.c | 33 +++++----- > zfs/module/zfs/dmu_traverse.c | 17 +++++- > zfs/module/zfs/dmu_tx.c | 2 +- > zfs/module/zfs/dmu_zfetch.c | 24 ++++++-- > zfs/module/zfs/dsl_pool.c | 50 +++++++++++++++ > zfs/module/zfs/metaslab.c | 3 +- > zfs/module/zfs/mmp.c | 5 ++ > zfs/module/zfs/spa.c | 6 +- > zfs/module/zfs/spa_config.c | 5 ++ > zfs/module/zfs/spa_stats.c | 25 +++++--- > zfs/module/zfs/vdev_disk.c | 2 +- > zfs/module/zfs/vdev_mirror.c | 36 +++++------ > zfs/module/zfs/vdev_queue.c | 21 +++---- > zfs/module/zfs/zfs_acl.c | 2 +- > zfs/module/zfs/zfs_dir.c | 16 ++++- > zfs/module/zfs/zfs_fm.c | 14 ++--- > zfs/module/zfs/zfs_ioctl.c | 24 ++++++-- > zfs/module/zfs/zil.c | 34 ++++++++--- > zfs/module/zfs/zle.c | 4 ++ > zfs/module/zfs/zvol.c | 64 +++++++++++++------- > 37 files changed, 367 insertions(+), 209 deletions(-) > > diff --git a/zfs/META b/zfs/META > index d624ae4..2110eef 100644 > --- a/zfs/META > +++ b/zfs/META > @@ -2,7 +2,7 @@ Meta: 1 > Name: zfs > Branch: 1.0 > Version: 0.7.5 > -Release: 1ubuntu13 > +Release: 1ubuntu15 > Release-Tags: relext > License: CDDL > Author: OpenZFS on Linux > diff --git a/zfs/include/sys/dmu.h b/zfs/include/sys/dmu.h > index d246152..bcdf7d6 100644 > --- a/zfs/include/sys/dmu.h > +++ b/zfs/include/sys/dmu.h > @@ -713,11 +713,16 @@ void dmu_tx_mark_netfree(dmu_tx_t *tx); > * to stable storage and will also be called if the dmu_tx is aborted. > * If there is any error which prevents the transaction from being committed to > * disk, the callback will be called with a value of error != 0. > + * > + * When multiple callbacks are registered to the transaction, the callbacks > + * will be called in reverse order to let Lustre, the only user of commit > + * callback currently, take the fast path of its commit callback handling. > */ > typedef void dmu_tx_callback_func_t(void *dcb_data, int error); > > void dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *dcb_func, > void *dcb_data); > +void dmu_tx_do_callbacks(list_t *cb_list, int error); > > /* > * Free up the data blocks for a defined range of a file. If size is > diff --git a/zfs/include/sys/dmu_tx.h b/zfs/include/sys/dmu_tx.h > index f16e1e8..d82a793 100644 > --- a/zfs/include/sys/dmu_tx.h > +++ b/zfs/include/sys/dmu_tx.h > @@ -145,10 +145,6 @@ uint64_t dmu_tx_get_txg(dmu_tx_t *tx); > struct dsl_pool *dmu_tx_pool(dmu_tx_t *tx); > void dmu_tx_wait(dmu_tx_t *tx); > > -void dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *dcb_func, > - void *dcb_data); > -void dmu_tx_do_callbacks(list_t *cb_list, int error); > - > /* > * These routines are defined in dmu_spa.h, and are called by the SPA. > */ > diff --git a/zfs/include/sys/dsl_pool.h b/zfs/include/sys/dsl_pool.h > index d2dabda..7eb6cb0 100644 > --- a/zfs/include/sys/dsl_pool.h > +++ b/zfs/include/sys/dsl_pool.h > @@ -126,6 +126,7 @@ typedef struct dsl_pool { > txg_list_t dp_dirty_dirs; > txg_list_t dp_sync_tasks; > taskq_t *dp_sync_taskq; > + taskq_t *dp_zil_clean_taskq; > > /* > * Protects administrative changes (properties, namespace) > diff --git a/zfs/include/sys/trace_dmu.h b/zfs/include/sys/trace_dmu.h > index 5ae59e5..24e57f5 100644 > --- a/zfs/include/sys/trace_dmu.h > +++ b/zfs/include/sys/trace_dmu.h > @@ -50,7 +50,7 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class, > __field(uint64_t, tx_lastsnap_txg) > __field(uint64_t, tx_lasttried_txg) > __field(boolean_t, tx_anyobj) > - __field(boolean_t, tx_waited) > + __field(boolean_t, tx_dirty_delayed) > __field(hrtime_t, tx_start) > __field(boolean_t, tx_wait_dirty) > __field(int, tx_err) > @@ -62,7 +62,7 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class, > __entry->tx_lastsnap_txg = tx->tx_lastsnap_txg; > __entry->tx_lasttried_txg = tx->tx_lasttried_txg; > __entry->tx_anyobj = tx->tx_anyobj; > - __entry->tx_waited = tx->tx_waited; > + __entry->tx_dirty_delayed = tx->tx_dirty_delayed; > __entry->tx_start = tx->tx_start; > __entry->tx_wait_dirty = tx->tx_wait_dirty; > __entry->tx_err = tx->tx_err; > @@ -70,11 +70,12 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class, > __entry->min_tx_time = min_tx_time; > ), > TP_printk("tx { txg %llu lastsnap_txg %llu tx_lasttried_txg %llu " > - "anyobj %d waited %d start %llu wait_dirty %d err %i " > + "anyobj %d dirty_delayed %d start %llu wait_dirty %d err %i " > "} dirty %llu min_tx_time %llu", > __entry->tx_txg, __entry->tx_lastsnap_txg, > - __entry->tx_lasttried_txg, __entry->tx_anyobj, __entry->tx_waited, > - __entry->tx_start, __entry->tx_wait_dirty, __entry->tx_err, > + __entry->tx_lasttried_txg, __entry->tx_anyobj, > + __entry->tx_dirty_delayed, __entry->tx_start, > + __entry->tx_wait_dirty, __entry->tx_err, > __entry->dirty, __entry->min_tx_time) > ); > /* END CSTYLED */ > diff --git a/zfs/include/sys/vdev.h b/zfs/include/sys/vdev.h > index 7157ef4..473d269 100644 > --- a/zfs/include/sys/vdev.h > +++ b/zfs/include/sys/vdev.h > @@ -125,8 +125,7 @@ extern zio_t *vdev_queue_io(zio_t *zio); > extern void vdev_queue_io_done(zio_t *zio); > > extern int vdev_queue_length(vdev_t *vd); > -extern uint64_t vdev_queue_lastoffset(vdev_t *vd); > -extern void vdev_queue_register_lastoffset(vdev_t *vd, zio_t *zio); > +extern uint64_t vdev_queue_last_offset(vdev_t *vd); > > extern void vdev_config_dirty(vdev_t *vd); > extern void vdev_config_clean(vdev_t *vd); > diff --git a/zfs/include/sys/vdev_impl.h b/zfs/include/sys/vdev_impl.h > index 7c5e54b..4c2e3cd 100644 > --- a/zfs/include/sys/vdev_impl.h > +++ b/zfs/include/sys/vdev_impl.h > @@ -127,7 +127,6 @@ struct vdev_queue { > hrtime_t vq_io_delta_ts; > zio_t vq_io_search; /* used as local for stack reduction */ > kmutex_t vq_lock; > - uint64_t vq_lastoffset; > }; > > /* > diff --git a/zfs/include/sys/vdev_raidz_impl.h b/zfs/include/sys/vdev_raidz_impl.h > index 4bd15e3..0799ed1 100644 > --- a/zfs/include/sys/vdev_raidz_impl.h > +++ b/zfs/include/sys/vdev_raidz_impl.h > @@ -102,30 +102,30 @@ typedef struct raidz_impl_ops { > } raidz_impl_ops_t; > > typedef struct raidz_col { > - size_t rc_devidx; /* child device index for I/O */ > - size_t rc_offset; /* device offset */ > - size_t rc_size; /* I/O size */ > + uint64_t rc_devidx; /* child device index for I/O */ > + uint64_t rc_offset; /* device offset */ > + uint64_t rc_size; /* I/O size */ > abd_t *rc_abd; /* I/O data */ > void *rc_gdata; /* used to store the "good" version */ > int rc_error; /* I/O error for this device */ > - unsigned int rc_tried; /* Did we attempt this I/O column? */ > - unsigned int rc_skipped; /* Did we skip this I/O column? */ > + uint8_t rc_tried; /* Did we attempt this I/O column? */ > + uint8_t rc_skipped; /* Did we skip this I/O column? */ > } raidz_col_t; > > typedef struct raidz_map { > - size_t rm_cols; /* Regular column count */ > - size_t rm_scols; /* Count including skipped columns */ > - size_t rm_bigcols; /* Number of oversized columns */ > - size_t rm_asize; /* Actual total I/O size */ > - size_t rm_missingdata; /* Count of missing data devices */ > - size_t rm_missingparity; /* Count of missing parity devices */ > - size_t rm_firstdatacol; /* First data column/parity count */ > - size_t rm_nskip; /* Skipped sectors for padding */ > - size_t rm_skipstart; /* Column index of padding start */ > + uint64_t rm_cols; /* Regular column count */ > + uint64_t rm_scols; /* Count including skipped columns */ > + uint64_t rm_bigcols; /* Number of oversized columns */ > + uint64_t rm_asize; /* Actual total I/O size */ > + uint64_t rm_missingdata; /* Count of missing data devices */ > + uint64_t rm_missingparity; /* Count of missing parity devices */ > + uint64_t rm_firstdatacol; /* First data column/parity count */ > + uint64_t rm_nskip; /* Skipped sectors for padding */ > + uint64_t rm_skipstart; /* Column index of padding start */ > abd_t *rm_abd_copy; /* rm_asize-buffer of copied data */ > - size_t rm_reports; /* # of referencing checksum reports */ > - unsigned int rm_freed; /* map no longer has referencing ZIO */ > - unsigned int rm_ecksuminjected; /* checksum error was injected */ > + uintptr_t rm_reports; /* # of referencing checksum reports */ > + uint8_t rm_freed; /* map no longer has referencing ZIO */ > + uint8_t rm_ecksuminjected; /* checksum error was injected */ > raidz_impl_ops_t *rm_ops; /* RAIDZ math operations */ > raidz_col_t rm_col[1]; /* Flexible array of I/O columns */ > } raidz_map_t; > diff --git a/zfs/include/sys/zil_impl.h b/zfs/include/sys/zil_impl.h > index 13ecca3..dd5304b 100644 > --- a/zfs/include/sys/zil_impl.h > +++ b/zfs/include/sys/zil_impl.h > @@ -124,7 +124,6 @@ struct zilog { > list_t zl_lwb_list; /* in-flight log write list */ > kmutex_t zl_vdev_lock; /* protects zl_vdev_tree */ > avl_tree_t zl_vdev_tree; /* vdevs to flush in zil_commit() */ > - taskq_t *zl_clean_taskq; /* runs lwb and itx clean tasks */ > avl_tree_t zl_bp_tree; /* track bps during log parse */ > clock_t zl_replay_time; /* lbolt of when replay started */ > uint64_t zl_replay_blks; /* number of log blocks replayed */ > diff --git a/zfs/module/icp/asm-x86_64/aes/aes_intel.S b/zfs/module/icp/asm-x86_64/aes/aes_intel.S > index ed0df75..a40e30f 100644 > --- a/zfs/module/icp/asm-x86_64/aes/aes_intel.S > +++ b/zfs/module/icp/asm-x86_64/aes/aes_intel.S > @@ -207,7 +207,7 @@ _key_expansion_256a_local: > shufps $0b10001100, %xmm0, %xmm4 > pxor %xmm4, %xmm0 > pxor %xmm1, %xmm0 > - movaps %xmm0, (%rcx) > + movups %xmm0, (%rcx) > add $0x10, %rcx > ret > nop > @@ -224,18 +224,18 @@ _key_expansion_192a_local: > pxor %xmm4, %xmm0 > pxor %xmm1, %xmm0 > > - movaps %xmm2, %xmm5 > - movaps %xmm2, %xmm6 > + movups %xmm2, %xmm5 > + movups %xmm2, %xmm6 > pslldq $4, %xmm5 > pshufd $0b11111111, %xmm0, %xmm3 > pxor %xmm3, %xmm2 > pxor %xmm5, %xmm2 > > - movaps %xmm0, %xmm1 > + movups %xmm0, %xmm1 > shufps $0b01000100, %xmm0, %xmm6 > - movaps %xmm6, (%rcx) > + movups %xmm6, (%rcx) > shufps $0b01001110, %xmm2, %xmm1 > - movaps %xmm1, 0x10(%rcx) > + movups %xmm1, 0x10(%rcx) > add $0x20, %rcx > ret > SET_SIZE(_key_expansion_192a) > @@ -250,13 +250,13 @@ _key_expansion_192b_local: > pxor %xmm4, %xmm0 > pxor %xmm1, %xmm0 > > - movaps %xmm2, %xmm5 > + movups %xmm2, %xmm5 > pslldq $4, %xmm5 > pshufd $0b11111111, %xmm0, %xmm3 > pxor %xmm3, %xmm2 > pxor %xmm5, %xmm2 > > - movaps %xmm0, (%rcx) > + movups %xmm0, (%rcx) > add $0x10, %rcx > ret > SET_SIZE(_key_expansion_192b) > @@ -270,7 +270,7 @@ _key_expansion_256b_local: > shufps $0b10001100, %xmm2, %xmm4 > pxor %xmm4, %xmm2 > pxor %xmm1, %xmm2 > - movaps %xmm2, (%rcx) > + movups %xmm2, (%rcx) > add $0x10, %rcx > ret > SET_SIZE(_key_expansion_256b) > @@ -327,7 +327,7 @@ rijndael_key_setup_enc_intel_local: > jz .Lenc_key_invalid_param > > movups (%USERCIPHERKEY), %xmm0 // user key (first 16 bytes) > - movaps %xmm0, (%AESKEY) > + movups %xmm0, (%AESKEY) > lea 0x10(%AESKEY), %rcx // key addr > pxor %xmm4, %xmm4 // xmm4 is assumed 0 in _key_expansion_x > > @@ -341,7 +341,7 @@ rijndael_key_setup_enc_intel_local: > #endif /* OPENSSL_INTERFACE */ > > movups 0x10(%USERCIPHERKEY), %xmm2 // other user key (2nd 16 bytes) > - movaps %xmm2, (%rcx) > + movups %xmm2, (%rcx) > add $0x10, %rcx > > aeskeygenassist $0x1, %xmm2, %xmm1 // expand the key > @@ -525,10 +525,10 @@ FRAME_BEGIN > > .align 4 > .Ldec_key_reorder_loop: > - movaps (%AESKEY), %xmm0 > - movaps (%ROUNDS64), %xmm1 > - movaps %xmm0, (%ROUNDS64) > - movaps %xmm1, (%AESKEY) > + movups (%AESKEY), %xmm0 > + movups (%ROUNDS64), %xmm1 > + movups %xmm0, (%ROUNDS64) > + movups %xmm1, (%AESKEY) > lea 0x10(%AESKEY), %AESKEY > lea -0x10(%ROUNDS64), %ROUNDS64 > cmp %AESKEY, %ROUNDS64 > @@ -536,11 +536,11 @@ FRAME_BEGIN > > .align 4 > .Ldec_key_inv_loop: > - movaps (%rcx), %xmm0 > + movups (%rcx), %xmm0 > // Convert an encryption round key to a form usable for decryption > // with the "AES Inverse Mix Columns" instruction > aesimc %xmm0, %xmm1 > - movaps %xmm1, (%rcx) > + movups %xmm1, (%rcx) > lea 0x10(%rcx), %rcx > cmp %ENDAESKEY, %rcx > jnz .Ldec_key_inv_loop > @@ -602,7 +602,7 @@ FRAME_BEGIN > ENTRY_NP(aes_encrypt_intel) > > movups (%INP), %STATE // input > - movaps (%KEYP), %KEY // key > + movups (%KEYP), %KEY // key > #ifdef OPENSSL_INTERFACE > mov 240(%KEYP), %NROUNDS32 // round count > #else /* OpenSolaris Interface */ > @@ -618,41 +618,41 @@ ENTRY_NP(aes_encrypt_intel) > > // AES 256 > lea 0x20(%KEYP), %KEYP > - movaps -0x60(%KEYP), %KEY > + movups -0x60(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps -0x50(%KEYP), %KEY > + movups -0x50(%KEYP), %KEY > aesenc %KEY, %STATE > > .align 4 > .Lenc192: > // AES 192 and 256 > - movaps -0x40(%KEYP), %KEY > + movups -0x40(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps -0x30(%KEYP), %KEY > + movups -0x30(%KEYP), %KEY > aesenc %KEY, %STATE > > .align 4 > .Lenc128: > // AES 128, 192, and 256 > - movaps -0x20(%KEYP), %KEY > + movups -0x20(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps -0x10(%KEYP), %KEY > + movups -0x10(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps (%KEYP), %KEY > + movups (%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x10(%KEYP), %KEY > + movups 0x10(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x20(%KEYP), %KEY > + movups 0x20(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x30(%KEYP), %KEY > + movups 0x30(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x40(%KEYP), %KEY > + movups 0x40(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x50(%KEYP), %KEY > + movups 0x50(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x60(%KEYP), %KEY > + movups 0x60(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x70(%KEYP), %KEY > + movups 0x70(%KEYP), %KEY > aesenclast %KEY, %STATE // last round > movups %STATE, (%OUTP) // output > > @@ -685,7 +685,7 @@ ENTRY_NP(aes_encrypt_intel) > ENTRY_NP(aes_decrypt_intel) > > movups (%INP), %STATE // input > - movaps (%KEYP), %KEY // key > + movups (%KEYP), %KEY // key > #ifdef OPENSSL_INTERFACE > mov 240(%KEYP), %NROUNDS32 // round count > #else /* OpenSolaris Interface */ > @@ -701,41 +701,41 @@ ENTRY_NP(aes_decrypt_intel) > > // AES 256 > lea 0x20(%KEYP), %KEYP > - movaps -0x60(%KEYP), %KEY > + movups -0x60(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps -0x50(%KEYP), %KEY > + movups -0x50(%KEYP), %KEY > aesdec %KEY, %STATE > > .align 4 > .Ldec192: > // AES 192 and 256 > - movaps -0x40(%KEYP), %KEY > + movups -0x40(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps -0x30(%KEYP), %KEY > + movups -0x30(%KEYP), %KEY > aesdec %KEY, %STATE > > .align 4 > .Ldec128: > // AES 128, 192, and 256 > - movaps -0x20(%KEYP), %KEY > + movups -0x20(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps -0x10(%KEYP), %KEY > + movups -0x10(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps (%KEYP), %KEY > + movups (%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x10(%KEYP), %KEY > + movups 0x10(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x20(%KEYP), %KEY > + movups 0x20(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x30(%KEYP), %KEY > + movups 0x30(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x40(%KEYP), %KEY > + movups 0x40(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x50(%KEYP), %KEY > + movups 0x50(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x60(%KEYP), %KEY > + movups 0x60(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x70(%KEYP), %KEY > + movups 0x70(%KEYP), %KEY > aesdeclast %KEY, %STATE // last round > movups %STATE, (%OUTP) // output > > diff --git a/zfs/module/icp/asm-x86_64/modes/gcm_intel.S b/zfs/module/icp/asm-x86_64/modes/gcm_intel.S > index a43b5eb..3aec0ee 100644 > --- a/zfs/module/icp/asm-x86_64/modes/gcm_intel.S > +++ b/zfs/module/icp/asm-x86_64/modes/gcm_intel.S > @@ -150,7 +150,7 @@ ENTRY_NP(gcm_mul_pclmulqdq) > // Byte swap 16-byte input > // > lea .Lbyte_swap16_mask(%rip), %rax > - movaps (%rax), %xmm10 > + movups (%rax), %xmm10 > pshufb %xmm10, %xmm0 > pshufb %xmm10, %xmm1 > > diff --git a/zfs/module/icp/spi/kcf_spi.c b/zfs/module/icp/spi/kcf_spi.c > index c2c2b54..0a6e38d 100644 > --- a/zfs/module/icp/spi/kcf_spi.c > +++ b/zfs/module/icp/spi/kcf_spi.c > @@ -111,7 +111,7 @@ int > crypto_register_provider(crypto_provider_info_t *info, > crypto_kcf_provider_handle_t *handle) > { > - char ks_name[KSTAT_STRLEN]; > + char *ks_name; > > kcf_provider_desc_t *prov_desc = NULL; > int ret = CRYPTO_ARGUMENTS_BAD; > @@ -238,12 +238,12 @@ crypto_register_provider(crypto_provider_info_t *info, > * This kstat is deleted, when the provider unregisters. > */ > if (prov_desc->pd_prov_type == CRYPTO_SW_PROVIDER) { > - (void) snprintf(ks_name, KSTAT_STRLEN, "%s_%s", > + ks_name = kmem_asprintf("%s_%s", > "NONAME", "provider_stats"); > } else { > - (void) snprintf(ks_name, KSTAT_STRLEN, "%s_%d_%u_%s", > - "NONAME", 0, > - prov_desc->pd_prov_id, "provider_stats"); > + ks_name = kmem_asprintf("%s_%d_%u_%s", > + "NONAME", 0, prov_desc->pd_prov_id, > + "provider_stats"); > } > > prov_desc->pd_kstat = kstat_create("kcf", 0, ks_name, "crypto", > @@ -261,6 +261,7 @@ crypto_register_provider(crypto_provider_info_t *info, > prov_desc->pd_kstat->ks_update = kcf_prov_kstat_update; > kstat_install(prov_desc->pd_kstat); > } > + strfree(ks_name); > } > > if (prov_desc->pd_prov_type == CRYPTO_HW_PROVIDER) > diff --git a/zfs/module/nvpair/nvpair.c b/zfs/module/nvpair/nvpair.c > index 249b7c9..abed33e 100644 > --- a/zfs/module/nvpair/nvpair.c > +++ b/zfs/module/nvpair/nvpair.c > @@ -21,7 +21,7 @@ > > /* > * Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. > - * Copyright (c) 2015, 2016 by Delphix. All rights reserved. > + * Copyright (c) 2015, 2017 by Delphix. All rights reserved. > */ > > #include <sys/stropts.h> > @@ -916,6 +916,8 @@ nvlist_add_common(nvlist_t *nvl, const char *name, > > /* calculate sizes of the nvpair elements and the nvpair itself */ > name_sz = strlen(name) + 1; > + if (name_sz >= 1ULL << (sizeof (nvp->nvp_name_sz) * NBBY - 1)) > + return (EINVAL); > > nvp_sz = NVP_SIZE_CALC(name_sz, value_sz); > > @@ -1242,6 +1244,7 @@ nvpair_type_is_array(nvpair_t *nvp) > data_type_t type = NVP_TYPE(nvp); > > if ((type == DATA_TYPE_BYTE_ARRAY) || > + (type == DATA_TYPE_INT8_ARRAY) || > (type == DATA_TYPE_UINT8_ARRAY) || > (type == DATA_TYPE_INT16_ARRAY) || > (type == DATA_TYPE_UINT16_ARRAY) || > @@ -2200,8 +2203,10 @@ nvs_embedded(nvstream_t *nvs, nvlist_t *embedded) > > nvlist_init(embedded, embedded->nvl_nvflag, priv); > > - if (nvs->nvs_recursion >= nvpair_max_recursion) > + if (nvs->nvs_recursion >= nvpair_max_recursion) { > + nvlist_free(embedded); > return (EINVAL); > + } > nvs->nvs_recursion++; > if ((err = nvs_operation(nvs, embedded, NULL)) != 0) > nvlist_free(embedded); > diff --git a/zfs/module/zfs/abd.c b/zfs/module/zfs/abd.c > index 765ac7f..3c7893d 100644 > --- a/zfs/module/zfs/abd.c > +++ b/zfs/module/zfs/abd.c > @@ -571,7 +571,7 @@ static inline void > abd_free_struct(abd_t *abd) > { > kmem_cache_free(abd_cache, abd); > - ABDSTAT_INCR(abdstat_struct_size, -sizeof (abd_t)); > + ABDSTAT_INCR(abdstat_struct_size, -(int)sizeof (abd_t)); > } > > /* > @@ -618,7 +618,7 @@ abd_free_scatter(abd_t *abd) > ABDSTAT_BUMPDOWN(abdstat_scatter_cnt); > ABDSTAT_INCR(abdstat_scatter_data_size, -(int)abd->abd_size); > ABDSTAT_INCR(abdstat_scatter_chunk_waste, > - abd->abd_size - P2ROUNDUP(abd->abd_size, PAGESIZE)); > + (int)abd->abd_size - (int)P2ROUNDUP(abd->abd_size, PAGESIZE)); > > abd_free_struct(abd); > } > diff --git a/zfs/module/zfs/bpobj.c b/zfs/module/zfs/bpobj.c > index 82ca94e..32459c9 100644 > --- a/zfs/module/zfs/bpobj.c > +++ b/zfs/module/zfs/bpobj.c > @@ -261,7 +261,7 @@ bpobj_iterate_impl(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx, > } > if (free) { > VERIFY3U(0, ==, dmu_free_range(bpo->bpo_os, bpo->bpo_object, > - (i + 1) * sizeof (blkptr_t), -1ULL, tx)); > + (i + 1) * sizeof (blkptr_t), DMU_OBJECT_END, tx)); > } > if (err || !bpo->bpo_havesubobj || bpo->bpo_phys->bpo_subobjs == 0) > goto out; > @@ -339,7 +339,7 @@ bpobj_iterate_impl(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx, > if (free) { > VERIFY3U(0, ==, dmu_free_range(bpo->bpo_os, > bpo->bpo_phys->bpo_subobjs, > - (i + 1) * sizeof (uint64_t), -1ULL, tx)); > + (i + 1) * sizeof (uint64_t), DMU_OBJECT_END, tx)); > } > > out: > diff --git a/zfs/module/zfs/dmu.c b/zfs/module/zfs/dmu.c > index 6f09aa2..05c9fc3 100644 > --- a/zfs/module/zfs/dmu.c > +++ b/zfs/module/zfs/dmu.c > @@ -887,7 +887,7 @@ dmu_free_range(objset_t *os, uint64_t object, uint64_t offset, > if (err) > return (err); > ASSERT(offset < UINT64_MAX); > - ASSERT(size == -1ULL || size <= UINT64_MAX - offset); > + ASSERT(size == DMU_OBJECT_END || size <= UINT64_MAX - offset); > dnode_free_range(dn, offset, size, tx); > dnode_rele(dn, FTAG); > return (0); > diff --git a/zfs/module/zfs/dmu_objset.c b/zfs/module/zfs/dmu_objset.c > index 9a7a696..3425d54 100644 > --- a/zfs/module/zfs/dmu_objset.c > +++ b/zfs/module/zfs/dmu_objset.c > @@ -1853,6 +1853,7 @@ dmu_objset_space_upgrade(objset_t *os) > dmu_tx_hold_bonus(tx, obj); > objerr = dmu_tx_assign(tx, TXG_WAIT); > if (objerr != 0) { > + dmu_buf_rele(db, FTAG); > dmu_tx_abort(tx); > continue; > } > diff --git a/zfs/module/zfs/dmu_send.c b/zfs/module/zfs/dmu_send.c > index 344e420..2e3d706 100644 > --- a/zfs/module/zfs/dmu_send.c > +++ b/zfs/module/zfs/dmu_send.c > @@ -224,9 +224,6 @@ dump_free(dmu_sendarg_t *dsp, uint64_t object, uint64_t offset, > (object == dsp->dsa_last_data_object && > offset > dsp->dsa_last_data_offset)); > > - if (length != -1ULL && offset + length < offset) > - length = -1ULL; > - > /* > * If there is a pending op, but it's not PENDING_FREE, push it out, > * since free block aggregation can only be done for blocks of the > @@ -243,19 +240,22 @@ dump_free(dmu_sendarg_t *dsp, uint64_t object, uint64_t offset, > > if (dsp->dsa_pending_op == PENDING_FREE) { > /* > - * There should never be a PENDING_FREE if length is -1 > - * (because dump_dnode is the only place where this > - * function is called with a -1, and only after flushing > - * any pending record). > + * There should never be a PENDING_FREE if length is > + * DMU_OBJECT_END (because dump_dnode is the only place where > + * this function is called with a DMU_OBJECT_END, and only after > + * flushing any pending record). > */ > - ASSERT(length != -1ULL); > + ASSERT(length != DMU_OBJECT_END); > /* > * Check to see whether this free block can be aggregated > * with pending one. > */ > if (drrf->drr_object == object && drrf->drr_offset + > drrf->drr_length == offset) { > - drrf->drr_length += length; > + if (offset + length < offset) > + drrf->drr_length = DMU_OBJECT_END; > + else > + drrf->drr_length += length; > return (0); > } else { > /* not a continuation. Push out pending record */ > @@ -269,9 +269,12 @@ dump_free(dmu_sendarg_t *dsp, uint64_t object, uint64_t offset, > dsp->dsa_drr->drr_type = DRR_FREE; > drrf->drr_object = object; > drrf->drr_offset = offset; > - drrf->drr_length = length; > + if (offset + length < offset) > + drrf->drr_length = DMU_OBJECT_END; > + else > + drrf->drr_length = length; > drrf->drr_toguid = dsp->dsa_toguid; > - if (length == -1ULL) { > + if (length == DMU_OBJECT_END) { > if (dump_record(dsp, NULL, 0) != 0) > return (SET_ERROR(EINTR)); > } else { > @@ -530,7 +533,7 @@ dump_dnode(dmu_sendarg_t *dsp, uint64_t object, dnode_phys_t *dnp) > > /* Free anything past the end of the file. */ > if (dump_free(dsp, object, (dnp->dn_maxblkid + 1) * > - (dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT), -1ULL) != 0) > + (dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT), DMU_OBJECT_END) != 0) > return (SET_ERROR(EINTR)); > if (dsp->dsa_err != 0) > return (SET_ERROR(EINTR)); > @@ -666,7 +669,9 @@ do_dump(dmu_sendarg_t *dsa, struct send_block_record *data) > } else if (BP_IS_HOLE(bp)) { > uint64_t span = BP_SPAN(dblkszsec, indblkshift, zb->zb_level); > uint64_t offset = zb->zb_blkid * span; > - err = dump_free(dsa, zb->zb_object, offset, span); > + /* Don't dump free records for offsets > DMU_OBJECT_END */ > + if (zb->zb_blkid == 0 || span <= DMU_OBJECT_END / zb->zb_blkid) > + err = dump_free(dsa, zb->zb_object, offset, span); > } else if (zb->zb_level > 0 || type == DMU_OT_OBJSET) { > return (0); > } else if (type == DMU_OT_DNODE) { > @@ -2498,7 +2503,7 @@ receive_free(struct receive_writer_arg *rwa, struct drr_free *drrf) > { > int err; > > - if (drrf->drr_length != -1ULL && > + if (drrf->drr_length != DMU_OBJECT_END && > drrf->drr_offset + drrf->drr_length < drrf->drr_offset) > return (SET_ERROR(EINVAL)); > > diff --git a/zfs/module/zfs/dmu_traverse.c b/zfs/module/zfs/dmu_traverse.c > index c78228d..62f770e 100644 > --- a/zfs/module/zfs/dmu_traverse.c > +++ b/zfs/module/zfs/dmu_traverse.c > @@ -609,9 +609,20 @@ traverse_impl(spa_t *spa, dsl_dataset_t *ds, uint64_t objset, blkptr_t *rootbp, > if (err != 0) > return (err); > > - osp = buf->b_data; > - traverse_zil(td, &osp->os_zil_header); > - arc_buf_destroy(buf, &buf); > + if (err != 0) { > + /* > + * If both TRAVERSE_HARD and TRAVERSE_PRE are set, > + * continue to visitbp so that td_func can be called > + * in pre stage, and err will reset to zero. > + */ > + if (!(td->td_flags & TRAVERSE_HARD) || > + !(td->td_flags & TRAVERSE_PRE)) > + return (err); > + } else { > + osp = buf->b_data; > + traverse_zil(td, &osp->os_zil_header); > + arc_buf_destroy(buf, &buf); > + } > } > > if (!(flags & TRAVERSE_PREFETCH_DATA) || > diff --git a/zfs/module/zfs/dmu_tx.c b/zfs/module/zfs/dmu_tx.c > index 097fa77..c3cc03a 100644 > --- a/zfs/module/zfs/dmu_tx.c > +++ b/zfs/module/zfs/dmu_tx.c > @@ -1200,7 +1200,7 @@ dmu_tx_do_callbacks(list_t *cb_list, int error) > { > dmu_tx_callback_t *dcb; > > - while ((dcb = list_head(cb_list)) != NULL) { > + while ((dcb = list_tail(cb_list)) != NULL) { > list_remove(cb_list, dcb); > dcb->dcb_func(dcb->dcb_data, error); > kmem_free(dcb, sizeof (dmu_tx_callback_t)); > diff --git a/zfs/module/zfs/dmu_zfetch.c b/zfs/module/zfs/dmu_zfetch.c > index 1bf5c4e..e72e9ef 100644 > --- a/zfs/module/zfs/dmu_zfetch.c > +++ b/zfs/module/zfs/dmu_zfetch.c > @@ -228,19 +228,33 @@ dmu_zfetch(zfetch_t *zf, uint64_t blkid, uint64_t nblks, boolean_t fetch_data) > > rw_enter(&zf->zf_rwlock, RW_READER); > > + /* > + * Find matching prefetch stream. Depending on whether the accesses > + * are block-aligned, first block of the new access may either follow > + * the last block of the previous access, or be equal to it. > + */ > for (zs = list_head(&zf->zf_stream); zs != NULL; > zs = list_next(&zf->zf_stream, zs)) { > - if (blkid == zs->zs_blkid) { > + if (blkid == zs->zs_blkid || blkid + 1 == zs->zs_blkid) { > mutex_enter(&zs->zs_lock); > /* > * zs_blkid could have changed before we > * acquired zs_lock; re-check them here. > */ > - if (blkid != zs->zs_blkid) { > - mutex_exit(&zs->zs_lock); > - continue; > + if (blkid == zs->zs_blkid) { > + break; > + } else if (blkid + 1 == zs->zs_blkid) { > + blkid++; > + nblks--; > + if (nblks == 0) { > + /* Already prefetched this before. */ > + mutex_exit(&zs->zs_lock); > + rw_exit(&zf->zf_rwlock); > + return; > + } > + break; > } > - break; > + mutex_exit(&zs->zs_lock); > } > } > > diff --git a/zfs/module/zfs/dsl_pool.c b/zfs/module/zfs/dsl_pool.c > index c167080..0320d0e 100644 > --- a/zfs/module/zfs/dsl_pool.c > +++ b/zfs/module/zfs/dsl_pool.c > @@ -135,6 +135,36 @@ unsigned long zfs_delay_scale = 1000 * 1000 * 1000 / 2000; > */ > int zfs_sync_taskq_batch_pct = 75; > > +/* > + * These tunables determine the behavior of how zil_itxg_clean() is > + * called via zil_clean() in the context of spa_sync(). When an itxg > + * list needs to be cleaned, TQ_NOSLEEP will be used when dispatching. > + * If the dispatch fails, the call to zil_itxg_clean() will occur > + * synchronously in the context of spa_sync(), which can negatively > + * impact the performance of spa_sync() (e.g. in the case of the itxg > + * list having a large number of itxs that needs to be cleaned). > + * > + * Thus, these tunables can be used to manipulate the behavior of the > + * taskq used by zil_clean(); they determine the number of taskq entries > + * that are pre-populated when the taskq is first created (via the > + * "zfs_zil_clean_taskq_minalloc" tunable) and the maximum number of > + * taskq entries that are cached after an on-demand allocation (via the > + * "zfs_zil_clean_taskq_maxalloc"). > + * > + * The idea being, we want to try reasonably hard to ensure there will > + * already be a taskq entry pre-allocated by the time that it is needed > + * by zil_clean(). This way, we can avoid the possibility of an > + * on-demand allocation of a new taskq entry from failing, which would > + * result in zil_itxg_clean() being called synchronously from zil_clean() > + * (which can adversely affect performance of spa_sync()). > + * > + * Additionally, the number of threads used by the taskq can be > + * configured via the "zfs_zil_clean_taskq_nthr_pct" tunable. > + */ > +int zfs_zil_clean_taskq_nthr_pct = 100; > +int zfs_zil_clean_taskq_minalloc = 1024; > +int zfs_zil_clean_taskq_maxalloc = 1024 * 1024; > + > int > dsl_pool_open_special_dir(dsl_pool_t *dp, const char *name, dsl_dir_t **ddp) > { > @@ -176,6 +206,12 @@ dsl_pool_open_impl(spa_t *spa, uint64_t txg) > zfs_sync_taskq_batch_pct, minclsyspri, 1, INT_MAX, > TASKQ_THREADS_CPU_PCT); > > + dp->dp_zil_clean_taskq = taskq_create("dp_zil_clean_taskq", > + zfs_zil_clean_taskq_nthr_pct, minclsyspri, > + zfs_zil_clean_taskq_minalloc, > + zfs_zil_clean_taskq_maxalloc, > + TASKQ_PREPOPULATE | TASKQ_THREADS_CPU_PCT); > + > mutex_init(&dp->dp_lock, NULL, MUTEX_DEFAULT, NULL); > cv_init(&dp->dp_spaceavail_cv, NULL, CV_DEFAULT, NULL); > > @@ -334,6 +370,7 @@ dsl_pool_close(dsl_pool_t *dp) > txg_list_destroy(&dp->dp_sync_tasks); > txg_list_destroy(&dp->dp_dirty_dirs); > > + taskq_destroy(dp->dp_zil_clean_taskq); > taskq_destroy(dp->dp_sync_taskq); > > /* > @@ -1142,5 +1179,18 @@ MODULE_PARM_DESC(zfs_delay_scale, "how quickly delay approaches infinity"); > module_param(zfs_sync_taskq_batch_pct, int, 0644); > MODULE_PARM_DESC(zfs_sync_taskq_batch_pct, > "max percent of CPUs that are used to sync dirty data"); > + > +module_param(zfs_zil_clean_taskq_nthr_pct, int, 0644); > +MODULE_PARM_DESC(zfs_zil_clean_taskq_nthr_pct, > + "max percent of CPUs that are used per dp_sync_taskq"); > + > +module_param(zfs_zil_clean_taskq_minalloc, int, 0644); > +MODULE_PARM_DESC(zfs_zil_clean_taskq_minalloc, > + "number of taskq entries that are pre-populated"); > + > +module_param(zfs_zil_clean_taskq_maxalloc, int, 0644); > +MODULE_PARM_DESC(zfs_zil_clean_taskq_maxalloc, > + "max number of taskq entries that are cached"); > + > /* END CSTYLED */ > #endif > diff --git a/zfs/module/zfs/metaslab.c b/zfs/module/zfs/metaslab.c > index 5e413c0..01e5234 100644 > --- a/zfs/module/zfs/metaslab.c > +++ b/zfs/module/zfs/metaslab.c > @@ -1937,7 +1937,8 @@ metaslab_passivate(metaslab_t *msp, uint64_t weight) > * this metaslab again. In that case, it had better be empty, > * or we would be leaving space on the table. > */ > - ASSERT(size >= SPA_MINBLOCKSIZE || > + ASSERT(!WEIGHT_IS_SPACEBASED(msp->ms_weight) || > + size >= SPA_MINBLOCKSIZE || > range_tree_space(msp->ms_tree) == 0); > ASSERT0(weight & METASLAB_ACTIVE_MASK); > > diff --git a/zfs/module/zfs/mmp.c b/zfs/module/zfs/mmp.c > index 6f2aa3f..e91ae62 100644 > --- a/zfs/module/zfs/mmp.c > +++ b/zfs/module/zfs/mmp.c > @@ -26,6 +26,7 @@ > #include <sys/mmp.h> > #include <sys/spa.h> > #include <sys/spa_impl.h> > +#include <sys/time.h> > #include <sys/vdev.h> > #include <sys/vdev_impl.h> > #include <sys/zfs_context.h> > @@ -428,6 +429,10 @@ mmp_thread(spa_t *spa) > */ > if (!suspended && mmp_fail_intervals && multihost && > (start - mmp->mmp_last_write) > max_fail_ns) { > + cmn_err(CE_WARN, "MMP writes to pool '%s' have not " > + "succeeded in over %llus; suspending pool", > + spa_name(spa), > + NSEC2SEC(start - mmp->mmp_last_write)); > zio_suspend(spa, NULL); > } > > diff --git a/zfs/module/zfs/spa.c b/zfs/module/zfs/spa.c > index a7a2f62..00587d8 100644 > --- a/zfs/module/zfs/spa.c > +++ b/zfs/module/zfs/spa.c > @@ -1561,7 +1561,7 @@ spa_load_spares(spa_t *spa) > static void > spa_load_l2cache(spa_t *spa) > { > - nvlist_t **l2cache; > + nvlist_t **l2cache = NULL; > uint_t nl2cache; > int i, j, oldnvdevs; > uint64_t guid; > @@ -1645,7 +1645,9 @@ spa_load_l2cache(spa_t *spa) > VERIFY(nvlist_remove(sav->sav_config, ZPOOL_CONFIG_L2CACHE, > DATA_TYPE_NVLIST_ARRAY) == 0); > > - l2cache = kmem_alloc(sav->sav_count * sizeof (void *), KM_SLEEP); > + if (sav->sav_count > 0) > + l2cache = kmem_alloc(sav->sav_count * sizeof (void *), > + KM_SLEEP); > for (i = 0; i < sav->sav_count; i++) > l2cache[i] = vdev_config_generate(spa, > sav->sav_vdevs[i], B_TRUE, VDEV_CONFIG_L2CACHE); > diff --git a/zfs/module/zfs/spa_config.c b/zfs/module/zfs/spa_config.c > index 5b792b8..5bbfb4a 100644 > --- a/zfs/module/zfs/spa_config.c > +++ b/zfs/module/zfs/spa_config.c > @@ -162,6 +162,11 @@ spa_config_write(spa_config_dirent_t *dp, nvlist_t *nvl) > */ > if (nvl == NULL) { > err = vn_remove(dp->scd_path, UIO_SYSSPACE, RMFILE); > + /* > + * Don't report an error when the cache file is already removed > + */ > + if (err == ENOENT) > + err = 0; > return (err); > } > > diff --git a/zfs/module/zfs/spa_stats.c b/zfs/module/zfs/spa_stats.c > index 7ca3598..8c4dba2 100644 > --- a/zfs/module/zfs/spa_stats.c > +++ b/zfs/module/zfs/spa_stats.c > @@ -142,7 +142,7 @@ static void > spa_read_history_init(spa_t *spa) > { > spa_stats_history_t *ssh = &spa->spa_stats.read_history; > - char name[KSTAT_STRLEN]; > + char *name; > kstat_t *ksp; > > mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL); > @@ -153,7 +153,7 @@ spa_read_history_init(spa_t *spa) > ssh->size = 0; > ssh->private = NULL; > > - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); > + name = kmem_asprintf("zfs/%s", spa_name(spa)); > > ksp = kstat_create(name, 0, "reads", "misc", > KSTAT_TYPE_RAW, 0, KSTAT_FLAG_VIRTUAL); > @@ -168,6 +168,7 @@ spa_read_history_init(spa_t *spa) > spa_read_history_data, spa_read_history_addr); > kstat_install(ksp); > } > + strfree(name); > } > > static void > @@ -365,7 +366,7 @@ static void > spa_txg_history_init(spa_t *spa) > { > spa_stats_history_t *ssh = &spa->spa_stats.txg_history; > - char name[KSTAT_STRLEN]; > + char *name; > kstat_t *ksp; > > mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL); > @@ -376,7 +377,7 @@ spa_txg_history_init(spa_t *spa) > ssh->size = 0; > ssh->private = NULL; > > - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); > + name = kmem_asprintf("zfs/%s", spa_name(spa)); > > ksp = kstat_create(name, 0, "txgs", "misc", > KSTAT_TYPE_RAW, 0, KSTAT_FLAG_VIRTUAL); > @@ -391,6 +392,7 @@ spa_txg_history_init(spa_t *spa) > spa_txg_history_data, spa_txg_history_addr); > kstat_install(ksp); > } > + strfree(name); > } > > static void > @@ -598,7 +600,7 @@ static void > spa_tx_assign_init(spa_t *spa) > { > spa_stats_history_t *ssh = &spa->spa_stats.tx_assign_histogram; > - char name[KSTAT_STRLEN]; > + char *name; > kstat_named_t *ks; > kstat_t *ksp; > int i; > @@ -609,7 +611,7 @@ spa_tx_assign_init(spa_t *spa) > ssh->size = ssh->count * sizeof (kstat_named_t); > ssh->private = kmem_alloc(ssh->size, KM_SLEEP); > > - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); > + name = kmem_asprintf("zfs/%s", spa_name(spa)); > > for (i = 0; i < ssh->count; i++) { > ks = &((kstat_named_t *)ssh->private)[i]; > @@ -632,6 +634,7 @@ spa_tx_assign_init(spa_t *spa) > ksp->ks_update = spa_tx_assign_update; > kstat_install(ksp); > } > + strfree(name); > } > > static void > @@ -678,12 +681,12 @@ static void > spa_io_history_init(spa_t *spa) > { > spa_stats_history_t *ssh = &spa->spa_stats.io_history; > - char name[KSTAT_STRLEN]; > + char *name; > kstat_t *ksp; > > mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL); > > - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); > + name = kmem_asprintf("zfs/%s", spa_name(spa)); > > ksp = kstat_create(name, 0, "io", "disk", KSTAT_TYPE_IO, 1, 0); > ssh->kstat = ksp; > @@ -694,6 +697,7 @@ spa_io_history_init(spa_t *spa) > ksp->ks_update = spa_io_history_update; > kstat_install(ksp); > } > + strfree(name); > } > > static void > @@ -806,7 +810,7 @@ static void > spa_mmp_history_init(spa_t *spa) > { > spa_stats_history_t *ssh = &spa->spa_stats.mmp_history; > - char name[KSTAT_STRLEN]; > + char *name; > kstat_t *ksp; > > mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL); > @@ -817,7 +821,7 @@ spa_mmp_history_init(spa_t *spa) > ssh->size = 0; > ssh->private = NULL; > > - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); > + name = kmem_asprintf("zfs/%s", spa_name(spa)); > > ksp = kstat_create(name, 0, "multihost", "misc", > KSTAT_TYPE_RAW, 0, KSTAT_FLAG_VIRTUAL); > @@ -832,6 +836,7 @@ spa_mmp_history_init(spa_t *spa) > spa_mmp_history_data, spa_mmp_history_addr); > kstat_install(ksp); > } > + strfree(name); > } > > static void > diff --git a/zfs/module/zfs/vdev_disk.c b/zfs/module/zfs/vdev_disk.c > index 5ae50a3..aecc351 100644 > --- a/zfs/module/zfs/vdev_disk.c > +++ b/zfs/module/zfs/vdev_disk.c > @@ -98,7 +98,7 @@ static void > vdev_disk_error(zio_t *zio) > { > #ifdef ZFS_DEBUG > - printk("ZFS: zio error=%d type=%d offset=%llu size=%llu " > + printk(KERN_WARNING "ZFS: zio error=%d type=%d offset=%llu size=%llu " > "flags=%x\n", zio->io_error, zio->io_type, > (u_longlong_t)zio->io_offset, (u_longlong_t)zio->io_size, > zio->io_flags); > diff --git a/zfs/module/zfs/vdev_mirror.c b/zfs/module/zfs/vdev_mirror.c > index 0439e4b..d230b4d 100644 > --- a/zfs/module/zfs/vdev_mirror.c > +++ b/zfs/module/zfs/vdev_mirror.c > @@ -116,7 +116,8 @@ static const zio_vsd_ops_t vdev_mirror_vsd_ops = { > static int > vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset) > { > - uint64_t lastoffset; > + uint64_t last_offset; > + int64_t offset_diff; > int load; > > /* All DVAs have equal weight at the root. */ > @@ -129,13 +130,17 @@ vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset) > * worse overall when resilvering with compared to without. > */ > > + /* Fix zio_offset for leaf vdevs */ > + if (vd->vdev_ops->vdev_op_leaf) > + zio_offset += VDEV_LABEL_START_SIZE; > + > /* Standard load based on pending queue length. */ > load = vdev_queue_length(vd); > - lastoffset = vdev_queue_lastoffset(vd); > + last_offset = vdev_queue_last_offset(vd); > > if (vd->vdev_nonrot) { > /* Non-rotating media. */ > - if (lastoffset == zio_offset) > + if (last_offset == zio_offset) > return (load + zfs_vdev_mirror_non_rotating_inc); > > /* > @@ -148,16 +153,16 @@ vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset) > } > > /* Rotating media I/O's which directly follow the last I/O. */ > - if (lastoffset == zio_offset) > + if (last_offset == zio_offset) > return (load + zfs_vdev_mirror_rotating_inc); > > /* > * Apply half the seek increment to I/O's within seek offset > - * of the last I/O queued to this vdev as they should incur less > + * of the last I/O issued to this vdev as they should incur less > * of a seek increment. > */ > - if (ABS(lastoffset - zio_offset) < > - zfs_vdev_mirror_rotating_seek_offset) > + offset_diff = (int64_t)(last_offset - zio_offset); > + if (ABS(offset_diff) < zfs_vdev_mirror_rotating_seek_offset) > return (load + (zfs_vdev_mirror_rotating_seek_inc / 2)); > > /* Apply the full seek increment to all other I/O's. */ > @@ -382,29 +387,20 @@ vdev_mirror_child_select(zio_t *zio) > mm->mm_preferred_cnt++; > } > > - if (mm->mm_preferred_cnt == 1) { > - vdev_queue_register_lastoffset( > - mm->mm_child[mm->mm_preferred[0]].mc_vd, zio); > + if (mm->mm_preferred_cnt == 1) > return (mm->mm_preferred[0]); > - } > > - if (mm->mm_preferred_cnt > 1) { > - int c = vdev_mirror_preferred_child_randomize(zio); > > - vdev_queue_register_lastoffset(mm->mm_child[c].mc_vd, zio); > - return (c); > - } > + if (mm->mm_preferred_cnt > 1) > + return (vdev_mirror_preferred_child_randomize(zio)); > > /* > * Every device is either missing or has this txg in its DTL. > * Look for any child we haven't already tried before giving up. > */ > for (c = 0; c < mm->mm_children; c++) { > - if (!mm->mm_child[c].mc_tried) { > - vdev_queue_register_lastoffset(mm->mm_child[c].mc_vd, > - zio); > + if (!mm->mm_child[c].mc_tried) > return (c); > - } > } > > /* > diff --git a/zfs/module/zfs/vdev_queue.c b/zfs/module/zfs/vdev_queue.c > index 6b3e872..40cba34 100644 > --- a/zfs/module/zfs/vdev_queue.c > +++ b/zfs/module/zfs/vdev_queue.c > @@ -393,7 +393,7 @@ vdev_queue_init(vdev_t *vd) > sizeof (zio_t), offsetof(struct zio, io_queue_node)); > } > > - vq->vq_lastoffset = 0; > + vq->vq_last_offset = 0; > } > > void > @@ -699,9 +699,8 @@ vdev_queue_io_to_issue(vdev_queue_t *vq) > */ > tree = vdev_queue_class_tree(vq, p); > vq->vq_io_search.io_timestamp = 0; > - vq->vq_io_search.io_offset = vq->vq_last_offset + 1; > - VERIFY3P(avl_find(tree, &vq->vq_io_search, > - &idx), ==, NULL); > + vq->vq_io_search.io_offset = vq->vq_last_offset - 1; > + VERIFY3P(avl_find(tree, &vq->vq_io_search, &idx), ==, NULL); > zio = avl_nearest(tree, idx, AVL_AFTER); > if (zio == NULL) > zio = avl_first(tree); > @@ -728,7 +727,7 @@ vdev_queue_io_to_issue(vdev_queue_t *vq) > } > > vdev_queue_pending_add(vq, zio); > - vq->vq_last_offset = zio->io_offset; > + vq->vq_last_offset = zio->io_offset + zio->io_size; > > return (zio); > } > @@ -806,7 +805,7 @@ vdev_queue_io_done(zio_t *zio) > } > > /* > - * As these three methods are only used for load calculations we're not > + * As these two methods are only used for load calculations we're not > * concerned if we get an incorrect value on 32bit platforms due to lack of > * vq_lock mutex use here, instead we prefer to keep it lock free for > * performance. > @@ -818,15 +817,9 @@ vdev_queue_length(vdev_t *vd) > } > > uint64_t > -vdev_queue_lastoffset(vdev_t *vd) > +vdev_queue_last_offset(vdev_t *vd) > { > - return (vd->vdev_queue.vq_lastoffset); > -} > - > -void > -vdev_queue_register_lastoffset(vdev_t *vd, zio_t *zio) > -{ > - vd->vdev_queue.vq_lastoffset = zio->io_offset + zio->io_size; > + return (vd->vdev_queue.vq_last_offset); > } > > #if defined(_KERNEL) && defined(HAVE_SPL) > diff --git a/zfs/module/zfs/zfs_acl.c b/zfs/module/zfs/zfs_acl.c > index 7ddedea..1fcfca0 100644 > --- a/zfs/module/zfs/zfs_acl.c > +++ b/zfs/module/zfs/zfs_acl.c > @@ -1323,6 +1323,7 @@ zfs_aclset_common(znode_t *zp, zfs_acl_t *aclp, cred_t *cr, dmu_tx_t *tx) > sa_bulk_attr_t bulk[5]; > uint64_t ctime[2]; > int count = 0; > + zfs_acl_phys_t acl_phys; > > mode = zp->z_mode; > > @@ -1369,7 +1370,6 @@ zfs_aclset_common(znode_t *zp, zfs_acl_t *aclp, cred_t *cr, dmu_tx_t *tx) > } else { /* Painful legacy way */ > zfs_acl_node_t *aclnode; > uint64_t off = 0; > - zfs_acl_phys_t acl_phys; > uint64_t aoid; > > if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_ZNODE_ACL(zfsvfs), > diff --git a/zfs/module/zfs/zfs_dir.c b/zfs/module/zfs/zfs_dir.c > index c6ee302..9a8bbcc 100644 > --- a/zfs/module/zfs/zfs_dir.c > +++ b/zfs/module/zfs/zfs_dir.c > @@ -977,11 +977,25 @@ zfs_link_destroy(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag, > * Indicate whether the directory is empty. Works with or without z_lock > * held, but can only be consider a hint in the latter case. Returns true > * if only "." and ".." remain and there's no work in progress. > + * > + * The internal ZAP size, rather than zp->z_size, needs to be checked since > + * some consumers (Lustre) do not strictly maintain an accurate SA_ZPL_SIZE. > */ > boolean_t > zfs_dirempty(znode_t *dzp) > { > - return (dzp->z_size == 2 && dzp->z_dirlocks == 0); > + zfsvfs_t *zfsvfs = ZTOZSB(dzp); > + uint64_t count; > + int error; > + > + if (dzp->z_dirlocks != NULL) > + return (B_FALSE); > + > + error = zap_count(zfsvfs->z_os, dzp->z_id, &count); > + if (error != 0 || count != 0) > + return (B_FALSE); > + > + return (B_TRUE); > } > > int > diff --git a/zfs/module/zfs/zfs_fm.c b/zfs/module/zfs/zfs_fm.c > index 3986b39..1c66ed6 100644 > --- a/zfs/module/zfs/zfs_fm.c > +++ b/zfs/module/zfs/zfs_fm.c > @@ -455,8 +455,8 @@ zfs_ereport_start(nvlist_t **ereport_out, nvlist_t **detector_out, > > typedef struct zfs_ecksum_info { > /* histograms of set and cleared bits by bit number in a 64-bit word */ > - uint16_t zei_histogram_set[sizeof (uint64_t) * NBBY]; > - uint16_t zei_histogram_cleared[sizeof (uint64_t) * NBBY]; > + uint32_t zei_histogram_set[sizeof (uint64_t) * NBBY]; > + uint32_t zei_histogram_cleared[sizeof (uint64_t) * NBBY]; > > /* inline arrays of bits set and cleared. */ > uint64_t zei_bits_set[ZFM_MAX_INLINE]; > @@ -481,7 +481,7 @@ typedef struct zfs_ecksum_info { > } zfs_ecksum_info_t; > > static void > -update_histogram(uint64_t value_arg, uint16_t *hist, uint32_t *count) > +update_histogram(uint64_t value_arg, uint32_t *hist, uint32_t *count) > { > size_t i; > size_t bits = 0; > @@ -490,8 +490,7 @@ update_histogram(uint64_t value_arg, uint16_t *hist, uint32_t *count) > /* We store the bits in big-endian (largest-first) order */ > for (i = 0; i < 64; i++) { > if (value & (1ull << i)) { > - if (hist[63 - i] < UINT16_MAX) > - hist[63 - i]++; > + hist[63 - i]++; > ++bits; > } > } > @@ -649,6 +648,7 @@ annotate_ecksum(nvlist_t *ereport, zio_bad_cksum_t *info, > if (badabd == NULL || goodabd == NULL) > return (eip); > > + ASSERT3U(nui64s, <=, UINT32_MAX); > ASSERT3U(size, ==, nui64s * sizeof (uint64_t)); > ASSERT3U(size, <=, SPA_MAXBLOCKSIZE); > ASSERT3U(size, <=, UINT32_MAX); > @@ -759,10 +759,10 @@ annotate_ecksum(nvlist_t *ereport, zio_bad_cksum_t *info, > } else { > fm_payload_set(ereport, > FM_EREPORT_PAYLOAD_ZFS_BAD_SET_HISTOGRAM, > - DATA_TYPE_UINT16_ARRAY, > + DATA_TYPE_UINT32_ARRAY, > NBBY * sizeof (uint64_t), eip->zei_histogram_set, > FM_EREPORT_PAYLOAD_ZFS_BAD_CLEARED_HISTOGRAM, > - DATA_TYPE_UINT16_ARRAY, > + DATA_TYPE_UINT32_ARRAY, > NBBY * sizeof (uint64_t), eip->zei_histogram_cleared, > NULL); > } > diff --git a/zfs/module/zfs/zfs_ioctl.c b/zfs/module/zfs/zfs_ioctl.c > index d195ede..f41e1b9 100644 > --- a/zfs/module/zfs/zfs_ioctl.c > +++ b/zfs/module/zfs/zfs_ioctl.c > @@ -3738,9 +3738,12 @@ zfs_ioc_rename(zfs_cmd_t *zc) > boolean_t recursive = zc->zc_cookie & 1; > char *at; > > + /* "zfs rename" from and to ...%recv datasets should both fail */ > + zc->zc_name[sizeof (zc->zc_name) - 1] = '\0'; > zc->zc_value[sizeof (zc->zc_value) - 1] = '\0'; > - if (dataset_namecheck(zc->zc_value, NULL, NULL) != 0 || > - strchr(zc->zc_value, '%')) > + if (dataset_namecheck(zc->zc_name, NULL, NULL) != 0 || > + dataset_namecheck(zc->zc_value, NULL, NULL) != 0 || > + strchr(zc->zc_name, '%') || strchr(zc->zc_value, '%')) > return (SET_ERROR(EINVAL)); > > at = strchr(zc->zc_name, '@'); > @@ -5002,6 +5005,11 @@ zfs_ioc_promote(zfs_cmd_t *zc) > char *cp; > int error; > > + zc->zc_name[sizeof (zc->zc_name) - 1] = '\0'; > + if (dataset_namecheck(zc->zc_name, NULL, NULL) != 0 || > + strchr(zc->zc_name, '%')) > + return (SET_ERROR(EINVAL)); > + > error = dsl_pool_hold(zc->zc_name, FTAG, &dp); > if (error != 0) > return (error); > @@ -5901,20 +5909,26 @@ static int > zfs_ioc_pool_sync(const char *pool, nvlist_t *innvl, nvlist_t *onvl) > { > int err; > - boolean_t force; > + boolean_t force = B_FALSE; > spa_t *spa; > > if ((err = spa_open(pool, &spa, FTAG)) != 0) > return (err); > > - force = fnvlist_lookup_boolean_value(innvl, "force"); > + if (innvl) { > + if (nvlist_lookup_boolean_value(innvl, "force", &force) != 0) { > + err = SET_ERROR(EINVAL); > + goto out; > + } > + } > + > if (force) { > spa_config_enter(spa, SCL_CONFIG, FTAG, RW_WRITER); > vdev_config_dirty(spa->spa_root_vdev); > spa_config_exit(spa, SCL_CONFIG, FTAG); > } > txg_wait_synced(spa_get_dsl(spa), 0); > - > +out: > spa_close(spa, FTAG); > > return (err); > diff --git a/zfs/module/zfs/zil.c b/zfs/module/zfs/zil.c > index 4d714ce..1e3e69d 100644 > --- a/zfs/module/zfs/zil.c > +++ b/zfs/module/zfs/zil.c > @@ -1009,7 +1009,24 @@ zil_lwb_write_start(zilog_t *zilog, lwb_t *lwb) > * to clean up in the event of allocation failure or I/O failure. > */ > tx = dmu_tx_create(zilog->zl_os); > - VERIFY(dmu_tx_assign(tx, TXG_WAIT) == 0); > + > + /* > + * Since we are not going to create any new dirty data and we can even > + * help with clearing the existing dirty data, we should not be subject > + * to the dirty data based delays. > + * We (ab)use TXG_WAITED to bypass the delay mechanism. > + * One side effect from using TXG_WAITED is that dmu_tx_assign() can > + * fail if the pool is suspended. Those are dramatic circumstances, > + * so we return NULL to signal that the normal ZIL processing is not > + * possible and txg_wait_synced() should be used to ensure that the data > + * is on disk. > + */ > + error = dmu_tx_assign(tx, TXG_WAITED); > + if (error != 0) { > + ASSERT3S(error, ==, EIO); > + dmu_tx_abort(tx); > + return (NULL); > + } > dsl_dataset_dirty(dmu_objset_ds(zilog->zl_os), tx); > txg = dmu_tx_get_txg(tx); > > @@ -1435,8 +1452,7 @@ zil_clean(zilog_t *zilog, uint64_t synced_txg) > return; > } > ASSERT3U(itxg->itxg_txg, <=, synced_txg); > - ASSERT(itxg->itxg_txg != 0); > - ASSERT(zilog->zl_clean_taskq != NULL); > + ASSERT3U(itxg->itxg_txg, !=, 0); > clean_me = itxg->itxg_itxs; > itxg->itxg_itxs = NULL; > itxg->itxg_txg = 0; > @@ -1447,8 +1463,11 @@ zil_clean(zilog_t *zilog, uint64_t synced_txg) > * free it in-line. This should be rare. Note, using TQ_SLEEP > * created a bad performance problem. > */ > - if (taskq_dispatch(zilog->zl_clean_taskq, > - (void (*)(void *))zil_itxg_clean, clean_me, TQ_NOSLEEP) == 0) > + ASSERT3P(zilog->zl_dmu_pool, !=, NULL); > + ASSERT3P(zilog->zl_dmu_pool->dp_zil_clean_taskq, !=, NULL); > + taskqid_t id = taskq_dispatch(zilog->zl_dmu_pool->dp_zil_clean_taskq, > + (void (*)(void *))zil_itxg_clean, clean_me, TQ_NOSLEEP); > + if (id == TASKQID_INVALID) > zil_itxg_clean(clean_me); > } > > @@ -1921,13 +1940,10 @@ zil_open(objset_t *os, zil_get_data_t *get_data) > { > zilog_t *zilog = dmu_objset_zil(os); > > - ASSERT(zilog->zl_clean_taskq == NULL); > ASSERT(zilog->zl_get_data == NULL); > ASSERT(list_is_empty(&zilog->zl_lwb_list)); > > zilog->zl_get_data = get_data; > - zilog->zl_clean_taskq = taskq_create("zil_clean", 1, defclsyspri, > - 2, 2, TASKQ_PREPOPULATE); > > return (zilog); > } > @@ -1962,8 +1978,6 @@ zil_close(zilog_t *zilog) > if (txg < spa_freeze_txg(zilog->zl_spa)) > VERIFY(!zilog_is_dirty(zilog)); > > - taskq_destroy(zilog->zl_clean_taskq); > - zilog->zl_clean_taskq = NULL; > zilog->zl_get_data = NULL; > > /* > diff --git a/zfs/module/zfs/zle.c b/zfs/module/zfs/zle.c > index 13c5673..613607f 100644 > --- a/zfs/module/zfs/zle.c > +++ b/zfs/module/zfs/zle.c > @@ -74,10 +74,14 @@ zle_decompress(void *s_start, void *d_start, size_t s_len, size_t d_len, int n) > while (src < s_end && dst < d_end) { > int len = 1 + *src++; > if (len <= n) { > + if (src + len > s_end || dst + len > d_end) > + return (-1); > while (len-- != 0) > *dst++ = *src++; > } else { > len -= n; > + if (dst + len > d_end) > + return (-1); > while (len-- != 0) > *dst++ = 0; > } > diff --git a/zfs/module/zfs/zvol.c b/zfs/module/zfs/zvol.c > index 5293f95..5b62bf94 100644 > --- a/zfs/module/zfs/zvol.c > +++ b/zfs/module/zfs/zvol.c > @@ -1347,9 +1347,9 @@ zvol_open(struct block_device *bdev, fmode_t flag) > { > zvol_state_t *zv; > int error = 0; > - boolean_t drop_suspend = B_FALSE; > + boolean_t drop_suspend = B_TRUE; > > - ASSERT(!mutex_owned(&zvol_state_lock)); > + ASSERT(!MUTEX_HELD(&zvol_state_lock)); > > mutex_enter(&zvol_state_lock); > /* > @@ -1364,23 +1364,31 @@ zvol_open(struct block_device *bdev, fmode_t flag) > return (SET_ERROR(-ENXIO)); > } > > - /* take zv_suspend_lock before zv_state_lock */ > - rw_enter(&zv->zv_suspend_lock, RW_READER); > - > mutex_enter(&zv->zv_state_lock); > - > /* > * make sure zvol is not suspended during first open > - * (hold zv_suspend_lock), otherwise, drop the lock > + * (hold zv_suspend_lock) and respect proper lock acquisition > + * ordering - zv_suspend_lock before zv_state_lock > */ > if (zv->zv_open_count == 0) { > - drop_suspend = B_TRUE; > + if (!rw_tryenter(&zv->zv_suspend_lock, RW_READER)) { > + mutex_exit(&zv->zv_state_lock); > + rw_enter(&zv->zv_suspend_lock, RW_READER); > + mutex_enter(&zv->zv_state_lock); > + /* check to see if zv_suspend_lock is needed */ > + if (zv->zv_open_count != 0) { > + rw_exit(&zv->zv_suspend_lock); > + drop_suspend = B_FALSE; > + } > + } > } else { > - rw_exit(&zv->zv_suspend_lock); > + drop_suspend = B_FALSE; > } > - > mutex_exit(&zvol_state_lock); > > + ASSERT(MUTEX_HELD(&zv->zv_state_lock)); > + ASSERT(zv->zv_open_count != 0 || RW_READ_HELD(&zv->zv_suspend_lock)); > + > if (zv->zv_open_count == 0) { > error = zvol_first_open(zv); > if (error) > @@ -1417,28 +1425,38 @@ static int > zvol_release(struct gendisk *disk, fmode_t mode) > { > zvol_state_t *zv; > - boolean_t drop_suspend = B_FALSE; > + boolean_t drop_suspend = B_TRUE; > > - ASSERT(!mutex_owned(&zvol_state_lock)); > + ASSERT(!MUTEX_HELD(&zvol_state_lock)); > > mutex_enter(&zvol_state_lock); > zv = disk->private_data; > - ASSERT(zv && zv->zv_open_count > 0); > - > - /* take zv_suspend_lock before zv_state_lock */ > - rw_enter(&zv->zv_suspend_lock, RW_READER); > > mutex_enter(&zv->zv_state_lock); > - mutex_exit(&zvol_state_lock); > - > + ASSERT(zv->zv_open_count > 0); > /* > * make sure zvol is not suspended during last close > - * (hold zv_suspend_lock), otherwise, drop the lock > + * (hold zv_suspend_lock) and respect proper lock acquisition > + * ordering - zv_suspend_lock before zv_state_lock > */ > - if (zv->zv_open_count == 1) > - drop_suspend = B_TRUE; > - else > - rw_exit(&zv->zv_suspend_lock); > + if (zv->zv_open_count == 1) { > + if (!rw_tryenter(&zv->zv_suspend_lock, RW_READER)) { > + mutex_exit(&zv->zv_state_lock); > + rw_enter(&zv->zv_suspend_lock, RW_READER); > + mutex_enter(&zv->zv_state_lock); > + /* check to see if zv_suspend_lock is needed */ > + if (zv->zv_open_count != 1) { > + rw_exit(&zv->zv_suspend_lock); > + drop_suspend = B_FALSE; > + } > + } > + } else { > + drop_suspend = B_FALSE; > + } > + mutex_exit(&zvol_state_lock); > + > + ASSERT(MUTEX_HELD(&zv->zv_state_lock)); > + ASSERT(zv->zv_open_count != 1 || RW_READ_HELD(&zv->zv_suspend_lock)); > > zv->zv_open_count--; > if (zv->zv_open_count == 0) Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
On 17/04/18 12:04, Colin King wrote: > From: Colin Ian King <colin.king@canonical.com> > > BugLink: http://bugs.launchpad.net/bugs/1764690 > > This sync's SRU fixes in ZFS 0.7.5-1ubuntu15 to the kernel ZFS driver. > Fixes zfsonlinux issues fix in upstream ZFS repository: > > - OpenZFS 8373 - TXG_WAIT in ZIL commit path > Closes zfsonlinux #6403 > - zfs promote|rename .../%recv should be an error > Closes zfsonlinux #4843, #6339 > - Fix parsable 'zfs get' for compressratios > Closes zfsonlinux #6436, #6449 > - Fix zpool events scripted mode tab separator > Closes zfsonlinux #6444, #6445 > - zv_suspend_lock in zvol_open()/zvol_release() > Closes zfsonlinux #6342 > - Allow longer SPA names in stats, allows bigger pool names > Closes zfsonlinux #6481 > - vdev_mirror: load balancing fixes > Closes zfsonlinux #6461 > - Fix zfs_ioc_pool_sync should not use fnvlist > Closes zfsonlinux #6529 > - OpenZFS 8375 - Kernel memory leak in nvpair code > Closes zfsonlinux #6578 > - OpenZFS 7261 - nvlist code should enforce name length limit > Closes zfsonlinux #6579 > - OpenZFS 5778 - nvpair_type_is_array() does not recognize > DATA_TYPE_INT8_ARRAY > Closes zfsonlinux #6580 > - dmu_objset: release bonus buffer in failure path > Closes zfsonlinux #6575 > - Fix false config_cache_write events > Closes zfsonlinux #6617 > - Fix printk() calls missing log level > Closes zfsonlinux #6672 > - Fix abdstats kstat on 32-bit systems > Closes zfsonlinux #6721 > - Relax ASSERT for #6526 > Closes zfsonlinux #6526 > - Fix coverity defects: 147480, 147584 (Logically dead code) > Closes zfsonlinux #6745 > - Fix coverity defects: CID 161388 (Resource Leak) > Closes zfsonlinux #6755 > - Use ashift=12 by default on SSDSC2BW48 disks > Closes zfsonlinux #6774 > - OpenZFS 8558, 8602 - lwp_create() returns EAGAIN > Closes zfsonlinux #6779 > - ZFS send fails to dump objects larger than 128PiB > Closes zfsonlinux #6760 > - Sort output of tunables in arc_summary.py > Closes zfsonlinux #6828 > - Fix data on evict_skips in arc_summary.py > Closes zfsonlinux #6882, #6883 > - Fix segfault in zpool iostat when adding VDEVs > Closes zfsonlinux #6748, #6872 > - ZTS: Fix create-o_ashift test case > Closes zfsonlinux #6924, #6877 > - Handle invalid options in arc_summary > Closes zfsonlinux #6983 > - Call commit callbacks from the tail of the list > Closes zfsonlinux #6986 > - Fix 'zpool add' handling of nested interior VDEVs > Closes zfsonlinux #6678, #6996 > - Fix -fsanitize=address memory leak > kmem_alloc(0, ...) in userspace returns a leakable pointer. > Closes zfsonlinux #6941 > - Revert raidz_map and _col structure types > Closes zfsonlinux #6981, #7023 > - Use zap_count instead of cached z_size for unlink > Closes zfsonlinux #7019 > - OpenZFS 8897 - zpool online -e fails assertion when run on non-leaf > vdevs > Closes zfsonlinux #7030 > - OpenZFS 8898 - creating fs with checksum=skein on the boot pools > fails ungracefully > Closes zfsonlinux #7031 > - Emit an error message before MMP suspends pool > Closes zfsonlinux #7048 > - OpenZFS 8641 - "zpool clear" and "zinject" don't work on "spare" > or "replacing" vdevs > Closes zfsonlinux #7060 > - OpenZFS 8835 - Speculative prefetch in ZFS not working for > misaligned reads > Closes zfsonlinux #7062 > - OpenZFS 8972 - zfs holds: In scripted mode, do not pad columns with > spaces > Closes zfsonlinux #7063 > - Revert "Remove wrong ASSERT in annotate_ecksum" > Closes zfsonlinux #7079 > - OpenZFS 8731 - ASSERT3U(nui64s, <=, UINT16_MAX) fails for large > blocks > Closes zfsonlinux #7079 > - Prevent zdb(8) from occasionally hanging on I/O > Closes zfsonlinux #6999 > - Fix 'zfs receive -o' when used with '-e|-d' > Closes zfsonlinux #7088 > - Change movaps to movups in AES-NI code > Closes zfsonlinux #7065, #7108 > - tx_waited -> tx_dirty_delayed in trace_dmu.h > Closes zfsonlinux #7096 > - OpenZFS 8966 - Source file zfs_acl.c, function > Closes zfsonlinux #7141 > - Fix zdb -c traverse stop on damaged objset root > Closes zfsonlinux #7099 > - Fix zle_decompress out of bound access > Closes zfsonlinux #7099 > - Fix racy assignment of zcb.zcb_haderrors > Closes zfsonlinux #7099 > - Fix zdb -R decompression > Closes zfsonlinux #7099, #4984 > - Fix zdb -E segfault > Closes zfsonlinux #7099 > - Fix zdb -ed on objset for exported pool > Closes zfsonlinux #7099, #6464 > > Signed-off-by: Colin Ian King <colin.king@canonical.com> > --- > zfs/META | 2 +- > zfs/include/sys/dmu.h | 5 ++ > zfs/include/sys/dmu_tx.h | 4 -- > zfs/include/sys/dsl_pool.h | 1 + > zfs/include/sys/trace_dmu.h | 11 ++-- > zfs/include/sys/vdev.h | 3 +- > zfs/include/sys/vdev_impl.h | 1 - > zfs/include/sys/vdev_raidz_impl.h | 34 +++++------ > zfs/include/sys/zil_impl.h | 1 - > zfs/module/icp/asm-x86_64/aes/aes_intel.S | 94 ++++++++++++++--------------- > zfs/module/icp/asm-x86_64/modes/gcm_intel.S | 2 +- > zfs/module/icp/spi/kcf_spi.c | 11 ++-- > zfs/module/nvpair/nvpair.c | 9 ++- > zfs/module/zfs/abd.c | 4 +- > zfs/module/zfs/bpobj.c | 4 +- > zfs/module/zfs/dmu.c | 2 +- > zfs/module/zfs/dmu_objset.c | 1 + > zfs/module/zfs/dmu_send.c | 33 +++++----- > zfs/module/zfs/dmu_traverse.c | 17 +++++- > zfs/module/zfs/dmu_tx.c | 2 +- > zfs/module/zfs/dmu_zfetch.c | 24 ++++++-- > zfs/module/zfs/dsl_pool.c | 50 +++++++++++++++ > zfs/module/zfs/metaslab.c | 3 +- > zfs/module/zfs/mmp.c | 5 ++ > zfs/module/zfs/spa.c | 6 +- > zfs/module/zfs/spa_config.c | 5 ++ > zfs/module/zfs/spa_stats.c | 25 +++++--- > zfs/module/zfs/vdev_disk.c | 2 +- > zfs/module/zfs/vdev_mirror.c | 36 +++++------ > zfs/module/zfs/vdev_queue.c | 21 +++---- > zfs/module/zfs/zfs_acl.c | 2 +- > zfs/module/zfs/zfs_dir.c | 16 ++++- > zfs/module/zfs/zfs_fm.c | 14 ++--- > zfs/module/zfs/zfs_ioctl.c | 24 ++++++-- > zfs/module/zfs/zil.c | 34 ++++++++--- > zfs/module/zfs/zle.c | 4 ++ > zfs/module/zfs/zvol.c | 64 +++++++++++++------- > 37 files changed, 367 insertions(+), 209 deletions(-) > > diff --git a/zfs/META b/zfs/META > index d624ae4..2110eef 100644 > --- a/zfs/META > +++ b/zfs/META > @@ -2,7 +2,7 @@ Meta: 1 > Name: zfs > Branch: 1.0 > Version: 0.7.5 > -Release: 1ubuntu13 > +Release: 1ubuntu15 > Release-Tags: relext > License: CDDL > Author: OpenZFS on Linux > diff --git a/zfs/include/sys/dmu.h b/zfs/include/sys/dmu.h > index d246152..bcdf7d6 100644 > --- a/zfs/include/sys/dmu.h > +++ b/zfs/include/sys/dmu.h > @@ -713,11 +713,16 @@ void dmu_tx_mark_netfree(dmu_tx_t *tx); > * to stable storage and will also be called if the dmu_tx is aborted. > * If there is any error which prevents the transaction from being committed to > * disk, the callback will be called with a value of error != 0. > + * > + * When multiple callbacks are registered to the transaction, the callbacks > + * will be called in reverse order to let Lustre, the only user of commit > + * callback currently, take the fast path of its commit callback handling. > */ > typedef void dmu_tx_callback_func_t(void *dcb_data, int error); > > void dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *dcb_func, > void *dcb_data); > +void dmu_tx_do_callbacks(list_t *cb_list, int error); > > /* > * Free up the data blocks for a defined range of a file. If size is > diff --git a/zfs/include/sys/dmu_tx.h b/zfs/include/sys/dmu_tx.h > index f16e1e8..d82a793 100644 > --- a/zfs/include/sys/dmu_tx.h > +++ b/zfs/include/sys/dmu_tx.h > @@ -145,10 +145,6 @@ uint64_t dmu_tx_get_txg(dmu_tx_t *tx); > struct dsl_pool *dmu_tx_pool(dmu_tx_t *tx); > void dmu_tx_wait(dmu_tx_t *tx); > > -void dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *dcb_func, > - void *dcb_data); > -void dmu_tx_do_callbacks(list_t *cb_list, int error); > - > /* > * These routines are defined in dmu_spa.h, and are called by the SPA. > */ > diff --git a/zfs/include/sys/dsl_pool.h b/zfs/include/sys/dsl_pool.h > index d2dabda..7eb6cb0 100644 > --- a/zfs/include/sys/dsl_pool.h > +++ b/zfs/include/sys/dsl_pool.h > @@ -126,6 +126,7 @@ typedef struct dsl_pool { > txg_list_t dp_dirty_dirs; > txg_list_t dp_sync_tasks; > taskq_t *dp_sync_taskq; > + taskq_t *dp_zil_clean_taskq; > > /* > * Protects administrative changes (properties, namespace) > diff --git a/zfs/include/sys/trace_dmu.h b/zfs/include/sys/trace_dmu.h > index 5ae59e5..24e57f5 100644 > --- a/zfs/include/sys/trace_dmu.h > +++ b/zfs/include/sys/trace_dmu.h > @@ -50,7 +50,7 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class, > __field(uint64_t, tx_lastsnap_txg) > __field(uint64_t, tx_lasttried_txg) > __field(boolean_t, tx_anyobj) > - __field(boolean_t, tx_waited) > + __field(boolean_t, tx_dirty_delayed) > __field(hrtime_t, tx_start) > __field(boolean_t, tx_wait_dirty) > __field(int, tx_err) > @@ -62,7 +62,7 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class, > __entry->tx_lastsnap_txg = tx->tx_lastsnap_txg; > __entry->tx_lasttried_txg = tx->tx_lasttried_txg; > __entry->tx_anyobj = tx->tx_anyobj; > - __entry->tx_waited = tx->tx_waited; > + __entry->tx_dirty_delayed = tx->tx_dirty_delayed; > __entry->tx_start = tx->tx_start; > __entry->tx_wait_dirty = tx->tx_wait_dirty; > __entry->tx_err = tx->tx_err; > @@ -70,11 +70,12 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class, > __entry->min_tx_time = min_tx_time; > ), > TP_printk("tx { txg %llu lastsnap_txg %llu tx_lasttried_txg %llu " > - "anyobj %d waited %d start %llu wait_dirty %d err %i " > + "anyobj %d dirty_delayed %d start %llu wait_dirty %d err %i " > "} dirty %llu min_tx_time %llu", > __entry->tx_txg, __entry->tx_lastsnap_txg, > - __entry->tx_lasttried_txg, __entry->tx_anyobj, __entry->tx_waited, > - __entry->tx_start, __entry->tx_wait_dirty, __entry->tx_err, > + __entry->tx_lasttried_txg, __entry->tx_anyobj, > + __entry->tx_dirty_delayed, __entry->tx_start, > + __entry->tx_wait_dirty, __entry->tx_err, > __entry->dirty, __entry->min_tx_time) > ); > /* END CSTYLED */ > diff --git a/zfs/include/sys/vdev.h b/zfs/include/sys/vdev.h > index 7157ef4..473d269 100644 > --- a/zfs/include/sys/vdev.h > +++ b/zfs/include/sys/vdev.h > @@ -125,8 +125,7 @@ extern zio_t *vdev_queue_io(zio_t *zio); > extern void vdev_queue_io_done(zio_t *zio); > > extern int vdev_queue_length(vdev_t *vd); > -extern uint64_t vdev_queue_lastoffset(vdev_t *vd); > -extern void vdev_queue_register_lastoffset(vdev_t *vd, zio_t *zio); > +extern uint64_t vdev_queue_last_offset(vdev_t *vd); > > extern void vdev_config_dirty(vdev_t *vd); > extern void vdev_config_clean(vdev_t *vd); > diff --git a/zfs/include/sys/vdev_impl.h b/zfs/include/sys/vdev_impl.h > index 7c5e54b..4c2e3cd 100644 > --- a/zfs/include/sys/vdev_impl.h > +++ b/zfs/include/sys/vdev_impl.h > @@ -127,7 +127,6 @@ struct vdev_queue { > hrtime_t vq_io_delta_ts; > zio_t vq_io_search; /* used as local for stack reduction */ > kmutex_t vq_lock; > - uint64_t vq_lastoffset; > }; > > /* > diff --git a/zfs/include/sys/vdev_raidz_impl.h b/zfs/include/sys/vdev_raidz_impl.h > index 4bd15e3..0799ed1 100644 > --- a/zfs/include/sys/vdev_raidz_impl.h > +++ b/zfs/include/sys/vdev_raidz_impl.h > @@ -102,30 +102,30 @@ typedef struct raidz_impl_ops { > } raidz_impl_ops_t; > > typedef struct raidz_col { > - size_t rc_devidx; /* child device index for I/O */ > - size_t rc_offset; /* device offset */ > - size_t rc_size; /* I/O size */ > + uint64_t rc_devidx; /* child device index for I/O */ > + uint64_t rc_offset; /* device offset */ > + uint64_t rc_size; /* I/O size */ > abd_t *rc_abd; /* I/O data */ > void *rc_gdata; /* used to store the "good" version */ > int rc_error; /* I/O error for this device */ > - unsigned int rc_tried; /* Did we attempt this I/O column? */ > - unsigned int rc_skipped; /* Did we skip this I/O column? */ > + uint8_t rc_tried; /* Did we attempt this I/O column? */ > + uint8_t rc_skipped; /* Did we skip this I/O column? */ > } raidz_col_t; > > typedef struct raidz_map { > - size_t rm_cols; /* Regular column count */ > - size_t rm_scols; /* Count including skipped columns */ > - size_t rm_bigcols; /* Number of oversized columns */ > - size_t rm_asize; /* Actual total I/O size */ > - size_t rm_missingdata; /* Count of missing data devices */ > - size_t rm_missingparity; /* Count of missing parity devices */ > - size_t rm_firstdatacol; /* First data column/parity count */ > - size_t rm_nskip; /* Skipped sectors for padding */ > - size_t rm_skipstart; /* Column index of padding start */ > + uint64_t rm_cols; /* Regular column count */ > + uint64_t rm_scols; /* Count including skipped columns */ > + uint64_t rm_bigcols; /* Number of oversized columns */ > + uint64_t rm_asize; /* Actual total I/O size */ > + uint64_t rm_missingdata; /* Count of missing data devices */ > + uint64_t rm_missingparity; /* Count of missing parity devices */ > + uint64_t rm_firstdatacol; /* First data column/parity count */ > + uint64_t rm_nskip; /* Skipped sectors for padding */ > + uint64_t rm_skipstart; /* Column index of padding start */ > abd_t *rm_abd_copy; /* rm_asize-buffer of copied data */ > - size_t rm_reports; /* # of referencing checksum reports */ > - unsigned int rm_freed; /* map no longer has referencing ZIO */ > - unsigned int rm_ecksuminjected; /* checksum error was injected */ > + uintptr_t rm_reports; /* # of referencing checksum reports */ > + uint8_t rm_freed; /* map no longer has referencing ZIO */ > + uint8_t rm_ecksuminjected; /* checksum error was injected */ > raidz_impl_ops_t *rm_ops; /* RAIDZ math operations */ > raidz_col_t rm_col[1]; /* Flexible array of I/O columns */ > } raidz_map_t; > diff --git a/zfs/include/sys/zil_impl.h b/zfs/include/sys/zil_impl.h > index 13ecca3..dd5304b 100644 > --- a/zfs/include/sys/zil_impl.h > +++ b/zfs/include/sys/zil_impl.h > @@ -124,7 +124,6 @@ struct zilog { > list_t zl_lwb_list; /* in-flight log write list */ > kmutex_t zl_vdev_lock; /* protects zl_vdev_tree */ > avl_tree_t zl_vdev_tree; /* vdevs to flush in zil_commit() */ > - taskq_t *zl_clean_taskq; /* runs lwb and itx clean tasks */ > avl_tree_t zl_bp_tree; /* track bps during log parse */ > clock_t zl_replay_time; /* lbolt of when replay started */ > uint64_t zl_replay_blks; /* number of log blocks replayed */ > diff --git a/zfs/module/icp/asm-x86_64/aes/aes_intel.S b/zfs/module/icp/asm-x86_64/aes/aes_intel.S > index ed0df75..a40e30f 100644 > --- a/zfs/module/icp/asm-x86_64/aes/aes_intel.S > +++ b/zfs/module/icp/asm-x86_64/aes/aes_intel.S > @@ -207,7 +207,7 @@ _key_expansion_256a_local: > shufps $0b10001100, %xmm0, %xmm4 > pxor %xmm4, %xmm0 > pxor %xmm1, %xmm0 > - movaps %xmm0, (%rcx) > + movups %xmm0, (%rcx) > add $0x10, %rcx > ret > nop > @@ -224,18 +224,18 @@ _key_expansion_192a_local: > pxor %xmm4, %xmm0 > pxor %xmm1, %xmm0 > > - movaps %xmm2, %xmm5 > - movaps %xmm2, %xmm6 > + movups %xmm2, %xmm5 > + movups %xmm2, %xmm6 > pslldq $4, %xmm5 > pshufd $0b11111111, %xmm0, %xmm3 > pxor %xmm3, %xmm2 > pxor %xmm5, %xmm2 > > - movaps %xmm0, %xmm1 > + movups %xmm0, %xmm1 > shufps $0b01000100, %xmm0, %xmm6 > - movaps %xmm6, (%rcx) > + movups %xmm6, (%rcx) > shufps $0b01001110, %xmm2, %xmm1 > - movaps %xmm1, 0x10(%rcx) > + movups %xmm1, 0x10(%rcx) > add $0x20, %rcx > ret > SET_SIZE(_key_expansion_192a) > @@ -250,13 +250,13 @@ _key_expansion_192b_local: > pxor %xmm4, %xmm0 > pxor %xmm1, %xmm0 > > - movaps %xmm2, %xmm5 > + movups %xmm2, %xmm5 > pslldq $4, %xmm5 > pshufd $0b11111111, %xmm0, %xmm3 > pxor %xmm3, %xmm2 > pxor %xmm5, %xmm2 > > - movaps %xmm0, (%rcx) > + movups %xmm0, (%rcx) > add $0x10, %rcx > ret > SET_SIZE(_key_expansion_192b) > @@ -270,7 +270,7 @@ _key_expansion_256b_local: > shufps $0b10001100, %xmm2, %xmm4 > pxor %xmm4, %xmm2 > pxor %xmm1, %xmm2 > - movaps %xmm2, (%rcx) > + movups %xmm2, (%rcx) > add $0x10, %rcx > ret > SET_SIZE(_key_expansion_256b) > @@ -327,7 +327,7 @@ rijndael_key_setup_enc_intel_local: > jz .Lenc_key_invalid_param > > movups (%USERCIPHERKEY), %xmm0 // user key (first 16 bytes) > - movaps %xmm0, (%AESKEY) > + movups %xmm0, (%AESKEY) > lea 0x10(%AESKEY), %rcx // key addr > pxor %xmm4, %xmm4 // xmm4 is assumed 0 in _key_expansion_x > > @@ -341,7 +341,7 @@ rijndael_key_setup_enc_intel_local: > #endif /* OPENSSL_INTERFACE */ > > movups 0x10(%USERCIPHERKEY), %xmm2 // other user key (2nd 16 bytes) > - movaps %xmm2, (%rcx) > + movups %xmm2, (%rcx) > add $0x10, %rcx > > aeskeygenassist $0x1, %xmm2, %xmm1 // expand the key > @@ -525,10 +525,10 @@ FRAME_BEGIN > > .align 4 > .Ldec_key_reorder_loop: > - movaps (%AESKEY), %xmm0 > - movaps (%ROUNDS64), %xmm1 > - movaps %xmm0, (%ROUNDS64) > - movaps %xmm1, (%AESKEY) > + movups (%AESKEY), %xmm0 > + movups (%ROUNDS64), %xmm1 > + movups %xmm0, (%ROUNDS64) > + movups %xmm1, (%AESKEY) > lea 0x10(%AESKEY), %AESKEY > lea -0x10(%ROUNDS64), %ROUNDS64 > cmp %AESKEY, %ROUNDS64 > @@ -536,11 +536,11 @@ FRAME_BEGIN > > .align 4 > .Ldec_key_inv_loop: > - movaps (%rcx), %xmm0 > + movups (%rcx), %xmm0 > // Convert an encryption round key to a form usable for decryption > // with the "AES Inverse Mix Columns" instruction > aesimc %xmm0, %xmm1 > - movaps %xmm1, (%rcx) > + movups %xmm1, (%rcx) > lea 0x10(%rcx), %rcx > cmp %ENDAESKEY, %rcx > jnz .Ldec_key_inv_loop > @@ -602,7 +602,7 @@ FRAME_BEGIN > ENTRY_NP(aes_encrypt_intel) > > movups (%INP), %STATE // input > - movaps (%KEYP), %KEY // key > + movups (%KEYP), %KEY // key > #ifdef OPENSSL_INTERFACE > mov 240(%KEYP), %NROUNDS32 // round count > #else /* OpenSolaris Interface */ > @@ -618,41 +618,41 @@ ENTRY_NP(aes_encrypt_intel) > > // AES 256 > lea 0x20(%KEYP), %KEYP > - movaps -0x60(%KEYP), %KEY > + movups -0x60(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps -0x50(%KEYP), %KEY > + movups -0x50(%KEYP), %KEY > aesenc %KEY, %STATE > > .align 4 > .Lenc192: > // AES 192 and 256 > - movaps -0x40(%KEYP), %KEY > + movups -0x40(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps -0x30(%KEYP), %KEY > + movups -0x30(%KEYP), %KEY > aesenc %KEY, %STATE > > .align 4 > .Lenc128: > // AES 128, 192, and 256 > - movaps -0x20(%KEYP), %KEY > + movups -0x20(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps -0x10(%KEYP), %KEY > + movups -0x10(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps (%KEYP), %KEY > + movups (%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x10(%KEYP), %KEY > + movups 0x10(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x20(%KEYP), %KEY > + movups 0x20(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x30(%KEYP), %KEY > + movups 0x30(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x40(%KEYP), %KEY > + movups 0x40(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x50(%KEYP), %KEY > + movups 0x50(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x60(%KEYP), %KEY > + movups 0x60(%KEYP), %KEY > aesenc %KEY, %STATE > - movaps 0x70(%KEYP), %KEY > + movups 0x70(%KEYP), %KEY > aesenclast %KEY, %STATE // last round > movups %STATE, (%OUTP) // output > > @@ -685,7 +685,7 @@ ENTRY_NP(aes_encrypt_intel) > ENTRY_NP(aes_decrypt_intel) > > movups (%INP), %STATE // input > - movaps (%KEYP), %KEY // key > + movups (%KEYP), %KEY // key > #ifdef OPENSSL_INTERFACE > mov 240(%KEYP), %NROUNDS32 // round count > #else /* OpenSolaris Interface */ > @@ -701,41 +701,41 @@ ENTRY_NP(aes_decrypt_intel) > > // AES 256 > lea 0x20(%KEYP), %KEYP > - movaps -0x60(%KEYP), %KEY > + movups -0x60(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps -0x50(%KEYP), %KEY > + movups -0x50(%KEYP), %KEY > aesdec %KEY, %STATE > > .align 4 > .Ldec192: > // AES 192 and 256 > - movaps -0x40(%KEYP), %KEY > + movups -0x40(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps -0x30(%KEYP), %KEY > + movups -0x30(%KEYP), %KEY > aesdec %KEY, %STATE > > .align 4 > .Ldec128: > // AES 128, 192, and 256 > - movaps -0x20(%KEYP), %KEY > + movups -0x20(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps -0x10(%KEYP), %KEY > + movups -0x10(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps (%KEYP), %KEY > + movups (%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x10(%KEYP), %KEY > + movups 0x10(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x20(%KEYP), %KEY > + movups 0x20(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x30(%KEYP), %KEY > + movups 0x30(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x40(%KEYP), %KEY > + movups 0x40(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x50(%KEYP), %KEY > + movups 0x50(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x60(%KEYP), %KEY > + movups 0x60(%KEYP), %KEY > aesdec %KEY, %STATE > - movaps 0x70(%KEYP), %KEY > + movups 0x70(%KEYP), %KEY > aesdeclast %KEY, %STATE // last round > movups %STATE, (%OUTP) // output > > diff --git a/zfs/module/icp/asm-x86_64/modes/gcm_intel.S b/zfs/module/icp/asm-x86_64/modes/gcm_intel.S > index a43b5eb..3aec0ee 100644 > --- a/zfs/module/icp/asm-x86_64/modes/gcm_intel.S > +++ b/zfs/module/icp/asm-x86_64/modes/gcm_intel.S > @@ -150,7 +150,7 @@ ENTRY_NP(gcm_mul_pclmulqdq) > // Byte swap 16-byte input > // > lea .Lbyte_swap16_mask(%rip), %rax > - movaps (%rax), %xmm10 > + movups (%rax), %xmm10 > pshufb %xmm10, %xmm0 > pshufb %xmm10, %xmm1 > > diff --git a/zfs/module/icp/spi/kcf_spi.c b/zfs/module/icp/spi/kcf_spi.c > index c2c2b54..0a6e38d 100644 > --- a/zfs/module/icp/spi/kcf_spi.c > +++ b/zfs/module/icp/spi/kcf_spi.c > @@ -111,7 +111,7 @@ int > crypto_register_provider(crypto_provider_info_t *info, > crypto_kcf_provider_handle_t *handle) > { > - char ks_name[KSTAT_STRLEN]; > + char *ks_name; > > kcf_provider_desc_t *prov_desc = NULL; > int ret = CRYPTO_ARGUMENTS_BAD; > @@ -238,12 +238,12 @@ crypto_register_provider(crypto_provider_info_t *info, > * This kstat is deleted, when the provider unregisters. > */ > if (prov_desc->pd_prov_type == CRYPTO_SW_PROVIDER) { > - (void) snprintf(ks_name, KSTAT_STRLEN, "%s_%s", > + ks_name = kmem_asprintf("%s_%s", > "NONAME", "provider_stats"); > } else { > - (void) snprintf(ks_name, KSTAT_STRLEN, "%s_%d_%u_%s", > - "NONAME", 0, > - prov_desc->pd_prov_id, "provider_stats"); > + ks_name = kmem_asprintf("%s_%d_%u_%s", > + "NONAME", 0, prov_desc->pd_prov_id, > + "provider_stats"); > } > > prov_desc->pd_kstat = kstat_create("kcf", 0, ks_name, "crypto", > @@ -261,6 +261,7 @@ crypto_register_provider(crypto_provider_info_t *info, > prov_desc->pd_kstat->ks_update = kcf_prov_kstat_update; > kstat_install(prov_desc->pd_kstat); > } > + strfree(ks_name); > } > > if (prov_desc->pd_prov_type == CRYPTO_HW_PROVIDER) > diff --git a/zfs/module/nvpair/nvpair.c b/zfs/module/nvpair/nvpair.c > index 249b7c9..abed33e 100644 > --- a/zfs/module/nvpair/nvpair.c > +++ b/zfs/module/nvpair/nvpair.c > @@ -21,7 +21,7 @@ > > /* > * Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. > - * Copyright (c) 2015, 2016 by Delphix. All rights reserved. > + * Copyright (c) 2015, 2017 by Delphix. All rights reserved. > */ > > #include <sys/stropts.h> > @@ -916,6 +916,8 @@ nvlist_add_common(nvlist_t *nvl, const char *name, > > /* calculate sizes of the nvpair elements and the nvpair itself */ > name_sz = strlen(name) + 1; > + if (name_sz >= 1ULL << (sizeof (nvp->nvp_name_sz) * NBBY - 1)) > + return (EINVAL); > > nvp_sz = NVP_SIZE_CALC(name_sz, value_sz); > > @@ -1242,6 +1244,7 @@ nvpair_type_is_array(nvpair_t *nvp) > data_type_t type = NVP_TYPE(nvp); > > if ((type == DATA_TYPE_BYTE_ARRAY) || > + (type == DATA_TYPE_INT8_ARRAY) || > (type == DATA_TYPE_UINT8_ARRAY) || > (type == DATA_TYPE_INT16_ARRAY) || > (type == DATA_TYPE_UINT16_ARRAY) || > @@ -2200,8 +2203,10 @@ nvs_embedded(nvstream_t *nvs, nvlist_t *embedded) > > nvlist_init(embedded, embedded->nvl_nvflag, priv); > > - if (nvs->nvs_recursion >= nvpair_max_recursion) > + if (nvs->nvs_recursion >= nvpair_max_recursion) { > + nvlist_free(embedded); > return (EINVAL); > + } > nvs->nvs_recursion++; > if ((err = nvs_operation(nvs, embedded, NULL)) != 0) > nvlist_free(embedded); > diff --git a/zfs/module/zfs/abd.c b/zfs/module/zfs/abd.c > index 765ac7f..3c7893d 100644 > --- a/zfs/module/zfs/abd.c > +++ b/zfs/module/zfs/abd.c > @@ -571,7 +571,7 @@ static inline void > abd_free_struct(abd_t *abd) > { > kmem_cache_free(abd_cache, abd); > - ABDSTAT_INCR(abdstat_struct_size, -sizeof (abd_t)); > + ABDSTAT_INCR(abdstat_struct_size, -(int)sizeof (abd_t)); > } > > /* > @@ -618,7 +618,7 @@ abd_free_scatter(abd_t *abd) > ABDSTAT_BUMPDOWN(abdstat_scatter_cnt); > ABDSTAT_INCR(abdstat_scatter_data_size, -(int)abd->abd_size); > ABDSTAT_INCR(abdstat_scatter_chunk_waste, > - abd->abd_size - P2ROUNDUP(abd->abd_size, PAGESIZE)); > + (int)abd->abd_size - (int)P2ROUNDUP(abd->abd_size, PAGESIZE)); > > abd_free_struct(abd); > } > diff --git a/zfs/module/zfs/bpobj.c b/zfs/module/zfs/bpobj.c > index 82ca94e..32459c9 100644 > --- a/zfs/module/zfs/bpobj.c > +++ b/zfs/module/zfs/bpobj.c > @@ -261,7 +261,7 @@ bpobj_iterate_impl(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx, > } > if (free) { > VERIFY3U(0, ==, dmu_free_range(bpo->bpo_os, bpo->bpo_object, > - (i + 1) * sizeof (blkptr_t), -1ULL, tx)); > + (i + 1) * sizeof (blkptr_t), DMU_OBJECT_END, tx)); > } > if (err || !bpo->bpo_havesubobj || bpo->bpo_phys->bpo_subobjs == 0) > goto out; > @@ -339,7 +339,7 @@ bpobj_iterate_impl(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx, > if (free) { > VERIFY3U(0, ==, dmu_free_range(bpo->bpo_os, > bpo->bpo_phys->bpo_subobjs, > - (i + 1) * sizeof (uint64_t), -1ULL, tx)); > + (i + 1) * sizeof (uint64_t), DMU_OBJECT_END, tx)); > } > > out: > diff --git a/zfs/module/zfs/dmu.c b/zfs/module/zfs/dmu.c > index 6f09aa2..05c9fc3 100644 > --- a/zfs/module/zfs/dmu.c > +++ b/zfs/module/zfs/dmu.c > @@ -887,7 +887,7 @@ dmu_free_range(objset_t *os, uint64_t object, uint64_t offset, > if (err) > return (err); > ASSERT(offset < UINT64_MAX); > - ASSERT(size == -1ULL || size <= UINT64_MAX - offset); > + ASSERT(size == DMU_OBJECT_END || size <= UINT64_MAX - offset); > dnode_free_range(dn, offset, size, tx); > dnode_rele(dn, FTAG); > return (0); > diff --git a/zfs/module/zfs/dmu_objset.c b/zfs/module/zfs/dmu_objset.c > index 9a7a696..3425d54 100644 > --- a/zfs/module/zfs/dmu_objset.c > +++ b/zfs/module/zfs/dmu_objset.c > @@ -1853,6 +1853,7 @@ dmu_objset_space_upgrade(objset_t *os) > dmu_tx_hold_bonus(tx, obj); > objerr = dmu_tx_assign(tx, TXG_WAIT); > if (objerr != 0) { > + dmu_buf_rele(db, FTAG); > dmu_tx_abort(tx); > continue; > } > diff --git a/zfs/module/zfs/dmu_send.c b/zfs/module/zfs/dmu_send.c > index 344e420..2e3d706 100644 > --- a/zfs/module/zfs/dmu_send.c > +++ b/zfs/module/zfs/dmu_send.c > @@ -224,9 +224,6 @@ dump_free(dmu_sendarg_t *dsp, uint64_t object, uint64_t offset, > (object == dsp->dsa_last_data_object && > offset > dsp->dsa_last_data_offset)); > > - if (length != -1ULL && offset + length < offset) > - length = -1ULL; > - > /* > * If there is a pending op, but it's not PENDING_FREE, push it out, > * since free block aggregation can only be done for blocks of the > @@ -243,19 +240,22 @@ dump_free(dmu_sendarg_t *dsp, uint64_t object, uint64_t offset, > > if (dsp->dsa_pending_op == PENDING_FREE) { > /* > - * There should never be a PENDING_FREE if length is -1 > - * (because dump_dnode is the only place where this > - * function is called with a -1, and only after flushing > - * any pending record). > + * There should never be a PENDING_FREE if length is > + * DMU_OBJECT_END (because dump_dnode is the only place where > + * this function is called with a DMU_OBJECT_END, and only after > + * flushing any pending record). > */ > - ASSERT(length != -1ULL); > + ASSERT(length != DMU_OBJECT_END); > /* > * Check to see whether this free block can be aggregated > * with pending one. > */ > if (drrf->drr_object == object && drrf->drr_offset + > drrf->drr_length == offset) { > - drrf->drr_length += length; > + if (offset + length < offset) > + drrf->drr_length = DMU_OBJECT_END; > + else > + drrf->drr_length += length; > return (0); > } else { > /* not a continuation. Push out pending record */ > @@ -269,9 +269,12 @@ dump_free(dmu_sendarg_t *dsp, uint64_t object, uint64_t offset, > dsp->dsa_drr->drr_type = DRR_FREE; > drrf->drr_object = object; > drrf->drr_offset = offset; > - drrf->drr_length = length; > + if (offset + length < offset) > + drrf->drr_length = DMU_OBJECT_END; > + else > + drrf->drr_length = length; > drrf->drr_toguid = dsp->dsa_toguid; > - if (length == -1ULL) { > + if (length == DMU_OBJECT_END) { > if (dump_record(dsp, NULL, 0) != 0) > return (SET_ERROR(EINTR)); > } else { > @@ -530,7 +533,7 @@ dump_dnode(dmu_sendarg_t *dsp, uint64_t object, dnode_phys_t *dnp) > > /* Free anything past the end of the file. */ > if (dump_free(dsp, object, (dnp->dn_maxblkid + 1) * > - (dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT), -1ULL) != 0) > + (dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT), DMU_OBJECT_END) != 0) > return (SET_ERROR(EINTR)); > if (dsp->dsa_err != 0) > return (SET_ERROR(EINTR)); > @@ -666,7 +669,9 @@ do_dump(dmu_sendarg_t *dsa, struct send_block_record *data) > } else if (BP_IS_HOLE(bp)) { > uint64_t span = BP_SPAN(dblkszsec, indblkshift, zb->zb_level); > uint64_t offset = zb->zb_blkid * span; > - err = dump_free(dsa, zb->zb_object, offset, span); > + /* Don't dump free records for offsets > DMU_OBJECT_END */ > + if (zb->zb_blkid == 0 || span <= DMU_OBJECT_END / zb->zb_blkid) > + err = dump_free(dsa, zb->zb_object, offset, span); > } else if (zb->zb_level > 0 || type == DMU_OT_OBJSET) { > return (0); > } else if (type == DMU_OT_DNODE) { > @@ -2498,7 +2503,7 @@ receive_free(struct receive_writer_arg *rwa, struct drr_free *drrf) > { > int err; > > - if (drrf->drr_length != -1ULL && > + if (drrf->drr_length != DMU_OBJECT_END && > drrf->drr_offset + drrf->drr_length < drrf->drr_offset) > return (SET_ERROR(EINVAL)); > > diff --git a/zfs/module/zfs/dmu_traverse.c b/zfs/module/zfs/dmu_traverse.c > index c78228d..62f770e 100644 > --- a/zfs/module/zfs/dmu_traverse.c > +++ b/zfs/module/zfs/dmu_traverse.c > @@ -609,9 +609,20 @@ traverse_impl(spa_t *spa, dsl_dataset_t *ds, uint64_t objset, blkptr_t *rootbp, > if (err != 0) > return (err); > > - osp = buf->b_data; > - traverse_zil(td, &osp->os_zil_header); > - arc_buf_destroy(buf, &buf); > + if (err != 0) { > + /* > + * If both TRAVERSE_HARD and TRAVERSE_PRE are set, > + * continue to visitbp so that td_func can be called > + * in pre stage, and err will reset to zero. > + */ > + if (!(td->td_flags & TRAVERSE_HARD) || > + !(td->td_flags & TRAVERSE_PRE)) > + return (err); > + } else { > + osp = buf->b_data; > + traverse_zil(td, &osp->os_zil_header); > + arc_buf_destroy(buf, &buf); > + } > } > > if (!(flags & TRAVERSE_PREFETCH_DATA) || > diff --git a/zfs/module/zfs/dmu_tx.c b/zfs/module/zfs/dmu_tx.c > index 097fa77..c3cc03a 100644 > --- a/zfs/module/zfs/dmu_tx.c > +++ b/zfs/module/zfs/dmu_tx.c > @@ -1200,7 +1200,7 @@ dmu_tx_do_callbacks(list_t *cb_list, int error) > { > dmu_tx_callback_t *dcb; > > - while ((dcb = list_head(cb_list)) != NULL) { > + while ((dcb = list_tail(cb_list)) != NULL) { > list_remove(cb_list, dcb); > dcb->dcb_func(dcb->dcb_data, error); > kmem_free(dcb, sizeof (dmu_tx_callback_t)); > diff --git a/zfs/module/zfs/dmu_zfetch.c b/zfs/module/zfs/dmu_zfetch.c > index 1bf5c4e..e72e9ef 100644 > --- a/zfs/module/zfs/dmu_zfetch.c > +++ b/zfs/module/zfs/dmu_zfetch.c > @@ -228,19 +228,33 @@ dmu_zfetch(zfetch_t *zf, uint64_t blkid, uint64_t nblks, boolean_t fetch_data) > > rw_enter(&zf->zf_rwlock, RW_READER); > > + /* > + * Find matching prefetch stream. Depending on whether the accesses > + * are block-aligned, first block of the new access may either follow > + * the last block of the previous access, or be equal to it. > + */ > for (zs = list_head(&zf->zf_stream); zs != NULL; > zs = list_next(&zf->zf_stream, zs)) { > - if (blkid == zs->zs_blkid) { > + if (blkid == zs->zs_blkid || blkid + 1 == zs->zs_blkid) { > mutex_enter(&zs->zs_lock); > /* > * zs_blkid could have changed before we > * acquired zs_lock; re-check them here. > */ > - if (blkid != zs->zs_blkid) { > - mutex_exit(&zs->zs_lock); > - continue; > + if (blkid == zs->zs_blkid) { > + break; > + } else if (blkid + 1 == zs->zs_blkid) { > + blkid++; > + nblks--; > + if (nblks == 0) { > + /* Already prefetched this before. */ > + mutex_exit(&zs->zs_lock); > + rw_exit(&zf->zf_rwlock); > + return; > + } > + break; > } > - break; > + mutex_exit(&zs->zs_lock); > } > } > > diff --git a/zfs/module/zfs/dsl_pool.c b/zfs/module/zfs/dsl_pool.c > index c167080..0320d0e 100644 > --- a/zfs/module/zfs/dsl_pool.c > +++ b/zfs/module/zfs/dsl_pool.c > @@ -135,6 +135,36 @@ unsigned long zfs_delay_scale = 1000 * 1000 * 1000 / 2000; > */ > int zfs_sync_taskq_batch_pct = 75; > > +/* > + * These tunables determine the behavior of how zil_itxg_clean() is > + * called via zil_clean() in the context of spa_sync(). When an itxg > + * list needs to be cleaned, TQ_NOSLEEP will be used when dispatching. > + * If the dispatch fails, the call to zil_itxg_clean() will occur > + * synchronously in the context of spa_sync(), which can negatively > + * impact the performance of spa_sync() (e.g. in the case of the itxg > + * list having a large number of itxs that needs to be cleaned). > + * > + * Thus, these tunables can be used to manipulate the behavior of the > + * taskq used by zil_clean(); they determine the number of taskq entries > + * that are pre-populated when the taskq is first created (via the > + * "zfs_zil_clean_taskq_minalloc" tunable) and the maximum number of > + * taskq entries that are cached after an on-demand allocation (via the > + * "zfs_zil_clean_taskq_maxalloc"). > + * > + * The idea being, we want to try reasonably hard to ensure there will > + * already be a taskq entry pre-allocated by the time that it is needed > + * by zil_clean(). This way, we can avoid the possibility of an > + * on-demand allocation of a new taskq entry from failing, which would > + * result in zil_itxg_clean() being called synchronously from zil_clean() > + * (which can adversely affect performance of spa_sync()). > + * > + * Additionally, the number of threads used by the taskq can be > + * configured via the "zfs_zil_clean_taskq_nthr_pct" tunable. > + */ > +int zfs_zil_clean_taskq_nthr_pct = 100; > +int zfs_zil_clean_taskq_minalloc = 1024; > +int zfs_zil_clean_taskq_maxalloc = 1024 * 1024; > + > int > dsl_pool_open_special_dir(dsl_pool_t *dp, const char *name, dsl_dir_t **ddp) > { > @@ -176,6 +206,12 @@ dsl_pool_open_impl(spa_t *spa, uint64_t txg) > zfs_sync_taskq_batch_pct, minclsyspri, 1, INT_MAX, > TASKQ_THREADS_CPU_PCT); > > + dp->dp_zil_clean_taskq = taskq_create("dp_zil_clean_taskq", > + zfs_zil_clean_taskq_nthr_pct, minclsyspri, > + zfs_zil_clean_taskq_minalloc, > + zfs_zil_clean_taskq_maxalloc, > + TASKQ_PREPOPULATE | TASKQ_THREADS_CPU_PCT); > + > mutex_init(&dp->dp_lock, NULL, MUTEX_DEFAULT, NULL); > cv_init(&dp->dp_spaceavail_cv, NULL, CV_DEFAULT, NULL); > > @@ -334,6 +370,7 @@ dsl_pool_close(dsl_pool_t *dp) > txg_list_destroy(&dp->dp_sync_tasks); > txg_list_destroy(&dp->dp_dirty_dirs); > > + taskq_destroy(dp->dp_zil_clean_taskq); > taskq_destroy(dp->dp_sync_taskq); > > /* > @@ -1142,5 +1179,18 @@ MODULE_PARM_DESC(zfs_delay_scale, "how quickly delay approaches infinity"); > module_param(zfs_sync_taskq_batch_pct, int, 0644); > MODULE_PARM_DESC(zfs_sync_taskq_batch_pct, > "max percent of CPUs that are used to sync dirty data"); > + > +module_param(zfs_zil_clean_taskq_nthr_pct, int, 0644); > +MODULE_PARM_DESC(zfs_zil_clean_taskq_nthr_pct, > + "max percent of CPUs that are used per dp_sync_taskq"); > + > +module_param(zfs_zil_clean_taskq_minalloc, int, 0644); > +MODULE_PARM_DESC(zfs_zil_clean_taskq_minalloc, > + "number of taskq entries that are pre-populated"); > + > +module_param(zfs_zil_clean_taskq_maxalloc, int, 0644); > +MODULE_PARM_DESC(zfs_zil_clean_taskq_maxalloc, > + "max number of taskq entries that are cached"); > + > /* END CSTYLED */ > #endif > diff --git a/zfs/module/zfs/metaslab.c b/zfs/module/zfs/metaslab.c > index 5e413c0..01e5234 100644 > --- a/zfs/module/zfs/metaslab.c > +++ b/zfs/module/zfs/metaslab.c > @@ -1937,7 +1937,8 @@ metaslab_passivate(metaslab_t *msp, uint64_t weight) > * this metaslab again. In that case, it had better be empty, > * or we would be leaving space on the table. > */ > - ASSERT(size >= SPA_MINBLOCKSIZE || > + ASSERT(!WEIGHT_IS_SPACEBASED(msp->ms_weight) || > + size >= SPA_MINBLOCKSIZE || > range_tree_space(msp->ms_tree) == 0); > ASSERT0(weight & METASLAB_ACTIVE_MASK); > > diff --git a/zfs/module/zfs/mmp.c b/zfs/module/zfs/mmp.c > index 6f2aa3f..e91ae62 100644 > --- a/zfs/module/zfs/mmp.c > +++ b/zfs/module/zfs/mmp.c > @@ -26,6 +26,7 @@ > #include <sys/mmp.h> > #include <sys/spa.h> > #include <sys/spa_impl.h> > +#include <sys/time.h> > #include <sys/vdev.h> > #include <sys/vdev_impl.h> > #include <sys/zfs_context.h> > @@ -428,6 +429,10 @@ mmp_thread(spa_t *spa) > */ > if (!suspended && mmp_fail_intervals && multihost && > (start - mmp->mmp_last_write) > max_fail_ns) { > + cmn_err(CE_WARN, "MMP writes to pool '%s' have not " > + "succeeded in over %llus; suspending pool", > + spa_name(spa), > + NSEC2SEC(start - mmp->mmp_last_write)); > zio_suspend(spa, NULL); > } > > diff --git a/zfs/module/zfs/spa.c b/zfs/module/zfs/spa.c > index a7a2f62..00587d8 100644 > --- a/zfs/module/zfs/spa.c > +++ b/zfs/module/zfs/spa.c > @@ -1561,7 +1561,7 @@ spa_load_spares(spa_t *spa) > static void > spa_load_l2cache(spa_t *spa) > { > - nvlist_t **l2cache; > + nvlist_t **l2cache = NULL; > uint_t nl2cache; > int i, j, oldnvdevs; > uint64_t guid; > @@ -1645,7 +1645,9 @@ spa_load_l2cache(spa_t *spa) > VERIFY(nvlist_remove(sav->sav_config, ZPOOL_CONFIG_L2CACHE, > DATA_TYPE_NVLIST_ARRAY) == 0); > > - l2cache = kmem_alloc(sav->sav_count * sizeof (void *), KM_SLEEP); > + if (sav->sav_count > 0) > + l2cache = kmem_alloc(sav->sav_count * sizeof (void *), > + KM_SLEEP); > for (i = 0; i < sav->sav_count; i++) > l2cache[i] = vdev_config_generate(spa, > sav->sav_vdevs[i], B_TRUE, VDEV_CONFIG_L2CACHE); > diff --git a/zfs/module/zfs/spa_config.c b/zfs/module/zfs/spa_config.c > index 5b792b8..5bbfb4a 100644 > --- a/zfs/module/zfs/spa_config.c > +++ b/zfs/module/zfs/spa_config.c > @@ -162,6 +162,11 @@ spa_config_write(spa_config_dirent_t *dp, nvlist_t *nvl) > */ > if (nvl == NULL) { > err = vn_remove(dp->scd_path, UIO_SYSSPACE, RMFILE); > + /* > + * Don't report an error when the cache file is already removed > + */ > + if (err == ENOENT) > + err = 0; > return (err); > } > > diff --git a/zfs/module/zfs/spa_stats.c b/zfs/module/zfs/spa_stats.c > index 7ca3598..8c4dba2 100644 > --- a/zfs/module/zfs/spa_stats.c > +++ b/zfs/module/zfs/spa_stats.c > @@ -142,7 +142,7 @@ static void > spa_read_history_init(spa_t *spa) > { > spa_stats_history_t *ssh = &spa->spa_stats.read_history; > - char name[KSTAT_STRLEN]; > + char *name; > kstat_t *ksp; > > mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL); > @@ -153,7 +153,7 @@ spa_read_history_init(spa_t *spa) > ssh->size = 0; > ssh->private = NULL; > > - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); > + name = kmem_asprintf("zfs/%s", spa_name(spa)); > > ksp = kstat_create(name, 0, "reads", "misc", > KSTAT_TYPE_RAW, 0, KSTAT_FLAG_VIRTUAL); > @@ -168,6 +168,7 @@ spa_read_history_init(spa_t *spa) > spa_read_history_data, spa_read_history_addr); > kstat_install(ksp); > } > + strfree(name); > } > > static void > @@ -365,7 +366,7 @@ static void > spa_txg_history_init(spa_t *spa) > { > spa_stats_history_t *ssh = &spa->spa_stats.txg_history; > - char name[KSTAT_STRLEN]; > + char *name; > kstat_t *ksp; > > mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL); > @@ -376,7 +377,7 @@ spa_txg_history_init(spa_t *spa) > ssh->size = 0; > ssh->private = NULL; > > - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); > + name = kmem_asprintf("zfs/%s", spa_name(spa)); > > ksp = kstat_create(name, 0, "txgs", "misc", > KSTAT_TYPE_RAW, 0, KSTAT_FLAG_VIRTUAL); > @@ -391,6 +392,7 @@ spa_txg_history_init(spa_t *spa) > spa_txg_history_data, spa_txg_history_addr); > kstat_install(ksp); > } > + strfree(name); > } > > static void > @@ -598,7 +600,7 @@ static void > spa_tx_assign_init(spa_t *spa) > { > spa_stats_history_t *ssh = &spa->spa_stats.tx_assign_histogram; > - char name[KSTAT_STRLEN]; > + char *name; > kstat_named_t *ks; > kstat_t *ksp; > int i; > @@ -609,7 +611,7 @@ spa_tx_assign_init(spa_t *spa) > ssh->size = ssh->count * sizeof (kstat_named_t); > ssh->private = kmem_alloc(ssh->size, KM_SLEEP); > > - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); > + name = kmem_asprintf("zfs/%s", spa_name(spa)); > > for (i = 0; i < ssh->count; i++) { > ks = &((kstat_named_t *)ssh->private)[i]; > @@ -632,6 +634,7 @@ spa_tx_assign_init(spa_t *spa) > ksp->ks_update = spa_tx_assign_update; > kstat_install(ksp); > } > + strfree(name); > } > > static void > @@ -678,12 +681,12 @@ static void > spa_io_history_init(spa_t *spa) > { > spa_stats_history_t *ssh = &spa->spa_stats.io_history; > - char name[KSTAT_STRLEN]; > + char *name; > kstat_t *ksp; > > mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL); > > - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); > + name = kmem_asprintf("zfs/%s", spa_name(spa)); > > ksp = kstat_create(name, 0, "io", "disk", KSTAT_TYPE_IO, 1, 0); > ssh->kstat = ksp; > @@ -694,6 +697,7 @@ spa_io_history_init(spa_t *spa) > ksp->ks_update = spa_io_history_update; > kstat_install(ksp); > } > + strfree(name); > } > > static void > @@ -806,7 +810,7 @@ static void > spa_mmp_history_init(spa_t *spa) > { > spa_stats_history_t *ssh = &spa->spa_stats.mmp_history; > - char name[KSTAT_STRLEN]; > + char *name; > kstat_t *ksp; > > mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL); > @@ -817,7 +821,7 @@ spa_mmp_history_init(spa_t *spa) > ssh->size = 0; > ssh->private = NULL; > > - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); > + name = kmem_asprintf("zfs/%s", spa_name(spa)); > > ksp = kstat_create(name, 0, "multihost", "misc", > KSTAT_TYPE_RAW, 0, KSTAT_FLAG_VIRTUAL); > @@ -832,6 +836,7 @@ spa_mmp_history_init(spa_t *spa) > spa_mmp_history_data, spa_mmp_history_addr); > kstat_install(ksp); > } > + strfree(name); > } > > static void > diff --git a/zfs/module/zfs/vdev_disk.c b/zfs/module/zfs/vdev_disk.c > index 5ae50a3..aecc351 100644 > --- a/zfs/module/zfs/vdev_disk.c > +++ b/zfs/module/zfs/vdev_disk.c > @@ -98,7 +98,7 @@ static void > vdev_disk_error(zio_t *zio) > { > #ifdef ZFS_DEBUG > - printk("ZFS: zio error=%d type=%d offset=%llu size=%llu " > + printk(KERN_WARNING "ZFS: zio error=%d type=%d offset=%llu size=%llu " > "flags=%x\n", zio->io_error, zio->io_type, > (u_longlong_t)zio->io_offset, (u_longlong_t)zio->io_size, > zio->io_flags); > diff --git a/zfs/module/zfs/vdev_mirror.c b/zfs/module/zfs/vdev_mirror.c > index 0439e4b..d230b4d 100644 > --- a/zfs/module/zfs/vdev_mirror.c > +++ b/zfs/module/zfs/vdev_mirror.c > @@ -116,7 +116,8 @@ static const zio_vsd_ops_t vdev_mirror_vsd_ops = { > static int > vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset) > { > - uint64_t lastoffset; > + uint64_t last_offset; > + int64_t offset_diff; > int load; > > /* All DVAs have equal weight at the root. */ > @@ -129,13 +130,17 @@ vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset) > * worse overall when resilvering with compared to without. > */ > > + /* Fix zio_offset for leaf vdevs */ > + if (vd->vdev_ops->vdev_op_leaf) > + zio_offset += VDEV_LABEL_START_SIZE; > + > /* Standard load based on pending queue length. */ > load = vdev_queue_length(vd); > - lastoffset = vdev_queue_lastoffset(vd); > + last_offset = vdev_queue_last_offset(vd); > > if (vd->vdev_nonrot) { > /* Non-rotating media. */ > - if (lastoffset == zio_offset) > + if (last_offset == zio_offset) > return (load + zfs_vdev_mirror_non_rotating_inc); > > /* > @@ -148,16 +153,16 @@ vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset) > } > > /* Rotating media I/O's which directly follow the last I/O. */ > - if (lastoffset == zio_offset) > + if (last_offset == zio_offset) > return (load + zfs_vdev_mirror_rotating_inc); > > /* > * Apply half the seek increment to I/O's within seek offset > - * of the last I/O queued to this vdev as they should incur less > + * of the last I/O issued to this vdev as they should incur less > * of a seek increment. > */ > - if (ABS(lastoffset - zio_offset) < > - zfs_vdev_mirror_rotating_seek_offset) > + offset_diff = (int64_t)(last_offset - zio_offset); > + if (ABS(offset_diff) < zfs_vdev_mirror_rotating_seek_offset) > return (load + (zfs_vdev_mirror_rotating_seek_inc / 2)); > > /* Apply the full seek increment to all other I/O's. */ > @@ -382,29 +387,20 @@ vdev_mirror_child_select(zio_t *zio) > mm->mm_preferred_cnt++; > } > > - if (mm->mm_preferred_cnt == 1) { > - vdev_queue_register_lastoffset( > - mm->mm_child[mm->mm_preferred[0]].mc_vd, zio); > + if (mm->mm_preferred_cnt == 1) > return (mm->mm_preferred[0]); > - } > > - if (mm->mm_preferred_cnt > 1) { > - int c = vdev_mirror_preferred_child_randomize(zio); > > - vdev_queue_register_lastoffset(mm->mm_child[c].mc_vd, zio); > - return (c); > - } > + if (mm->mm_preferred_cnt > 1) > + return (vdev_mirror_preferred_child_randomize(zio)); > > /* > * Every device is either missing or has this txg in its DTL. > * Look for any child we haven't already tried before giving up. > */ > for (c = 0; c < mm->mm_children; c++) { > - if (!mm->mm_child[c].mc_tried) { > - vdev_queue_register_lastoffset(mm->mm_child[c].mc_vd, > - zio); > + if (!mm->mm_child[c].mc_tried) > return (c); > - } > } > > /* > diff --git a/zfs/module/zfs/vdev_queue.c b/zfs/module/zfs/vdev_queue.c > index 6b3e872..40cba34 100644 > --- a/zfs/module/zfs/vdev_queue.c > +++ b/zfs/module/zfs/vdev_queue.c > @@ -393,7 +393,7 @@ vdev_queue_init(vdev_t *vd) > sizeof (zio_t), offsetof(struct zio, io_queue_node)); > } > > - vq->vq_lastoffset = 0; > + vq->vq_last_offset = 0; > } > > void > @@ -699,9 +699,8 @@ vdev_queue_io_to_issue(vdev_queue_t *vq) > */ > tree = vdev_queue_class_tree(vq, p); > vq->vq_io_search.io_timestamp = 0; > - vq->vq_io_search.io_offset = vq->vq_last_offset + 1; > - VERIFY3P(avl_find(tree, &vq->vq_io_search, > - &idx), ==, NULL); > + vq->vq_io_search.io_offset = vq->vq_last_offset - 1; > + VERIFY3P(avl_find(tree, &vq->vq_io_search, &idx), ==, NULL); > zio = avl_nearest(tree, idx, AVL_AFTER); > if (zio == NULL) > zio = avl_first(tree); > @@ -728,7 +727,7 @@ vdev_queue_io_to_issue(vdev_queue_t *vq) > } > > vdev_queue_pending_add(vq, zio); > - vq->vq_last_offset = zio->io_offset; > + vq->vq_last_offset = zio->io_offset + zio->io_size; > > return (zio); > } > @@ -806,7 +805,7 @@ vdev_queue_io_done(zio_t *zio) > } > > /* > - * As these three methods are only used for load calculations we're not > + * As these two methods are only used for load calculations we're not > * concerned if we get an incorrect value on 32bit platforms due to lack of > * vq_lock mutex use here, instead we prefer to keep it lock free for > * performance. > @@ -818,15 +817,9 @@ vdev_queue_length(vdev_t *vd) > } > > uint64_t > -vdev_queue_lastoffset(vdev_t *vd) > +vdev_queue_last_offset(vdev_t *vd) > { > - return (vd->vdev_queue.vq_lastoffset); > -} > - > -void > -vdev_queue_register_lastoffset(vdev_t *vd, zio_t *zio) > -{ > - vd->vdev_queue.vq_lastoffset = zio->io_offset + zio->io_size; > + return (vd->vdev_queue.vq_last_offset); > } > > #if defined(_KERNEL) && defined(HAVE_SPL) > diff --git a/zfs/module/zfs/zfs_acl.c b/zfs/module/zfs/zfs_acl.c > index 7ddedea..1fcfca0 100644 > --- a/zfs/module/zfs/zfs_acl.c > +++ b/zfs/module/zfs/zfs_acl.c > @@ -1323,6 +1323,7 @@ zfs_aclset_common(znode_t *zp, zfs_acl_t *aclp, cred_t *cr, dmu_tx_t *tx) > sa_bulk_attr_t bulk[5]; > uint64_t ctime[2]; > int count = 0; > + zfs_acl_phys_t acl_phys; > > mode = zp->z_mode; > > @@ -1369,7 +1370,6 @@ zfs_aclset_common(znode_t *zp, zfs_acl_t *aclp, cred_t *cr, dmu_tx_t *tx) > } else { /* Painful legacy way */ > zfs_acl_node_t *aclnode; > uint64_t off = 0; > - zfs_acl_phys_t acl_phys; > uint64_t aoid; > > if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_ZNODE_ACL(zfsvfs), > diff --git a/zfs/module/zfs/zfs_dir.c b/zfs/module/zfs/zfs_dir.c > index c6ee302..9a8bbcc 100644 > --- a/zfs/module/zfs/zfs_dir.c > +++ b/zfs/module/zfs/zfs_dir.c > @@ -977,11 +977,25 @@ zfs_link_destroy(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag, > * Indicate whether the directory is empty. Works with or without z_lock > * held, but can only be consider a hint in the latter case. Returns true > * if only "." and ".." remain and there's no work in progress. > + * > + * The internal ZAP size, rather than zp->z_size, needs to be checked since > + * some consumers (Lustre) do not strictly maintain an accurate SA_ZPL_SIZE. > */ > boolean_t > zfs_dirempty(znode_t *dzp) > { > - return (dzp->z_size == 2 && dzp->z_dirlocks == 0); > + zfsvfs_t *zfsvfs = ZTOZSB(dzp); > + uint64_t count; > + int error; > + > + if (dzp->z_dirlocks != NULL) > + return (B_FALSE); > + > + error = zap_count(zfsvfs->z_os, dzp->z_id, &count); > + if (error != 0 || count != 0) > + return (B_FALSE); > + > + return (B_TRUE); > } > > int > diff --git a/zfs/module/zfs/zfs_fm.c b/zfs/module/zfs/zfs_fm.c > index 3986b39..1c66ed6 100644 > --- a/zfs/module/zfs/zfs_fm.c > +++ b/zfs/module/zfs/zfs_fm.c > @@ -455,8 +455,8 @@ zfs_ereport_start(nvlist_t **ereport_out, nvlist_t **detector_out, > > typedef struct zfs_ecksum_info { > /* histograms of set and cleared bits by bit number in a 64-bit word */ > - uint16_t zei_histogram_set[sizeof (uint64_t) * NBBY]; > - uint16_t zei_histogram_cleared[sizeof (uint64_t) * NBBY]; > + uint32_t zei_histogram_set[sizeof (uint64_t) * NBBY]; > + uint32_t zei_histogram_cleared[sizeof (uint64_t) * NBBY]; > > /* inline arrays of bits set and cleared. */ > uint64_t zei_bits_set[ZFM_MAX_INLINE]; > @@ -481,7 +481,7 @@ typedef struct zfs_ecksum_info { > } zfs_ecksum_info_t; > > static void > -update_histogram(uint64_t value_arg, uint16_t *hist, uint32_t *count) > +update_histogram(uint64_t value_arg, uint32_t *hist, uint32_t *count) > { > size_t i; > size_t bits = 0; > @@ -490,8 +490,7 @@ update_histogram(uint64_t value_arg, uint16_t *hist, uint32_t *count) > /* We store the bits in big-endian (largest-first) order */ > for (i = 0; i < 64; i++) { > if (value & (1ull << i)) { > - if (hist[63 - i] < UINT16_MAX) > - hist[63 - i]++; > + hist[63 - i]++; > ++bits; > } > } > @@ -649,6 +648,7 @@ annotate_ecksum(nvlist_t *ereport, zio_bad_cksum_t *info, > if (badabd == NULL || goodabd == NULL) > return (eip); > > + ASSERT3U(nui64s, <=, UINT32_MAX); > ASSERT3U(size, ==, nui64s * sizeof (uint64_t)); > ASSERT3U(size, <=, SPA_MAXBLOCKSIZE); > ASSERT3U(size, <=, UINT32_MAX); > @@ -759,10 +759,10 @@ annotate_ecksum(nvlist_t *ereport, zio_bad_cksum_t *info, > } else { > fm_payload_set(ereport, > FM_EREPORT_PAYLOAD_ZFS_BAD_SET_HISTOGRAM, > - DATA_TYPE_UINT16_ARRAY, > + DATA_TYPE_UINT32_ARRAY, > NBBY * sizeof (uint64_t), eip->zei_histogram_set, > FM_EREPORT_PAYLOAD_ZFS_BAD_CLEARED_HISTOGRAM, > - DATA_TYPE_UINT16_ARRAY, > + DATA_TYPE_UINT32_ARRAY, > NBBY * sizeof (uint64_t), eip->zei_histogram_cleared, > NULL); > } > diff --git a/zfs/module/zfs/zfs_ioctl.c b/zfs/module/zfs/zfs_ioctl.c > index d195ede..f41e1b9 100644 > --- a/zfs/module/zfs/zfs_ioctl.c > +++ b/zfs/module/zfs/zfs_ioctl.c > @@ -3738,9 +3738,12 @@ zfs_ioc_rename(zfs_cmd_t *zc) > boolean_t recursive = zc->zc_cookie & 1; > char *at; > > + /* "zfs rename" from and to ...%recv datasets should both fail */ > + zc->zc_name[sizeof (zc->zc_name) - 1] = '\0'; > zc->zc_value[sizeof (zc->zc_value) - 1] = '\0'; > - if (dataset_namecheck(zc->zc_value, NULL, NULL) != 0 || > - strchr(zc->zc_value, '%')) > + if (dataset_namecheck(zc->zc_name, NULL, NULL) != 0 || > + dataset_namecheck(zc->zc_value, NULL, NULL) != 0 || > + strchr(zc->zc_name, '%') || strchr(zc->zc_value, '%')) > return (SET_ERROR(EINVAL)); > > at = strchr(zc->zc_name, '@'); > @@ -5002,6 +5005,11 @@ zfs_ioc_promote(zfs_cmd_t *zc) > char *cp; > int error; > > + zc->zc_name[sizeof (zc->zc_name) - 1] = '\0'; > + if (dataset_namecheck(zc->zc_name, NULL, NULL) != 0 || > + strchr(zc->zc_name, '%')) > + return (SET_ERROR(EINVAL)); > + > error = dsl_pool_hold(zc->zc_name, FTAG, &dp); > if (error != 0) > return (error); > @@ -5901,20 +5909,26 @@ static int > zfs_ioc_pool_sync(const char *pool, nvlist_t *innvl, nvlist_t *onvl) > { > int err; > - boolean_t force; > + boolean_t force = B_FALSE; > spa_t *spa; > > if ((err = spa_open(pool, &spa, FTAG)) != 0) > return (err); > > - force = fnvlist_lookup_boolean_value(innvl, "force"); > + if (innvl) { > + if (nvlist_lookup_boolean_value(innvl, "force", &force) != 0) { > + err = SET_ERROR(EINVAL); > + goto out; > + } > + } > + > if (force) { > spa_config_enter(spa, SCL_CONFIG, FTAG, RW_WRITER); > vdev_config_dirty(spa->spa_root_vdev); > spa_config_exit(spa, SCL_CONFIG, FTAG); > } > txg_wait_synced(spa_get_dsl(spa), 0); > - > +out: > spa_close(spa, FTAG); > > return (err); > diff --git a/zfs/module/zfs/zil.c b/zfs/module/zfs/zil.c > index 4d714ce..1e3e69d 100644 > --- a/zfs/module/zfs/zil.c > +++ b/zfs/module/zfs/zil.c > @@ -1009,7 +1009,24 @@ zil_lwb_write_start(zilog_t *zilog, lwb_t *lwb) > * to clean up in the event of allocation failure or I/O failure. > */ > tx = dmu_tx_create(zilog->zl_os); > - VERIFY(dmu_tx_assign(tx, TXG_WAIT) == 0); > + > + /* > + * Since we are not going to create any new dirty data and we can even > + * help with clearing the existing dirty data, we should not be subject > + * to the dirty data based delays. > + * We (ab)use TXG_WAITED to bypass the delay mechanism. > + * One side effect from using TXG_WAITED is that dmu_tx_assign() can > + * fail if the pool is suspended. Those are dramatic circumstances, > + * so we return NULL to signal that the normal ZIL processing is not > + * possible and txg_wait_synced() should be used to ensure that the data > + * is on disk. > + */ > + error = dmu_tx_assign(tx, TXG_WAITED); > + if (error != 0) { > + ASSERT3S(error, ==, EIO); > + dmu_tx_abort(tx); > + return (NULL); > + } > dsl_dataset_dirty(dmu_objset_ds(zilog->zl_os), tx); > txg = dmu_tx_get_txg(tx); > > @@ -1435,8 +1452,7 @@ zil_clean(zilog_t *zilog, uint64_t synced_txg) > return; > } > ASSERT3U(itxg->itxg_txg, <=, synced_txg); > - ASSERT(itxg->itxg_txg != 0); > - ASSERT(zilog->zl_clean_taskq != NULL); > + ASSERT3U(itxg->itxg_txg, !=, 0); > clean_me = itxg->itxg_itxs; > itxg->itxg_itxs = NULL; > itxg->itxg_txg = 0; > @@ -1447,8 +1463,11 @@ zil_clean(zilog_t *zilog, uint64_t synced_txg) > * free it in-line. This should be rare. Note, using TQ_SLEEP > * created a bad performance problem. > */ > - if (taskq_dispatch(zilog->zl_clean_taskq, > - (void (*)(void *))zil_itxg_clean, clean_me, TQ_NOSLEEP) == 0) > + ASSERT3P(zilog->zl_dmu_pool, !=, NULL); > + ASSERT3P(zilog->zl_dmu_pool->dp_zil_clean_taskq, !=, NULL); > + taskqid_t id = taskq_dispatch(zilog->zl_dmu_pool->dp_zil_clean_taskq, > + (void (*)(void *))zil_itxg_clean, clean_me, TQ_NOSLEEP); > + if (id == TASKQID_INVALID) > zil_itxg_clean(clean_me); > } > > @@ -1921,13 +1940,10 @@ zil_open(objset_t *os, zil_get_data_t *get_data) > { > zilog_t *zilog = dmu_objset_zil(os); > > - ASSERT(zilog->zl_clean_taskq == NULL); > ASSERT(zilog->zl_get_data == NULL); > ASSERT(list_is_empty(&zilog->zl_lwb_list)); > > zilog->zl_get_data = get_data; > - zilog->zl_clean_taskq = taskq_create("zil_clean", 1, defclsyspri, > - 2, 2, TASKQ_PREPOPULATE); > > return (zilog); > } > @@ -1962,8 +1978,6 @@ zil_close(zilog_t *zilog) > if (txg < spa_freeze_txg(zilog->zl_spa)) > VERIFY(!zilog_is_dirty(zilog)); > > - taskq_destroy(zilog->zl_clean_taskq); > - zilog->zl_clean_taskq = NULL; > zilog->zl_get_data = NULL; > > /* > diff --git a/zfs/module/zfs/zle.c b/zfs/module/zfs/zle.c > index 13c5673..613607f 100644 > --- a/zfs/module/zfs/zle.c > +++ b/zfs/module/zfs/zle.c > @@ -74,10 +74,14 @@ zle_decompress(void *s_start, void *d_start, size_t s_len, size_t d_len, int n) > while (src < s_end && dst < d_end) { > int len = 1 + *src++; > if (len <= n) { > + if (src + len > s_end || dst + len > d_end) > + return (-1); > while (len-- != 0) > *dst++ = *src++; > } else { > len -= n; > + if (dst + len > d_end) > + return (-1); > while (len-- != 0) > *dst++ = 0; > } > diff --git a/zfs/module/zfs/zvol.c b/zfs/module/zfs/zvol.c > index 5293f95..5b62bf94 100644 > --- a/zfs/module/zfs/zvol.c > +++ b/zfs/module/zfs/zvol.c > @@ -1347,9 +1347,9 @@ zvol_open(struct block_device *bdev, fmode_t flag) > { > zvol_state_t *zv; > int error = 0; > - boolean_t drop_suspend = B_FALSE; > + boolean_t drop_suspend = B_TRUE; > > - ASSERT(!mutex_owned(&zvol_state_lock)); > + ASSERT(!MUTEX_HELD(&zvol_state_lock)); > > mutex_enter(&zvol_state_lock); > /* > @@ -1364,23 +1364,31 @@ zvol_open(struct block_device *bdev, fmode_t flag) > return (SET_ERROR(-ENXIO)); > } > > - /* take zv_suspend_lock before zv_state_lock */ > - rw_enter(&zv->zv_suspend_lock, RW_READER); > - > mutex_enter(&zv->zv_state_lock); > - > /* > * make sure zvol is not suspended during first open > - * (hold zv_suspend_lock), otherwise, drop the lock > + * (hold zv_suspend_lock) and respect proper lock acquisition > + * ordering - zv_suspend_lock before zv_state_lock > */ > if (zv->zv_open_count == 0) { > - drop_suspend = B_TRUE; > + if (!rw_tryenter(&zv->zv_suspend_lock, RW_READER)) { > + mutex_exit(&zv->zv_state_lock); > + rw_enter(&zv->zv_suspend_lock, RW_READER); > + mutex_enter(&zv->zv_state_lock); > + /* check to see if zv_suspend_lock is needed */ > + if (zv->zv_open_count != 0) { > + rw_exit(&zv->zv_suspend_lock); > + drop_suspend = B_FALSE; > + } > + } > } else { > - rw_exit(&zv->zv_suspend_lock); > + drop_suspend = B_FALSE; > } > - > mutex_exit(&zvol_state_lock); > > + ASSERT(MUTEX_HELD(&zv->zv_state_lock)); > + ASSERT(zv->zv_open_count != 0 || RW_READ_HELD(&zv->zv_suspend_lock)); > + > if (zv->zv_open_count == 0) { > error = zvol_first_open(zv); > if (error) > @@ -1417,28 +1425,38 @@ static int > zvol_release(struct gendisk *disk, fmode_t mode) > { > zvol_state_t *zv; > - boolean_t drop_suspend = B_FALSE; > + boolean_t drop_suspend = B_TRUE; > > - ASSERT(!mutex_owned(&zvol_state_lock)); > + ASSERT(!MUTEX_HELD(&zvol_state_lock)); > > mutex_enter(&zvol_state_lock); > zv = disk->private_data; > - ASSERT(zv && zv->zv_open_count > 0); > - > - /* take zv_suspend_lock before zv_state_lock */ > - rw_enter(&zv->zv_suspend_lock, RW_READER); > > mutex_enter(&zv->zv_state_lock); > - mutex_exit(&zvol_state_lock); > - > + ASSERT(zv->zv_open_count > 0); > /* > * make sure zvol is not suspended during last close > - * (hold zv_suspend_lock), otherwise, drop the lock > + * (hold zv_suspend_lock) and respect proper lock acquisition > + * ordering - zv_suspend_lock before zv_state_lock > */ > - if (zv->zv_open_count == 1) > - drop_suspend = B_TRUE; > - else > - rw_exit(&zv->zv_suspend_lock); > + if (zv->zv_open_count == 1) { > + if (!rw_tryenter(&zv->zv_suspend_lock, RW_READER)) { > + mutex_exit(&zv->zv_state_lock); > + rw_enter(&zv->zv_suspend_lock, RW_READER); > + mutex_enter(&zv->zv_state_lock); > + /* check to see if zv_suspend_lock is needed */ > + if (zv->zv_open_count != 1) { > + rw_exit(&zv->zv_suspend_lock); > + drop_suspend = B_FALSE; > + } > + } > + } else { > + drop_suspend = B_FALSE; > + } > + mutex_exit(&zvol_state_lock); > + > + ASSERT(MUTEX_HELD(&zv->zv_state_lock)); > + ASSERT(zv->zv_open_count != 1 || RW_READ_HELD(&zv->zv_suspend_lock)); > > zv->zv_open_count--; > if (zv->zv_open_count == 0) > Is is possible to get this fix in on the next Bionic release?
On Sun, Apr 22, 2018 at 11:48:28AM +0100, Colin Ian King wrote:
<snip>
> Is is possible to get this fix in on the next Bionic release?
It should be included in the first bionic SRU, since it was sent after
kernel freeze. Since kernel freeze we've only been applying fixes
considered important for release, and I didn't see any indication that
this was urgently needed in the release kernel.
Thanks,
Seth
On 23/04/18 14:15, Seth Forshee wrote: > On Sun, Apr 22, 2018 at 11:48:28AM +0100, Colin Ian King wrote: > > <snip> > >> Is is possible to get this fix in on the next Bionic release? > > It should be included in the first bionic SRU, since it was sent after > kernel freeze. Since kernel freeze we've only been applying fixes > considered important for release, and I didn't see any indication that > this was urgently needed in the release kernel. Sure, it's SRU non-urgent, so that's good for me, apologies for the noise. Colin > > Thanks, > Seth >
Applied to bionic/master-next branch. Thanks, Kleber
diff --git a/zfs/META b/zfs/META index d624ae4..2110eef 100644 --- a/zfs/META +++ b/zfs/META @@ -2,7 +2,7 @@ Meta: 1 Name: zfs Branch: 1.0 Version: 0.7.5 -Release: 1ubuntu13 +Release: 1ubuntu15 Release-Tags: relext License: CDDL Author: OpenZFS on Linux diff --git a/zfs/include/sys/dmu.h b/zfs/include/sys/dmu.h index d246152..bcdf7d6 100644 --- a/zfs/include/sys/dmu.h +++ b/zfs/include/sys/dmu.h @@ -713,11 +713,16 @@ void dmu_tx_mark_netfree(dmu_tx_t *tx); * to stable storage and will also be called if the dmu_tx is aborted. * If there is any error which prevents the transaction from being committed to * disk, the callback will be called with a value of error != 0. + * + * When multiple callbacks are registered to the transaction, the callbacks + * will be called in reverse order to let Lustre, the only user of commit + * callback currently, take the fast path of its commit callback handling. */ typedef void dmu_tx_callback_func_t(void *dcb_data, int error); void dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *dcb_func, void *dcb_data); +void dmu_tx_do_callbacks(list_t *cb_list, int error); /* * Free up the data blocks for a defined range of a file. If size is diff --git a/zfs/include/sys/dmu_tx.h b/zfs/include/sys/dmu_tx.h index f16e1e8..d82a793 100644 --- a/zfs/include/sys/dmu_tx.h +++ b/zfs/include/sys/dmu_tx.h @@ -145,10 +145,6 @@ uint64_t dmu_tx_get_txg(dmu_tx_t *tx); struct dsl_pool *dmu_tx_pool(dmu_tx_t *tx); void dmu_tx_wait(dmu_tx_t *tx); -void dmu_tx_callback_register(dmu_tx_t *tx, dmu_tx_callback_func_t *dcb_func, - void *dcb_data); -void dmu_tx_do_callbacks(list_t *cb_list, int error); - /* * These routines are defined in dmu_spa.h, and are called by the SPA. */ diff --git a/zfs/include/sys/dsl_pool.h b/zfs/include/sys/dsl_pool.h index d2dabda..7eb6cb0 100644 --- a/zfs/include/sys/dsl_pool.h +++ b/zfs/include/sys/dsl_pool.h @@ -126,6 +126,7 @@ typedef struct dsl_pool { txg_list_t dp_dirty_dirs; txg_list_t dp_sync_tasks; taskq_t *dp_sync_taskq; + taskq_t *dp_zil_clean_taskq; /* * Protects administrative changes (properties, namespace) diff --git a/zfs/include/sys/trace_dmu.h b/zfs/include/sys/trace_dmu.h index 5ae59e5..24e57f5 100644 --- a/zfs/include/sys/trace_dmu.h +++ b/zfs/include/sys/trace_dmu.h @@ -50,7 +50,7 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class, __field(uint64_t, tx_lastsnap_txg) __field(uint64_t, tx_lasttried_txg) __field(boolean_t, tx_anyobj) - __field(boolean_t, tx_waited) + __field(boolean_t, tx_dirty_delayed) __field(hrtime_t, tx_start) __field(boolean_t, tx_wait_dirty) __field(int, tx_err) @@ -62,7 +62,7 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class, __entry->tx_lastsnap_txg = tx->tx_lastsnap_txg; __entry->tx_lasttried_txg = tx->tx_lasttried_txg; __entry->tx_anyobj = tx->tx_anyobj; - __entry->tx_waited = tx->tx_waited; + __entry->tx_dirty_delayed = tx->tx_dirty_delayed; __entry->tx_start = tx->tx_start; __entry->tx_wait_dirty = tx->tx_wait_dirty; __entry->tx_err = tx->tx_err; @@ -70,11 +70,12 @@ DECLARE_EVENT_CLASS(zfs_delay_mintime_class, __entry->min_tx_time = min_tx_time; ), TP_printk("tx { txg %llu lastsnap_txg %llu tx_lasttried_txg %llu " - "anyobj %d waited %d start %llu wait_dirty %d err %i " + "anyobj %d dirty_delayed %d start %llu wait_dirty %d err %i " "} dirty %llu min_tx_time %llu", __entry->tx_txg, __entry->tx_lastsnap_txg, - __entry->tx_lasttried_txg, __entry->tx_anyobj, __entry->tx_waited, - __entry->tx_start, __entry->tx_wait_dirty, __entry->tx_err, + __entry->tx_lasttried_txg, __entry->tx_anyobj, + __entry->tx_dirty_delayed, __entry->tx_start, + __entry->tx_wait_dirty, __entry->tx_err, __entry->dirty, __entry->min_tx_time) ); /* END CSTYLED */ diff --git a/zfs/include/sys/vdev.h b/zfs/include/sys/vdev.h index 7157ef4..473d269 100644 --- a/zfs/include/sys/vdev.h +++ b/zfs/include/sys/vdev.h @@ -125,8 +125,7 @@ extern zio_t *vdev_queue_io(zio_t *zio); extern void vdev_queue_io_done(zio_t *zio); extern int vdev_queue_length(vdev_t *vd); -extern uint64_t vdev_queue_lastoffset(vdev_t *vd); -extern void vdev_queue_register_lastoffset(vdev_t *vd, zio_t *zio); +extern uint64_t vdev_queue_last_offset(vdev_t *vd); extern void vdev_config_dirty(vdev_t *vd); extern void vdev_config_clean(vdev_t *vd); diff --git a/zfs/include/sys/vdev_impl.h b/zfs/include/sys/vdev_impl.h index 7c5e54b..4c2e3cd 100644 --- a/zfs/include/sys/vdev_impl.h +++ b/zfs/include/sys/vdev_impl.h @@ -127,7 +127,6 @@ struct vdev_queue { hrtime_t vq_io_delta_ts; zio_t vq_io_search; /* used as local for stack reduction */ kmutex_t vq_lock; - uint64_t vq_lastoffset; }; /* diff --git a/zfs/include/sys/vdev_raidz_impl.h b/zfs/include/sys/vdev_raidz_impl.h index 4bd15e3..0799ed1 100644 --- a/zfs/include/sys/vdev_raidz_impl.h +++ b/zfs/include/sys/vdev_raidz_impl.h @@ -102,30 +102,30 @@ typedef struct raidz_impl_ops { } raidz_impl_ops_t; typedef struct raidz_col { - size_t rc_devidx; /* child device index for I/O */ - size_t rc_offset; /* device offset */ - size_t rc_size; /* I/O size */ + uint64_t rc_devidx; /* child device index for I/O */ + uint64_t rc_offset; /* device offset */ + uint64_t rc_size; /* I/O size */ abd_t *rc_abd; /* I/O data */ void *rc_gdata; /* used to store the "good" version */ int rc_error; /* I/O error for this device */ - unsigned int rc_tried; /* Did we attempt this I/O column? */ - unsigned int rc_skipped; /* Did we skip this I/O column? */ + uint8_t rc_tried; /* Did we attempt this I/O column? */ + uint8_t rc_skipped; /* Did we skip this I/O column? */ } raidz_col_t; typedef struct raidz_map { - size_t rm_cols; /* Regular column count */ - size_t rm_scols; /* Count including skipped columns */ - size_t rm_bigcols; /* Number of oversized columns */ - size_t rm_asize; /* Actual total I/O size */ - size_t rm_missingdata; /* Count of missing data devices */ - size_t rm_missingparity; /* Count of missing parity devices */ - size_t rm_firstdatacol; /* First data column/parity count */ - size_t rm_nskip; /* Skipped sectors for padding */ - size_t rm_skipstart; /* Column index of padding start */ + uint64_t rm_cols; /* Regular column count */ + uint64_t rm_scols; /* Count including skipped columns */ + uint64_t rm_bigcols; /* Number of oversized columns */ + uint64_t rm_asize; /* Actual total I/O size */ + uint64_t rm_missingdata; /* Count of missing data devices */ + uint64_t rm_missingparity; /* Count of missing parity devices */ + uint64_t rm_firstdatacol; /* First data column/parity count */ + uint64_t rm_nskip; /* Skipped sectors for padding */ + uint64_t rm_skipstart; /* Column index of padding start */ abd_t *rm_abd_copy; /* rm_asize-buffer of copied data */ - size_t rm_reports; /* # of referencing checksum reports */ - unsigned int rm_freed; /* map no longer has referencing ZIO */ - unsigned int rm_ecksuminjected; /* checksum error was injected */ + uintptr_t rm_reports; /* # of referencing checksum reports */ + uint8_t rm_freed; /* map no longer has referencing ZIO */ + uint8_t rm_ecksuminjected; /* checksum error was injected */ raidz_impl_ops_t *rm_ops; /* RAIDZ math operations */ raidz_col_t rm_col[1]; /* Flexible array of I/O columns */ } raidz_map_t; diff --git a/zfs/include/sys/zil_impl.h b/zfs/include/sys/zil_impl.h index 13ecca3..dd5304b 100644 --- a/zfs/include/sys/zil_impl.h +++ b/zfs/include/sys/zil_impl.h @@ -124,7 +124,6 @@ struct zilog { list_t zl_lwb_list; /* in-flight log write list */ kmutex_t zl_vdev_lock; /* protects zl_vdev_tree */ avl_tree_t zl_vdev_tree; /* vdevs to flush in zil_commit() */ - taskq_t *zl_clean_taskq; /* runs lwb and itx clean tasks */ avl_tree_t zl_bp_tree; /* track bps during log parse */ clock_t zl_replay_time; /* lbolt of when replay started */ uint64_t zl_replay_blks; /* number of log blocks replayed */ diff --git a/zfs/module/icp/asm-x86_64/aes/aes_intel.S b/zfs/module/icp/asm-x86_64/aes/aes_intel.S index ed0df75..a40e30f 100644 --- a/zfs/module/icp/asm-x86_64/aes/aes_intel.S +++ b/zfs/module/icp/asm-x86_64/aes/aes_intel.S @@ -207,7 +207,7 @@ _key_expansion_256a_local: shufps $0b10001100, %xmm0, %xmm4 pxor %xmm4, %xmm0 pxor %xmm1, %xmm0 - movaps %xmm0, (%rcx) + movups %xmm0, (%rcx) add $0x10, %rcx ret nop @@ -224,18 +224,18 @@ _key_expansion_192a_local: pxor %xmm4, %xmm0 pxor %xmm1, %xmm0 - movaps %xmm2, %xmm5 - movaps %xmm2, %xmm6 + movups %xmm2, %xmm5 + movups %xmm2, %xmm6 pslldq $4, %xmm5 pshufd $0b11111111, %xmm0, %xmm3 pxor %xmm3, %xmm2 pxor %xmm5, %xmm2 - movaps %xmm0, %xmm1 + movups %xmm0, %xmm1 shufps $0b01000100, %xmm0, %xmm6 - movaps %xmm6, (%rcx) + movups %xmm6, (%rcx) shufps $0b01001110, %xmm2, %xmm1 - movaps %xmm1, 0x10(%rcx) + movups %xmm1, 0x10(%rcx) add $0x20, %rcx ret SET_SIZE(_key_expansion_192a) @@ -250,13 +250,13 @@ _key_expansion_192b_local: pxor %xmm4, %xmm0 pxor %xmm1, %xmm0 - movaps %xmm2, %xmm5 + movups %xmm2, %xmm5 pslldq $4, %xmm5 pshufd $0b11111111, %xmm0, %xmm3 pxor %xmm3, %xmm2 pxor %xmm5, %xmm2 - movaps %xmm0, (%rcx) + movups %xmm0, (%rcx) add $0x10, %rcx ret SET_SIZE(_key_expansion_192b) @@ -270,7 +270,7 @@ _key_expansion_256b_local: shufps $0b10001100, %xmm2, %xmm4 pxor %xmm4, %xmm2 pxor %xmm1, %xmm2 - movaps %xmm2, (%rcx) + movups %xmm2, (%rcx) add $0x10, %rcx ret SET_SIZE(_key_expansion_256b) @@ -327,7 +327,7 @@ rijndael_key_setup_enc_intel_local: jz .Lenc_key_invalid_param movups (%USERCIPHERKEY), %xmm0 // user key (first 16 bytes) - movaps %xmm0, (%AESKEY) + movups %xmm0, (%AESKEY) lea 0x10(%AESKEY), %rcx // key addr pxor %xmm4, %xmm4 // xmm4 is assumed 0 in _key_expansion_x @@ -341,7 +341,7 @@ rijndael_key_setup_enc_intel_local: #endif /* OPENSSL_INTERFACE */ movups 0x10(%USERCIPHERKEY), %xmm2 // other user key (2nd 16 bytes) - movaps %xmm2, (%rcx) + movups %xmm2, (%rcx) add $0x10, %rcx aeskeygenassist $0x1, %xmm2, %xmm1 // expand the key @@ -525,10 +525,10 @@ FRAME_BEGIN .align 4 .Ldec_key_reorder_loop: - movaps (%AESKEY), %xmm0 - movaps (%ROUNDS64), %xmm1 - movaps %xmm0, (%ROUNDS64) - movaps %xmm1, (%AESKEY) + movups (%AESKEY), %xmm0 + movups (%ROUNDS64), %xmm1 + movups %xmm0, (%ROUNDS64) + movups %xmm1, (%AESKEY) lea 0x10(%AESKEY), %AESKEY lea -0x10(%ROUNDS64), %ROUNDS64 cmp %AESKEY, %ROUNDS64 @@ -536,11 +536,11 @@ FRAME_BEGIN .align 4 .Ldec_key_inv_loop: - movaps (%rcx), %xmm0 + movups (%rcx), %xmm0 // Convert an encryption round key to a form usable for decryption // with the "AES Inverse Mix Columns" instruction aesimc %xmm0, %xmm1 - movaps %xmm1, (%rcx) + movups %xmm1, (%rcx) lea 0x10(%rcx), %rcx cmp %ENDAESKEY, %rcx jnz .Ldec_key_inv_loop @@ -602,7 +602,7 @@ FRAME_BEGIN ENTRY_NP(aes_encrypt_intel) movups (%INP), %STATE // input - movaps (%KEYP), %KEY // key + movups (%KEYP), %KEY // key #ifdef OPENSSL_INTERFACE mov 240(%KEYP), %NROUNDS32 // round count #else /* OpenSolaris Interface */ @@ -618,41 +618,41 @@ ENTRY_NP(aes_encrypt_intel) // AES 256 lea 0x20(%KEYP), %KEYP - movaps -0x60(%KEYP), %KEY + movups -0x60(%KEYP), %KEY aesenc %KEY, %STATE - movaps -0x50(%KEYP), %KEY + movups -0x50(%KEYP), %KEY aesenc %KEY, %STATE .align 4 .Lenc192: // AES 192 and 256 - movaps -0x40(%KEYP), %KEY + movups -0x40(%KEYP), %KEY aesenc %KEY, %STATE - movaps -0x30(%KEYP), %KEY + movups -0x30(%KEYP), %KEY aesenc %KEY, %STATE .align 4 .Lenc128: // AES 128, 192, and 256 - movaps -0x20(%KEYP), %KEY + movups -0x20(%KEYP), %KEY aesenc %KEY, %STATE - movaps -0x10(%KEYP), %KEY + movups -0x10(%KEYP), %KEY aesenc %KEY, %STATE - movaps (%KEYP), %KEY + movups (%KEYP), %KEY aesenc %KEY, %STATE - movaps 0x10(%KEYP), %KEY + movups 0x10(%KEYP), %KEY aesenc %KEY, %STATE - movaps 0x20(%KEYP), %KEY + movups 0x20(%KEYP), %KEY aesenc %KEY, %STATE - movaps 0x30(%KEYP), %KEY + movups 0x30(%KEYP), %KEY aesenc %KEY, %STATE - movaps 0x40(%KEYP), %KEY + movups 0x40(%KEYP), %KEY aesenc %KEY, %STATE - movaps 0x50(%KEYP), %KEY + movups 0x50(%KEYP), %KEY aesenc %KEY, %STATE - movaps 0x60(%KEYP), %KEY + movups 0x60(%KEYP), %KEY aesenc %KEY, %STATE - movaps 0x70(%KEYP), %KEY + movups 0x70(%KEYP), %KEY aesenclast %KEY, %STATE // last round movups %STATE, (%OUTP) // output @@ -685,7 +685,7 @@ ENTRY_NP(aes_encrypt_intel) ENTRY_NP(aes_decrypt_intel) movups (%INP), %STATE // input - movaps (%KEYP), %KEY // key + movups (%KEYP), %KEY // key #ifdef OPENSSL_INTERFACE mov 240(%KEYP), %NROUNDS32 // round count #else /* OpenSolaris Interface */ @@ -701,41 +701,41 @@ ENTRY_NP(aes_decrypt_intel) // AES 256 lea 0x20(%KEYP), %KEYP - movaps -0x60(%KEYP), %KEY + movups -0x60(%KEYP), %KEY aesdec %KEY, %STATE - movaps -0x50(%KEYP), %KEY + movups -0x50(%KEYP), %KEY aesdec %KEY, %STATE .align 4 .Ldec192: // AES 192 and 256 - movaps -0x40(%KEYP), %KEY + movups -0x40(%KEYP), %KEY aesdec %KEY, %STATE - movaps -0x30(%KEYP), %KEY + movups -0x30(%KEYP), %KEY aesdec %KEY, %STATE .align 4 .Ldec128: // AES 128, 192, and 256 - movaps -0x20(%KEYP), %KEY + movups -0x20(%KEYP), %KEY aesdec %KEY, %STATE - movaps -0x10(%KEYP), %KEY + movups -0x10(%KEYP), %KEY aesdec %KEY, %STATE - movaps (%KEYP), %KEY + movups (%KEYP), %KEY aesdec %KEY, %STATE - movaps 0x10(%KEYP), %KEY + movups 0x10(%KEYP), %KEY aesdec %KEY, %STATE - movaps 0x20(%KEYP), %KEY + movups 0x20(%KEYP), %KEY aesdec %KEY, %STATE - movaps 0x30(%KEYP), %KEY + movups 0x30(%KEYP), %KEY aesdec %KEY, %STATE - movaps 0x40(%KEYP), %KEY + movups 0x40(%KEYP), %KEY aesdec %KEY, %STATE - movaps 0x50(%KEYP), %KEY + movups 0x50(%KEYP), %KEY aesdec %KEY, %STATE - movaps 0x60(%KEYP), %KEY + movups 0x60(%KEYP), %KEY aesdec %KEY, %STATE - movaps 0x70(%KEYP), %KEY + movups 0x70(%KEYP), %KEY aesdeclast %KEY, %STATE // last round movups %STATE, (%OUTP) // output diff --git a/zfs/module/icp/asm-x86_64/modes/gcm_intel.S b/zfs/module/icp/asm-x86_64/modes/gcm_intel.S index a43b5eb..3aec0ee 100644 --- a/zfs/module/icp/asm-x86_64/modes/gcm_intel.S +++ b/zfs/module/icp/asm-x86_64/modes/gcm_intel.S @@ -150,7 +150,7 @@ ENTRY_NP(gcm_mul_pclmulqdq) // Byte swap 16-byte input // lea .Lbyte_swap16_mask(%rip), %rax - movaps (%rax), %xmm10 + movups (%rax), %xmm10 pshufb %xmm10, %xmm0 pshufb %xmm10, %xmm1 diff --git a/zfs/module/icp/spi/kcf_spi.c b/zfs/module/icp/spi/kcf_spi.c index c2c2b54..0a6e38d 100644 --- a/zfs/module/icp/spi/kcf_spi.c +++ b/zfs/module/icp/spi/kcf_spi.c @@ -111,7 +111,7 @@ int crypto_register_provider(crypto_provider_info_t *info, crypto_kcf_provider_handle_t *handle) { - char ks_name[KSTAT_STRLEN]; + char *ks_name; kcf_provider_desc_t *prov_desc = NULL; int ret = CRYPTO_ARGUMENTS_BAD; @@ -238,12 +238,12 @@ crypto_register_provider(crypto_provider_info_t *info, * This kstat is deleted, when the provider unregisters. */ if (prov_desc->pd_prov_type == CRYPTO_SW_PROVIDER) { - (void) snprintf(ks_name, KSTAT_STRLEN, "%s_%s", + ks_name = kmem_asprintf("%s_%s", "NONAME", "provider_stats"); } else { - (void) snprintf(ks_name, KSTAT_STRLEN, "%s_%d_%u_%s", - "NONAME", 0, - prov_desc->pd_prov_id, "provider_stats"); + ks_name = kmem_asprintf("%s_%d_%u_%s", + "NONAME", 0, prov_desc->pd_prov_id, + "provider_stats"); } prov_desc->pd_kstat = kstat_create("kcf", 0, ks_name, "crypto", @@ -261,6 +261,7 @@ crypto_register_provider(crypto_provider_info_t *info, prov_desc->pd_kstat->ks_update = kcf_prov_kstat_update; kstat_install(prov_desc->pd_kstat); } + strfree(ks_name); } if (prov_desc->pd_prov_type == CRYPTO_HW_PROVIDER) diff --git a/zfs/module/nvpair/nvpair.c b/zfs/module/nvpair/nvpair.c index 249b7c9..abed33e 100644 --- a/zfs/module/nvpair/nvpair.c +++ b/zfs/module/nvpair/nvpair.c @@ -21,7 +21,7 @@ /* * Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2015, 2016 by Delphix. All rights reserved. + * Copyright (c) 2015, 2017 by Delphix. All rights reserved. */ #include <sys/stropts.h> @@ -916,6 +916,8 @@ nvlist_add_common(nvlist_t *nvl, const char *name, /* calculate sizes of the nvpair elements and the nvpair itself */ name_sz = strlen(name) + 1; + if (name_sz >= 1ULL << (sizeof (nvp->nvp_name_sz) * NBBY - 1)) + return (EINVAL); nvp_sz = NVP_SIZE_CALC(name_sz, value_sz); @@ -1242,6 +1244,7 @@ nvpair_type_is_array(nvpair_t *nvp) data_type_t type = NVP_TYPE(nvp); if ((type == DATA_TYPE_BYTE_ARRAY) || + (type == DATA_TYPE_INT8_ARRAY) || (type == DATA_TYPE_UINT8_ARRAY) || (type == DATA_TYPE_INT16_ARRAY) || (type == DATA_TYPE_UINT16_ARRAY) || @@ -2200,8 +2203,10 @@ nvs_embedded(nvstream_t *nvs, nvlist_t *embedded) nvlist_init(embedded, embedded->nvl_nvflag, priv); - if (nvs->nvs_recursion >= nvpair_max_recursion) + if (nvs->nvs_recursion >= nvpair_max_recursion) { + nvlist_free(embedded); return (EINVAL); + } nvs->nvs_recursion++; if ((err = nvs_operation(nvs, embedded, NULL)) != 0) nvlist_free(embedded); diff --git a/zfs/module/zfs/abd.c b/zfs/module/zfs/abd.c index 765ac7f..3c7893d 100644 --- a/zfs/module/zfs/abd.c +++ b/zfs/module/zfs/abd.c @@ -571,7 +571,7 @@ static inline void abd_free_struct(abd_t *abd) { kmem_cache_free(abd_cache, abd); - ABDSTAT_INCR(abdstat_struct_size, -sizeof (abd_t)); + ABDSTAT_INCR(abdstat_struct_size, -(int)sizeof (abd_t)); } /* @@ -618,7 +618,7 @@ abd_free_scatter(abd_t *abd) ABDSTAT_BUMPDOWN(abdstat_scatter_cnt); ABDSTAT_INCR(abdstat_scatter_data_size, -(int)abd->abd_size); ABDSTAT_INCR(abdstat_scatter_chunk_waste, - abd->abd_size - P2ROUNDUP(abd->abd_size, PAGESIZE)); + (int)abd->abd_size - (int)P2ROUNDUP(abd->abd_size, PAGESIZE)); abd_free_struct(abd); } diff --git a/zfs/module/zfs/bpobj.c b/zfs/module/zfs/bpobj.c index 82ca94e..32459c9 100644 --- a/zfs/module/zfs/bpobj.c +++ b/zfs/module/zfs/bpobj.c @@ -261,7 +261,7 @@ bpobj_iterate_impl(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx, } if (free) { VERIFY3U(0, ==, dmu_free_range(bpo->bpo_os, bpo->bpo_object, - (i + 1) * sizeof (blkptr_t), -1ULL, tx)); + (i + 1) * sizeof (blkptr_t), DMU_OBJECT_END, tx)); } if (err || !bpo->bpo_havesubobj || bpo->bpo_phys->bpo_subobjs == 0) goto out; @@ -339,7 +339,7 @@ bpobj_iterate_impl(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx, if (free) { VERIFY3U(0, ==, dmu_free_range(bpo->bpo_os, bpo->bpo_phys->bpo_subobjs, - (i + 1) * sizeof (uint64_t), -1ULL, tx)); + (i + 1) * sizeof (uint64_t), DMU_OBJECT_END, tx)); } out: diff --git a/zfs/module/zfs/dmu.c b/zfs/module/zfs/dmu.c index 6f09aa2..05c9fc3 100644 --- a/zfs/module/zfs/dmu.c +++ b/zfs/module/zfs/dmu.c @@ -887,7 +887,7 @@ dmu_free_range(objset_t *os, uint64_t object, uint64_t offset, if (err) return (err); ASSERT(offset < UINT64_MAX); - ASSERT(size == -1ULL || size <= UINT64_MAX - offset); + ASSERT(size == DMU_OBJECT_END || size <= UINT64_MAX - offset); dnode_free_range(dn, offset, size, tx); dnode_rele(dn, FTAG); return (0); diff --git a/zfs/module/zfs/dmu_objset.c b/zfs/module/zfs/dmu_objset.c index 9a7a696..3425d54 100644 --- a/zfs/module/zfs/dmu_objset.c +++ b/zfs/module/zfs/dmu_objset.c @@ -1853,6 +1853,7 @@ dmu_objset_space_upgrade(objset_t *os) dmu_tx_hold_bonus(tx, obj); objerr = dmu_tx_assign(tx, TXG_WAIT); if (objerr != 0) { + dmu_buf_rele(db, FTAG); dmu_tx_abort(tx); continue; } diff --git a/zfs/module/zfs/dmu_send.c b/zfs/module/zfs/dmu_send.c index 344e420..2e3d706 100644 --- a/zfs/module/zfs/dmu_send.c +++ b/zfs/module/zfs/dmu_send.c @@ -224,9 +224,6 @@ dump_free(dmu_sendarg_t *dsp, uint64_t object, uint64_t offset, (object == dsp->dsa_last_data_object && offset > dsp->dsa_last_data_offset)); - if (length != -1ULL && offset + length < offset) - length = -1ULL; - /* * If there is a pending op, but it's not PENDING_FREE, push it out, * since free block aggregation can only be done for blocks of the @@ -243,19 +240,22 @@ dump_free(dmu_sendarg_t *dsp, uint64_t object, uint64_t offset, if (dsp->dsa_pending_op == PENDING_FREE) { /* - * There should never be a PENDING_FREE if length is -1 - * (because dump_dnode is the only place where this - * function is called with a -1, and only after flushing - * any pending record). + * There should never be a PENDING_FREE if length is + * DMU_OBJECT_END (because dump_dnode is the only place where + * this function is called with a DMU_OBJECT_END, and only after + * flushing any pending record). */ - ASSERT(length != -1ULL); + ASSERT(length != DMU_OBJECT_END); /* * Check to see whether this free block can be aggregated * with pending one. */ if (drrf->drr_object == object && drrf->drr_offset + drrf->drr_length == offset) { - drrf->drr_length += length; + if (offset + length < offset) + drrf->drr_length = DMU_OBJECT_END; + else + drrf->drr_length += length; return (0); } else { /* not a continuation. Push out pending record */ @@ -269,9 +269,12 @@ dump_free(dmu_sendarg_t *dsp, uint64_t object, uint64_t offset, dsp->dsa_drr->drr_type = DRR_FREE; drrf->drr_object = object; drrf->drr_offset = offset; - drrf->drr_length = length; + if (offset + length < offset) + drrf->drr_length = DMU_OBJECT_END; + else + drrf->drr_length = length; drrf->drr_toguid = dsp->dsa_toguid; - if (length == -1ULL) { + if (length == DMU_OBJECT_END) { if (dump_record(dsp, NULL, 0) != 0) return (SET_ERROR(EINTR)); } else { @@ -530,7 +533,7 @@ dump_dnode(dmu_sendarg_t *dsp, uint64_t object, dnode_phys_t *dnp) /* Free anything past the end of the file. */ if (dump_free(dsp, object, (dnp->dn_maxblkid + 1) * - (dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT), -1ULL) != 0) + (dnp->dn_datablkszsec << SPA_MINBLOCKSHIFT), DMU_OBJECT_END) != 0) return (SET_ERROR(EINTR)); if (dsp->dsa_err != 0) return (SET_ERROR(EINTR)); @@ -666,7 +669,9 @@ do_dump(dmu_sendarg_t *dsa, struct send_block_record *data) } else if (BP_IS_HOLE(bp)) { uint64_t span = BP_SPAN(dblkszsec, indblkshift, zb->zb_level); uint64_t offset = zb->zb_blkid * span; - err = dump_free(dsa, zb->zb_object, offset, span); + /* Don't dump free records for offsets > DMU_OBJECT_END */ + if (zb->zb_blkid == 0 || span <= DMU_OBJECT_END / zb->zb_blkid) + err = dump_free(dsa, zb->zb_object, offset, span); } else if (zb->zb_level > 0 || type == DMU_OT_OBJSET) { return (0); } else if (type == DMU_OT_DNODE) { @@ -2498,7 +2503,7 @@ receive_free(struct receive_writer_arg *rwa, struct drr_free *drrf) { int err; - if (drrf->drr_length != -1ULL && + if (drrf->drr_length != DMU_OBJECT_END && drrf->drr_offset + drrf->drr_length < drrf->drr_offset) return (SET_ERROR(EINVAL)); diff --git a/zfs/module/zfs/dmu_traverse.c b/zfs/module/zfs/dmu_traverse.c index c78228d..62f770e 100644 --- a/zfs/module/zfs/dmu_traverse.c +++ b/zfs/module/zfs/dmu_traverse.c @@ -609,9 +609,20 @@ traverse_impl(spa_t *spa, dsl_dataset_t *ds, uint64_t objset, blkptr_t *rootbp, if (err != 0) return (err); - osp = buf->b_data; - traverse_zil(td, &osp->os_zil_header); - arc_buf_destroy(buf, &buf); + if (err != 0) { + /* + * If both TRAVERSE_HARD and TRAVERSE_PRE are set, + * continue to visitbp so that td_func can be called + * in pre stage, and err will reset to zero. + */ + if (!(td->td_flags & TRAVERSE_HARD) || + !(td->td_flags & TRAVERSE_PRE)) + return (err); + } else { + osp = buf->b_data; + traverse_zil(td, &osp->os_zil_header); + arc_buf_destroy(buf, &buf); + } } if (!(flags & TRAVERSE_PREFETCH_DATA) || diff --git a/zfs/module/zfs/dmu_tx.c b/zfs/module/zfs/dmu_tx.c index 097fa77..c3cc03a 100644 --- a/zfs/module/zfs/dmu_tx.c +++ b/zfs/module/zfs/dmu_tx.c @@ -1200,7 +1200,7 @@ dmu_tx_do_callbacks(list_t *cb_list, int error) { dmu_tx_callback_t *dcb; - while ((dcb = list_head(cb_list)) != NULL) { + while ((dcb = list_tail(cb_list)) != NULL) { list_remove(cb_list, dcb); dcb->dcb_func(dcb->dcb_data, error); kmem_free(dcb, sizeof (dmu_tx_callback_t)); diff --git a/zfs/module/zfs/dmu_zfetch.c b/zfs/module/zfs/dmu_zfetch.c index 1bf5c4e..e72e9ef 100644 --- a/zfs/module/zfs/dmu_zfetch.c +++ b/zfs/module/zfs/dmu_zfetch.c @@ -228,19 +228,33 @@ dmu_zfetch(zfetch_t *zf, uint64_t blkid, uint64_t nblks, boolean_t fetch_data) rw_enter(&zf->zf_rwlock, RW_READER); + /* + * Find matching prefetch stream. Depending on whether the accesses + * are block-aligned, first block of the new access may either follow + * the last block of the previous access, or be equal to it. + */ for (zs = list_head(&zf->zf_stream); zs != NULL; zs = list_next(&zf->zf_stream, zs)) { - if (blkid == zs->zs_blkid) { + if (blkid == zs->zs_blkid || blkid + 1 == zs->zs_blkid) { mutex_enter(&zs->zs_lock); /* * zs_blkid could have changed before we * acquired zs_lock; re-check them here. */ - if (blkid != zs->zs_blkid) { - mutex_exit(&zs->zs_lock); - continue; + if (blkid == zs->zs_blkid) { + break; + } else if (blkid + 1 == zs->zs_blkid) { + blkid++; + nblks--; + if (nblks == 0) { + /* Already prefetched this before. */ + mutex_exit(&zs->zs_lock); + rw_exit(&zf->zf_rwlock); + return; + } + break; } - break; + mutex_exit(&zs->zs_lock); } } diff --git a/zfs/module/zfs/dsl_pool.c b/zfs/module/zfs/dsl_pool.c index c167080..0320d0e 100644 --- a/zfs/module/zfs/dsl_pool.c +++ b/zfs/module/zfs/dsl_pool.c @@ -135,6 +135,36 @@ unsigned long zfs_delay_scale = 1000 * 1000 * 1000 / 2000; */ int zfs_sync_taskq_batch_pct = 75; +/* + * These tunables determine the behavior of how zil_itxg_clean() is + * called via zil_clean() in the context of spa_sync(). When an itxg + * list needs to be cleaned, TQ_NOSLEEP will be used when dispatching. + * If the dispatch fails, the call to zil_itxg_clean() will occur + * synchronously in the context of spa_sync(), which can negatively + * impact the performance of spa_sync() (e.g. in the case of the itxg + * list having a large number of itxs that needs to be cleaned). + * + * Thus, these tunables can be used to manipulate the behavior of the + * taskq used by zil_clean(); they determine the number of taskq entries + * that are pre-populated when the taskq is first created (via the + * "zfs_zil_clean_taskq_minalloc" tunable) and the maximum number of + * taskq entries that are cached after an on-demand allocation (via the + * "zfs_zil_clean_taskq_maxalloc"). + * + * The idea being, we want to try reasonably hard to ensure there will + * already be a taskq entry pre-allocated by the time that it is needed + * by zil_clean(). This way, we can avoid the possibility of an + * on-demand allocation of a new taskq entry from failing, which would + * result in zil_itxg_clean() being called synchronously from zil_clean() + * (which can adversely affect performance of spa_sync()). + * + * Additionally, the number of threads used by the taskq can be + * configured via the "zfs_zil_clean_taskq_nthr_pct" tunable. + */ +int zfs_zil_clean_taskq_nthr_pct = 100; +int zfs_zil_clean_taskq_minalloc = 1024; +int zfs_zil_clean_taskq_maxalloc = 1024 * 1024; + int dsl_pool_open_special_dir(dsl_pool_t *dp, const char *name, dsl_dir_t **ddp) { @@ -176,6 +206,12 @@ dsl_pool_open_impl(spa_t *spa, uint64_t txg) zfs_sync_taskq_batch_pct, minclsyspri, 1, INT_MAX, TASKQ_THREADS_CPU_PCT); + dp->dp_zil_clean_taskq = taskq_create("dp_zil_clean_taskq", + zfs_zil_clean_taskq_nthr_pct, minclsyspri, + zfs_zil_clean_taskq_minalloc, + zfs_zil_clean_taskq_maxalloc, + TASKQ_PREPOPULATE | TASKQ_THREADS_CPU_PCT); + mutex_init(&dp->dp_lock, NULL, MUTEX_DEFAULT, NULL); cv_init(&dp->dp_spaceavail_cv, NULL, CV_DEFAULT, NULL); @@ -334,6 +370,7 @@ dsl_pool_close(dsl_pool_t *dp) txg_list_destroy(&dp->dp_sync_tasks); txg_list_destroy(&dp->dp_dirty_dirs); + taskq_destroy(dp->dp_zil_clean_taskq); taskq_destroy(dp->dp_sync_taskq); /* @@ -1142,5 +1179,18 @@ MODULE_PARM_DESC(zfs_delay_scale, "how quickly delay approaches infinity"); module_param(zfs_sync_taskq_batch_pct, int, 0644); MODULE_PARM_DESC(zfs_sync_taskq_batch_pct, "max percent of CPUs that are used to sync dirty data"); + +module_param(zfs_zil_clean_taskq_nthr_pct, int, 0644); +MODULE_PARM_DESC(zfs_zil_clean_taskq_nthr_pct, + "max percent of CPUs that are used per dp_sync_taskq"); + +module_param(zfs_zil_clean_taskq_minalloc, int, 0644); +MODULE_PARM_DESC(zfs_zil_clean_taskq_minalloc, + "number of taskq entries that are pre-populated"); + +module_param(zfs_zil_clean_taskq_maxalloc, int, 0644); +MODULE_PARM_DESC(zfs_zil_clean_taskq_maxalloc, + "max number of taskq entries that are cached"); + /* END CSTYLED */ #endif diff --git a/zfs/module/zfs/metaslab.c b/zfs/module/zfs/metaslab.c index 5e413c0..01e5234 100644 --- a/zfs/module/zfs/metaslab.c +++ b/zfs/module/zfs/metaslab.c @@ -1937,7 +1937,8 @@ metaslab_passivate(metaslab_t *msp, uint64_t weight) * this metaslab again. In that case, it had better be empty, * or we would be leaving space on the table. */ - ASSERT(size >= SPA_MINBLOCKSIZE || + ASSERT(!WEIGHT_IS_SPACEBASED(msp->ms_weight) || + size >= SPA_MINBLOCKSIZE || range_tree_space(msp->ms_tree) == 0); ASSERT0(weight & METASLAB_ACTIVE_MASK); diff --git a/zfs/module/zfs/mmp.c b/zfs/module/zfs/mmp.c index 6f2aa3f..e91ae62 100644 --- a/zfs/module/zfs/mmp.c +++ b/zfs/module/zfs/mmp.c @@ -26,6 +26,7 @@ #include <sys/mmp.h> #include <sys/spa.h> #include <sys/spa_impl.h> +#include <sys/time.h> #include <sys/vdev.h> #include <sys/vdev_impl.h> #include <sys/zfs_context.h> @@ -428,6 +429,10 @@ mmp_thread(spa_t *spa) */ if (!suspended && mmp_fail_intervals && multihost && (start - mmp->mmp_last_write) > max_fail_ns) { + cmn_err(CE_WARN, "MMP writes to pool '%s' have not " + "succeeded in over %llus; suspending pool", + spa_name(spa), + NSEC2SEC(start - mmp->mmp_last_write)); zio_suspend(spa, NULL); } diff --git a/zfs/module/zfs/spa.c b/zfs/module/zfs/spa.c index a7a2f62..00587d8 100644 --- a/zfs/module/zfs/spa.c +++ b/zfs/module/zfs/spa.c @@ -1561,7 +1561,7 @@ spa_load_spares(spa_t *spa) static void spa_load_l2cache(spa_t *spa) { - nvlist_t **l2cache; + nvlist_t **l2cache = NULL; uint_t nl2cache; int i, j, oldnvdevs; uint64_t guid; @@ -1645,7 +1645,9 @@ spa_load_l2cache(spa_t *spa) VERIFY(nvlist_remove(sav->sav_config, ZPOOL_CONFIG_L2CACHE, DATA_TYPE_NVLIST_ARRAY) == 0); - l2cache = kmem_alloc(sav->sav_count * sizeof (void *), KM_SLEEP); + if (sav->sav_count > 0) + l2cache = kmem_alloc(sav->sav_count * sizeof (void *), + KM_SLEEP); for (i = 0; i < sav->sav_count; i++) l2cache[i] = vdev_config_generate(spa, sav->sav_vdevs[i], B_TRUE, VDEV_CONFIG_L2CACHE); diff --git a/zfs/module/zfs/spa_config.c b/zfs/module/zfs/spa_config.c index 5b792b8..5bbfb4a 100644 --- a/zfs/module/zfs/spa_config.c +++ b/zfs/module/zfs/spa_config.c @@ -162,6 +162,11 @@ spa_config_write(spa_config_dirent_t *dp, nvlist_t *nvl) */ if (nvl == NULL) { err = vn_remove(dp->scd_path, UIO_SYSSPACE, RMFILE); + /* + * Don't report an error when the cache file is already removed + */ + if (err == ENOENT) + err = 0; return (err); } diff --git a/zfs/module/zfs/spa_stats.c b/zfs/module/zfs/spa_stats.c index 7ca3598..8c4dba2 100644 --- a/zfs/module/zfs/spa_stats.c +++ b/zfs/module/zfs/spa_stats.c @@ -142,7 +142,7 @@ static void spa_read_history_init(spa_t *spa) { spa_stats_history_t *ssh = &spa->spa_stats.read_history; - char name[KSTAT_STRLEN]; + char *name; kstat_t *ksp; mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL); @@ -153,7 +153,7 @@ spa_read_history_init(spa_t *spa) ssh->size = 0; ssh->private = NULL; - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); + name = kmem_asprintf("zfs/%s", spa_name(spa)); ksp = kstat_create(name, 0, "reads", "misc", KSTAT_TYPE_RAW, 0, KSTAT_FLAG_VIRTUAL); @@ -168,6 +168,7 @@ spa_read_history_init(spa_t *spa) spa_read_history_data, spa_read_history_addr); kstat_install(ksp); } + strfree(name); } static void @@ -365,7 +366,7 @@ static void spa_txg_history_init(spa_t *spa) { spa_stats_history_t *ssh = &spa->spa_stats.txg_history; - char name[KSTAT_STRLEN]; + char *name; kstat_t *ksp; mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL); @@ -376,7 +377,7 @@ spa_txg_history_init(spa_t *spa) ssh->size = 0; ssh->private = NULL; - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); + name = kmem_asprintf("zfs/%s", spa_name(spa)); ksp = kstat_create(name, 0, "txgs", "misc", KSTAT_TYPE_RAW, 0, KSTAT_FLAG_VIRTUAL); @@ -391,6 +392,7 @@ spa_txg_history_init(spa_t *spa) spa_txg_history_data, spa_txg_history_addr); kstat_install(ksp); } + strfree(name); } static void @@ -598,7 +600,7 @@ static void spa_tx_assign_init(spa_t *spa) { spa_stats_history_t *ssh = &spa->spa_stats.tx_assign_histogram; - char name[KSTAT_STRLEN]; + char *name; kstat_named_t *ks; kstat_t *ksp; int i; @@ -609,7 +611,7 @@ spa_tx_assign_init(spa_t *spa) ssh->size = ssh->count * sizeof (kstat_named_t); ssh->private = kmem_alloc(ssh->size, KM_SLEEP); - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); + name = kmem_asprintf("zfs/%s", spa_name(spa)); for (i = 0; i < ssh->count; i++) { ks = &((kstat_named_t *)ssh->private)[i]; @@ -632,6 +634,7 @@ spa_tx_assign_init(spa_t *spa) ksp->ks_update = spa_tx_assign_update; kstat_install(ksp); } + strfree(name); } static void @@ -678,12 +681,12 @@ static void spa_io_history_init(spa_t *spa) { spa_stats_history_t *ssh = &spa->spa_stats.io_history; - char name[KSTAT_STRLEN]; + char *name; kstat_t *ksp; mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL); - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); + name = kmem_asprintf("zfs/%s", spa_name(spa)); ksp = kstat_create(name, 0, "io", "disk", KSTAT_TYPE_IO, 1, 0); ssh->kstat = ksp; @@ -694,6 +697,7 @@ spa_io_history_init(spa_t *spa) ksp->ks_update = spa_io_history_update; kstat_install(ksp); } + strfree(name); } static void @@ -806,7 +810,7 @@ static void spa_mmp_history_init(spa_t *spa) { spa_stats_history_t *ssh = &spa->spa_stats.mmp_history; - char name[KSTAT_STRLEN]; + char *name; kstat_t *ksp; mutex_init(&ssh->lock, NULL, MUTEX_DEFAULT, NULL); @@ -817,7 +821,7 @@ spa_mmp_history_init(spa_t *spa) ssh->size = 0; ssh->private = NULL; - (void) snprintf(name, KSTAT_STRLEN, "zfs/%s", spa_name(spa)); + name = kmem_asprintf("zfs/%s", spa_name(spa)); ksp = kstat_create(name, 0, "multihost", "misc", KSTAT_TYPE_RAW, 0, KSTAT_FLAG_VIRTUAL); @@ -832,6 +836,7 @@ spa_mmp_history_init(spa_t *spa) spa_mmp_history_data, spa_mmp_history_addr); kstat_install(ksp); } + strfree(name); } static void diff --git a/zfs/module/zfs/vdev_disk.c b/zfs/module/zfs/vdev_disk.c index 5ae50a3..aecc351 100644 --- a/zfs/module/zfs/vdev_disk.c +++ b/zfs/module/zfs/vdev_disk.c @@ -98,7 +98,7 @@ static void vdev_disk_error(zio_t *zio) { #ifdef ZFS_DEBUG - printk("ZFS: zio error=%d type=%d offset=%llu size=%llu " + printk(KERN_WARNING "ZFS: zio error=%d type=%d offset=%llu size=%llu " "flags=%x\n", zio->io_error, zio->io_type, (u_longlong_t)zio->io_offset, (u_longlong_t)zio->io_size, zio->io_flags); diff --git a/zfs/module/zfs/vdev_mirror.c b/zfs/module/zfs/vdev_mirror.c index 0439e4b..d230b4d 100644 --- a/zfs/module/zfs/vdev_mirror.c +++ b/zfs/module/zfs/vdev_mirror.c @@ -116,7 +116,8 @@ static const zio_vsd_ops_t vdev_mirror_vsd_ops = { static int vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset) { - uint64_t lastoffset; + uint64_t last_offset; + int64_t offset_diff; int load; /* All DVAs have equal weight at the root. */ @@ -129,13 +130,17 @@ vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset) * worse overall when resilvering with compared to without. */ + /* Fix zio_offset for leaf vdevs */ + if (vd->vdev_ops->vdev_op_leaf) + zio_offset += VDEV_LABEL_START_SIZE; + /* Standard load based on pending queue length. */ load = vdev_queue_length(vd); - lastoffset = vdev_queue_lastoffset(vd); + last_offset = vdev_queue_last_offset(vd); if (vd->vdev_nonrot) { /* Non-rotating media. */ - if (lastoffset == zio_offset) + if (last_offset == zio_offset) return (load + zfs_vdev_mirror_non_rotating_inc); /* @@ -148,16 +153,16 @@ vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset) } /* Rotating media I/O's which directly follow the last I/O. */ - if (lastoffset == zio_offset) + if (last_offset == zio_offset) return (load + zfs_vdev_mirror_rotating_inc); /* * Apply half the seek increment to I/O's within seek offset - * of the last I/O queued to this vdev as they should incur less + * of the last I/O issued to this vdev as they should incur less * of a seek increment. */ - if (ABS(lastoffset - zio_offset) < - zfs_vdev_mirror_rotating_seek_offset) + offset_diff = (int64_t)(last_offset - zio_offset); + if (ABS(offset_diff) < zfs_vdev_mirror_rotating_seek_offset) return (load + (zfs_vdev_mirror_rotating_seek_inc / 2)); /* Apply the full seek increment to all other I/O's. */ @@ -382,29 +387,20 @@ vdev_mirror_child_select(zio_t *zio) mm->mm_preferred_cnt++; } - if (mm->mm_preferred_cnt == 1) { - vdev_queue_register_lastoffset( - mm->mm_child[mm->mm_preferred[0]].mc_vd, zio); + if (mm->mm_preferred_cnt == 1) return (mm->mm_preferred[0]); - } - if (mm->mm_preferred_cnt > 1) { - int c = vdev_mirror_preferred_child_randomize(zio); - vdev_queue_register_lastoffset(mm->mm_child[c].mc_vd, zio); - return (c); - } + if (mm->mm_preferred_cnt > 1) + return (vdev_mirror_preferred_child_randomize(zio)); /* * Every device is either missing or has this txg in its DTL. * Look for any child we haven't already tried before giving up. */ for (c = 0; c < mm->mm_children; c++) { - if (!mm->mm_child[c].mc_tried) { - vdev_queue_register_lastoffset(mm->mm_child[c].mc_vd, - zio); + if (!mm->mm_child[c].mc_tried) return (c); - } } /* diff --git a/zfs/module/zfs/vdev_queue.c b/zfs/module/zfs/vdev_queue.c index 6b3e872..40cba34 100644 --- a/zfs/module/zfs/vdev_queue.c +++ b/zfs/module/zfs/vdev_queue.c @@ -393,7 +393,7 @@ vdev_queue_init(vdev_t *vd) sizeof (zio_t), offsetof(struct zio, io_queue_node)); } - vq->vq_lastoffset = 0; + vq->vq_last_offset = 0; } void @@ -699,9 +699,8 @@ vdev_queue_io_to_issue(vdev_queue_t *vq) */ tree = vdev_queue_class_tree(vq, p); vq->vq_io_search.io_timestamp = 0; - vq->vq_io_search.io_offset = vq->vq_last_offset + 1; - VERIFY3P(avl_find(tree, &vq->vq_io_search, - &idx), ==, NULL); + vq->vq_io_search.io_offset = vq->vq_last_offset - 1; + VERIFY3P(avl_find(tree, &vq->vq_io_search, &idx), ==, NULL); zio = avl_nearest(tree, idx, AVL_AFTER); if (zio == NULL) zio = avl_first(tree); @@ -728,7 +727,7 @@ vdev_queue_io_to_issue(vdev_queue_t *vq) } vdev_queue_pending_add(vq, zio); - vq->vq_last_offset = zio->io_offset; + vq->vq_last_offset = zio->io_offset + zio->io_size; return (zio); } @@ -806,7 +805,7 @@ vdev_queue_io_done(zio_t *zio) } /* - * As these three methods are only used for load calculations we're not + * As these two methods are only used for load calculations we're not * concerned if we get an incorrect value on 32bit platforms due to lack of * vq_lock mutex use here, instead we prefer to keep it lock free for * performance. @@ -818,15 +817,9 @@ vdev_queue_length(vdev_t *vd) } uint64_t -vdev_queue_lastoffset(vdev_t *vd) +vdev_queue_last_offset(vdev_t *vd) { - return (vd->vdev_queue.vq_lastoffset); -} - -void -vdev_queue_register_lastoffset(vdev_t *vd, zio_t *zio) -{ - vd->vdev_queue.vq_lastoffset = zio->io_offset + zio->io_size; + return (vd->vdev_queue.vq_last_offset); } #if defined(_KERNEL) && defined(HAVE_SPL) diff --git a/zfs/module/zfs/zfs_acl.c b/zfs/module/zfs/zfs_acl.c index 7ddedea..1fcfca0 100644 --- a/zfs/module/zfs/zfs_acl.c +++ b/zfs/module/zfs/zfs_acl.c @@ -1323,6 +1323,7 @@ zfs_aclset_common(znode_t *zp, zfs_acl_t *aclp, cred_t *cr, dmu_tx_t *tx) sa_bulk_attr_t bulk[5]; uint64_t ctime[2]; int count = 0; + zfs_acl_phys_t acl_phys; mode = zp->z_mode; @@ -1369,7 +1370,6 @@ zfs_aclset_common(znode_t *zp, zfs_acl_t *aclp, cred_t *cr, dmu_tx_t *tx) } else { /* Painful legacy way */ zfs_acl_node_t *aclnode; uint64_t off = 0; - zfs_acl_phys_t acl_phys; uint64_t aoid; if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_ZNODE_ACL(zfsvfs), diff --git a/zfs/module/zfs/zfs_dir.c b/zfs/module/zfs/zfs_dir.c index c6ee302..9a8bbcc 100644 --- a/zfs/module/zfs/zfs_dir.c +++ b/zfs/module/zfs/zfs_dir.c @@ -977,11 +977,25 @@ zfs_link_destroy(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag, * Indicate whether the directory is empty. Works with or without z_lock * held, but can only be consider a hint in the latter case. Returns true * if only "." and ".." remain and there's no work in progress. + * + * The internal ZAP size, rather than zp->z_size, needs to be checked since + * some consumers (Lustre) do not strictly maintain an accurate SA_ZPL_SIZE. */ boolean_t zfs_dirempty(znode_t *dzp) { - return (dzp->z_size == 2 && dzp->z_dirlocks == 0); + zfsvfs_t *zfsvfs = ZTOZSB(dzp); + uint64_t count; + int error; + + if (dzp->z_dirlocks != NULL) + return (B_FALSE); + + error = zap_count(zfsvfs->z_os, dzp->z_id, &count); + if (error != 0 || count != 0) + return (B_FALSE); + + return (B_TRUE); } int diff --git a/zfs/module/zfs/zfs_fm.c b/zfs/module/zfs/zfs_fm.c index 3986b39..1c66ed6 100644 --- a/zfs/module/zfs/zfs_fm.c +++ b/zfs/module/zfs/zfs_fm.c @@ -455,8 +455,8 @@ zfs_ereport_start(nvlist_t **ereport_out, nvlist_t **detector_out, typedef struct zfs_ecksum_info { /* histograms of set and cleared bits by bit number in a 64-bit word */ - uint16_t zei_histogram_set[sizeof (uint64_t) * NBBY]; - uint16_t zei_histogram_cleared[sizeof (uint64_t) * NBBY]; + uint32_t zei_histogram_set[sizeof (uint64_t) * NBBY]; + uint32_t zei_histogram_cleared[sizeof (uint64_t) * NBBY]; /* inline arrays of bits set and cleared. */ uint64_t zei_bits_set[ZFM_MAX_INLINE]; @@ -481,7 +481,7 @@ typedef struct zfs_ecksum_info { } zfs_ecksum_info_t; static void -update_histogram(uint64_t value_arg, uint16_t *hist, uint32_t *count) +update_histogram(uint64_t value_arg, uint32_t *hist, uint32_t *count) { size_t i; size_t bits = 0; @@ -490,8 +490,7 @@ update_histogram(uint64_t value_arg, uint16_t *hist, uint32_t *count) /* We store the bits in big-endian (largest-first) order */ for (i = 0; i < 64; i++) { if (value & (1ull << i)) { - if (hist[63 - i] < UINT16_MAX) - hist[63 - i]++; + hist[63 - i]++; ++bits; } } @@ -649,6 +648,7 @@ annotate_ecksum(nvlist_t *ereport, zio_bad_cksum_t *info, if (badabd == NULL || goodabd == NULL) return (eip); + ASSERT3U(nui64s, <=, UINT32_MAX); ASSERT3U(size, ==, nui64s * sizeof (uint64_t)); ASSERT3U(size, <=, SPA_MAXBLOCKSIZE); ASSERT3U(size, <=, UINT32_MAX); @@ -759,10 +759,10 @@ annotate_ecksum(nvlist_t *ereport, zio_bad_cksum_t *info, } else { fm_payload_set(ereport, FM_EREPORT_PAYLOAD_ZFS_BAD_SET_HISTOGRAM, - DATA_TYPE_UINT16_ARRAY, + DATA_TYPE_UINT32_ARRAY, NBBY * sizeof (uint64_t), eip->zei_histogram_set, FM_EREPORT_PAYLOAD_ZFS_BAD_CLEARED_HISTOGRAM, - DATA_TYPE_UINT16_ARRAY, + DATA_TYPE_UINT32_ARRAY, NBBY * sizeof (uint64_t), eip->zei_histogram_cleared, NULL); } diff --git a/zfs/module/zfs/zfs_ioctl.c b/zfs/module/zfs/zfs_ioctl.c index d195ede..f41e1b9 100644 --- a/zfs/module/zfs/zfs_ioctl.c +++ b/zfs/module/zfs/zfs_ioctl.c @@ -3738,9 +3738,12 @@ zfs_ioc_rename(zfs_cmd_t *zc) boolean_t recursive = zc->zc_cookie & 1; char *at; + /* "zfs rename" from and to ...%recv datasets should both fail */ + zc->zc_name[sizeof (zc->zc_name) - 1] = '\0'; zc->zc_value[sizeof (zc->zc_value) - 1] = '\0'; - if (dataset_namecheck(zc->zc_value, NULL, NULL) != 0 || - strchr(zc->zc_value, '%')) + if (dataset_namecheck(zc->zc_name, NULL, NULL) != 0 || + dataset_namecheck(zc->zc_value, NULL, NULL) != 0 || + strchr(zc->zc_name, '%') || strchr(zc->zc_value, '%')) return (SET_ERROR(EINVAL)); at = strchr(zc->zc_name, '@'); @@ -5002,6 +5005,11 @@ zfs_ioc_promote(zfs_cmd_t *zc) char *cp; int error; + zc->zc_name[sizeof (zc->zc_name) - 1] = '\0'; + if (dataset_namecheck(zc->zc_name, NULL, NULL) != 0 || + strchr(zc->zc_name, '%')) + return (SET_ERROR(EINVAL)); + error = dsl_pool_hold(zc->zc_name, FTAG, &dp); if (error != 0) return (error); @@ -5901,20 +5909,26 @@ static int zfs_ioc_pool_sync(const char *pool, nvlist_t *innvl, nvlist_t *onvl) { int err; - boolean_t force; + boolean_t force = B_FALSE; spa_t *spa; if ((err = spa_open(pool, &spa, FTAG)) != 0) return (err); - force = fnvlist_lookup_boolean_value(innvl, "force"); + if (innvl) { + if (nvlist_lookup_boolean_value(innvl, "force", &force) != 0) { + err = SET_ERROR(EINVAL); + goto out; + } + } + if (force) { spa_config_enter(spa, SCL_CONFIG, FTAG, RW_WRITER); vdev_config_dirty(spa->spa_root_vdev); spa_config_exit(spa, SCL_CONFIG, FTAG); } txg_wait_synced(spa_get_dsl(spa), 0); - +out: spa_close(spa, FTAG); return (err); diff --git a/zfs/module/zfs/zil.c b/zfs/module/zfs/zil.c index 4d714ce..1e3e69d 100644 --- a/zfs/module/zfs/zil.c +++ b/zfs/module/zfs/zil.c @@ -1009,7 +1009,24 @@ zil_lwb_write_start(zilog_t *zilog, lwb_t *lwb) * to clean up in the event of allocation failure or I/O failure. */ tx = dmu_tx_create(zilog->zl_os); - VERIFY(dmu_tx_assign(tx, TXG_WAIT) == 0); + + /* + * Since we are not going to create any new dirty data and we can even + * help with clearing the existing dirty data, we should not be subject + * to the dirty data based delays. + * We (ab)use TXG_WAITED to bypass the delay mechanism. + * One side effect from using TXG_WAITED is that dmu_tx_assign() can + * fail if the pool is suspended. Those are dramatic circumstances, + * so we return NULL to signal that the normal ZIL processing is not + * possible and txg_wait_synced() should be used to ensure that the data + * is on disk. + */ + error = dmu_tx_assign(tx, TXG_WAITED); + if (error != 0) { + ASSERT3S(error, ==, EIO); + dmu_tx_abort(tx); + return (NULL); + } dsl_dataset_dirty(dmu_objset_ds(zilog->zl_os), tx); txg = dmu_tx_get_txg(tx); @@ -1435,8 +1452,7 @@ zil_clean(zilog_t *zilog, uint64_t synced_txg) return; } ASSERT3U(itxg->itxg_txg, <=, synced_txg); - ASSERT(itxg->itxg_txg != 0); - ASSERT(zilog->zl_clean_taskq != NULL); + ASSERT3U(itxg->itxg_txg, !=, 0); clean_me = itxg->itxg_itxs; itxg->itxg_itxs = NULL; itxg->itxg_txg = 0; @@ -1447,8 +1463,11 @@ zil_clean(zilog_t *zilog, uint64_t synced_txg) * free it in-line. This should be rare. Note, using TQ_SLEEP * created a bad performance problem. */ - if (taskq_dispatch(zilog->zl_clean_taskq, - (void (*)(void *))zil_itxg_clean, clean_me, TQ_NOSLEEP) == 0) + ASSERT3P(zilog->zl_dmu_pool, !=, NULL); + ASSERT3P(zilog->zl_dmu_pool->dp_zil_clean_taskq, !=, NULL); + taskqid_t id = taskq_dispatch(zilog->zl_dmu_pool->dp_zil_clean_taskq, + (void (*)(void *))zil_itxg_clean, clean_me, TQ_NOSLEEP); + if (id == TASKQID_INVALID) zil_itxg_clean(clean_me); } @@ -1921,13 +1940,10 @@ zil_open(objset_t *os, zil_get_data_t *get_data) { zilog_t *zilog = dmu_objset_zil(os); - ASSERT(zilog->zl_clean_taskq == NULL); ASSERT(zilog->zl_get_data == NULL); ASSERT(list_is_empty(&zilog->zl_lwb_list)); zilog->zl_get_data = get_data; - zilog->zl_clean_taskq = taskq_create("zil_clean", 1, defclsyspri, - 2, 2, TASKQ_PREPOPULATE); return (zilog); } @@ -1962,8 +1978,6 @@ zil_close(zilog_t *zilog) if (txg < spa_freeze_txg(zilog->zl_spa)) VERIFY(!zilog_is_dirty(zilog)); - taskq_destroy(zilog->zl_clean_taskq); - zilog->zl_clean_taskq = NULL; zilog->zl_get_data = NULL; /* diff --git a/zfs/module/zfs/zle.c b/zfs/module/zfs/zle.c index 13c5673..613607f 100644 --- a/zfs/module/zfs/zle.c +++ b/zfs/module/zfs/zle.c @@ -74,10 +74,14 @@ zle_decompress(void *s_start, void *d_start, size_t s_len, size_t d_len, int n) while (src < s_end && dst < d_end) { int len = 1 + *src++; if (len <= n) { + if (src + len > s_end || dst + len > d_end) + return (-1); while (len-- != 0) *dst++ = *src++; } else { len -= n; + if (dst + len > d_end) + return (-1); while (len-- != 0) *dst++ = 0; } diff --git a/zfs/module/zfs/zvol.c b/zfs/module/zfs/zvol.c index 5293f95..5b62bf94 100644 --- a/zfs/module/zfs/zvol.c +++ b/zfs/module/zfs/zvol.c @@ -1347,9 +1347,9 @@ zvol_open(struct block_device *bdev, fmode_t flag) { zvol_state_t *zv; int error = 0; - boolean_t drop_suspend = B_FALSE; + boolean_t drop_suspend = B_TRUE; - ASSERT(!mutex_owned(&zvol_state_lock)); + ASSERT(!MUTEX_HELD(&zvol_state_lock)); mutex_enter(&zvol_state_lock); /* @@ -1364,23 +1364,31 @@ zvol_open(struct block_device *bdev, fmode_t flag) return (SET_ERROR(-ENXIO)); } - /* take zv_suspend_lock before zv_state_lock */ - rw_enter(&zv->zv_suspend_lock, RW_READER); - mutex_enter(&zv->zv_state_lock); - /* * make sure zvol is not suspended during first open - * (hold zv_suspend_lock), otherwise, drop the lock + * (hold zv_suspend_lock) and respect proper lock acquisition + * ordering - zv_suspend_lock before zv_state_lock */ if (zv->zv_open_count == 0) { - drop_suspend = B_TRUE; + if (!rw_tryenter(&zv->zv_suspend_lock, RW_READER)) { + mutex_exit(&zv->zv_state_lock); + rw_enter(&zv->zv_suspend_lock, RW_READER); + mutex_enter(&zv->zv_state_lock); + /* check to see if zv_suspend_lock is needed */ + if (zv->zv_open_count != 0) { + rw_exit(&zv->zv_suspend_lock); + drop_suspend = B_FALSE; + } + } } else { - rw_exit(&zv->zv_suspend_lock); + drop_suspend = B_FALSE; } - mutex_exit(&zvol_state_lock); + ASSERT(MUTEX_HELD(&zv->zv_state_lock)); + ASSERT(zv->zv_open_count != 0 || RW_READ_HELD(&zv->zv_suspend_lock)); + if (zv->zv_open_count == 0) { error = zvol_first_open(zv); if (error) @@ -1417,28 +1425,38 @@ static int zvol_release(struct gendisk *disk, fmode_t mode) { zvol_state_t *zv; - boolean_t drop_suspend = B_FALSE; + boolean_t drop_suspend = B_TRUE; - ASSERT(!mutex_owned(&zvol_state_lock)); + ASSERT(!MUTEX_HELD(&zvol_state_lock)); mutex_enter(&zvol_state_lock); zv = disk->private_data; - ASSERT(zv && zv->zv_open_count > 0); - - /* take zv_suspend_lock before zv_state_lock */ - rw_enter(&zv->zv_suspend_lock, RW_READER); mutex_enter(&zv->zv_state_lock); - mutex_exit(&zvol_state_lock); - + ASSERT(zv->zv_open_count > 0); /* * make sure zvol is not suspended during last close - * (hold zv_suspend_lock), otherwise, drop the lock + * (hold zv_suspend_lock) and respect proper lock acquisition + * ordering - zv_suspend_lock before zv_state_lock */ - if (zv->zv_open_count == 1) - drop_suspend = B_TRUE; - else - rw_exit(&zv->zv_suspend_lock); + if (zv->zv_open_count == 1) { + if (!rw_tryenter(&zv->zv_suspend_lock, RW_READER)) { + mutex_exit(&zv->zv_state_lock); + rw_enter(&zv->zv_suspend_lock, RW_READER); + mutex_enter(&zv->zv_state_lock); + /* check to see if zv_suspend_lock is needed */ + if (zv->zv_open_count != 1) { + rw_exit(&zv->zv_suspend_lock); + drop_suspend = B_FALSE; + } + } + } else { + drop_suspend = B_FALSE; + } + mutex_exit(&zvol_state_lock); + + ASSERT(MUTEX_HELD(&zv->zv_state_lock)); + ASSERT(zv->zv_open_count != 1 || RW_READ_HELD(&zv->zv_suspend_lock)); zv->zv_open_count--; if (zv->zv_open_count == 0)