diff mbox series

[v3] 9pfs: use GHashTable for fid table

Message ID 20220908112353.289267-1-git@sphalerite.org
State New
Headers show
Series [v3] 9pfs: use GHashTable for fid table | expand

Commit Message

Linus Heckemann Sept. 8, 2022, 11:23 a.m. UTC
The previous implementation would iterate over the fid table for
lookup operations, resulting in an operation with O(n) complexity on
the number of open files and poor cache locality -- for every open,
stat, read, write, etc operation.

This change uses a hashtable for this instead, significantly improving
the performance of the 9p filesystem. The runtime of NixOS's simple
installer test, which copies ~122k files totalling ~1.8GiB from 9p,
decreased by a factor of about 10.

Signed-off-by: Linus Heckemann <git@sphalerite.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
---

Greg Kurz writes:
> The comment above should be adapted to the new situation : no need

I've removed it completely, since the logic is simple enough that only
the shortened comment below remains necessary.

> With the new logic, this should just be:

now is :)

> g_hash_table_steal_extended() [1] actually allows to do just that.

g_hash_table_steal_extended unfortunately isn't available since it was
introduced in glib 2.58 and we're maintaining compatibility to 2.56.

> You could just call g_hash_table_iter_remove(&iter) here

Applied this suggestion, thanks!


> Well... finding at least one clunked fid state in this table is
> definitely a bug so I'll keep the BUG_ON() anyway.

Christian Schoenebeck writes:
> Yeah, I think you are right, it would feel odd. Just drop BUG_ON() for
> now.

I still prefer dropping it, but if we were to keep it I think it should
be in v9fs_reclaim_fd where we iterate and can thus check the whole
table.


Greg Kurz and Philippe Mathieu-Daudé write:
> [patch versioning]

Whoops. I used -v2 on git send-email, which just ignored the option,
rather than git format-patch, by accident. This one _should_ now be v3!


 hw/9pfs/9p.c | 140 +++++++++++++++++++++++++--------------------------
 hw/9pfs/9p.h |   2 +-
 2 files changed, 70 insertions(+), 72 deletions(-)

Comments

Greg Kurz Sept. 8, 2022, 12:18 p.m. UTC | #1
On Thu,  8 Sep 2022 13:23:53 +0200
Linus Heckemann <git@sphalerite.org> wrote:

> The previous implementation would iterate over the fid table for
> lookup operations, resulting in an operation with O(n) complexity on
> the number of open files and poor cache locality -- for every open,
> stat, read, write, etc operation.
> 
> This change uses a hashtable for this instead, significantly improving
> the performance of the 9p filesystem. The runtime of NixOS's simple
> installer test, which copies ~122k files totalling ~1.8GiB from 9p,
> decreased by a factor of about 10.
> 
> Signed-off-by: Linus Heckemann <git@sphalerite.org>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Reviewed-by: Greg Kurz <groug@kaod.org>
> ---
> 
> Greg Kurz writes:
> > The comment above should be adapted to the new situation : no need
> 
> I've removed it completely, since the logic is simple enough that only
> the shortened comment below remains necessary.
> 
> > With the new logic, this should just be:
> 
> now is :)
> 
> > g_hash_table_steal_extended() [1] actually allows to do just that.
> 
> g_hash_table_steal_extended unfortunately isn't available since it was
> introduced in glib 2.58 and we're maintaining compatibility to 2.56.
> 

Ha... this could be addressed through conditional compilation, e.g.:

static V9fsFidState *clunk_fid(V9fsState *s, int32_t fid)
{
    V9fsFidState *fidp;

#if GLIB_CHECK_VERSION(2,56,0)
    if (!g_hash_table_steal_extended(s->fids, GINT_TO_POINTER(fid),
                                     NULL, (gpointer *) &fidp)) {
        return NULL;
    }
#else
    fidp = g_hash_table_lookup(s->fids, GINT_TO_POINTER(fid));
    if (fidp) {
        g_hash_table_remove(s->fids, GINT_TO_POINTER(fid));
    } else {
        return NULL;
    }
#endif

    fidp->clunked = true;
    return fidp;
}

or simply leave a TODO comment so that we don't forget.


> > You could just call g_hash_table_iter_remove(&iter) here
> 
> Applied this suggestion, thanks!
> 
> 
> > Well... finding at least one clunked fid state in this table is
> > definitely a bug so I'll keep the BUG_ON() anyway.
> 
> Christian Schoenebeck writes:
> > Yeah, I think you are right, it would feel odd. Just drop BUG_ON() for
> > now.
> 
> I still prefer dropping it, but if we were to keep it I think it should
> be in v9fs_reclaim_fd where we iterate and can thus check the whole
> table.
> 

IMO the relevant aspect isn't really about checking the whole table, but
rather not to get a clunked fid out of this table and pass it over.

> 
> Greg Kurz and Philippe Mathieu-Daudé write:
> > [patch versioning]
> 
> Whoops. I used -v2 on git send-email, which just ignored the option,
> rather than git format-patch, by accident. This one _should_ now be v3!
> 
> 

v3 it is and LGTM ! No big deal with the BUG_ON(), given the improvement.

My R-b stands. Thanks Linus !

>  hw/9pfs/9p.c | 140 +++++++++++++++++++++++++--------------------------
>  hw/9pfs/9p.h |   2 +-
>  2 files changed, 70 insertions(+), 72 deletions(-)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index aebadeaa03..98a475e560 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -282,33 +282,31 @@ static V9fsFidState *coroutine_fn get_fid(V9fsPDU *pdu, int32_t fid)
>      V9fsFidState *f;
>      V9fsState *s = pdu->s;
>  
> -    QSIMPLEQ_FOREACH(f, &s->fid_list, next) {
> -        BUG_ON(f->clunked);
> -        if (f->fid == fid) {
> -            /*
> -             * Update the fid ref upfront so that
> -             * we don't get reclaimed when we yield
> -             * in open later.
> -             */
> -            f->ref++;
> -            /*
> -             * check whether we need to reopen the
> -             * file. We might have closed the fd
> -             * while trying to free up some file
> -             * descriptors.
> -             */
> -            err = v9fs_reopen_fid(pdu, f);
> -            if (err < 0) {
> -                f->ref--;
> -                return NULL;
> -            }
> -            /*
> -             * Mark the fid as referenced so that the LRU
> -             * reclaim won't close the file descriptor
> -             */
> -            f->flags |= FID_REFERENCED;
> -            return f;
> +    f = g_hash_table_lookup(s->fids, GINT_TO_POINTER(fid));
> +    if (f) {
> +        /*
> +         * Update the fid ref upfront so that
> +         * we don't get reclaimed when we yield
> +         * in open later.
> +         */
> +        f->ref++;
> +        /*
> +         * check whether we need to reopen the
> +         * file. We might have closed the fd
> +         * while trying to free up some file
> +         * descriptors.
> +         */
> +        err = v9fs_reopen_fid(pdu, f);
> +        if (err < 0) {
> +            f->ref--;
> +            return NULL;
>          }
> +        /*
> +         * Mark the fid as referenced so that the LRU
> +         * reclaim won't close the file descriptor
> +         */
> +        f->flags |= FID_REFERENCED;
> +        return f;
>      }
>      return NULL;
>  }
> @@ -317,12 +315,9 @@ static V9fsFidState *alloc_fid(V9fsState *s, int32_t fid)
>  {
>      V9fsFidState *f;
>  
> -    QSIMPLEQ_FOREACH(f, &s->fid_list, next) {
> +    if (g_hash_table_contains(s->fids, GINT_TO_POINTER(fid))) {
>          /* If fid is already there return NULL */
> -        BUG_ON(f->clunked);
> -        if (f->fid == fid) {
> -            return NULL;
> -        }
> +        return NULL;
>      }
>      f = g_new0(V9fsFidState, 1);
>      f->fid = fid;
> @@ -333,7 +328,7 @@ static V9fsFidState *alloc_fid(V9fsState *s, int32_t fid)
>       * reclaim won't close the file descriptor
>       */
>      f->flags |= FID_REFERENCED;
> -    QSIMPLEQ_INSERT_TAIL(&s->fid_list, f, next);
> +    g_hash_table_insert(s->fids, GINT_TO_POINTER(fid), f);
>  
>      v9fs_readdir_init(s->proto_version, &f->fs.dir);
>      v9fs_readdir_init(s->proto_version, &f->fs_reclaim.dir);
> @@ -424,12 +419,11 @@ static V9fsFidState *clunk_fid(V9fsState *s, int32_t fid)
>  {
>      V9fsFidState *fidp;
>  
> -    QSIMPLEQ_FOREACH(fidp, &s->fid_list, next) {
> -        if (fidp->fid == fid) {
> -            QSIMPLEQ_REMOVE(&s->fid_list, fidp, V9fsFidState, next);
> -            fidp->clunked = true;
> -            return fidp;
> -        }
> +    fidp = g_hash_table_lookup(s->fids, GINT_TO_POINTER(fid));
> +    if (fidp) {
> +        g_hash_table_remove(s->fids, GINT_TO_POINTER(fid));
> +        fidp->clunked = true;
> +        return fidp;
>      }
>      return NULL;
>  }
> @@ -439,10 +433,15 @@ void coroutine_fn v9fs_reclaim_fd(V9fsPDU *pdu)
>      int reclaim_count = 0;
>      V9fsState *s = pdu->s;
>      V9fsFidState *f;
> +    GHashTableIter iter;
> +    gpointer fid;
> +
> +    g_hash_table_iter_init(&iter, s->fids);
> +
>      QSLIST_HEAD(, V9fsFidState) reclaim_list =
>          QSLIST_HEAD_INITIALIZER(reclaim_list);
>  
> -    QSIMPLEQ_FOREACH(f, &s->fid_list, next) {
> +    while (g_hash_table_iter_next(&iter, &fid, (gpointer *) &f)) {
>          /*
>           * Unlink fids cannot be reclaimed. Check
>           * for them and skip them. Also skip fids
> @@ -518,23 +517,19 @@ static int coroutine_fn v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path)
>  {
>      int err;
>      V9fsState *s = pdu->s;
> -    V9fsFidState *fidp, *fidp_next;
> +    V9fsFidState *fidp;
> +    gpointer fid;
> +    GHashTableIter iter;
>  
> -    fidp = QSIMPLEQ_FIRST(&s->fid_list);
> -    if (!fidp) {
> -        return 0;
> -    }
> +    g_hash_table_iter_init(&iter, s->fids);
> +
> +    while (g_hash_table_iter_next(&iter, &fid, (gpointer *) &fidp)) {
> +        /*
> +         * Ensure the fid survives a potential clunk request during
> +         * v9fs_reopen_fid.
> +         */
> +        fidp->ref++;
>  
> -    /*
> -     * v9fs_reopen_fid() can yield : a reference on the fid must be held
> -     * to ensure its pointer remains valid and we can safely pass it to
> -     * QSIMPLEQ_NEXT(). The corresponding put_fid() can also yield so
> -     * we must keep a reference on the next fid as well. So the logic here
> -     * is to get a reference on a fid and only put it back during the next
> -     * iteration after we could get a reference on the next fid. Start with
> -     * the first one.
> -     */
> -    for (fidp->ref++; fidp; fidp = fidp_next) {
>          if (fidp->path.size == path->size &&
>              !memcmp(fidp->path.data, path->data, path->size)) {
>              /* Mark the fid non reclaimable. */
> @@ -548,16 +543,6 @@ static int coroutine_fn v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path)
>              }
>          }
>  
> -        fidp_next = QSIMPLEQ_NEXT(fidp, next);
> -
> -        if (fidp_next) {
> -            /*
> -             * Ensure the next fid survives a potential clunk request during
> -             * put_fid() below and v9fs_reopen_fid() in the next iteration.
> -             */
> -            fidp_next->ref++;
> -        }
> -
>          /* We're done with this fid */
>          put_fid(pdu, fidp);
>      }
> @@ -569,18 +554,20 @@ static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
>  {
>      V9fsState *s = pdu->s;
>      V9fsFidState *fidp;
> +    gpointer fid;
> +    GHashTableIter iter;
> +
> +    g_hash_table_iter_init(&iter, s->fids);
>  
>      /* Free all fids */
> -    while (!QSIMPLEQ_EMPTY(&s->fid_list)) {
> -        /* Get fid */
> -        fidp = QSIMPLEQ_FIRST(&s->fid_list);
> +    while (g_hash_table_iter_next(&iter, &fid, (gpointer *) &fidp)) {
>          fidp->ref++;
>  
>          /* Clunk fid */
> -        QSIMPLEQ_REMOVE(&s->fid_list, fidp, V9fsFidState, next);
>          fidp->clunked = true;
>  
>          put_fid(pdu, fidp);
> +        g_hash_table_iter_remove(&iter);
>      }
>  }
>  
> @@ -3205,6 +3192,8 @@ static int coroutine_fn v9fs_complete_rename(V9fsPDU *pdu, V9fsFidState *fidp,
>      V9fsFidState *tfidp;
>      V9fsState *s = pdu->s;
>      V9fsFidState *dirfidp = NULL;
> +    GHashTableIter iter;
> +    gpointer fid;
>  
>      v9fs_path_init(&new_path);
>      if (newdirfid != -1) {
> @@ -3238,11 +3227,13 @@ static int coroutine_fn v9fs_complete_rename(V9fsPDU *pdu, V9fsFidState *fidp,
>      if (err < 0) {
>          goto out;
>      }
> +
>      /*
>       * Fixup fid's pointing to the old name to
>       * start pointing to the new name
>       */
> -    QSIMPLEQ_FOREACH(tfidp, &s->fid_list, next) {
> +    g_hash_table_iter_init(&iter, s->fids);
> +    while (g_hash_table_iter_next(&iter, &fid, (gpointer *) &tfidp)) {
>          if (v9fs_path_is_ancestor(&fidp->path, &tfidp->path)) {
>              /* replace the name */
>              v9fs_fix_path(&tfidp->path, &new_path, strlen(fidp->path.data));
> @@ -3320,6 +3311,8 @@ static int coroutine_fn v9fs_fix_fid_paths(V9fsPDU *pdu, V9fsPath *olddir,
>      V9fsPath oldpath, newpath;
>      V9fsState *s = pdu->s;
>      int err;
> +    GHashTableIter iter;
> +    gpointer fid;
>  
>      v9fs_path_init(&oldpath);
>      v9fs_path_init(&newpath);
> @@ -3336,7 +3329,8 @@ static int coroutine_fn v9fs_fix_fid_paths(V9fsPDU *pdu, V9fsPath *olddir,
>       * Fixup fid's pointing to the old name to
>       * start pointing to the new name
>       */
> -    QSIMPLEQ_FOREACH(tfidp, &s->fid_list, next) {
> +    g_hash_table_iter_init(&iter, s->fids);
> +    while (g_hash_table_iter_next(&iter, &fid, (gpointer *) &tfidp)) {
>          if (v9fs_path_is_ancestor(&oldpath, &tfidp->path)) {
>              /* replace the name */
>              v9fs_fix_path(&tfidp->path, &newpath, strlen(oldpath.data));
> @@ -4226,7 +4220,7 @@ int v9fs_device_realize_common(V9fsState *s, const V9fsTransport *t,
>      s->ctx.fmode = fse->fmode;
>      s->ctx.dmode = fse->dmode;
>  
> -    QSIMPLEQ_INIT(&s->fid_list);
> +    s->fids = g_hash_table_new(NULL, NULL);
>      qemu_co_rwlock_init(&s->rename_lock);
>  
>      if (s->ops->init(&s->ctx, errp) < 0) {
> @@ -4286,6 +4280,10 @@ void v9fs_device_unrealize_common(V9fsState *s)
>      if (s->ctx.fst) {
>          fsdev_throttle_cleanup(s->ctx.fst);
>      }
> +    if (s->fids) {
> +        g_hash_table_destroy(s->fids);
> +        s->fids = NULL;
> +    }
>      g_free(s->tag);
>      qp_table_destroy(&s->qpd_table);
>      qp_table_destroy(&s->qpp_table);
> diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
> index 994f952600..10fd2076c2 100644
> --- a/hw/9pfs/9p.h
> +++ b/hw/9pfs/9p.h
> @@ -339,7 +339,7 @@ typedef struct {
>  struct V9fsState {
>      QLIST_HEAD(, V9fsPDU) free_list;
>      QLIST_HEAD(, V9fsPDU) active_list;
> -    QSIMPLEQ_HEAD(, V9fsFidState) fid_list;
> +    GHashTable *fids;
>      FileOperations *ops;
>      FsContext ctx;
>      char *tag;
Linus Heckemann Sept. 8, 2022, 4:10 p.m. UTC | #2
(sorry for the dup @Greg, forgot to reply-all)

Greg Kurz <groug@kaod.org> writes:
>> > g_hash_table_steal_extended() [1] actually allows to do just that.
>> 
>> g_hash_table_steal_extended unfortunately isn't available since it was
>> introduced in glib 2.58 and we're maintaining compatibility to 2.56.
>> 
>
> Ha... this could be addressed through conditional compilation, e.g.:

It still won't compile, because we set GLIB_VERSION_MAX_ALLOWED in
glib-compat.h and it would require a compat wrapper as described
there. I think that's a bit much for this far more marginal performance
change. I'm happy to resubmit with the TODO comment though if you like?
Greg Kurz Sept. 8, 2022, 4:14 p.m. UTC | #3
On Thu, 08 Sep 2022 18:10:28 +0200
Linus Heckemann <git@sphalerite.org> wrote:

> (sorry for the dup @Greg, forgot to reply-all)
> 
> Greg Kurz <groug@kaod.org> writes:
> >> > g_hash_table_steal_extended() [1] actually allows to do just that.
> >> 
> >> g_hash_table_steal_extended unfortunately isn't available since it was
> >> introduced in glib 2.58 and we're maintaining compatibility to 2.56.
> >> 
> >
> > Ha... this could be addressed through conditional compilation, e.g.:
> 
> It still won't compile, because we set GLIB_VERSION_MAX_ALLOWED in
> glib-compat.h and it would require a compat wrapper as described

ah drat, you're right !

> there. I think that's a bit much for this far more marginal performance
> change. I'm happy to resubmit with the TODO comment though if you like?

Either that or Christian may add it when merging.

Cheers,

--
Greg
Christian Schoenebeck Sept. 9, 2022, 1:10 p.m. UTC | #4
On Donnerstag, 8. September 2022 13:23:53 CEST Linus Heckemann wrote:
> The previous implementation would iterate over the fid table for
> lookup operations, resulting in an operation with O(n) complexity on
> the number of open files and poor cache locality -- for every open,
> stat, read, write, etc operation.
> 
> This change uses a hashtable for this instead, significantly improving
> the performance of the 9p filesystem. The runtime of NixOS's simple
> installer test, which copies ~122k files totalling ~1.8GiB from 9p,
> decreased by a factor of about 10.
> 
> Signed-off-by: Linus Heckemann <git@sphalerite.org>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Reviewed-by: Greg Kurz <groug@kaod.org>
> ---

Queued on 9p.next:
https://github.com/cschoenebeck/qemu/commits/9p.next

I retained the BUG_ON() in get_fid(), Greg had a point there that continuing 
to work on a clunked fid would still be a bug.

I also added the suggested TODO comment for g_hash_table_steal_extended(), the 
actual change would be outside the scope of this patch.

And finally I gave this patch a whirl, and what can I say: that's just sick! 
Compiling sources with 9p is boosted by around factor 6..7 here! And running 
9p as root fs also no longer feels sluggish as before. I mean I knew that this 
fid list traversal performance issue existed and had it on my TODO list, but 
the actual impact exceeded my expectation by far.

Thanks!

Best regards,
Christian Schoenebeck
Christian Schoenebeck Sept. 19, 2022, 5:34 p.m. UTC | #5
On Freitag, 9. September 2022 15:10:48 CEST Christian Schoenebeck wrote:
> On Donnerstag, 8. September 2022 13:23:53 CEST Linus Heckemann wrote:
> > The previous implementation would iterate over the fid table for
> > lookup operations, resulting in an operation with O(n) complexity on
> > the number of open files and poor cache locality -- for every open,
> > stat, read, write, etc operation.
> > 
> > This change uses a hashtable for this instead, significantly improving
> > the performance of the 9p filesystem. The runtime of NixOS's simple
> > installer test, which copies ~122k files totalling ~1.8GiB from 9p,
> > decreased by a factor of about 10.
> > 
> > Signed-off-by: Linus Heckemann <git@sphalerite.org>
> > Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> > Reviewed-by: Greg Kurz <groug@kaod.org>
> > ---
> 
> Queued on 9p.next:
> https://github.com/cschoenebeck/qemu/commits/9p.next
> 
> I retained the BUG_ON() in get_fid(), Greg had a point there that continuing
> to work on a clunked fid would still be a bug.
> 
> I also added the suggested TODO comment for g_hash_table_steal_extended(),
> the actual change would be outside the scope of this patch.
> 
> And finally I gave this patch a whirl, and what can I say: that's just sick!
> Compiling sources with 9p is boosted by around factor 6..7 here! And
> running 9p as root fs also no longer feels sluggish as before. I mean I
> knew that this fid list traversal performance issue existed and had it on
> my TODO list, but the actual impact exceeded my expectation by far.

Linus, there is still something cheesy. After more testing, at a certain point
running the VM, the terminal is spilled with this message:

  GLib: g_hash_table_iter_next: assertion 'ri->version == ri->hash_table->version' failed

Looking at the glib sources, I think this warning means the iterator got
invalidated. Setting a breakpoint at glib function g_return_if_fail_warning I
got:

  Thread 1 "qemu-system-x86" hit Breakpoint 1, 0x00007ffff7aa9d80 in g_return_if_fail_warning () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
  (gdb) bt
  #0  0x00007ffff7aa9d80 in g_return_if_fail_warning () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #1  0x00007ffff7a8ea18 in g_hash_table_iter_next () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #2  0x0000555555998a7a in v9fs_mark_fids_unreclaim (pdu=0x555557a34c90, path=0x7ffba8ceff30) at ../hw/9pfs/9p.c:528
  #3  0x000055555599f7a0 in v9fs_unlinkat (opaque=0x555557a34c90) at ../hw/9pfs/9p.c:3170
  #4  0x000055555606dc4b in coroutine_trampoline (i0=1463900480, i1=21845) at ../util/coroutine-ucontext.c:177
  #5  0x00007ffff7749d40 in __start_context () at /lib/x86_64-linux-gnu/libc.so.6
  #6  0x00007fffffffd5f0 in  ()
  #7  0x0000000000000000 in  ()
  (gdb)

The while loop in v9fs_mark_fids_unreclaim() holds the hash table iterator
while the hash table is modified during the loop.

Would you please fix this? If you do, please use my already queued patch
version as basis.

Best regards,
Christian Schoenebeck
Linus Heckemann Sept. 22, 2022, 11:43 a.m. UTC | #6
Christian Schoenebeck <qemu_oss@crudebyte.com> writes:

> On Freitag, 9. September 2022 15:10:48 CEST Christian Schoenebeck wrote:
>> On Donnerstag, 8. September 2022 13:23:53 CEST Linus Heckemann wrote:
>> > The previous implementation would iterate over the fid table for
>> > lookup operations, resulting in an operation with O(n) complexity on
>> > the number of open files and poor cache locality -- for every open,
>> > stat, read, write, etc operation.
>> > 
>> > This change uses a hashtable for this instead, significantly improving
>> > the performance of the 9p filesystem. The runtime of NixOS's simple
>> > installer test, which copies ~122k files totalling ~1.8GiB from 9p,
>> > decreased by a factor of about 10.
>> > 
>> > Signed-off-by: Linus Heckemann <git@sphalerite.org>
>> > Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>> > Reviewed-by: Greg Kurz <groug@kaod.org>
>> > ---
>> 
>> Queued on 9p.next:
>> https://github.com/cschoenebeck/qemu/commits/9p.next
>> 
>> I retained the BUG_ON() in get_fid(), Greg had a point there that continuing
>> to work on a clunked fid would still be a bug.
>> 
>> I also added the suggested TODO comment for g_hash_table_steal_extended(),
>> the actual change would be outside the scope of this patch.
>> 
>> And finally I gave this patch a whirl, and what can I say: that's just sick!
>> Compiling sources with 9p is boosted by around factor 6..7 here! And
>> running 9p as root fs also no longer feels sluggish as before. I mean I
>> knew that this fid list traversal performance issue existed and had it on
>> my TODO list, but the actual impact exceeded my expectation by far.
>
> Linus, there is still something cheesy. After more testing, at a certain point
> running the VM, the terminal is spilled with this message:
>
>   GLib: g_hash_table_iter_next: assertion 'ri->version == ri->hash_table->version' failed
>
> Looking at the glib sources, I think this warning means the iterator got
> invalidated. Setting a breakpoint at glib function g_return_if_fail_warning I
> got:
>
>   Thread 1 "qemu-system-x86" hit Breakpoint 1, 0x00007ffff7aa9d80 in g_return_if_fail_warning () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>   (gdb) bt
>   #0  0x00007ffff7aa9d80 in g_return_if_fail_warning () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
>   #1  0x00007ffff7a8ea18 in g_hash_table_iter_next () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
>   #2  0x0000555555998a7a in v9fs_mark_fids_unreclaim (pdu=0x555557a34c90, path=0x7ffba8ceff30) at ../hw/9pfs/9p.c:528
>   #3  0x000055555599f7a0 in v9fs_unlinkat (opaque=0x555557a34c90) at ../hw/9pfs/9p.c:3170
>   #4  0x000055555606dc4b in coroutine_trampoline (i0=1463900480, i1=21845) at ../util/coroutine-ucontext.c:177
>   #5  0x00007ffff7749d40 in __start_context () at /lib/x86_64-linux-gnu/libc.so.6
>   #6  0x00007fffffffd5f0 in  ()
>   #7  0x0000000000000000 in  ()
>   (gdb)
>
> The while loop in v9fs_mark_fids_unreclaim() holds the hash table iterator
> while the hash table is modified during the loop.
>
> Would you please fix this? If you do, please use my already queued patch
> version as basis.
>
> Best regards,
> Christian Schoenebeck

Hi Christian,

Thanks for finding this!

I think I understand the problem, but I can't reproduce it at all (I've
been trying by hammering the filesystem with thousands of opens/closes
across several processes). Do you have a reliable way?

Cheers
Linus
Christian Schoenebeck Sept. 22, 2022, 1:24 p.m. UTC | #7
On Donnerstag, 22. September 2022 13:43:56 CEST Linus Heckemann wrote:
> Christian Schoenebeck <qemu_oss@crudebyte.com> writes:
> > On Freitag, 9. September 2022 15:10:48 CEST Christian Schoenebeck wrote:
> >> On Donnerstag, 8. September 2022 13:23:53 CEST Linus Heckemann wrote:
> >> > The previous implementation would iterate over the fid table for
> >> > lookup operations, resulting in an operation with O(n) complexity on
> >> > the number of open files and poor cache locality -- for every open,
> >> > stat, read, write, etc operation.
> >> > 
> >> > This change uses a hashtable for this instead, significantly improving
> >> > the performance of the 9p filesystem. The runtime of NixOS's simple
> >> > installer test, which copies ~122k files totalling ~1.8GiB from 9p,
> >> > decreased by a factor of about 10.
> >> > 
> >> > Signed-off-by: Linus Heckemann <git@sphalerite.org>
> >> > Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> >> > Reviewed-by: Greg Kurz <groug@kaod.org>
> >> > ---
> >> 
> >> Queued on 9p.next:
> >> https://github.com/cschoenebeck/qemu/commits/9p.next
> >> 
> >> I retained the BUG_ON() in get_fid(), Greg had a point there that
> >> continuing to work on a clunked fid would still be a bug.
> >> 
> >> I also added the suggested TODO comment for
> >> g_hash_table_steal_extended(),
> >> the actual change would be outside the scope of this patch.
> >> 
> >> And finally I gave this patch a whirl, and what can I say: that's just
> >> sick! Compiling sources with 9p is boosted by around factor 6..7 here!
> >> And running 9p as root fs also no longer feels sluggish as before. I
> >> mean I knew that this fid list traversal performance issue existed and
> >> had it on my TODO list, but the actual impact exceeded my expectation by
> >> far.> 
> > Linus, there is still something cheesy. After more testing, at a certain
> > point> 
> > running the VM, the terminal is spilled with this message:
> >   GLib: g_hash_table_iter_next: assertion 'ri->version ==
> >   ri->hash_table->version' failed> 
> > Looking at the glib sources, I think this warning means the iterator got
> > invalidated. Setting a breakpoint at glib function
> > g_return_if_fail_warning I> 
> > got:
> >   Thread 1 "qemu-system-x86" hit Breakpoint 1, 0x00007ffff7aa9d80 in
> >   g_return_if_fail_warning () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> >   (gdb) bt
> >   #0  0x00007ffff7aa9d80 in g_return_if_fail_warning () at
> >   /lib/x86_64-linux-gnu/libglib-2.0.so.0 #1  0x00007ffff7a8ea18 in
> >   g_hash_table_iter_next () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 #2 
> >   0x0000555555998a7a in v9fs_mark_fids_unreclaim (pdu=0x555557a34c90,
> >   path=0x7ffba8ceff30) at ../hw/9pfs/9p.c:528 #3  0x000055555599f7a0 in
> >   v9fs_unlinkat (opaque=0x555557a34c90) at ../hw/9pfs/9p.c:3170 #4 
> >   0x000055555606dc4b in coroutine_trampoline (i0=1463900480, i1=21845) at
> >   ../util/coroutine-ucontext.c:177 #5  0x00007ffff7749d40 in
> >   __start_context () at /lib/x86_64-linux-gnu/libc.so.6 #6 
> >   0x00007fffffffd5f0 in  ()
> >   #7  0x0000000000000000 in  ()
> >   (gdb)
> > 
> > The while loop in v9fs_mark_fids_unreclaim() holds the hash table iterator
> > while the hash table is modified during the loop.
> > 
> > Would you please fix this? If you do, please use my already queued patch
> > version as basis.
> > 
> > Best regards,
> > Christian Schoenebeck
> 
> Hi Christian,
> 
> Thanks for finding this!
> 
> I think I understand the problem, but I can't reproduce it at all (I've
> been trying by hammering the filesystem with thousands of opens/closes
> across several processes). Do you have a reliable way?

I also didn't hit this issue in my initial tests. I do hit this after about 
just a minute though when running 9p as root fs [1] and then starting to 
compile KDE [2] inside the VM.

[1] https://wiki.qemu.org/Documentation/9p_root_fs
[2] https://community.kde.org/Get_Involved/development

Independent of reproduction, let me elaborate what's going on, as this issue 
is probably not that obvious:

1.) Like with any ordered container, the glib hash iterator becomes invalid as 
soon as the hash table got modified.

Fortunately glib detects this case by maintaining a "version" field on the 
hash table which is bumped whenever the hash table was modified, and likewise 
a "version" field on the iterator structure and detecting an invalid iterator 
by comparing [3] the two "version" fields when calling 
g_hash_table_iter_next() for instance:

  g_return_val_if_fail (ri->version == ri->hash_table->version, FALSE);

and the g_return_val_if_fail() macro calls g_return_if_fail_warning() to print 
the warning message in this case and then just immediately returns from 
g_hash_table_iter_next().

So this is not nitpicking, it will actually start severe 9p server 
misbehaviour at this point.

[3] https://github.com/GNOME/glib/blob/main/glib/ghash.c#L1182

2.) The hash table loop inside v9fs_mark_fids_unreclaim() does not directly 
modify the "fids" hash table. But inside that loop we get contention, because 
even though basically everything inside 9p.c is executed on QEMU main thread 
only, all the 9p filesystem driver calls are dispatched [4] to a worker thread 
and then after fs driver task completion, dispatched back to main thread. And 
specifically in v9fs_mark_fids_unreclaim() these are the two functions 
put_fid() and v9fs_reopen_fid() that are endangered to this possibility (i.e. 
those two may "yield" [4] the coroutine).

So what happens is, the coroutine is dispatched to the fs worker thread 
(inside those two functions), in the meantime main thread picks & continues to 
run another coroutine (i.e. processes another active 9p request), and that in 
turn might of course modify the 'fids' hash table, depending on what kind of 
9p request that is. Then main thread gets back to the original couroutine 
inside 9fs_mark_fids_unreclaim(), and eventually continues the hash table loop 
there, and bam.

[4] https://wiki.qemu.org/Documentation/9p#Coroutines

---

As for a solution: in contrast to the previous list based implementation, I 
think we can't (or shouldn't) recover from an invalidated hash table iterator, 
even though we could detect this case by checking the return value of 
g_hash_table_iter_next().

I think it would make sense instead to first collect the respective fids 
inside that loop with a local container (e.g. a local list on the stack), and 
then putting the fids subsequently in a separate loop below.

Does this make sense?

Best regards,
Christian Schoenebeck
diff mbox series

Patch

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index aebadeaa03..98a475e560 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -282,33 +282,31 @@  static V9fsFidState *coroutine_fn get_fid(V9fsPDU *pdu, int32_t fid)
     V9fsFidState *f;
     V9fsState *s = pdu->s;
 
-    QSIMPLEQ_FOREACH(f, &s->fid_list, next) {
-        BUG_ON(f->clunked);
-        if (f->fid == fid) {
-            /*
-             * Update the fid ref upfront so that
-             * we don't get reclaimed when we yield
-             * in open later.
-             */
-            f->ref++;
-            /*
-             * check whether we need to reopen the
-             * file. We might have closed the fd
-             * while trying to free up some file
-             * descriptors.
-             */
-            err = v9fs_reopen_fid(pdu, f);
-            if (err < 0) {
-                f->ref--;
-                return NULL;
-            }
-            /*
-             * Mark the fid as referenced so that the LRU
-             * reclaim won't close the file descriptor
-             */
-            f->flags |= FID_REFERENCED;
-            return f;
+    f = g_hash_table_lookup(s->fids, GINT_TO_POINTER(fid));
+    if (f) {
+        /*
+         * Update the fid ref upfront so that
+         * we don't get reclaimed when we yield
+         * in open later.
+         */
+        f->ref++;
+        /*
+         * check whether we need to reopen the
+         * file. We might have closed the fd
+         * while trying to free up some file
+         * descriptors.
+         */
+        err = v9fs_reopen_fid(pdu, f);
+        if (err < 0) {
+            f->ref--;
+            return NULL;
         }
+        /*
+         * Mark the fid as referenced so that the LRU
+         * reclaim won't close the file descriptor
+         */
+        f->flags |= FID_REFERENCED;
+        return f;
     }
     return NULL;
 }
@@ -317,12 +315,9 @@  static V9fsFidState *alloc_fid(V9fsState *s, int32_t fid)
 {
     V9fsFidState *f;
 
-    QSIMPLEQ_FOREACH(f, &s->fid_list, next) {
+    if (g_hash_table_contains(s->fids, GINT_TO_POINTER(fid))) {
         /* If fid is already there return NULL */
-        BUG_ON(f->clunked);
-        if (f->fid == fid) {
-            return NULL;
-        }
+        return NULL;
     }
     f = g_new0(V9fsFidState, 1);
     f->fid = fid;
@@ -333,7 +328,7 @@  static V9fsFidState *alloc_fid(V9fsState *s, int32_t fid)
      * reclaim won't close the file descriptor
      */
     f->flags |= FID_REFERENCED;
-    QSIMPLEQ_INSERT_TAIL(&s->fid_list, f, next);
+    g_hash_table_insert(s->fids, GINT_TO_POINTER(fid), f);
 
     v9fs_readdir_init(s->proto_version, &f->fs.dir);
     v9fs_readdir_init(s->proto_version, &f->fs_reclaim.dir);
@@ -424,12 +419,11 @@  static V9fsFidState *clunk_fid(V9fsState *s, int32_t fid)
 {
     V9fsFidState *fidp;
 
-    QSIMPLEQ_FOREACH(fidp, &s->fid_list, next) {
-        if (fidp->fid == fid) {
-            QSIMPLEQ_REMOVE(&s->fid_list, fidp, V9fsFidState, next);
-            fidp->clunked = true;
-            return fidp;
-        }
+    fidp = g_hash_table_lookup(s->fids, GINT_TO_POINTER(fid));
+    if (fidp) {
+        g_hash_table_remove(s->fids, GINT_TO_POINTER(fid));
+        fidp->clunked = true;
+        return fidp;
     }
     return NULL;
 }
@@ -439,10 +433,15 @@  void coroutine_fn v9fs_reclaim_fd(V9fsPDU *pdu)
     int reclaim_count = 0;
     V9fsState *s = pdu->s;
     V9fsFidState *f;
+    GHashTableIter iter;
+    gpointer fid;
+
+    g_hash_table_iter_init(&iter, s->fids);
+
     QSLIST_HEAD(, V9fsFidState) reclaim_list =
         QSLIST_HEAD_INITIALIZER(reclaim_list);
 
-    QSIMPLEQ_FOREACH(f, &s->fid_list, next) {
+    while (g_hash_table_iter_next(&iter, &fid, (gpointer *) &f)) {
         /*
          * Unlink fids cannot be reclaimed. Check
          * for them and skip them. Also skip fids
@@ -518,23 +517,19 @@  static int coroutine_fn v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path)
 {
     int err;
     V9fsState *s = pdu->s;
-    V9fsFidState *fidp, *fidp_next;
+    V9fsFidState *fidp;
+    gpointer fid;
+    GHashTableIter iter;
 
-    fidp = QSIMPLEQ_FIRST(&s->fid_list);
-    if (!fidp) {
-        return 0;
-    }
+    g_hash_table_iter_init(&iter, s->fids);
+
+    while (g_hash_table_iter_next(&iter, &fid, (gpointer *) &fidp)) {
+        /*
+         * Ensure the fid survives a potential clunk request during
+         * v9fs_reopen_fid.
+         */
+        fidp->ref++;
 
-    /*
-     * v9fs_reopen_fid() can yield : a reference on the fid must be held
-     * to ensure its pointer remains valid and we can safely pass it to
-     * QSIMPLEQ_NEXT(). The corresponding put_fid() can also yield so
-     * we must keep a reference on the next fid as well. So the logic here
-     * is to get a reference on a fid and only put it back during the next
-     * iteration after we could get a reference on the next fid. Start with
-     * the first one.
-     */
-    for (fidp->ref++; fidp; fidp = fidp_next) {
         if (fidp->path.size == path->size &&
             !memcmp(fidp->path.data, path->data, path->size)) {
             /* Mark the fid non reclaimable. */
@@ -548,16 +543,6 @@  static int coroutine_fn v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path)
             }
         }
 
-        fidp_next = QSIMPLEQ_NEXT(fidp, next);
-
-        if (fidp_next) {
-            /*
-             * Ensure the next fid survives a potential clunk request during
-             * put_fid() below and v9fs_reopen_fid() in the next iteration.
-             */
-            fidp_next->ref++;
-        }
-
         /* We're done with this fid */
         put_fid(pdu, fidp);
     }
@@ -569,18 +554,20 @@  static void coroutine_fn virtfs_reset(V9fsPDU *pdu)
 {
     V9fsState *s = pdu->s;
     V9fsFidState *fidp;
+    gpointer fid;
+    GHashTableIter iter;
+
+    g_hash_table_iter_init(&iter, s->fids);
 
     /* Free all fids */
-    while (!QSIMPLEQ_EMPTY(&s->fid_list)) {
-        /* Get fid */
-        fidp = QSIMPLEQ_FIRST(&s->fid_list);
+    while (g_hash_table_iter_next(&iter, &fid, (gpointer *) &fidp)) {
         fidp->ref++;
 
         /* Clunk fid */
-        QSIMPLEQ_REMOVE(&s->fid_list, fidp, V9fsFidState, next);
         fidp->clunked = true;
 
         put_fid(pdu, fidp);
+        g_hash_table_iter_remove(&iter);
     }
 }
 
@@ -3205,6 +3192,8 @@  static int coroutine_fn v9fs_complete_rename(V9fsPDU *pdu, V9fsFidState *fidp,
     V9fsFidState *tfidp;
     V9fsState *s = pdu->s;
     V9fsFidState *dirfidp = NULL;
+    GHashTableIter iter;
+    gpointer fid;
 
     v9fs_path_init(&new_path);
     if (newdirfid != -1) {
@@ -3238,11 +3227,13 @@  static int coroutine_fn v9fs_complete_rename(V9fsPDU *pdu, V9fsFidState *fidp,
     if (err < 0) {
         goto out;
     }
+
     /*
      * Fixup fid's pointing to the old name to
      * start pointing to the new name
      */
-    QSIMPLEQ_FOREACH(tfidp, &s->fid_list, next) {
+    g_hash_table_iter_init(&iter, s->fids);
+    while (g_hash_table_iter_next(&iter, &fid, (gpointer *) &tfidp)) {
         if (v9fs_path_is_ancestor(&fidp->path, &tfidp->path)) {
             /* replace the name */
             v9fs_fix_path(&tfidp->path, &new_path, strlen(fidp->path.data));
@@ -3320,6 +3311,8 @@  static int coroutine_fn v9fs_fix_fid_paths(V9fsPDU *pdu, V9fsPath *olddir,
     V9fsPath oldpath, newpath;
     V9fsState *s = pdu->s;
     int err;
+    GHashTableIter iter;
+    gpointer fid;
 
     v9fs_path_init(&oldpath);
     v9fs_path_init(&newpath);
@@ -3336,7 +3329,8 @@  static int coroutine_fn v9fs_fix_fid_paths(V9fsPDU *pdu, V9fsPath *olddir,
      * Fixup fid's pointing to the old name to
      * start pointing to the new name
      */
-    QSIMPLEQ_FOREACH(tfidp, &s->fid_list, next) {
+    g_hash_table_iter_init(&iter, s->fids);
+    while (g_hash_table_iter_next(&iter, &fid, (gpointer *) &tfidp)) {
         if (v9fs_path_is_ancestor(&oldpath, &tfidp->path)) {
             /* replace the name */
             v9fs_fix_path(&tfidp->path, &newpath, strlen(oldpath.data));
@@ -4226,7 +4220,7 @@  int v9fs_device_realize_common(V9fsState *s, const V9fsTransport *t,
     s->ctx.fmode = fse->fmode;
     s->ctx.dmode = fse->dmode;
 
-    QSIMPLEQ_INIT(&s->fid_list);
+    s->fids = g_hash_table_new(NULL, NULL);
     qemu_co_rwlock_init(&s->rename_lock);
 
     if (s->ops->init(&s->ctx, errp) < 0) {
@@ -4286,6 +4280,10 @@  void v9fs_device_unrealize_common(V9fsState *s)
     if (s->ctx.fst) {
         fsdev_throttle_cleanup(s->ctx.fst);
     }
+    if (s->fids) {
+        g_hash_table_destroy(s->fids);
+        s->fids = NULL;
+    }
     g_free(s->tag);
     qp_table_destroy(&s->qpd_table);
     qp_table_destroy(&s->qpp_table);
diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
index 994f952600..10fd2076c2 100644
--- a/hw/9pfs/9p.h
+++ b/hw/9pfs/9p.h
@@ -339,7 +339,7 @@  typedef struct {
 struct V9fsState {
     QLIST_HEAD(, V9fsPDU) free_list;
     QLIST_HEAD(, V9fsPDU) active_list;
-    QSIMPLEQ_HEAD(, V9fsFidState) fid_list;
+    GHashTable *fids;
     FileOperations *ops;
     FsContext ctx;
     char *tag;