diff mbox series

vfio/common: Do not g_free in vfio_get_iommu_info

Message ID 20220910004245.2878-1-nicolinc@nvidia.com
State New
Headers show
Series vfio/common: Do not g_free in vfio_get_iommu_info | expand

Commit Message

Nicolin Chen Sept. 10, 2022, 12:42 a.m. UTC
Its caller vfio_connect_container() assigns a default value
to info->iova_pgsizes, even if vfio_get_iommu_info() fails.
This would result in a "Segmentation fault" error, when the
VFIO_IOMMU_GET_INFO ioctl errors out.

Since the caller has g_free already, drop the g_free in its
rollback routine and add a line of comments to highlight it.

Fixes: 87ea529c50 ("vfio: Get migration capability flags for container")
Cc: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 hw/vfio/common.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Cornelia Huck Sept. 12, 2022, 12:38 p.m. UTC | #1
On Fri, Sep 09 2022, Nicolin Chen <nicolinc@nvidia.com> wrote:

> Its caller vfio_connect_container() assigns a default value
> to info->iova_pgsizes, even if vfio_get_iommu_info() fails.
> This would result in a "Segmentation fault" error, when the
> VFIO_IOMMU_GET_INFO ioctl errors out.
>
> Since the caller has g_free already, drop the g_free in its
> rollback routine and add a line of comments to highlight it.

There's basically two ways to fix this:

- return *info in any case, even on error
- free *info on error, and make sure that the caller doesn't try to
  access *info if the function returned !0

The problem with the first option is that the caller will access invalid
information if it neglects to check the return code, and that might lead
to not-that-obvious errors; in the second case, a broken caller would at
least fail quickly with a segfault. The current code is easier to fix
with the first option.

I think I'd prefer the second option; but obviously maintainer's choice.

>
> Fixes: 87ea529c50 ("vfio: Get migration capability flags for container")
> Cc: Kirti Wankhede <kwankhede@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  hw/vfio/common.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ace9562a9b..51b2e05c76 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1940,6 +1940,7 @@ static int vfio_init_container(VFIOContainer *container, int group_fd,
>      return 0;
>  }
>  
> +/* The caller is responsible for g_free(*info) */
>  static int vfio_get_iommu_info(VFIOContainer *container,
>                                 struct vfio_iommu_type1_info **info)
>  {
> @@ -1951,8 +1952,6 @@ again:
>      (*info)->argsz = argsz;
>  
>      if (ioctl(container->fd, VFIO_IOMMU_GET_INFO, *info)) {
> -        g_free(*info);
> -        *info = NULL;
>          return -errno;
>      }
>
Nicolin Chen Sept. 12, 2022, 8:51 p.m. UTC | #2
On Mon, Sep 12, 2022 at 02:38:52PM +0200, Cornelia Huck wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Sep 09 2022, Nicolin Chen <nicolinc@nvidia.com> wrote:
> 
> > Its caller vfio_connect_container() assigns a default value
> > to info->iova_pgsizes, even if vfio_get_iommu_info() fails.
> > This would result in a "Segmentation fault" error, when the
> > VFIO_IOMMU_GET_INFO ioctl errors out.
> >
> > Since the caller has g_free already, drop the g_free in its
> > rollback routine and add a line of comments to highlight it.
> 
> There's basically two ways to fix this:
> 
> - return *info in any case, even on error
> - free *info on error, and make sure that the caller doesn't try to
>   access *info if the function returned !0
> 
> The problem with the first option is that the caller will access invalid
> information if it neglects to check the return code, and that might lead
> to not-that-obvious errors; in the second case, a broken caller would at
> least fail quickly with a segfault. The current code is easier to fix
> with the first option.
> 
> I think I'd prefer the second option; but obviously maintainer's choice.

The caller does check rc all the time. So I made a smaller fix
(the first option). Attaching the git-diff for the second one.

Alex, please let me know which one you prefer. Thanks!

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 51b2e05c76..74431411ab 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -2109,6 +2109,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     case VFIO_TYPE1_IOMMU:
     {
         struct vfio_iommu_type1_info *info;
+        uint64_t iova_pgsizes;
 
         /*
          * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
@@ -2119,20 +2120,22 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
          */
         ret = vfio_get_iommu_info(container, &info);
 
-        if (ret || !(info->flags & VFIO_IOMMU_INFO_PGSIZES)) {
+        if (info && (info->flags & VFIO_IOMMU_INFO_PGSIZES)) {
+            iova_pgsizes = info->iova_pgsizes;
+        } else {
             /* Assume 4k IOVA page size */
-            info->iova_pgsizes = 4096;
+            iova_pgsizes = 4096;
         }
-        vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
-        container->pgsizes = info->iova_pgsizes;
+        vfio_host_win_add(container, 0, (hwaddr)-1, iova_pgsizes);
+        container->pgsizes = iova_pgsizes;
 
         /* The default in the kernel ("dma_entry_limit") is 65535. */
         container->dma_max_mappings = 65535;
-        if (!ret) {
+        if (info) {
             vfio_get_info_dma_avail(info, &container->dma_max_mappings);
             vfio_get_iommu_info_migration(container, info);
+            g_free(info);
         }
-        g_free(info);
         break;
     }
     case VFIO_SPAPR_TCE_v2_IOMMU:
Alex Williamson Sept. 14, 2022, 6:10 p.m. UTC | #3
On Mon, 12 Sep 2022 13:51:30 -0700
Nicolin Chen <nicolinc@nvidia.com> wrote:

> On Mon, Sep 12, 2022 at 02:38:52PM +0200, Cornelia Huck wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Fri, Sep 09 2022, Nicolin Chen <nicolinc@nvidia.com> wrote:
> >   
> > > Its caller vfio_connect_container() assigns a default value
> > > to info->iova_pgsizes, even if vfio_get_iommu_info() fails.
> > > This would result in a "Segmentation fault" error, when the
> > > VFIO_IOMMU_GET_INFO ioctl errors out.
> > >
> > > Since the caller has g_free already, drop the g_free in its
> > > rollback routine and add a line of comments to highlight it.  
> > 
> > There's basically two ways to fix this:
> > 
> > - return *info in any case, even on error
> > - free *info on error, and make sure that the caller doesn't try to
> >   access *info if the function returned !0
> > 
> > The problem with the first option is that the caller will access invalid
> > information if it neglects to check the return code, and that might lead
> > to not-that-obvious errors; in the second case, a broken caller would at
> > least fail quickly with a segfault. The current code is easier to fix
> > with the first option.
> > 
> > I think I'd prefer the second option; but obviously maintainer's choice.  
> 
> The caller does check rc all the time. So I made a smaller fix
> (the first option). Attaching the git-diff for the second one.
> 
> Alex, please let me know which one you prefer. Thanks!
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 51b2e05c76..74431411ab 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
[snip]

I think we can do better than that, I don't think we need to maintain
the existing grouping, and that FIXME comment is outdated and has
drifted from the relevant line of code.  What about:

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ace9562a9ba1..8d8c54d59083 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -2111,29 +2111,31 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     {
         struct vfio_iommu_type1_info *info;
 
-        /*
-         * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
-         * IOVA whatsoever.  That's not actually true, but the current
-         * kernel interface doesn't tell us what it can map, and the
-         * existing Type1 IOMMUs generally support any IOVA we're
-         * going to actually try in practice.
+	/*
+         * Setup defaults for container pgsizes and dma_max_mappings if not
+         * provided by kernel below.
          */
-        ret = vfio_get_iommu_info(container, &info);
-
-        if (ret || !(info->flags & VFIO_IOMMU_INFO_PGSIZES)) {
-            /* Assume 4k IOVA page size */
-            info->iova_pgsizes = 4096;
-        }
-        vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
-        container->pgsizes = info->iova_pgsizes;
-
-        /* The default in the kernel ("dma_entry_limit") is 65535. */
+        container->pgsizes = 4096;
         container->dma_max_mappings = 65535;
+
+        ret = vfio_get_iommu_info(container, &info);
         if (!ret) {
+            if (info->flags & VFIO_IOMMU_INFO_PGSIZES) {
+                container->pgsizes = info->iova_pgsizes;
+            }
+
             vfio_get_info_dma_avail(info, &container->dma_max_mappings);
             vfio_get_iommu_info_migration(container, info);
+            g_free(info);
         }
-        g_free(info);
+
+        /*
+         * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
+         * information to get the actual window extent rather than assume
+         * a 64-bit IOVA address space.
+         */
+        vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
+
         break;
     }
     case VFIO_SPAPR_TCE_v2_IOMMU:
Alex Williamson Sept. 14, 2022, 6:30 p.m. UTC | #4
On Wed, 14 Sep 2022 12:10:29 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Mon, 12 Sep 2022 13:51:30 -0700
> Nicolin Chen <nicolinc@nvidia.com> wrote:
> 
> > On Mon, Sep 12, 2022 at 02:38:52PM +0200, Cornelia Huck wrote:  
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On Fri, Sep 09 2022, Nicolin Chen <nicolinc@nvidia.com> wrote:
> > >     
> > > > Its caller vfio_connect_container() assigns a default value
> > > > to info->iova_pgsizes, even if vfio_get_iommu_info() fails.
> > > > This would result in a "Segmentation fault" error, when the
> > > > VFIO_IOMMU_GET_INFO ioctl errors out.
> > > >
> > > > Since the caller has g_free already, drop the g_free in its
> > > > rollback routine and add a line of comments to highlight it.    
> > > 
> > > There's basically two ways to fix this:
> > > 
> > > - return *info in any case, even on error
> > > - free *info on error, and make sure that the caller doesn't try to
> > >   access *info if the function returned !0
> > > 
> > > The problem with the first option is that the caller will access invalid
> > > information if it neglects to check the return code, and that might lead
> > > to not-that-obvious errors; in the second case, a broken caller would at
> > > least fail quickly with a segfault. The current code is easier to fix
> > > with the first option.
> > > 
> > > I think I'd prefer the second option; but obviously maintainer's choice.    
> > 
> > The caller does check rc all the time. So I made a smaller fix
> > (the first option). Attaching the git-diff for the second one.
> > 
> > Alex, please let me know which one you prefer. Thanks!
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 51b2e05c76..74431411ab 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c  
> [snip]
> 
> I think we can do better than that, I don't think we need to maintain
> the existing grouping, and that FIXME comment is outdated and has
> drifted from the relevant line of code.  What about:
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ace9562a9ba1..8d8c54d59083 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -2111,29 +2111,31 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>      {
>          struct vfio_iommu_type1_info *info;
>  
> -        /*
> -         * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
> -         * IOVA whatsoever.  That's not actually true, but the current
> -         * kernel interface doesn't tell us what it can map, and the
> -         * existing Type1 IOMMUs generally support any IOVA we're
> -         * going to actually try in practice.
> +	/*
> +         * Setup defaults for container pgsizes and dma_max_mappings if not
> +         * provided by kernel below.
>           */
> -        ret = vfio_get_iommu_info(container, &info);
> -
> -        if (ret || !(info->flags & VFIO_IOMMU_INFO_PGSIZES)) {
> -            /* Assume 4k IOVA page size */
> -            info->iova_pgsizes = 4096;
> -        }
> -        vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
> -        container->pgsizes = info->iova_pgsizes;
> -
> -        /* The default in the kernel ("dma_entry_limit") is 65535. */
> +        container->pgsizes = 4096;
>          container->dma_max_mappings = 65535;
> +
> +        ret = vfio_get_iommu_info(container, &info);
>          if (!ret) {

Or better still:

            if (!vfio_get_iommu_info(container, &info)) {

> +            if (info->flags & VFIO_IOMMU_INFO_PGSIZES) {
> +                container->pgsizes = info->iova_pgsizes;
> +            }
> +
>              vfio_get_info_dma_avail(info, &container->dma_max_mappings);
>              vfio_get_iommu_info_migration(container, info);
> +            g_free(info);
>          }
> -        g_free(info);
> +
> +        /*
> +         * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> +         * information to get the actual window extent rather than assume
> +         * a 64-bit IOVA address space.
> +         */
> +        vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
> +
>          break;
>      }
>      case VFIO_SPAPR_TCE_v2_IOMMU:
>
Nicolin Chen Sept. 14, 2022, 7:02 p.m. UTC | #5
Hi Alex,

On Wed, Sep 14, 2022 at 12:10:29PM -0600, Alex Williamson wrote:

> > > > Its caller vfio_connect_container() assigns a default value
> > > > to info->iova_pgsizes, even if vfio_get_iommu_info() fails.
> > > > This would result in a "Segmentation fault" error, when the
> > > > VFIO_IOMMU_GET_INFO ioctl errors out.
> > > >
> > > > Since the caller has g_free already, drop the g_free in its
> > > > rollback routine and add a line of comments to highlight it.
> > >
> > > There's basically two ways to fix this:
> > >
> > > - return *info in any case, even on error
> > > - free *info on error, and make sure that the caller doesn't try to
> > >   access *info if the function returned !0
> > >
> > > The problem with the first option is that the caller will access invalid
> > > information if it neglects to check the return code, and that might lead
> > > to not-that-obvious errors; in the second case, a broken caller would at
> > > least fail quickly with a segfault. The current code is easier to fix
> > > with the first option.
> > >
> > > I think I'd prefer the second option; but obviously maintainer's choice.
> >
> > The caller does check rc all the time. So I made a smaller fix
> > (the first option). Attaching the git-diff for the second one.
> >
> > Alex, please let me know which one you prefer. Thanks!

> I think we can do better than that, I don't think we need to maintain
> the existing grouping, and that FIXME comment is outdated and has
> drifted from the relevant line of code.  What about:

Your patch looks good and clean to me (some nits inline).

Would you please send and merge it, replacing mine?

> +       /*
> +         * Setup defaults for container pgsizes and dma_max_mappings if not
> +         * provided by kernel below.
>           */

Indentation is misaligned at the first line.

> +        container->pgsizes = 4096;

This might be a separate question/issue: I wonder if we should use
"sysconf(_SC_PAGE_SIZE)" here instead of 4096.

With a kernel using a larger page size, e.g. CONFIG_ARM64_64K_PAGES,
the IO page size is likely to be 64K too. If the ioctl fails, this
default 4K setup won't work.

Thanks!
Nic
Alex Williamson Sept. 14, 2022, 7:53 p.m. UTC | #6
On Wed, 14 Sep 2022 12:02:56 -0700
Nicolin Chen <nicolinc@nvidia.com> wrote:

> Hi Alex,
> 
> On Wed, Sep 14, 2022 at 12:10:29PM -0600, Alex Williamson wrote:
> 
> > > > > Its caller vfio_connect_container() assigns a default value
> > > > > to info->iova_pgsizes, even if vfio_get_iommu_info() fails.
> > > > > This would result in a "Segmentation fault" error, when the
> > > > > VFIO_IOMMU_GET_INFO ioctl errors out.
> > > > >
> > > > > Since the caller has g_free already, drop the g_free in its
> > > > > rollback routine and add a line of comments to highlight it.  
> > > >
> > > > There's basically two ways to fix this:
> > > >
> > > > - return *info in any case, even on error
> > > > - free *info on error, and make sure that the caller doesn't try to
> > > >   access *info if the function returned !0
> > > >
> > > > The problem with the first option is that the caller will access invalid
> > > > information if it neglects to check the return code, and that might lead
> > > > to not-that-obvious errors; in the second case, a broken caller would at
> > > > least fail quickly with a segfault. The current code is easier to fix
> > > > with the first option.
> > > >
> > > > I think I'd prefer the second option; but obviously maintainer's choice.  
> > >
> > > The caller does check rc all the time. So I made a smaller fix
> > > (the first option). Attaching the git-diff for the second one.
> > >
> > > Alex, please let me know which one you prefer. Thanks!  
> 
> > I think we can do better than that, I don't think we need to maintain
> > the existing grouping, and that FIXME comment is outdated and has
> > drifted from the relevant line of code.  What about:  
> 
> Your patch looks good and clean to me (some nits inline).
> 
> Would you please send and merge it, replacing mine?
> 
> > +       /*
> > +         * Setup defaults for container pgsizes and dma_max_mappings if not
> > +         * provided by kernel below.
> >           */  
> 
> Indentation is misaligned at the first line.

Oops, will run through checkpatch.

> 
> > +        container->pgsizes = 4096;  
> 
> This might be a separate question/issue: I wonder if we should use
> "sysconf(_SC_PAGE_SIZE)" here instead of 4096.
> 
> With a kernel using a larger page size, e.g. CONFIG_ARM64_64K_PAGES,
> the IO page size is likely to be 64K too. If the ioctl fails, this
> default 4K setup won't work.

Perhaps, but IIRC this solution came about because we originally forgot
to expose the IOMMU_INFO flag to indicate the pgsize field was valid.
At the time we only supported 4K systems, so it made sense to provide
this default, though it is indeed dated.

TBH, I don't really see why we should try to continue if the ioctl
itself fails, so maybe this should be:

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ace9562a9ba1..ad188b7649e6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -2111,29 +2111,31 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     {
         struct vfio_iommu_type1_info *info;
 
-        /*
-         * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
-         * IOVA whatsoever.  That's not actually true, but the current
-         * kernel interface doesn't tell us what it can map, and the
-         * existing Type1 IOMMUs generally support any IOVA we're
-         * going to actually try in practice.
-         */
         ret = vfio_get_iommu_info(container, &info);
+        if (ret) {
+            error_setg_errno(errp, -ret, "Failed to get VFIO IOMMU info");
+            goto enable_discards_exit:;
+        }
 
-        if (ret || !(info->flags & VFIO_IOMMU_INFO_PGSIZES)) {
-            /* Assume 4k IOVA page size */
-            info->iova_pgsizes = 4096;
+        if (info->flags & VFIO_IOMMU_INFO_PGSIZES) {
+            container->pgsizes = info->iova_pgsizes;
+        } else {
+            container->pgsizes = qemu_real_host_page_size();
         }
-        vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
-        container->pgsizes = info->iova_pgsizes;
 
-        /* The default in the kernel ("dma_entry_limit") is 65535. */
-        container->dma_max_mappings = 65535;
-        if (!ret) {
-            vfio_get_info_dma_avail(info, &container->dma_max_mappings);
-            vfio_get_iommu_info_migration(container, info);
+        if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
+            container->dma_max_mappings = 65535;
         }
+        vfio_get_iommu_info_migration(container, info);
         g_free(info);
+
+        /*
+         * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
+         * information to get the actual window extent rather than assume
+         * a 64-bit IOVA address space.
+         */
+        vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
+
         break;
     }
     case VFIO_SPAPR_TCE_v2_IOMMU:
Alex Williamson Sept. 14, 2022, 8:03 p.m. UTC | #7
On Wed, 14 Sep 2022 13:53:39 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Wed, 14 Sep 2022 12:02:56 -0700
> Nicolin Chen <nicolinc@nvidia.com> wrote:
> 
> > Hi Alex,
> > 
> > On Wed, Sep 14, 2022 at 12:10:29PM -0600, Alex Williamson wrote:
> >   
> > > > > > Its caller vfio_connect_container() assigns a default value
> > > > > > to info->iova_pgsizes, even if vfio_get_iommu_info() fails.
> > > > > > This would result in a "Segmentation fault" error, when the
> > > > > > VFIO_IOMMU_GET_INFO ioctl errors out.
> > > > > >
> > > > > > Since the caller has g_free already, drop the g_free in its
> > > > > > rollback routine and add a line of comments to highlight it.    
> > > > >
> > > > > There's basically two ways to fix this:
> > > > >
> > > > > - return *info in any case, even on error
> > > > > - free *info on error, and make sure that the caller doesn't try to
> > > > >   access *info if the function returned !0
> > > > >
> > > > > The problem with the first option is that the caller will access invalid
> > > > > information if it neglects to check the return code, and that might lead
> > > > > to not-that-obvious errors; in the second case, a broken caller would at
> > > > > least fail quickly with a segfault. The current code is easier to fix
> > > > > with the first option.
> > > > >
> > > > > I think I'd prefer the second option; but obviously maintainer's choice.    
> > > >
> > > > The caller does check rc all the time. So I made a smaller fix
> > > > (the first option). Attaching the git-diff for the second one.
> > > >
> > > > Alex, please let me know which one you prefer. Thanks!    
> >   
> > > I think we can do better than that, I don't think we need to maintain
> > > the existing grouping, and that FIXME comment is outdated and has
> > > drifted from the relevant line of code.  What about:    
> > 
> > Your patch looks good and clean to me (some nits inline).
> > 
> > Would you please send and merge it, replacing mine?
> >   
> > > +       /*
> > > +         * Setup defaults for container pgsizes and dma_max_mappings if not
> > > +         * provided by kernel below.
> > >           */    
> > 
> > Indentation is misaligned at the first line.  
> 
> Oops, will run through checkpatch.
> 
> >   
> > > +        container->pgsizes = 4096;    
> > 
> > This might be a separate question/issue: I wonder if we should use
> > "sysconf(_SC_PAGE_SIZE)" here instead of 4096.
> > 
> > With a kernel using a larger page size, e.g. CONFIG_ARM64_64K_PAGES,
> > the IO page size is likely to be 64K too. If the ioctl fails, this
> > default 4K setup won't work.  
> 
> Perhaps, but IIRC this solution came about because we originally forgot
> to expose the IOMMU_INFO flag to indicate the pgsize field was valid.
> At the time we only supported 4K systems, so it made sense to provide
> this default, though it is indeed dated.
> 
> TBH, I don't really see why we should try to continue if the ioctl
> itself fails, so maybe this should be:
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ace9562a9ba1..ad188b7649e6 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -2111,29 +2111,31 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>      {
>          struct vfio_iommu_type1_info *info;
>  
> -        /*
> -         * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
> -         * IOVA whatsoever.  That's not actually true, but the current
> -         * kernel interface doesn't tell us what it can map, and the
> -         * existing Type1 IOMMUs generally support any IOVA we're
> -         * going to actually try in practice.
> -         */
>          ret = vfio_get_iommu_info(container, &info);
> +        if (ret) {

Clearly untested,

               ret = -errno;

> +            error_setg_errno(errp, -ret, "Failed to get VFIO IOMMU info");
> +            goto enable_discards_exit:;
> +        }
>  
> -        if (ret || !(info->flags & VFIO_IOMMU_INFO_PGSIZES)) {
> -            /* Assume 4k IOVA page size */
> -            info->iova_pgsizes = 4096;
> +        if (info->flags & VFIO_IOMMU_INFO_PGSIZES) {
> +            container->pgsizes = info->iova_pgsizes;
> +        } else {
> +            container->pgsizes = qemu_real_host_page_size();
>          }
> -        vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes);
> -        container->pgsizes = info->iova_pgsizes;
>  
> -        /* The default in the kernel ("dma_entry_limit") is 65535. */
> -        container->dma_max_mappings = 65535;
> -        if (!ret) {
> -            vfio_get_info_dma_avail(info, &container->dma_max_mappings);
> -            vfio_get_iommu_info_migration(container, info);
> +        if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
> +            container->dma_max_mappings = 65535;
>          }
> +        vfio_get_iommu_info_migration(container, info);
>          g_free(info);
> +
> +        /*
> +         * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> +         * information to get the actual window extent rather than assume
> +         * a 64-bit IOVA address space.
> +         */
> +        vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
> +
>          break;
>      }
>      case VFIO_SPAPR_TCE_v2_IOMMU:
Nicolin Chen Sept. 14, 2022, 8:16 p.m. UTC | #8
On Wed, Sep 14, 2022 at 02:03:25PM -0600, Alex Williamson wrote:

> > > > +        container->pgsizes = 4096;
> > >
> > > This might be a separate question/issue: I wonder if we should use
> > > "sysconf(_SC_PAGE_SIZE)" here instead of 4096.
> > >
> > > With a kernel using a larger page size, e.g. CONFIG_ARM64_64K_PAGES,
> > > the IO page size is likely to be 64K too. If the ioctl fails, this
> > > default 4K setup won't work.
> >
> > Perhaps, but IIRC this solution came about because we originally forgot
> > to expose the IOMMU_INFO flag to indicate the pgsize field was valid.
> > At the time we only supported 4K systems, so it made sense to provide
> > this default, though it is indeed dated.
> >
> > TBH, I don't really see why we should try to continue if the ioctl
> > itself fails, so maybe this should be:

OK. Makes sense to me.

> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index ace9562a9ba1..ad188b7649e6 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -2111,29 +2111,31 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> >      {
> >          struct vfio_iommu_type1_info *info;
> >
> > -        /*
> > -         * FIXME: This assumes that a Type1 IOMMU can map any 64-bit
> > -         * IOVA whatsoever.  That's not actually true, but the current
> > -         * kernel interface doesn't tell us what it can map, and the
> > -         * existing Type1 IOMMUs generally support any IOVA we're
> > -         * going to actually try in practice.
> > -         */
> >          ret = vfio_get_iommu_info(container, &info);
> > +        if (ret) {
> 
> Clearly untested,
> 
>                ret = -errno;
> 
> > +            error_setg_errno(errp, -ret, "Failed to get VFIO IOMMU info");
> > +            goto enable_discards_exit:;

There is a ":" in-between :)
diff mbox series

Patch

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ace9562a9b..51b2e05c76 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1940,6 +1940,7 @@  static int vfio_init_container(VFIOContainer *container, int group_fd,
     return 0;
 }
 
+/* The caller is responsible for g_free(*info) */
 static int vfio_get_iommu_info(VFIOContainer *container,
                                struct vfio_iommu_type1_info **info)
 {
@@ -1951,8 +1952,6 @@  again:
     (*info)->argsz = argsz;
 
     if (ioctl(container->fd, VFIO_IOMMU_GET_INFO, *info)) {
-        g_free(*info);
-        *info = NULL;
         return -errno;
     }