diff mbox series

[ovs-dev,python] Avoid pre-allocating column defaults

Message ID 20211012195316.2413675-1-twilson@redhat.com
State Changes Requested
Headers show
Series [ovs-dev,python] Avoid pre-allocating column defaults | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed

Commit Message

Terry Wilson Oct. 12, 2021, 7:53 p.m. UTC
Many python implementations pre-allocate space for multiple
objects in empty dicts and lists. Using a custom dict-like object
that only generates these objects when they are accessed can save
memory.

On a fairly pathological case where the DB has 1000 networks each
with 100 ports, with only 'name' fields set, this saves around
300MB of memory.

One could argue that if values are not going to change from their
defaults, then users should not be monitoring those columns, but
it's also probably good to not waste memory even if user code is
sub-optimal.

Signed-off-by: Terry Wilson <twilson@redhat.com>
---
 python/ovs/db/custom_index.py |  1 -
 python/ovs/db/idl.py          | 39 +++++++++++++++++++++++++++++------
 2 files changed, 33 insertions(+), 7 deletions(-)

Comments

Dumitru Ceara Nov. 5, 2021, 10:42 a.m. UTC | #1
Hi Terry,

Nit: I'd prefix the commit subject line with "python: idl: ".

On 10/12/21 9:53 PM, Terry Wilson wrote:
> Many python implementations pre-allocate space for multiple
> objects in empty dicts and lists. Using a custom dict-like object
> that only generates these objects when they are accessed can save
> memory.
> 
> On a fairly pathological case where the DB has 1000 networks each
> with 100 ports, with only 'name' fields set, this saves around
> 300MB of memory.
> 
> One could argue that if values are not going to change from their
> defaults, then users should not be monitoring those columns, but
> it's also probably good to not waste memory even if user code is
> sub-optimal.

I agree, seems important to have this change.

I'm definitely no Python expert but the change looks OK to me; I just
have a couple of minor comments/questions below.

> 
> Signed-off-by: Terry Wilson <twilson@redhat.com>
> ---
>  python/ovs/db/custom_index.py |  1 -
>  python/ovs/db/idl.py          | 39 +++++++++++++++++++++++++++++------
>  2 files changed, 33 insertions(+), 7 deletions(-)
> 
> diff --git a/python/ovs/db/custom_index.py b/python/ovs/db/custom_index.py
> index 587caf5e3..cf6c0b8e1 100644
> --- a/python/ovs/db/custom_index.py
> +++ b/python/ovs/db/custom_index.py
> @@ -18,7 +18,6 @@ OVSDB_INDEX_DESC = "DESC"
>  ColumnIndex = collections.namedtuple('ColumnIndex',
>                                       ['column', 'direction', 'key'])
>  
> -

This is unrelated and makes flake8 fail:

2021-11-05T10:23:25.4632044Z ../../python/ovs/db/custom_index.py:21:1:
E302 expected 2 blank lines, found 1
2021-11-05T10:23:25.5069538Z Makefile:5831: recipe for target
'flake8-check' failed

>  class MultiColumnIndex(object):
>      def __init__(self, name):
>          self.name = name
> diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
> index ecae5e143..de78bc79f 100644
> --- a/python/ovs/db/idl.py
> +++ b/python/ovs/db/idl.py
> @@ -45,6 +45,36 @@ Notice = collections.namedtuple('Notice', ('event', 'row', 'updates'))
>  Notice.__new__.__defaults__ = (None,)  # default updates=None
>  
>  
> +class ColumnDefaultDict(dict):
> +    """A column dictionary with on-demand generated default values
> +
> +    This object acts like the Row.__data__ column dictionary, but without the
> +    neccessity of populating columnn default values. These values are generated

Typos: "neccessity" and "columnn".

> +    on-demand and therefor only use memory once they are accessed.

Shouldn't this be "therefore"?

> +    """
> +    __slots__ = ('_table', )
> +
> +    def __init__(self, table, *args, **kwargs):
> +        self._table = table
> +        super().__init__(*args, **kwargs)

Do you foresee a use case for passing custom columns that are not part
of the table definition?

> +
> +    def __missing__(self, column):
> +        column = self._table.columns[column]
> +        return ovs.db.data.Datum.default(column.type)
> +
> +    def keys(self):
> +        return super().keys() | self._table.columns.keys()

Based on the answer to the question above we might not need this union.

Thanks,
Dumitru
Terry Wilson Nov. 5, 2021, 2 p.m. UTC | #2
Thanks for the review!

On Fri, Nov 5, 2021 at 5:43 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> Hi Terry,
>
> Nit: I'd prefix the commit subject line with "python: idl: ".
>
> On 10/12/21 9:53 PM, Terry Wilson wrote:
> > Many python implementations pre-allocate space for multiple
> > objects in empty dicts and lists. Using a custom dict-like object
> > that only generates these objects when they are accessed can save
> > memory.
> >
> > On a fairly pathological case where the DB has 1000 networks each
> > with 100 ports, with only 'name' fields set, this saves around
> > 300MB of memory.
> >
> > One could argue that if values are not going to change from their
> > defaults, then users should not be monitoring those columns, but
> > it's also probably good to not waste memory even if user code is
> > sub-optimal.
>
> I agree, seems important to have this change.
>
> I'm definitely no Python expert but the change looks OK to me; I just
> have a couple of minor comments/questions below.
>
> >
> > Signed-off-by: Terry Wilson <twilson@redhat.com>
> > ---
> >  python/ovs/db/custom_index.py |  1 -
> >  python/ovs/db/idl.py          | 39 +++++++++++++++++++++++++++++------
> >  2 files changed, 33 insertions(+), 7 deletions(-)
> >
> > diff --git a/python/ovs/db/custom_index.py b/python/ovs/db/custom_index.py
> > index 587caf5e3..cf6c0b8e1 100644
> > --- a/python/ovs/db/custom_index.py
> > +++ b/python/ovs/db/custom_index.py
> > @@ -18,7 +18,6 @@ OVSDB_INDEX_DESC = "DESC"
> >  ColumnIndex = collections.namedtuple('ColumnIndex',
> >                                       ['column', 'direction', 'key'])
> >
> > -
>
> This is unrelated and makes flake8 fail:

Well that's embarrassing...

> 2021-11-05T10:23:25.4632044Z ../../python/ovs/db/custom_index.py:21:1:
> E302 expected 2 blank lines, found 1
> 2021-11-05T10:23:25.5069538Z Makefile:5831: recipe for target
> 'flake8-check' failed
>
> >  class MultiColumnIndex(object):
> >      def __init__(self, name):
> >          self.name = name
> > diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
> > index ecae5e143..de78bc79f 100644
> > --- a/python/ovs/db/idl.py
> > +++ b/python/ovs/db/idl.py
> > @@ -45,6 +45,36 @@ Notice = collections.namedtuple('Notice', ('event', 'row', 'updates'))
> >  Notice.__new__.__defaults__ = (None,)  # default updates=None
> >
> >
> > +class ColumnDefaultDict(dict):
> > +    """A column dictionary with on-demand generated default values
> > +
> > +    This object acts like the Row.__data__ column dictionary, but without the
> > +    neccessity of populating columnn default values. These values are generated
>
> Typos: "neccessity" and "columnn".
>
> > +    on-demand and therefor only use memory once they are accessed.
>
> Shouldn't this be "therefore"?

It should.

> > +    """
> > +    __slots__ = ('_table', )
> > +
> > +    def __init__(self, table, *args, **kwargs):
> > +        self._table = table
> > +        super().__init__(*args, **kwargs)
>
> Do you foresee a use case for passing custom columns that are not part
> of the table definition?

Not really. It was more just that I was replacing a dict and defaulted
to making it behave as dict-like as possible.

> > +
> > +    def __missing__(self, column):
> > +        column = self._table.columns[column]
> > +        return ovs.db.data.Datum.default(column.type)
> > +
> > +    def keys(self):
> > +        return super().keys() | self._table.columns.keys()
>
> Based on the answer to the question above we might not need this union.

True. I don't see any good reason to make this behave as
general-purpose as a normal dict. It'd be wrong to store non-column
data there.

> Thanks,
> Dumitru
>
diff mbox series

Patch

diff --git a/python/ovs/db/custom_index.py b/python/ovs/db/custom_index.py
index 587caf5e3..cf6c0b8e1 100644
--- a/python/ovs/db/custom_index.py
+++ b/python/ovs/db/custom_index.py
@@ -18,7 +18,6 @@  OVSDB_INDEX_DESC = "DESC"
 ColumnIndex = collections.namedtuple('ColumnIndex',
                                      ['column', 'direction', 'key'])
 
-
 class MultiColumnIndex(object):
     def __init__(self, name):
         self.name = name
diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
index ecae5e143..de78bc79f 100644
--- a/python/ovs/db/idl.py
+++ b/python/ovs/db/idl.py
@@ -45,6 +45,36 @@  Notice = collections.namedtuple('Notice', ('event', 'row', 'updates'))
 Notice.__new__.__defaults__ = (None,)  # default updates=None
 
 
+class ColumnDefaultDict(dict):
+    """A column dictionary with on-demand generated default values
+
+    This object acts like the Row.__data__ column dictionary, but without the
+    neccessity of populating columnn default values. These values are generated
+    on-demand and therefor only use memory once they are accessed.
+    """
+    __slots__ = ('_table', )
+
+    def __init__(self, table, *args, **kwargs):
+        self._table = table
+        super().__init__(*args, **kwargs)
+
+    def __missing__(self, column):
+        column = self._table.columns[column]
+        return ovs.db.data.Datum.default(column.type)
+
+    def keys(self):
+        return super().keys() | self._table.columns.keys()
+
+    def values(self):
+        return iter(self[k] for k in self)
+
+    def __iter__(self):
+        return iter(self.keys())
+
+    def __contains__(self, item):
+        return item in self.keys()
+
+
 class Idl(object):
     """Open vSwitch Database Interface Definition Language (OVSDB IDL).
 
@@ -908,10 +938,7 @@  class Idl(object):
         return changed
 
     def __create_row(self, table, uuid):
-        data = {}
-        for column in table.columns.values():
-            data[column.name] = ovs.db.data.Datum.default(column.type)
-        return Row(self, table, uuid, data)
+        return Row(self, table, uuid, ColumnDefaultDict(table))
 
     def __error(self):
         self._session.force_reconnect()
@@ -1249,7 +1276,7 @@  class Row(object):
         A transaction must be in progress."""
         assert self._idl.txn
         assert self._changes is not None
-        if not self._data or column_name in self._changes:
+        if self._data is None or column_name in self._changes:
             return
 
         self._prereqs[column_name] = None
@@ -1777,7 +1804,7 @@  class Transaction(object):
         # transaction only does writes of existing values, without making any
         # real changes, we will drop the whole transaction later in
         # ovsdb_idl_txn_commit().)
-        if (not column.alert and row._data and
+        if (not column.alert and row._data is not None and
                 row._data.get(column.name) == datum):
             new_value = row._changes.get(column.name)
             if new_value is None or new_value == datum: