Message ID | 20211012195316.2413675-1-twilson@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev,python] Avoid pre-allocating column defaults | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
Hi Terry, Nit: I'd prefix the commit subject line with "python: idl: ". On 10/12/21 9:53 PM, Terry Wilson wrote: > Many python implementations pre-allocate space for multiple > objects in empty dicts and lists. Using a custom dict-like object > that only generates these objects when they are accessed can save > memory. > > On a fairly pathological case where the DB has 1000 networks each > with 100 ports, with only 'name' fields set, this saves around > 300MB of memory. > > One could argue that if values are not going to change from their > defaults, then users should not be monitoring those columns, but > it's also probably good to not waste memory even if user code is > sub-optimal. I agree, seems important to have this change. I'm definitely no Python expert but the change looks OK to me; I just have a couple of minor comments/questions below. > > Signed-off-by: Terry Wilson <twilson@redhat.com> > --- > python/ovs/db/custom_index.py | 1 - > python/ovs/db/idl.py | 39 +++++++++++++++++++++++++++++------ > 2 files changed, 33 insertions(+), 7 deletions(-) > > diff --git a/python/ovs/db/custom_index.py b/python/ovs/db/custom_index.py > index 587caf5e3..cf6c0b8e1 100644 > --- a/python/ovs/db/custom_index.py > +++ b/python/ovs/db/custom_index.py > @@ -18,7 +18,6 @@ OVSDB_INDEX_DESC = "DESC" > ColumnIndex = collections.namedtuple('ColumnIndex', > ['column', 'direction', 'key']) > > - This is unrelated and makes flake8 fail: 2021-11-05T10:23:25.4632044Z ../../python/ovs/db/custom_index.py:21:1: E302 expected 2 blank lines, found 1 2021-11-05T10:23:25.5069538Z Makefile:5831: recipe for target 'flake8-check' failed > class MultiColumnIndex(object): > def __init__(self, name): > self.name = name > diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py > index ecae5e143..de78bc79f 100644 > --- a/python/ovs/db/idl.py > +++ b/python/ovs/db/idl.py > @@ -45,6 +45,36 @@ Notice = collections.namedtuple('Notice', ('event', 'row', 'updates')) > Notice.__new__.__defaults__ = (None,) # default updates=None > > > +class ColumnDefaultDict(dict): > + """A column dictionary with on-demand generated default values > + > + This object acts like the Row.__data__ column dictionary, but without the > + neccessity of populating columnn default values. These values are generated Typos: "neccessity" and "columnn". > + on-demand and therefor only use memory once they are accessed. Shouldn't this be "therefore"? > + """ > + __slots__ = ('_table', ) > + > + def __init__(self, table, *args, **kwargs): > + self._table = table > + super().__init__(*args, **kwargs) Do you foresee a use case for passing custom columns that are not part of the table definition? > + > + def __missing__(self, column): > + column = self._table.columns[column] > + return ovs.db.data.Datum.default(column.type) > + > + def keys(self): > + return super().keys() | self._table.columns.keys() Based on the answer to the question above we might not need this union. Thanks, Dumitru
Thanks for the review! On Fri, Nov 5, 2021 at 5:43 AM Dumitru Ceara <dceara@redhat.com> wrote: > > Hi Terry, > > Nit: I'd prefix the commit subject line with "python: idl: ". > > On 10/12/21 9:53 PM, Terry Wilson wrote: > > Many python implementations pre-allocate space for multiple > > objects in empty dicts and lists. Using a custom dict-like object > > that only generates these objects when they are accessed can save > > memory. > > > > On a fairly pathological case where the DB has 1000 networks each > > with 100 ports, with only 'name' fields set, this saves around > > 300MB of memory. > > > > One could argue that if values are not going to change from their > > defaults, then users should not be monitoring those columns, but > > it's also probably good to not waste memory even if user code is > > sub-optimal. > > I agree, seems important to have this change. > > I'm definitely no Python expert but the change looks OK to me; I just > have a couple of minor comments/questions below. > > > > > Signed-off-by: Terry Wilson <twilson@redhat.com> > > --- > > python/ovs/db/custom_index.py | 1 - > > python/ovs/db/idl.py | 39 +++++++++++++++++++++++++++++------ > > 2 files changed, 33 insertions(+), 7 deletions(-) > > > > diff --git a/python/ovs/db/custom_index.py b/python/ovs/db/custom_index.py > > index 587caf5e3..cf6c0b8e1 100644 > > --- a/python/ovs/db/custom_index.py > > +++ b/python/ovs/db/custom_index.py > > @@ -18,7 +18,6 @@ OVSDB_INDEX_DESC = "DESC" > > ColumnIndex = collections.namedtuple('ColumnIndex', > > ['column', 'direction', 'key']) > > > > - > > This is unrelated and makes flake8 fail: Well that's embarrassing... > 2021-11-05T10:23:25.4632044Z ../../python/ovs/db/custom_index.py:21:1: > E302 expected 2 blank lines, found 1 > 2021-11-05T10:23:25.5069538Z Makefile:5831: recipe for target > 'flake8-check' failed > > > class MultiColumnIndex(object): > > def __init__(self, name): > > self.name = name > > diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py > > index ecae5e143..de78bc79f 100644 > > --- a/python/ovs/db/idl.py > > +++ b/python/ovs/db/idl.py > > @@ -45,6 +45,36 @@ Notice = collections.namedtuple('Notice', ('event', 'row', 'updates')) > > Notice.__new__.__defaults__ = (None,) # default updates=None > > > > > > +class ColumnDefaultDict(dict): > > + """A column dictionary with on-demand generated default values > > + > > + This object acts like the Row.__data__ column dictionary, but without the > > + neccessity of populating columnn default values. These values are generated > > Typos: "neccessity" and "columnn". > > > + on-demand and therefor only use memory once they are accessed. > > Shouldn't this be "therefore"? It should. > > + """ > > + __slots__ = ('_table', ) > > + > > + def __init__(self, table, *args, **kwargs): > > + self._table = table > > + super().__init__(*args, **kwargs) > > Do you foresee a use case for passing custom columns that are not part > of the table definition? Not really. It was more just that I was replacing a dict and defaulted to making it behave as dict-like as possible. > > + > > + def __missing__(self, column): > > + column = self._table.columns[column] > > + return ovs.db.data.Datum.default(column.type) > > + > > + def keys(self): > > + return super().keys() | self._table.columns.keys() > > Based on the answer to the question above we might not need this union. True. I don't see any good reason to make this behave as general-purpose as a normal dict. It'd be wrong to store non-column data there. > Thanks, > Dumitru >
diff --git a/python/ovs/db/custom_index.py b/python/ovs/db/custom_index.py index 587caf5e3..cf6c0b8e1 100644 --- a/python/ovs/db/custom_index.py +++ b/python/ovs/db/custom_index.py @@ -18,7 +18,6 @@ OVSDB_INDEX_DESC = "DESC" ColumnIndex = collections.namedtuple('ColumnIndex', ['column', 'direction', 'key']) - class MultiColumnIndex(object): def __init__(self, name): self.name = name diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py index ecae5e143..de78bc79f 100644 --- a/python/ovs/db/idl.py +++ b/python/ovs/db/idl.py @@ -45,6 +45,36 @@ Notice = collections.namedtuple('Notice', ('event', 'row', 'updates')) Notice.__new__.__defaults__ = (None,) # default updates=None +class ColumnDefaultDict(dict): + """A column dictionary with on-demand generated default values + + This object acts like the Row.__data__ column dictionary, but without the + neccessity of populating columnn default values. These values are generated + on-demand and therefor only use memory once they are accessed. + """ + __slots__ = ('_table', ) + + def __init__(self, table, *args, **kwargs): + self._table = table + super().__init__(*args, **kwargs) + + def __missing__(self, column): + column = self._table.columns[column] + return ovs.db.data.Datum.default(column.type) + + def keys(self): + return super().keys() | self._table.columns.keys() + + def values(self): + return iter(self[k] for k in self) + + def __iter__(self): + return iter(self.keys()) + + def __contains__(self, item): + return item in self.keys() + + class Idl(object): """Open vSwitch Database Interface Definition Language (OVSDB IDL). @@ -908,10 +938,7 @@ class Idl(object): return changed def __create_row(self, table, uuid): - data = {} - for column in table.columns.values(): - data[column.name] = ovs.db.data.Datum.default(column.type) - return Row(self, table, uuid, data) + return Row(self, table, uuid, ColumnDefaultDict(table)) def __error(self): self._session.force_reconnect() @@ -1249,7 +1276,7 @@ class Row(object): A transaction must be in progress.""" assert self._idl.txn assert self._changes is not None - if not self._data or column_name in self._changes: + if self._data is None or column_name in self._changes: return self._prereqs[column_name] = None @@ -1777,7 +1804,7 @@ class Transaction(object): # transaction only does writes of existing values, without making any # real changes, we will drop the whole transaction later in # ovsdb_idl_txn_commit().) - if (not column.alert and row._data and + if (not column.alert and row._data is not None and row._data.get(column.name) == datum): new_value = row._changes.get(column.name) if new_value is None or new_value == datum:
Many python implementations pre-allocate space for multiple objects in empty dicts and lists. Using a custom dict-like object that only generates these objects when they are accessed can save memory. On a fairly pathological case where the DB has 1000 networks each with 100 ports, with only 'name' fields set, this saves around 300MB of memory. One could argue that if values are not going to change from their defaults, then users should not be monitoring those columns, but it's also probably good to not waste memory even if user code is sub-optimal. Signed-off-by: Terry Wilson <twilson@redhat.com> --- python/ovs/db/custom_index.py | 1 - python/ovs/db/idl.py | 39 +++++++++++++++++++++++++++++------ 2 files changed, 33 insertions(+), 7 deletions(-)