diff mbox

[ovs-dev,v5,3/4] ovsdb-idl: idl compound indexes implementation

Message ID 20170624210152.8491-4-lrichard@redhat.com
State Superseded
Headers show

Commit Message

Lance Richardson June 24, 2017, 9:01 p.m. UTC
This patch adds support for the creation of multicolumn indexes
in the C IDL to enable for efficient search and retrieval of database
rows by key.

Signed-off-by: Esteban Rodriguez Betancourt <estebarb@hpe.com>
Co-authored-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
---
  v5: - Coding style fixes (checkpatch.py)
      - Fixed memory leak (missing ovsdb_datum_destroy() in
        ovsdb_idl_index_destroy_row__()).
      - Some polishing of comment and log message text.

 lib/ovsdb-idl-provider.h |  29 +++
 lib/ovsdb-idl.c          | 461 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/ovsdb-idl.h          |  59 ++++++
 3 files changed, 549 insertions(+)

Comments

Ben Pfaff July 11, 2017, 9:05 p.m. UTC | #1
On Sat, Jun 24, 2017 at 05:01:51PM -0400, Lance Richardson wrote:
> This patch adds support for the creation of multicolumn indexes
> in the C IDL to enable for efficient search and retrieval of database
> rows by key.
> 
> Signed-off-by: Esteban Rodriguez Betancourt <estebarb@hpe.com>
> Co-authored-by: Lance Richardson <lrichard@redhat.com>
> Signed-off-by: Lance Richardson <lrichard@redhat.com>
> ---
>   v5: - Coding style fixes (checkpatch.py)
>       - Fixed memory leak (missing ovsdb_datum_destroy() in
>         ovsdb_idl_index_destroy_row__()).
>       - Some polishing of comment and log message text.

Thanks for reviving this series.

I don't understand ovsdb_idl_index_read().  It is almost the same as
ovsdb_idl_read().  It looks like ovsdb_idl_read() could be implemented
as a wrapper around it, but I'm also not sure why ovsdb_idl_read() can't
be used directly.  Also, I don't understand its comment about "index_set
functions", since there are no functions with index_set in their names.

ovsdb_idl_index_write_() is mostly copy-paste of a part of
ovsdb_idl_txn_write__(), so to avoid code duplication it would be best
to factor that code out of ovsdb_idl_txn_write__() and call it from both
places.

I don't understand the behavior of ovsdb_idl_index_find() and
ovsdb_idl_index_forward_to() for a null 'value' argument.  Is there some
idiomatic usage where the null and nonnull behaviors work out nicely?

The row_sync behavior is confusing.  I remember it slightly from the
previous iteration but I don't remember being convinced it was the best
way to do things.

Thanks,

Ben.
Lance Richardson July 11, 2017, 9:49 p.m. UTC | #2
> From: "Ben Pfaff" <blp@ovn.org>
> To: "Lance Richardson" <lrichard@redhat.com>
> Cc: dev@openvswitch.org, estebarb@hpe.com, "javier albornoz" <javier.albornoz@hpe.com>, "jorge sauma"
> <jorge.sauma@hpe.com>, "arnoldo lutz guevara" <arnoldo.lutz.guevara@hpe.com>
> Sent: Tuesday, 11 July, 2017 5:05:32 PM
> Subject: Re: [ovs-dev] [PATCH v5 3/4] ovsdb-idl: idl compound indexes implementation
> 
> On Sat, Jun 24, 2017 at 05:01:51PM -0400, Lance Richardson wrote:
> > This patch adds support for the creation of multicolumn indexes
> > in the C IDL to enable for efficient search and retrieval of database
> > rows by key.
> > 
> > Signed-off-by: Esteban Rodriguez Betancourt <estebarb@hpe.com>
> > Co-authored-by: Lance Richardson <lrichard@redhat.com>
> > Signed-off-by: Lance Richardson <lrichard@redhat.com>
> > ---
> >   v5: - Coding style fixes (checkpatch.py)
> >       - Fixed memory leak (missing ovsdb_datum_destroy() in
> >         ovsdb_idl_index_destroy_row__()).
> >       - Some polishing of comment and log message text.
> 
> Thanks for reviving this series.
> 
> I don't understand ovsdb_idl_index_read().  It is almost the same as
> ovsdb_idl_read().  It looks like ovsdb_idl_read() could be implemented
> as a wrapper around it, but I'm also not sure why ovsdb_idl_read() can't
> be used directly.  Also, I don't understand its comment about "index_set
> functions", since there are no functions with index_set in their names.
> 

I would guess that the original reason had to do with the fact that
ovsdb_idl_index_read() needs to deal with "real" rows in the db replica as
well as external row instances created by ovsdb_idl_index_init_row() to
contain the key values being searched for. Looking at the code a little
more closely, I see that ovsdb_idl_index_init_row() sets row->table, so
the key row structure would not be considered "synthetic" according
to ovsdb_idl_row_is_synthetic().

So: I think you are correct, we should be able to use ovsdb_idl_read() and
eliminate ovsdb_idl_index_read().

> ovsdb_idl_index_write_() is mostly copy-paste of a part of
> ovsdb_idl_txn_write__(), so to avoid code duplication it would be best
> to factor that code out of ovsdb_idl_txn_write__() and call it from both
> places.
> 

Hmm, I just noticed that this is v5, v6 seems to be missing in patchwork.
The v6 version of this function is a bit more streamlined, and avoids
calling ovsdb_datum_destroy() so that a row structure containing a key
with a string type can avoid having to xstrdup() key strings:

void
ovsdb_idl_index_write_(struct ovsdb_idl_row *const_row,
                       const struct ovsdb_idl_column *column,
                       struct ovsdb_datum *datum,
                       const struct ovsdb_idl_table_class *class)
{
    struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, const_row);
    size_t column_idx = column - class->columns;

    if (bitmap_is_set(row->written, column_idx)) {
        free(row->new[column_idx].values);
        free(row->new[column_idx].keys);
    } else {
        bitmap_set1(row->written, column_idx);
     }
    row->new[column_idx] = *datum;
    (column->unparse)(row);
    (column->parse)(row, &row->new[column_idx]);
}

So there's a little less commonality between the two. 

> I don't understand the behavior of ovsdb_idl_index_find() and
> ovsdb_idl_index_forward_to() for a null 'value' argument.  Is there some
> idiomatic usage where the null and nonnull behaviors work out nicely?
> 

I don't know what the intended use was, but there doesn't currently seem
to be a use of the null 'value' case (generated code always passes
&data->header_ for this parameter, which could be null if 'data' is null,
but that seems pretty fragile if intentional).

I'll see if the special-case handling of null values can be eliminated.

> The row_sync behavior is confusing.  I remember it slightly from the
> previous iteration but I don't remember being convinced it was the best
> way to do things.
> 

Hmm, that does seem strange/confusing. I will have to spend more time
studying that bit of conde.

> Thanks,
> 
> Ben.
>
Lance Richardson July 13, 2017, 8:29 p.m. UTC | #3
----- Original Message -----
> From: "Ben Pfaff" <blp@ovn.org>
> To: "Lance Richardson" <lrichard@redhat.com>
> Cc: dev@openvswitch.org, estebarb@hpe.com, "javier albornoz" <javier.albornoz@hpe.com>, "jorge sauma"
> <jorge.sauma@hpe.com>, "arnoldo lutz guevara" <arnoldo.lutz.guevara@hpe.com>
> Sent: Tuesday, 11 July, 2017 5:05:32 PM
> Subject: Re: [ovs-dev] [PATCH v5 3/4] ovsdb-idl: idl compound indexes implementation
> 
> On Sat, Jun 24, 2017 at 05:01:51PM -0400, Lance Richardson wrote:
> > This patch adds support for the creation of multicolumn indexes
> > in the C IDL to enable for efficient search and retrieval of database
> > rows by key.
> > 
> > Signed-off-by: Esteban Rodriguez Betancourt <estebarb@hpe.com>
> > Co-authored-by: Lance Richardson <lrichard@redhat.com>
> > Signed-off-by: Lance Richardson <lrichard@redhat.com>
> > ---
> >   v5: - Coding style fixes (checkpatch.py)
> >       - Fixed memory leak (missing ovsdb_datum_destroy() in
> >         ovsdb_idl_index_destroy_row__()).
> >       - Some polishing of comment and log message text.
> 
> Thanks for reviving this series.
> 
> I don't understand ovsdb_idl_index_read().  It is almost the same as
> ovsdb_idl_read().  It looks like ovsdb_idl_read() could be implemented
> as a wrapper around it, but I'm also not sure why ovsdb_idl_read() can't
> be used directly.  Also, I don't understand its comment about "index_set
> functions", since there are no functions with index_set in their names.

Hi Ben,

I neglected to respond to the comment about "index_set functions" in my
previous response, these are automatically generated for the IDL by the
next patch. I had deleted that comment in v6 of this series (which is
in the ml archive but not patchwork for some reason), it didn't seem
useful to me.

> 
> ovsdb_idl_index_write_() is mostly copy-paste of a part of
> ovsdb_idl_txn_write__(), so to avoid code duplication it would be best
> to factor that code out of ovsdb_idl_txn_write__() and call it from both
> places.
> 
> I don't understand the behavior of ovsdb_idl_index_find() and
> ovsdb_idl_index_forward_to() for a null 'value' argument.  Is there some
> idiomatic usage where the null and nonnull behaviors work out nicely?
> 
> The row_sync behavior is confusing.  I remember it slightly from the
> previous iteration but I don't remember being convinced it was the best
> way to do things.
> 

The "row_sync" stuff seems to be a matter of using additional keys (row
uuid values first, then memory addresses) when inserting or deleting a
row in order to handle duplicate keys.

This still seems a bit strange, but after thinking about it a bit I believe
it is a reasonable approach; these extra keys aren't used when searching,
so we will always get the first entry in the list when there are multiple
rows with the same key, and when inserting/deleting we still get O(log N)
performance when there are many entries with identical keys by searching
on the UUID value.

But "row_sync" isn't at all useful in figuring out the above, nor are
any of the comments in this area.

I also wonder if the comparison on memory address is needed... is there
any real possibility that two rows in a table will have the same UUID?

Do you think reworking the "row_sync" name and adding better comments
would be an acceptable way forward here?

Thanks,

   Lance


> Thanks,
> 
> Ben.
>
Rodriguez Betancourt, Esteban July 13, 2017, 9:22 p.m. UTC | #4
Hello,
The UUID comparison was added to guarantee that always rows with the same keys are 
sorted in the same way (between executions of a command, for example). Before that we
 used the pointer comparison (which gives us some randomness in sorting) but we kept it
to be really really sure that we are deleting/inserting the correct row.

The row sync effectively was intended for finding the correct row to delete when there are duplicated "index keys" and there are
deletions/updates/inserts (note that the updates are really handled as a delete-and-reinsert).

We copied ovsdb_idl_index_read from ovsdb_idl_read because we were concerned of what would happen if somebody
access the indexes when a transaction is being made. If the index read the new values instead of the old one then
the rows could be lost/wrongly sorted, etc. It is the same story with ovsdb_idl_index_write_() and index_set: we 
wanted the indexes to behave correctly during a transaction with new changes.

ovsdb_idl_index_{find,forward_to} are used inside the autogenerated functions for iterating over the indexes.
We thought that it would be nice to be able to say things like:

OVSREC_EXAMPLE_FOR_EACH_RANGE(row, cursor, start, end) /* [start, end] */
OVSREC_EXAMPLE_FOR_EACH_RANGE(row, cursor, NULL, end) /* [0, end] */
OVSREC_EXAMPLE_FOR_EACH_RANGE(row, cursor, start, NULL) /* [start, "+infty"] */
OVSREC_EXAMPLE_FOR_EACH_RANGE(row, cursor, NULL, NULL) /* everything */

Regards,
Esteban

-----Original Message-----
From: Lance Richardson [mailto:lrichard@redhat.com] 
Sent: jueves, 13 de julio de 2017 14:29
To: Ben Pfaff <blp@ovn.org>
Cc: dev@openvswitch.org; Rodriguez Betancourt, Esteban <estebarb@hpe.com>; Albornoz, Javier <javier.albornoz@hpe.com>; jorge sauma <jorge.sauma@hpe.com>; Lutz, Arnoldo <arnoldo.lutz.guevara@hpe.com>
Subject: Re: [ovs-dev] [PATCH v5 3/4] ovsdb-idl: idl compound indexes implementation



----- Original Message -----
> From: "Ben Pfaff" <blp@ovn.org>
> To: "Lance Richardson" <lrichard@redhat.com>
> Cc: dev@openvswitch.org, estebarb@hpe.com, "javier albornoz" <javier.albornoz@hpe.com>, "jorge sauma"
> <jorge.sauma@hpe.com>, "arnoldo lutz guevara" 
> <arnoldo.lutz.guevara@hpe.com>
> Sent: Tuesday, 11 July, 2017 5:05:32 PM
> Subject: Re: [ovs-dev] [PATCH v5 3/4] ovsdb-idl: idl compound indexes 
> implementation
> 
> On Sat, Jun 24, 2017 at 05:01:51PM -0400, Lance Richardson wrote:
> > This patch adds support for the creation of multicolumn indexes in 
> > the C IDL to enable for efficient search and retrieval of database 
> > rows by key.
> > 
> > Signed-off-by: Esteban Rodriguez Betancourt <estebarb@hpe.com>
> > Co-authored-by: Lance Richardson <lrichard@redhat.com>
> > Signed-off-by: Lance Richardson <lrichard@redhat.com>
> > ---
> >   v5: - Coding style fixes (checkpatch.py)
> >       - Fixed memory leak (missing ovsdb_datum_destroy() in
> >         ovsdb_idl_index_destroy_row__()).
> >       - Some polishing of comment and log message text.
> 
> Thanks for reviving this series.
> 
> I don't understand ovsdb_idl_index_read().  It is almost the same as 
> ovsdb_idl_read().  It looks like ovsdb_idl_read() could be implemented 
> as a wrapper around it, but I'm also not sure why ovsdb_idl_read() 
> can't be used directly.  Also, I don't understand its comment about 
> "index_set functions", since there are no functions with index_set in their names.

Hi Ben,

I neglected to respond to the comment about "index_set functions" in my previous response, these are automatically generated for the IDL by the next patch. I had deleted that comment in v6 of this series (which is in the ml archive but not patchwork for some reason), it didn't seem useful to me.

> 
> ovsdb_idl_index_write_() is mostly copy-paste of a part of 
> ovsdb_idl_txn_write__(), so to avoid code duplication it would be best 
> to factor that code out of ovsdb_idl_txn_write__() and call it from 
> both places.
> 
> I don't understand the behavior of ovsdb_idl_index_find() and
> ovsdb_idl_index_forward_to() for a null 'value' argument.  Is there 
> some idiomatic usage where the null and nonnull behaviors work out nicely?
> 
> The row_sync behavior is confusing.  I remember it slightly from the 
> previous iteration but I don't remember being convinced it was the 
> best way to do things.
> 

The "row_sync" stuff seems to be a matter of using additional keys (row uuid values first, then memory addresses) when inserting or deleting a row in order to handle duplicate keys.

This still seems a bit strange, but after thinking about it a bit I believe it is a reasonable approach; these extra keys aren't used when searching, so we will always get the first entry in the list when there are multiple rows with the same key, and when inserting/deleting we still get O(log N) performance when there are many entries with identical keys by searching on the UUID value.

But "row_sync" isn't at all useful in figuring out the above, nor are any of the comments in this area.

I also wonder if the comparison on memory address is needed... is there any real possibility that two rows in a table will have the same UUID?

Do you think reworking the "row_sync" name and adding better comments would be an acceptable way forward here?

Thanks,

   Lance


> Thanks,
> 
> Ben.
>
Lance Richardson July 13, 2017, 10:47 p.m. UTC | #5
> From: "Rodriguez Betancourt, Esteban" <estebarb@hpe.com>
> To: "Lance Richardson" <lrichard@redhat.com>, "Ben Pfaff" <blp@ovn.org>
> Cc: dev@openvswitch.org, "Javier Albornoz" <javier.albornoz@hpe.com>, "Arnoldo Lutz" <arnoldo.lutz.guevara@hpe.com>
> Sent: Thursday, 13 July, 2017 5:22:42 PM
> Subject: RE: [ovs-dev] [PATCH v5 3/4] ovsdb-idl: idl compound indexes implementation
> 
> Hello,
> The UUID comparison was added to guarantee that always rows with the same
> keys are
> sorted in the same way (between executions of a command, for example). Before
> that we
>  used the pointer comparison (which gives us some randomness in sorting) but
>  we kept it
> to be really really sure that we are deleting/inserting the correct row.
> 
> The row sync effectively was intended for finding the correct row to delete
> when there are duplicated "index keys" and there are
> deletions/updates/inserts (note that the updates are really handled as a
> delete-and-reinsert).

Hi Esteban,

Thanks, I think all of that makes sense. I do wonder, though, since UUIDs
should be unique whether it would make just as much to assert that the addresses
are equal if the UUIDs are equal.

> 
> We copied ovsdb_idl_index_read from ovsdb_idl_read because we were concerned
> of what would happen if somebody
> access the indexes when a transaction is being made. If the index read the
> new values instead of the old one then
> the rows could be lost/wrongly sorted, etc. It is the same story with
> ovsdb_idl_index_write_() and index_set: we
> wanted the indexes to behave correctly during a transaction with new changes.
> 

Comparing the v4 version of ovsdb_idl_index_read() against the current
ovsdb_idl_read(), the only difference between the two that I can find is
that ovsdb_idl_index_read() takes the table class as a function parameter
while ovsdb_idl_read() expects row->table to be non-NULL and takes the
table class from row->table->class.  Since ovsdb_idl_index_init_row()
initializes row->table appropriately, ovsdb_idl_read() should produce exactly
the same result as ovsdb_idl_index_read(), whether a transaction on that
row is in progress or not. So it seems to me we should be able to get rid
of ovsdb_idl_index_read().

> ovsdb_idl_index_{find,forward_to} are used inside the autogenerated functions
> for iterating over the indexes.
> We thought that it would be nice to be able to say things like:
> 
> OVSREC_EXAMPLE_FOR_EACH_RANGE(row, cursor, start, end) /* [start, end] */
> OVSREC_EXAMPLE_FOR_EACH_RANGE(row, cursor, NULL, end) /* [0, end] */
> OVSREC_EXAMPLE_FOR_EACH_RANGE(row, cursor, start, NULL) /* [start, "+infty"]
> */
> OVSREC_EXAMPLE_FOR_EACH_RANGE(row, cursor, NULL, NULL) /* everything */
> 
> Regards,
> Esteban

Thanks, that makes sense.  I think it would be good to add tests for these
cases.

Thanks again!

   Lance

> 
> -----Original Message-----
> From: Lance Richardson [mailto:lrichard@redhat.com]
> Sent: jueves, 13 de julio de 2017 14:29
> To: Ben Pfaff <blp@ovn.org>
> Cc: dev@openvswitch.org; Rodriguez Betancourt, Esteban <estebarb@hpe.com>;
> Albornoz, Javier <javier.albornoz@hpe.com>; jorge sauma
> <jorge.sauma@hpe.com>; Lutz, Arnoldo <arnoldo.lutz.guevara@hpe.com>
> Subject: Re: [ovs-dev] [PATCH v5 3/4] ovsdb-idl: idl compound indexes
> implementation
> 
> 
> 
> ----- Original Message -----
> > From: "Ben Pfaff" <blp@ovn.org>
> > To: "Lance Richardson" <lrichard@redhat.com>
> > Cc: dev@openvswitch.org, estebarb@hpe.com, "javier albornoz"
> > <javier.albornoz@hpe.com>, "jorge sauma"
> > <jorge.sauma@hpe.com>, "arnoldo lutz guevara"
> > <arnoldo.lutz.guevara@hpe.com>
> > Sent: Tuesday, 11 July, 2017 5:05:32 PM
> > Subject: Re: [ovs-dev] [PATCH v5 3/4] ovsdb-idl: idl compound indexes
> > implementation
> > 
> > On Sat, Jun 24, 2017 at 05:01:51PM -0400, Lance Richardson wrote:
> > > This patch adds support for the creation of multicolumn indexes in
> > > the C IDL to enable for efficient search and retrieval of database
> > > rows by key.
> > > 
> > > Signed-off-by: Esteban Rodriguez Betancourt <estebarb@hpe.com>
> > > Co-authored-by: Lance Richardson <lrichard@redhat.com>
> > > Signed-off-by: Lance Richardson <lrichard@redhat.com>
> > > ---
> > >   v5: - Coding style fixes (checkpatch.py)
> > >       - Fixed memory leak (missing ovsdb_datum_destroy() in
> > >         ovsdb_idl_index_destroy_row__()).
> > >       - Some polishing of comment and log message text.
> > 
> > Thanks for reviving this series.
> > 
> > I don't understand ovsdb_idl_index_read().  It is almost the same as
> > ovsdb_idl_read().  It looks like ovsdb_idl_read() could be implemented
> > as a wrapper around it, but I'm also not sure why ovsdb_idl_read()
> > can't be used directly.  Also, I don't understand its comment about
> > "index_set functions", since there are no functions with index_set in their
> > names.
> 
> Hi Ben,
> 
> I neglected to respond to the comment about "index_set functions" in my
> previous response, these are automatically generated for the IDL by the next
> patch. I had deleted that comment in v6 of this series (which is in the ml
> archive but not patchwork for some reason), it didn't seem useful to me.
> 
> > 
> > ovsdb_idl_index_write_() is mostly copy-paste of a part of
> > ovsdb_idl_txn_write__(), so to avoid code duplication it would be best
> > to factor that code out of ovsdb_idl_txn_write__() and call it from
> > both places.
> > 
> > I don't understand the behavior of ovsdb_idl_index_find() and
> > ovsdb_idl_index_forward_to() for a null 'value' argument.  Is there
> > some idiomatic usage where the null and nonnull behaviors work out nicely?
> > 
> > The row_sync behavior is confusing.  I remember it slightly from the
> > previous iteration but I don't remember being convinced it was the
> > best way to do things.
> > 
> 
> The "row_sync" stuff seems to be a matter of using additional keys (row uuid
> values first, then memory addresses) when inserting or deleting a row in
> order to handle duplicate keys.
> 
> This still seems a bit strange, but after thinking about it a bit I believe
> it is a reasonable approach; these extra keys aren't used when searching, so
> we will always get the first entry in the list when there are multiple rows
> with the same key, and when inserting/deleting we still get O(log N)
> performance when there are many entries with identical keys by searching on
> the UUID value.
> 
> But "row_sync" isn't at all useful in figuring out the above, nor are any of
> the comments in this area.
> 
> I also wonder if the comparison on memory address is needed... is there any
> real possibility that two rows in a table will have the same UUID?
> 
> Do you think reworking the "row_sync" name and adding better comments would
> be an acceptable way forward here?
> 
> Thanks,
> 
>    Lance
> 
> 
> > Thanks,
> > 
> > Ben.
> > 
>
Rodriguez Betancourt, Esteban July 14, 2017, 9:27 p.m. UTC | #6
I think that is ok to remove that last pointer check in the generic comparison function. UUID is more than enough.
Regards,
Esteban

> -----Original Message-----
> From: Lance Richardson [mailto:lrichard@redhat.com]
> Sent: jueves, 13 de julio de 2017 16:48
> To: Rodriguez Betancourt, Esteban <estebarb@hpe.com>
> Cc: Ben Pfaff <blp@ovn.org>; dev@openvswitch.org; Albornoz, Javier
> <javier.albornoz@hpe.com>; Lutz, Arnoldo
> <arnoldo.lutz.guevara@hpe.com>
> Subject: Re: [ovs-dev] [PATCH v5 3/4] ovsdb-idl: idl compound indexes
> implementation
> 
> > From: "Rodriguez Betancourt, Esteban" <estebarb@hpe.com>
> > To: "Lance Richardson" <lrichard@redhat.com>, "Ben Pfaff"
> > <blp@ovn.org>
> > Cc: dev@openvswitch.org, "Javier Albornoz" <javier.albornoz@hpe.com>,
> > "Arnoldo Lutz" <arnoldo.lutz.guevara@hpe.com>
> > Sent: Thursday, 13 July, 2017 5:22:42 PM
> > Subject: RE: [ovs-dev] [PATCH v5 3/4] ovsdb-idl: idl compound indexes
> > implementation
> >
> > Hello,
> > The UUID comparison was added to guarantee that always rows with the
> > same keys are sorted in the same way (between executions of a
> command,
> > for example). Before that we  used the pointer comparison (which gives
> > us some randomness in sorting) but  we kept it to be really really
> > sure that we are deleting/inserting the correct row.
> >
> > The row sync effectively was intended for finding the correct row to
> > delete when there are duplicated "index keys" and there are
> > deletions/updates/inserts (note that the updates are really handled as
> > a delete-and-reinsert).
> 
> Hi Esteban,
> 
> Thanks, I think all of that makes sense. I do wonder, though, since UUIDs
> should be unique whether it would make just as much to assert that the
> addresses are equal if the UUIDs are equal.
> 
> >
> > We copied ovsdb_idl_index_read from ovsdb_idl_read because we were
> > concerned of what would happen if somebody access the indexes when a
> > transaction is being made. If the index read the new values instead of
> > the old one then the rows could be lost/wrongly sorted, etc. It is the
> > same story with
> > ovsdb_idl_index_write_() and index_set: we wanted the indexes to
> > behave correctly during a transaction with new changes.
> >
> 
> Comparing the v4 version of ovsdb_idl_index_read() against the current
> ovsdb_idl_read(), the only difference between the two that I can find is that
> ovsdb_idl_index_read() takes the table class as a function parameter while
> ovsdb_idl_read() expects row->table to be non-NULL and takes the table
> class from row->table->class.  Since ovsdb_idl_index_init_row() initializes
> row->table appropriately, ovsdb_idl_read() should produce exactly the same
> result as ovsdb_idl_index_read(), whether a transaction on that row is in
> progress or not. So it seems to me we should be able to get rid of
> ovsdb_idl_index_read().
> 
> > ovsdb_idl_index_{find,forward_to} are used inside the autogenerated
> > functions for iterating over the indexes.
> > We thought that it would be nice to be able to say things like:
> >
> > OVSREC_EXAMPLE_FOR_EACH_RANGE(row, cursor, start, end) /* [start,
> end]
> > */ OVSREC_EXAMPLE_FOR_EACH_RANGE(row, cursor, NULL, end) /* [0,
> end]
> > */ OVSREC_EXAMPLE_FOR_EACH_RANGE(row, cursor, start, NULL) /*
> [start,
> > "+infty"] */ OVSREC_EXAMPLE_FOR_EACH_RANGE(row, cursor, NULL,
> NULL) /*
> > everything */
> >
> > Regards,
> > Esteban
> 
> Thanks, that makes sense.  I think it would be good to add tests for these
> cases.
> 
> Thanks again!
> 
>    Lance
> 
> >
> > -----Original Message-----
> > From: Lance Richardson [mailto:lrichard@redhat.com]
> > Sent: jueves, 13 de julio de 2017 14:29
> > To: Ben Pfaff <blp@ovn.org>
> > Cc: dev@openvswitch.org; Rodriguez Betancourt, Esteban
> > <estebarb@hpe.com>; Albornoz, Javier <javier.albornoz@hpe.com>; jorge
> > sauma <jorge.sauma@hpe.com>; Lutz, Arnoldo
> > <arnoldo.lutz.guevara@hpe.com>
> > Subject: Re: [ovs-dev] [PATCH v5 3/4] ovsdb-idl: idl compound indexes
> > implementation
> >
> >
> >
> > ----- Original Message -----
> > > From: "Ben Pfaff" <blp@ovn.org>
> > > To: "Lance Richardson" <lrichard@redhat.com>
> > > Cc: dev@openvswitch.org, estebarb@hpe.com, "javier albornoz"
> > > <javier.albornoz@hpe.com>, "jorge sauma"
> > > <jorge.sauma@hpe.com>, "arnoldo lutz guevara"
> > > <arnoldo.lutz.guevara@hpe.com>
> > > Sent: Tuesday, 11 July, 2017 5:05:32 PM
> > > Subject: Re: [ovs-dev] [PATCH v5 3/4] ovsdb-idl: idl compound
> > > indexes implementation
> > >
> > > On Sat, Jun 24, 2017 at 05:01:51PM -0400, Lance Richardson wrote:
> > > > This patch adds support for the creation of multicolumn indexes in
> > > > the C IDL to enable for efficient search and retrieval of database
> > > > rows by key.
> > > >
> > > > Signed-off-by: Esteban Rodriguez Betancourt <estebarb@hpe.com>
> > > > Co-authored-by: Lance Richardson <lrichard@redhat.com>
> > > > Signed-off-by: Lance Richardson <lrichard@redhat.com>
> > > > ---
> > > >   v5: - Coding style fixes (checkpatch.py)
> > > >       - Fixed memory leak (missing ovsdb_datum_destroy() in
> > > >         ovsdb_idl_index_destroy_row__()).
> > > >       - Some polishing of comment and log message text.
> > >
> > > Thanks for reviving this series.
> > >
> > > I don't understand ovsdb_idl_index_read().  It is almost the same as
> > > ovsdb_idl_read().  It looks like ovsdb_idl_read() could be
> > > implemented as a wrapper around it, but I'm also not sure why
> > > ovsdb_idl_read() can't be used directly.  Also, I don't understand
> > > its comment about "index_set functions", since there are no
> > > functions with index_set in their names.
> >
> > Hi Ben,
> >
> > I neglected to respond to the comment about "index_set functions" in
> > my previous response, these are automatically generated for the IDL by
> > the next patch. I had deleted that comment in v6 of this series (which
> > is in the ml archive but not patchwork for some reason), it didn't seem
> useful to me.
> >
> > >
> > > ovsdb_idl_index_write_() is mostly copy-paste of a part of
> > > ovsdb_idl_txn_write__(), so to avoid code duplication it would be
> > > best to factor that code out of ovsdb_idl_txn_write__() and call it
> > > from both places.
> > >
> > > I don't understand the behavior of ovsdb_idl_index_find() and
> > > ovsdb_idl_index_forward_to() for a null 'value' argument.  Is there
> > > some idiomatic usage where the null and nonnull behaviors work out
> nicely?
> > >
> > > The row_sync behavior is confusing.  I remember it slightly from the
> > > previous iteration but I don't remember being convinced it was the
> > > best way to do things.
> > >
> >
> > The "row_sync" stuff seems to be a matter of using additional keys
> > (row uuid values first, then memory addresses) when inserting or
> > deleting a row in order to handle duplicate keys.
> >
> > This still seems a bit strange, but after thinking about it a bit I
> > believe it is a reasonable approach; these extra keys aren't used when
> > searching, so we will always get the first entry in the list when
> > there are multiple rows with the same key, and when inserting/deleting
> > we still get O(log N) performance when there are many entries with
> > identical keys by searching on the UUID value.
> >
> > But "row_sync" isn't at all useful in figuring out the above, nor are
> > any of the comments in this area.
> >
> > I also wonder if the comparison on memory address is needed... is
> > there any real possibility that two rows in a table will have the same UUID?
> >
> > Do you think reworking the "row_sync" name and adding better comments
> > would be an acceptable way forward here?
> >
> > Thanks,
> >
> >    Lance
> >
> >
> > > Thanks,
> > >
> > > Ben.
> > >
> >
Lance Richardson July 14, 2017, 10 p.m. UTC | #7
> From: "Rodriguez Betancourt, Esteban" <estebarb@hpe.com>
> To: "Lance Richardson" <lrichard@redhat.com>
> Cc: "Ben Pfaff" <blp@ovn.org>, dev@openvswitch.org, "Javier Albornoz" <javier.albornoz@hpe.com>, "Arnoldo Lutz"
> <arnoldo.lutz.guevara@hpe.com>
> Sent: Friday, 14 July, 2017 5:27:44 PM
> Subject: RE: [ovs-dev] [PATCH v5 3/4] ovsdb-idl: idl compound indexes implementation
> 
> I think that is ok to remove that last pointer check in the generic
> comparison function. UUID is more than enough.
> Regards,
> Esteban
> 

Thanks, Esteban. I think this will be a very useful facility, many thanks to
you and the others who have worked on it.

I've posted a v7 of the patch series a little while ago, in patchwork here (removing
the pointer check in the comparison function will be a v8 item):

   https://patchwork.ozlabs.org/project/openvswitch/list/?submitter=67850

Regards,

   Lance
diff mbox

Patch

diff --git a/lib/ovsdb-idl-provider.h b/lib/ovsdb-idl-provider.h
index 8cfbb95..48a732d 100644
--- a/lib/ovsdb-idl-provider.h
+++ b/lib/ovsdb-idl-provider.h
@@ -1,4 +1,5 @@ 
 /* Copyright (c) 2009, 2010, 2011, 2012, 2016 Nicira, Inc.
+ * Copyright (C) 2016 Hewlett Packard Enterprise Development LP
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -111,6 +112,7 @@  struct ovsdb_idl_table {
     struct hmap rows;        /* Contains "struct ovsdb_idl_row"s. */
     struct ovsdb_idl *idl;   /* Containing idl. */
     unsigned int change_seqno[OVSDB_IDL_CHANGE_MAX];
+    struct shash indexes;    /* Contains "struct ovsdb_idl_index"s */
     struct ovs_list track_list; /* Tracked rows (ovsdb_idl_row.track_node). */
     struct ovsdb_idl_condition condition;
     bool cond_changed;
@@ -122,6 +124,33 @@  struct ovsdb_idl_class {
     size_t n_tables;
 };
 
+/*
+ * Contains the configuration per column of the index
+ */
+struct ovsdb_idl_index_column {
+    const struct ovsdb_idl_column *column;
+    column_comparator *comparer;
+    int sorting_order;
+};
+
+/*
+ * Defines a IDL compound index
+ */
+struct ovsdb_idl_index {
+    struct skiplist *skiplist;                /* Skiplist with pointer to
+                                               * rows*/
+    struct ovsdb_idl_index_column *columns;   /* Columns configuration */
+    size_t n_columns;                         /* Number of columns in index */
+    size_t alloc_columns;                     /* Size allocated memory for
+                                               * columns, comparers and
+                                               * sorting order */
+    bool row_sync;                            /* Determines if the replica
+                                               * is modifying its content or
+                                               * not */
+    const struct ovsdb_idl_table *table;      /* Table that owns this index */
+    const char *index_name;                   /* The name of this index */
+};
+
 struct ovsdb_idl_row *ovsdb_idl_get_row_arc(
     struct ovsdb_idl_row *src,
     const struct ovsdb_idl_table_class *dst_table,
diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
index 893143c..6b51826 100644
--- a/lib/ovsdb-idl.c
+++ b/lib/ovsdb-idl.c
@@ -1,4 +1,5 @@ 
 /* Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 Nicira, Inc.
+ * Copyright (C) 2016 Hewlett Packard Enterprise Development LP
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -37,8 +38,10 @@ 
 #include "ovsdb-parser.h"
 #include "poll-loop.h"
 #include "openvswitch/shash.h"
+#include "skiplist.h"
 #include "sset.h"
 #include "util.h"
+#include "uuid.h"
 #include "openvswitch/vlog.h"
 
 VLOG_DEFINE_THIS_MODULE(ovsdb_idl);
@@ -228,6 +231,14 @@  ovsdb_idl_table_from_class(const struct ovsdb_idl *,
 static bool ovsdb_idl_track_is_set(struct ovsdb_idl_table *table);
 static void ovsdb_idl_send_cond_change(struct ovsdb_idl *idl);
 
+static struct ovsdb_idl_index *ovsdb_idl_create_index_(const struct
+                                                       ovsdb_idl_table *table,
+                                                       size_t allocated_cols);
+static void
+ ovsdb_idl_destroy_indexes(struct ovsdb_idl_table *table);
+static void ovsdb_idl_add_to_indexes(const struct ovsdb_idl_row *);
+static void ovsdb_idl_remove_from_indexes(const struct ovsdb_idl_row *);
+
 /* Creates and returns a connection to database 'remote', which should be in a
  * form acceptable to jsonrpc_session_open().  The connection will maintain an
  * in-memory replica of the remote database whose schema is described by
@@ -274,6 +285,7 @@  ovsdb_idl_create(const char *remote, const struct ovsdb_idl_class *class,
         memset(table->modes, default_mode, tc->n_columns);
         table->need_table = false;
         shash_init(&table->columns);
+        shash_init(&table->indexes);
         for (j = 0; j < tc->n_columns; j++) {
             const struct ovsdb_idl_column *column = &tc->columns[j];
 
@@ -329,6 +341,7 @@  ovsdb_idl_destroy(struct ovsdb_idl *idl)
         for (i = 0; i < idl->class->n_tables; i++) {
             struct ovsdb_idl_table *table = &idl->tables[i];
             ovsdb_idl_condition_destroy(&table->condition);
+            ovsdb_idl_destroy_indexes(table);
             shash_destroy(&table->columns);
             hmap_destroy(&table->rows);
             free(table->modes);
@@ -364,6 +377,7 @@  ovsdb_idl_clear(struct ovsdb_idl *idl)
             struct ovsdb_idl_arc *arc, *next_arc;
 
             if (!ovsdb_idl_row_is_orphan(row)) {
+                ovsdb_idl_remove_from_indexes(row);
                 ovsdb_idl_row_unparse(row);
             }
             LIST_FOR_EACH_SAFE (arc, next_arc, src_node, &row->src_arcs) {
@@ -2027,6 +2041,449 @@  ovsdb_idl_row_unparse(struct ovsdb_idl_row *row)
     }
 }
 
+/*
+ * The OVSDB-IDL Compound Indexes feature allows to create custom
+ * indexes over several columns in the IDL.
+ * With those index, the developer can retrieve rows matching a search
+ * criteria, or show the rows sorted in a specific way.
+ */
+
+/* Reads and returns the value of 'column' within 'row'.  If the value
+ * was set with index_set functions then the modified value is returned.
+ *
+ * The caller must not modify or free the returned value.
+ *
+ */
+static const struct ovsdb_datum *
+ovsdb_idl_index_read(const struct ovsdb_idl_row *row,
+               const struct ovsdb_idl_column *column,
+               const struct ovsdb_idl_table_class *class)
+{
+    size_t column_idx;
+
+    column_idx = column - class->columns;
+
+    ovs_assert(row->new != NULL);
+    ovs_assert(column_idx < class->n_columns);
+
+    if (row->written && bitmap_is_set(row->written, column_idx)) {
+        return &row->new[column_idx];
+    } else if (row->old) {
+        return &row->old[column_idx];
+    } else {
+        return ovsdb_datum_default(&column->type);
+    }
+}
+
+/*
+ * Creates a new index, that is attached to the given idl and table.
+ * The index has the given name.
+ * All the indexes must be created before the first ovsdb_idl_run is
+ * executed.
+ */
+struct ovsdb_idl_index *
+ovsdb_idl_create_index(struct ovsdb_idl *idl,
+                       const struct ovsdb_idl_table_class *tc,
+                       const char *index_name)
+{
+    struct ovsdb_idl_index *index;
+    size_t i;
+
+    for (i = 0; i < idl->class->n_tables; i++) {
+        struct ovsdb_idl_table *table = &idl->tables[i];
+
+        if (table->class == tc) {
+            index = ovsdb_idl_create_index_(table, 1);
+            if (!shash_add_once(&table->indexes, index_name, index)) {
+                VLOG_ERR("Duplicate index name '%s' in table %s",
+                         index_name, table->class->name);
+                return NULL;
+            }
+            index->index_name = index_name;
+            return index;
+        }
+    }
+    OVS_NOT_REACHED();
+    return NULL;
+}
+
+/*
+ * Generic comparator that can compare each index, using the custom
+ * configuration (an struct ovsdb_idl_index) passed to it.
+ * Not intended for direct usage.
+ */
+static int
+ovsdb_idl_index_generic_comparer(const void *a,
+                                 const void *b, const void *conf)
+{
+    const struct ovsdb_idl_column *column;
+    const struct ovsdb_idl_index *index;
+    size_t i;
+
+    index = CONST_CAST(struct ovsdb_idl_index *, conf);
+
+    if (a == b) {
+        return 0;
+    }
+
+    for (i = 0; i < index->n_columns; i++) {
+        int val;
+        if (index->columns[i].comparer) {
+            val = index->columns[i].comparer(a, b);
+        } else {
+            column = index->columns[i].column;
+            const struct ovsdb_idl_row *row_a, *row_b;
+            row_a = CONST_CAST(struct ovsdb_idl_row *, a);
+            row_b = CONST_CAST(struct ovsdb_idl_row *, b);
+            const struct ovsdb_datum *datum_a, *datum_b;
+            datum_a = ovsdb_idl_index_read(row_a, column, index->table->class);
+            datum_b = ovsdb_idl_index_read(row_b, column, index->table->class);
+            val = ovsdb_datum_compare_3way(datum_a, datum_b, &column->type);
+        }
+
+        if (val) {
+            return val * index->columns[i].sorting_order;
+        }
+    }
+
+    /*
+     * If row_sync is true then the IDL is synchronizing the replica's
+     * rows with the ones stored in the index. In this case is necessary
+     * to compare also by pointer value (eg: so the correct row is removed).
+     * In any other case (the user is doing a search) the values are
+     * already equal, so return 0.
+     * Also, the pointers obviously are random, so in different IDLs of the
+     * same OVSDB instance the index could have different ordering.
+     * Comparing first by UUID can guarantee the same order at any IDL.
+     */
+    if (index->row_sync) {
+        const struct ovsdb_idl_row *row_a, *row_b;
+
+        row_a = (const struct ovsdb_idl_row *) a;
+        row_b = (const struct ovsdb_idl_row *) b;
+        int value = uuid_compare_3way(&row_a->uuid, &row_b->uuid);
+
+        return value ? value : (a < b) - (a > b);
+    } else {
+        return 0;
+    }
+}
+
+static struct ovsdb_idl_index *
+ovsdb_idl_create_index_(const struct ovsdb_idl_table *table,
+                        size_t allocated_cols)
+{
+    struct ovsdb_idl_index *index;
+
+    index = xmalloc(sizeof (struct ovsdb_idl_index));
+    index->n_columns = 0;
+    index->alloc_columns = allocated_cols;
+    index->skiplist = skiplist_create(ovsdb_idl_index_generic_comparer, index);
+    index->columns = xmalloc(allocated_cols *
+                             sizeof (struct ovsdb_idl_index_column));
+    index->row_sync = false;
+    index->table = table;
+    return index;
+}
+
+static void
+ovsdb_idl_destroy_indexes(struct ovsdb_idl_table *table)
+{
+    struct ovsdb_idl_index *index;
+    struct shash_node *node;
+
+    SHASH_FOR_EACH (node, &(table->indexes)) {
+        index = (struct ovsdb_idl_index *) node->data;
+        skiplist_destroy(index->skiplist, NULL);
+        free(index->columns);
+    }
+}
+
+static void
+ovsdb_idl_add_to_indexes(const struct ovsdb_idl_row *row)
+{
+    struct ovsdb_idl_table *table = row->table;
+    struct ovsdb_idl_index *index;
+    struct shash_node *node;
+
+    SHASH_FOR_EACH (node, &(table->indexes)) {
+        index = (struct ovsdb_idl_index *) node->data;
+        index->row_sync = true;
+        skiplist_insert(index->skiplist, row);
+        index->row_sync = false;
+    }
+}
+
+static void
+ovsdb_idl_remove_from_indexes(const struct ovsdb_idl_row *row)
+{
+    struct ovsdb_idl_table *table = row->table;
+    struct ovsdb_idl_index *index;
+    struct shash_node *node;
+
+    SHASH_FOR_EACH (node, &(table->indexes)) {
+        index = (struct ovsdb_idl_index *) node->data;
+        index->row_sync = true;
+        skiplist_delete(index->skiplist, row);
+        index->row_sync = false;
+    }
+}
+
+/*
+ * Adds a column to an existing index (all columns must be inserted before
+ * the first ovsdb_idl_run is executed).
+ * In "order", accepts the values OVSDB_INDEX_ASC or OVSDB_INDEX_DESC
+ * (OVSDB_INDEX_ASC by default).
+ * In "custom_comparer" it accepts a custom comparison function. If given NULL
+ * it will use the default comparator for the column (only available for
+ * string, numeric or real columns).
+ */
+void
+ovsdb_idl_index_add_column(struct ovsdb_idl_index *index,
+                           const struct ovsdb_idl_column *column,
+                           int order, column_comparator * custom_comparer)
+{
+    /* Check that the column or table is tracked */
+    if (!index->table->need_table &&
+        !((OVSDB_IDL_MONITOR | OVSDB_IDL_ALERT) &
+          *ovsdb_idl_get_mode(index->table->idl, column))) {
+        VLOG_ERR("Can't add unmonitored column '%s' at index '%s' in "
+                 "table '%s'.",
+                 column->name, index->index_name, index->table->class->name);
+    }
+    if (!ovsdb_type_is_scalar(&column->type) && !custom_comparer) {
+        VLOG_WARN("Comparing non-scalar values.");
+    }
+
+    /* Allocate more memory for column configuration */
+    if (index->n_columns == index->alloc_columns) {
+        index->alloc_columns++;
+        index->columns = xrealloc(index->columns,
+                                  index->alloc_columns *
+                                  sizeof(struct ovsdb_idl_index_column));
+    }
+
+    /* Append column to index */
+    int i = index->n_columns;
+
+    index->columns[i].column = column;
+    index->columns[i].comparer = custom_comparer ? custom_comparer : NULL;
+    if (order == OVSDB_INDEX_ASC) {
+        index->columns[i].sorting_order = OVSDB_INDEX_ASC;
+    } else {
+        index->columns[i].sorting_order = OVSDB_INDEX_DESC;
+    }
+    index->n_columns++;
+}
+
+/*
+ * Initializes a index cursor
+ */
+bool
+ovsdb_idl_initialize_cursor(struct ovsdb_idl *idl,
+                            const struct ovsdb_idl_table_class *tc,
+                            const char *index_name,
+                            struct ovsdb_idl_index_cursor *cursor)
+{
+    size_t i;
+
+    for (i = 0; i < idl->class->n_tables; i++) {
+        struct ovsdb_idl_table *table = &idl->tables[i];
+
+        if (table->class == tc) {
+            cursor->index =
+                (struct ovsdb_idl_index *) shash_find(&table->indexes,
+                                                      index_name)->data;
+            if (!cursor->index) {
+                VLOG_ERR("Cursor initialization failed, "
+                         "index %s at table %s does not exist.",
+                         index_name, tc->name);
+                cursor->index = NULL;
+                cursor->position = NULL;
+                return false;
+            }
+            cursor->position = skiplist_first(cursor->index->skiplist);
+            return true;
+        }
+    }
+    VLOG_ERR("Cursor initialization failed, "
+             "index %s at table %s does not exist.", index_name, tc->name);
+    return false;
+}
+
+/*
+ * ovsdb_idl_index_write_ writes a datum in an ovsdb_idl_row,
+ * and updates the corresponding field in the table record.
+ * Not intended for direct usage.
+ */
+void
+ovsdb_idl_index_write_(struct ovsdb_idl_row *const_row,
+                       const struct ovsdb_idl_column *column,
+                       struct ovsdb_datum *datum,
+                       const struct ovsdb_idl_table_class *class)
+{
+    struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, const_row);
+    size_t column_idx = column - class->columns;
+
+    /* Extracted from ovsdb_idl_txn_write__ */
+    if (row->old == row->new) {
+        row->new = xmalloc(class->n_columns * sizeof *row->new);
+    }
+    if (!row->written) {
+        row->written = bitmap_allocate(class->n_columns);
+    }
+    if (bitmap_is_set(row->written, column_idx)) {
+        ovsdb_datum_destroy(&row->new[column_idx], &column->type);
+    } else {
+        bitmap_set1(row->written, column_idx);
+     }
+    row->new[column_idx] = *datum;
+    (column->unparse)(row);
+    (column->parse)(row, &row->new[column_idx]);
+}
+
+/*
+ * Initializes a row for usage with the compound indexes query.
+ * Not intended for direct usage.
+ */
+struct ovsdb_idl_row *
+ovsdb_idl_index_init_row(struct ovsdb_idl * idl,
+                         const struct ovsdb_idl_table_class *class)
+{
+    struct ovsdb_idl_row *row = xzalloc(class->allocation_size);
+    class->row_init(row);
+    row->written = bitmap_allocate(class->n_columns);
+    row->table = ovsdb_idl_table_from_class(idl, class);
+    return row;
+}
+
+/* Not intended for direct ussage.
+ * Destroys 'row_' and frees associated memory
+ * This function is intended to be used indirectly through one of the
+ * "index_destroy_row" functions generated by ovsdb-idlc.
+ */
+void
+ovsdb_idl_index_destroy_row__(const struct ovsdb_idl_row *row_)
+{
+    struct ovsdb_idl_row *row = CONST_CAST(struct ovsdb_idl_row *, row_);
+    const struct ovsdb_idl_table_class *class = row->table->class;
+    size_t i;
+
+    if (row->new) {
+        const struct ovsdb_idl_column *c;
+        if (row->written) {
+            BITMAP_FOR_EACH_1 (i, class->n_columns, row->written) {
+                c = &class->columns[i];
+                (c->unparse) (row);
+                ovsdb_datum_destroy(&row->new[i], &c->type);
+            }
+        }
+        if (row->old) {
+            free(row->old);
+        }
+        free(row->new);
+        free(row->written);
+        free(row);
+    }
+}
+
+/*
+ * Moves the cursor to the first entry in the index.
+ * Returns a pointer to the corresponding ovsdb_idl_row, or NULL.
+ */
+struct ovsdb_idl_row *
+ovsdb_idl_index_first(struct ovsdb_idl_index_cursor *cursor)
+{
+    cursor->position = skiplist_first(cursor->index->skiplist);
+    return ovsdb_idl_index_data(cursor);
+}
+
+/*
+ * Moves the cursor to the following record in the index.
+ */
+struct ovsdb_idl_row *
+ovsdb_idl_index_next(struct ovsdb_idl_index_cursor *cursor)
+{
+    if (!cursor->position) {
+        return NULL;
+    }
+    cursor->position = skiplist_next(cursor->position);
+    return ovsdb_idl_index_data(cursor);
+ }
+
+/*
+ * Returns the ovsdb_idl_row pointer corresponding to the current record
+ */
+struct ovsdb_idl_row *
+ovsdb_idl_index_data(struct ovsdb_idl_index_cursor *cursor)
+{
+    return skiplist_get_data(cursor->position);
+}
+
+/*
+ * Moves the cursor to the first entry with a value equal to the given value.
+ * If the value given is NULL then the function will behave like
+ * ovsdb_idl_index_first.
+ * Returns a pointer to the corresponding ovsdb_idl_row (that can be casted
+ * to a ovsrec) or NULL.
+ */
+struct ovsdb_idl_row *
+ovsdb_idl_index_find(struct ovsdb_idl_index_cursor *cursor,
+                     struct ovsdb_idl_row *value)
+{
+    if (value) {
+        cursor->position = skiplist_find(cursor->index->skiplist, value);
+    } else {
+        cursor->position = skiplist_first(cursor->index->skiplist);
+    }
+    return ovsdb_idl_index_data(cursor);
+}
+
+/*
+ * Moves the cursor to the first entry with a value greater or equal
+ * to the given value.
+ * If the value given is NULL then the function will behave like
+ * ovsdb_idl_index_first.
+ * Returns a pointer to the corresponding ovsdb_idl_row (that can be casted
+ * to a ovsrec) or NULL.
+ */
+struct ovsdb_idl_row *
+ovsdb_idl_index_forward_to(struct ovsdb_idl_index_cursor *cursor,
+                           struct ovsdb_idl_row *value)
+{
+    if (value) {
+        cursor->position = skiplist_forward_to(cursor->index->skiplist, value);
+    } else {
+        cursor->position = skiplist_first(cursor->index->skiplist);
+    }
+    return ovsdb_idl_index_data(cursor);
+}
+
+/*
+ * Returns the result of comparing two ovsrecs (casted to ovsdb_idl_row),
+ * using the comparer defined in the index.
+ * Returns:
+ * < 0 if a < b
+ * 0 if a == b
+ * > 0 if a > b
+ * When some input is NULL this function considers NULL to be greater than
+ * any other value, and NULL == NULL.
+ */
+int
+ovsdb_idl_index_compare(struct ovsdb_idl_index_cursor *cursor,
+                        struct ovsdb_idl_row *a, struct ovsdb_idl_row *b)
+{
+    if (a && b) {
+        return ovsdb_idl_index_generic_comparer(a, b, cursor->index);
+    } else if (!a && !b) {
+        return 0;
+    } else if (a) {
+        return -1;
+    } else {
+        return 1;
+    }
+}
+
 static void
 ovsdb_idl_row_clear_old(struct ovsdb_idl_row *row)
 {
@@ -2235,11 +2692,13 @@  ovsdb_idl_insert_row(struct ovsdb_idl_row *row, const struct json *row_json)
     ovsdb_idl_row_parse(row);
 
     ovsdb_idl_row_reparse_backrefs(row);
+    ovsdb_idl_add_to_indexes(row);
 }
 
 static void
 ovsdb_idl_delete_row(struct ovsdb_idl_row *row)
 {
+    ovsdb_idl_remove_from_indexes(row);
     ovsdb_idl_row_unparse(row);
     ovsdb_idl_row_clear_arcs(row, true);
     ovsdb_idl_row_clear_old(row);
@@ -2257,10 +2716,12 @@  ovsdb_idl_modify_row(struct ovsdb_idl_row *row, const struct json *row_json)
 {
     bool changed;
 
+    ovsdb_idl_remove_from_indexes(row);
     ovsdb_idl_row_unparse(row);
     ovsdb_idl_row_clear_arcs(row, true);
     changed = ovsdb_idl_row_update(row, row_json, OVSDB_IDL_CHANGE_MODIFY);
     ovsdb_idl_row_parse(row);
+    ovsdb_idl_add_to_indexes(row);
 
     return changed;
 }
diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h
index 281873d..77a01a6 100644
--- a/lib/ovsdb-idl.h
+++ b/lib/ovsdb-idl.h
@@ -1,4 +1,5 @@ 
 /* Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc.
+ * Copyright (C) 2016 Hewlett Packard Enterprise Development LP
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -41,6 +42,7 @@ 
 #include "ovsdb-data.h"
 #include "openvswitch/list.h"
 #include "ovsdb-condition.h"
+#include "skiplist.h"
 
 struct json;
 struct ovsdb_datum;
@@ -358,4 +360,61 @@  unsigned int ovsdb_idl_set_condition(struct ovsdb_idl *,
                                      const struct ovsdb_idl_condition *);
 
 unsigned int ovsdb_idl_get_condition_seqno(const struct ovsdb_idl *);
+
+/*
+ * The OVSDB-IDL Compound Indexes feature allows the creation of
+ * custom indexes over several columns in the IDL.
+ * With these custom indexes, the user can retrieve rows matching a search
+ * criteria or iterate of rows sorted in a specific order.
+ */
+
+struct ovsdb_idl_index *
+ovsdb_idl_create_index(struct ovsdb_idl *idl,
+                       const struct ovsdb_idl_table_class *tc,
+                       const char *index_name);
+
+#define OVSDB_INDEX_DESC -1
+#define OVSDB_INDEX_ASC 1
+
+/*
+ * Skiplist comparison function. Allows to store sorted data.
+ */
+typedef int
+(column_comparator)(const void *a, const void *b);
+
+struct ovsdb_idl_index_cursor {
+    struct ovsdb_idl_index *index;    /* Index used by this cursor */
+    struct skiplist_node *position;   /* Current position in the index */
+};
+
+int ovsdb_idl_index_strcmp(char *, char *);
+int ovsdb_idl_index_intcmp(int64_t, int64_t);
+int ovsdb_idl_index_doublecmp(double, double);
+void ovsdb_idl_index_add_column(struct ovsdb_idl_index *,
+                                const struct ovsdb_idl_column *,
+                                int order,
+                                column_comparator *custom_comparer
+                                );
+bool ovsdb_idl_initialize_cursor(struct ovsdb_idl *,
+                            const struct ovsdb_idl_table_class *tc,
+                            const char *index_name,
+                            struct ovsdb_idl_index_cursor *cursor);
+void ovsdb_idl_index_write_(struct ovsdb_idl_row *,
+                            const struct ovsdb_idl_column *,
+                            struct ovsdb_datum *,
+                            const struct ovsdb_idl_table_class *);
+struct ovsdb_idl_row *ovsdb_idl_index_init_row(struct ovsdb_idl *,
+                                       const struct ovsdb_idl_table_class *);
+void ovsdb_idl_index_destroy_row__(const struct ovsdb_idl_row *);
+struct ovsdb_idl_row *ovsdb_idl_index_first(struct ovsdb_idl_index_cursor *);
+struct ovsdb_idl_row *ovsdb_idl_index_next(struct ovsdb_idl_index_cursor *);
+struct ovsdb_idl_row *ovsdb_idl_index_data(struct ovsdb_idl_index_cursor *);
+struct ovsdb_idl_row *ovsdb_idl_index_find(struct ovsdb_idl_index_cursor *,
+                                           struct ovsdb_idl_row *);
+struct ovsdb_idl_row *ovsdb_idl_index_forward_to(
+        struct ovsdb_idl_index_cursor *,
+        struct ovsdb_idl_row *);
+int ovsdb_idl_index_compare(struct ovsdb_idl_index_cursor *,
+                            struct ovsdb_idl_row *a,
+                            struct ovsdb_idl_row *b);
 #endif /* ovsdb-idl.h */