mbox series

[0/4] Add 'Event.payload' field

Message ID 20180325151731.29959-1-stephen@that.guru
Headers show
Series Add 'Event.payload' field | expand

Message

Stephen Finucane March 25, 2018, 3:17 p.m. UTC
The '/event' API endpoint is really slow due to the amount of JOINs
necessary to retrieve records from the database. Resolve this by
a static JSON representation of any embedded data in the database. This
has some disadvantages, noted in the patches, but the performance
improvement is huge and users will not notice a thing.

As a necessary side-effect of this change, events now require the REST
API be enabled. To be honest, this should have been the case from day 1
as events are currently only exposed over this API.

Stephen Finucane (4):
  REST: Support embedded serializers without context
  signals: Only enable events when REST API enabled
  models: Migrate event fields to JSON field
  REST: Only fetch required fields event filtering

 patchwork/api/embedded.py                          |  60 ++++++++---
 patchwork/api/event.py                             | 110 +++++++++++++--------
 patchwork/api/filters.py                           |   7 ++
 patchwork/fields.py                                |  32 ++++++
 .../migrations/0025_add_event_payload_field.py     |  21 ++++
 ...26_migrate_data_from_event_fields_to_payload.py |  63 ++++++++++++
 .../migrations/0027_remove_old_event_fields.py     |  34 +++++++
 patchwork/models.py                                |  28 +-----
 patchwork/signals.py                               |  70 +++++++++++--
 patchwork/tests/test_events.py                     | 110 +++++++++++++--------
 .../events-require-rest-api-47eab4a3be745f75.yaml  |   5 +
 11 files changed, 413 insertions(+), 127 deletions(-)
 create mode 100644 patchwork/migrations/0025_add_event_payload_field.py
 create mode 100644 patchwork/migrations/0026_migrate_data_from_event_fields_to_payload.py
 create mode 100644 patchwork/migrations/0027_remove_old_event_fields.py
 create mode 100644 releasenotes/notes/events-require-rest-api-47eab4a3be745f75.yaml

Comments

Daniel Axtens March 27, 2018, 11:02 p.m. UTC | #1
Stephen Finucane <stephen@that.guru> writes:

Hi Stephen,

> The '/event' API endpoint is really slow due to the amount of JOINs
> necessary to retrieve records from the database. Resolve this by
> a static JSON representation of any embedded data in the database. This
> has some disadvantages, noted in the patches, but the performance
> improvement is huge and users will not notice a thing.
>
> As a necessary side-effect of this change, events now require the REST
> API be enabled. To be honest, this should have been the case from day 1
> as events are currently only exposed over this API.

How does perfomance of this compare to
https://patchwork.ozlabs.org/project/patchwork/list/?series=35029 ?

This is a much bigger change than that series, so if the performance
impact is similar perhaps the smaller, migration-free change would be
preferable?

Regards,
Daniel

>
> Stephen Finucane (4):
>   REST: Support embedded serializers without context
>   signals: Only enable events when REST API enabled
>   models: Migrate event fields to JSON field
>   REST: Only fetch required fields event filtering
>
>  patchwork/api/embedded.py                          |  60 ++++++++---
>  patchwork/api/event.py                             | 110 +++++++++++++--------
>  patchwork/api/filters.py                           |   7 ++
>  patchwork/fields.py                                |  32 ++++++
>  .../migrations/0025_add_event_payload_field.py     |  21 ++++
>  ...26_migrate_data_from_event_fields_to_payload.py |  63 ++++++++++++
>  .../migrations/0027_remove_old_event_fields.py     |  34 +++++++
>  patchwork/models.py                                |  28 +-----
>  patchwork/signals.py                               |  70 +++++++++++--
>  patchwork/tests/test_events.py                     | 110 +++++++++++++--------
>  .../events-require-rest-api-47eab4a3be745f75.yaml  |   5 +
>  11 files changed, 413 insertions(+), 127 deletions(-)
>  create mode 100644 patchwork/migrations/0025_add_event_payload_field.py
>  create mode 100644 patchwork/migrations/0026_migrate_data_from_event_fields_to_payload.py
>  create mode 100644 patchwork/migrations/0027_remove_old_event_fields.py
>  create mode 100644 releasenotes/notes/events-require-rest-api-47eab4a3be745f75.yaml
>
> -- 
> 2.14.3
>
> _______________________________________________
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
Stephen Finucane March 28, 2018, 11:36 a.m. UTC | #2
On Wed, 2018-03-28 at 10:02 +1100, Daniel Axtens wrote:
> Stephen Finucane <stephen@that.guru> writes:
> 
> Hi Stephen,
> 
> > The '/event' API endpoint is really slow due to the amount of JOINs
> > necessary to retrieve records from the database. Resolve this by
> > a static JSON representation of any embedded data in the database. This
> > has some disadvantages, noted in the patches, but the performance
> > improvement is huge and users will not notice a thing.
> > 
> > As a necessary side-effect of this change, events now require the REST
> > API be enabled. To be honest, this should have been the case from day 1
> > as events are currently only exposed over this API.
> 
> How does perfomance of this compare to
> https://patchwork.ozlabs.org/project/patchwork/list/?series=35029 ?

I've no idea. I know that it removes all but one (I think) JOIN so in
theory it should massively improve performance. It's impossible for me
to test this stuff as I simply can't get my hands on a big enough
archive to test it. If you have an idea on how to test this, I'd
appreciate the help.

> This is a much bigger change than that series, so if the performance
> impact is similar perhaps the smaller, migration-free change would be
> preferable?

From what I'm hearing, these large JOINs are generally killing us and
only get worse the larger the instance is. I was envisioning the above
series as something that would fix performance for 2.0, while this as a
longer term solution than the above series that could be applied for
2.1. As noted above though, I need some way to test this so any help
here would be appreciated.

Stephen

> Regards,
> Daniel
> 
> > 
> > Stephen Finucane (4):
> >   REST: Support embedded serializers without context
> >   signals: Only enable events when REST API enabled
> >   models: Migrate event fields to JSON field
> >   REST: Only fetch required fields event filtering
> > 
> >  patchwork/api/embedded.py                          |  60 ++++++++---
> >  patchwork/api/event.py                             | 110 +++++++++++++--------
> >  patchwork/api/filters.py                           |   7 ++
> >  patchwork/fields.py                                |  32 ++++++
> >  .../migrations/0025_add_event_payload_field.py     |  21 ++++
> >  ...26_migrate_data_from_event_fields_to_payload.py |  63 ++++++++++++
> >  .../migrations/0027_remove_old_event_fields.py     |  34 +++++++
> >  patchwork/models.py                                |  28 +-----
> >  patchwork/signals.py                               |  70 +++++++++++--
> >  patchwork/tests/test_events.py                     | 110 +++++++++++++--------
> >  .../events-require-rest-api-47eab4a3be745f75.yaml  |   5 +
> >  11 files changed, 413 insertions(+), 127 deletions(-)
> >  create mode 100644 patchwork/migrations/0025_add_event_payload_field.py
> >  create mode 100644 patchwork/migrations/0026_migrate_data_from_event_fields_to_payload.py
> >  create mode 100644 patchwork/migrations/0027_remove_old_event_fields.py
> >  create mode 100644 releasenotes/notes/events-require-rest-api-47eab4a3be745f75.yaml
> > 
> > -- 
> > 2.14.3
> > 
> > _______________________________________________
> > Patchwork mailing list
> > Patchwork@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/patchwork
Daniel Axtens March 28, 2018, 3:36 p.m. UTC | #3
Stephen Finucane <stephen@that.guru> writes:

> On Wed, 2018-03-28 at 10:02 +1100, Daniel Axtens wrote:
>> Stephen Finucane <stephen@that.guru> writes:
>> 
>> Hi Stephen,
>> 
>> > The '/event' API endpoint is really slow due to the amount of JOINs
>> > necessary to retrieve records from the database. Resolve this by
>> > a static JSON representation of any embedded data in the database. This
>> > has some disadvantages, noted in the patches, but the performance
>> > improvement is huge and users will not notice a thing.
>> > 
>> > As a necessary side-effect of this change, events now require the REST
>> > API be enabled. To be honest, this should have been the case from day 1
>> > as events are currently only exposed over this API.
>> 
>> How does perfomance of this compare to
>> https://patchwork.ozlabs.org/project/patchwork/list/?series=35029 ?
>
> I've no idea. I know that it removes all but one (I think) JOIN so in
> theory it should massively improve performance. It's impossible for me
> to test this stuff as I simply can't get my hands on a big enough
> archive to test it. If you have an idea on how to test this, I'd
> appreciate the help.

What I did to test it was to download the Ubuntu kernel team archive -
https://lists.ubuntu.com/archives/kernel-team/ - the full raw archive is
available from the top of the page and is currently 608MB. You can
import it multiple times into different projects too, so you can scale
up 1.2 GB pretty easily. This is what I did for my postgres tests.

I then used Django Debug Toolbar to tell me how long the SQL queries
were taking and how much time was spent in processing. I just tested
this on my laptop but it was enough to see the big, measurable
differences I included in my mail.

>> This is a much bigger change than that series, so if the performance
>> impact is similar perhaps the smaller, migration-free change would be
>> preferable?
>
> From what I'm hearing, these large JOINs are generally killing us and
> only get worse the larger the instance is. I was envisioning the above
> series as something that would fix performance for 2.0, while this as a
> longer term solution than the above series that could be applied for
> 2.1. As noted above though, I need some way to test this so any help
> here would be appreciated.

On much reflection, my concern with this series is how we would make a
change to the events API. As I understand it, we're going to have one
static blob stored in the db, so if we change what constitues an event
we're going to get a discontinuity in what's stored in the database.

For example if we add a field to the event - say for the sake of the
argument the SHA1 the message contents - we'd need to figure something
out. I can think of a few options, none of which thrill me:

 - We unversion the events API and push the burden to consumers to deal
   with changes to time-series data.

 - post-process the json coming out of the database on every query. For
   v1 consumers of new data, strip out the new field. For v2 consumers
   of old data, we calculate the new data and supply it (possibly saving
   the combined data so we only have to do this once per event.)

 - Or we have to do scary migrations where we read-modify-write all of
   the many, many, previous events.

What would be your plan for dealing with a change to the event model?

Regards,
Daniel
> Stephen
>
>> Regards,
>> Daniel
>> 
>> > 
>> > Stephen Finucane (4):
>> >   REST: Support embedded serializers without context
>> >   signals: Only enable events when REST API enabled
>> >   models: Migrate event fields to JSON field
>> >   REST: Only fetch required fields event filtering
>> > 
>> >  patchwork/api/embedded.py                          |  60 ++++++++---
>> >  patchwork/api/event.py                             | 110 +++++++++++++--------
>> >  patchwork/api/filters.py                           |   7 ++
>> >  patchwork/fields.py                                |  32 ++++++
>> >  .../migrations/0025_add_event_payload_field.py     |  21 ++++
>> >  ...26_migrate_data_from_event_fields_to_payload.py |  63 ++++++++++++
>> >  .../migrations/0027_remove_old_event_fields.py     |  34 +++++++
>> >  patchwork/models.py                                |  28 +-----
>> >  patchwork/signals.py                               |  70 +++++++++++--
>> >  patchwork/tests/test_events.py                     | 110 +++++++++++++--------
>> >  .../events-require-rest-api-47eab4a3be745f75.yaml  |   5 +
>> >  11 files changed, 413 insertions(+), 127 deletions(-)
>> >  create mode 100644 patchwork/migrations/0025_add_event_payload_field.py
>> >  create mode 100644 patchwork/migrations/0026_migrate_data_from_event_fields_to_payload.py
>> >  create mode 100644 patchwork/migrations/0027_remove_old_event_fields.py
>> >  create mode 100644 releasenotes/notes/events-require-rest-api-47eab4a3be745f75.yaml
>> > 
>> > -- 
>> > 2.14.3
>> > 
>> > _______________________________________________
>> > Patchwork mailing list
>> > Patchwork@lists.ozlabs.org
>> > https://lists.ozlabs.org/listinfo/patchwork
Daniel Axtens April 2, 2018, 7:10 a.m. UTC | #4
Hi Stephen,

I tried applying this and migrating, and I'm getting the following
exception trying to list events:

web_1  | Traceback (most recent call last):
web_1  |   File "/usr/local/lib/python3.6/dist-packages/django/core/handlers/exception.py", line 41, in inner
web_1  |     response = get_response(request)
web_1  |   File "/usr/local/lib/python3.6/dist-packages/django/core/handlers/base.py", line 187, in _get_response
web_1  |     response = self.process_exception_by_middleware(e, request)
web_1  |   File "/usr/local/lib/python3.6/dist-packages/django/core/handlers/base.py", line 185, in _get_response
web_1  |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
web_1  |   File "/usr/local/lib/python3.6/dist-packages/django/views/decorators/csrf.py", line 58, in wrapped_view
web_1  |     return view_func(*args, **kwargs)
web_1  |   File "/usr/local/lib/python3.6/dist-packages/django/views/generic/base.py", line 68, in view
web_1  |     return self.dispatch(request, *args, **kwargs)
web_1  |   File "/usr/local/lib/python3.6/dist-packages/rest_framework/views.py", line 489, in dispatch
web_1  |     response = self.handle_exception(exc)
web_1  |   File "/usr/local/lib/python3.6/dist-packages/rest_framework/views.py", line 449, in handle_exception
web_1  |     self.raise_uncaught_exception(exc)
web_1  |   File "/usr/local/lib/python3.6/dist-packages/rest_framework/views.py", line 486, in dispatch
web_1  |     response = handler(request, *args, **kwargs)
web_1  |   File "/usr/local/lib/python3.6/dist-packages/rest_framework/generics.py", line 201, in get
web_1  |     return self.list(request, *args, **kwargs)
web_1  |   File "/usr/local/lib/python3.6/dist-packages/rest_framework/mixins.py", line 45, in list
web_1  |     return self.get_paginated_response(serializer.data)
web_1  |   File "/usr/local/lib/python3.6/dist-packages/rest_framework/serializers.py", line 739, in data
web_1  |     ret = super(ListSerializer, self).data
web_1  |   File "/usr/local/lib/python3.6/dist-packages/rest_framework/serializers.py", line 263, in data
web_1  |     self._data = self.to_representation(self.instance)
web_1  |   File "/usr/local/lib/python3.6/dist-packages/rest_framework/serializers.py", line 657, in to_representation
web_1  |     self.child.to_representation(item) for item in iterable
web_1  |   File "/usr/local/lib/python3.6/dist-packages/rest_framework/serializers.py", line 657, in <listcomp>
web_1  |     self.child.to_representation(item) for item in iterable
web_1  |   File "/home/patchwork/patchwork/patchwork/api/event.py", line 75, in to_representation
web_1  |     if not payload.get(field):
web_1  | AttributeError: 'str' object has no attribute 'get'
web_1  | "GET /api/events/ HTTP/1.1" 500 158617

It seems that data['payload'] is a string - is that supposed to be the
case? It looks like the migrated data is in order, here's data:

web_1  | OrderedDict([('id', 317126), ('category', 'series-completed'), ('project', OrderedDict([('id', 2), ('url', 'http://localhost:8000/api/projects/2/'), ('name', 'Kernel Team'), ('link_name', 'kernel-team'), ('list_id', 'kernel-team.lists.ubuntu.com'), ('list_email', 'kernel-team@lists.ubuntu.com'), ('web_url', ''), ('scm_url', ''), ('webscm_url', '')])), ('date', '2018-03-08T12:07:19.184178'), ('payload', '{"series": {"id": 55063, "url": null, "date": "2017-08-29T23:18:10", "name": "[1/2,Artful] clocksource/drivers/arm_arch_timer: Fix mem frame loop initialization", "version": 1, "mbox": null}}')])

Looking at that in the database looks fine too:

patchwork=# select * from patchwork_event where id = 317126;
   id   |     category     |             date              | cover_id | patch_id | project_id | series_id |                                                                                                       payload     
                                                                                                  
--------+------------------+-------------------------------+----------+----------+------------+-----------+-------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
 317126 | series-completed | 2018-03-08 01:07:19.184178+00 |          |          |          2 |     55063 | "{\"series\": {\"id\": 55063, \"url\": null, \"date\": \"2017-08-29T23:18:10\", \"name\": \"[1/2,Artful] clocksour
ce/drivers/arm_arch_timer: Fix mem frame loop initialization\", \"version\": 1, \"mbox\": null}}"
(1 row)

I had to do a merge migration so it's somewhat possible that went wrong,
but it all looks in order to me. Any ideas?

Regards,
Daniel
Stephen Finucane April 9, 2018, 4:50 p.m. UTC | #5
On Mon, 2018-04-02 at 17:10 +1000, Daniel Axtens wrote:
> Hi Stephen,
> 
> I tried applying this and migrating, and I'm getting the following
> exception trying to list events:
> 
> [snip]
> 
> I had to do a merge migration so it's somewhat possible that went wrong,
> but it all looks in order to me. Any ideas?

This was a mistake in the migration script, where I was saving escaped
JSON instead of raw JSON. I didn't see this because I was creating
enough events on the fly to cause the migrated events to drop down.
I've fixed this in v2 and validated said fix. I should post this later
this evening.

Stephen
Stephen Finucane April 10, 2018, 7:01 p.m. UTC | #6
On Thu, 2018-03-29 at 02:36 +1100, Daniel Axtens wrote:
> Stephen Finucane <stephen@that.guru> writes:
> 
> > On Wed, 2018-03-28 at 10:02 +1100, Daniel Axtens wrote:
> > > Stephen Finucane <stephen@that.guru> writes:
> > > 
> > > Hi Stephen,
> > > 
> > > > The '/event' API endpoint is really slow due to the amount of JOINs
> > > > necessary to retrieve records from the database. Resolve this by
> > > > a static JSON representation of any embedded data in the database. This
> > > > has some disadvantages, noted in the patches, but the performance
> > > > improvement is huge and users will not notice a thing.
> > > > 
> > > > As a necessary side-effect of this change, events now require the REST
> > > > API be enabled. To be honest, this should have been the case from day 1
> > > > as events are currently only exposed over this API.
> > > 
> > > How does perfomance of this compare to
> > > https://patchwork.ozlabs.org/project/patchwork/list/?series=35029 ?
> > 
> > I've no idea. I know that it removes all but one (I think) JOIN so in
> > theory it should massively improve performance. It's impossible for me
> > to test this stuff as I simply can't get my hands on a big enough
> > archive to test it. If you have an idea on how to test this, I'd
> > appreciate the help.
> 
> What I did to test it was to download the Ubuntu kernel team archive -
> https://lists.ubuntu.com/archives/kernel-team/ - the full raw archive is
> available from the top of the page and is currently 608MB. You can
> import it multiple times into different projects too, so you can scale
> up 1.2 GB pretty easily. This is what I did for my postgres tests.
> 
> I then used Django Debug Toolbar to tell me how long the SQL queries
> were taking and how much time was spent in processing. I just tested
> this on my laptop but it was enough to see the big, measurable
> differences I included in my mail.
> 
> > > This is a much bigger change than that series, so if the performance
> > > impact is similar perhaps the smaller, migration-free change would be
> > > preferable?
> > 
> > From what I'm hearing, these large JOINs are generally killing us and
> > only get worse the larger the instance is. I was envisioning the above
> > series as something that would fix performance for 2.0, while this as a
> > longer term solution than the above series that could be applied for
> > 2.1. As noted above though, I need some way to test this so any help
> > here would be appreciated.
> 
> On much reflection, my concern with this series is how we would make a
> change to the events API. As I understand it, we're going to have one
> static blob stored in the db, so if we change what constitues an event
> we're going to get a discontinuity in what's stored in the database.
> 
> For example if we add a field to the event - say for the sake of the
> argument the SHA1 the message contents - we'd need to figure something
> out. I can think of a few options, none of which thrill me:
> 
>  - We unversion the events API and push the burden to consumers to deal
>    with changes to time-series data.
> 
>  - post-process the json coming out of the database on every query. For
>    v1 consumers of new data, strip out the new field. For v2 consumers
>    of old data, we calculate the new data and supply it (possibly saving
>    the combined data so we only have to do this once per event.)
> 
>  - Or we have to do scary migrations where we read-modify-write all of
>    the many, many, previous events.
> 
> What would be your plan for dealing with a change to the event model?

I don't really have one, to be honest. However, I've avoided storing
things that will change (urls) and in v2 I am using JSON types for the
database column, where possible. This means we can use the various JSON
features of the database backend to do whatever migration steps are
necessary.

Personally, if we _did_ have to make changes to the events, I would add
a new column, 'payload_b' or similar, which either contain the slightly
modified version of the 'payload' field or would only be populated for
new events. This would mean we could tailor our serializers for various
different versions. If not this, then I'd go for option (a) above
(don't worry about versioning it).

Stephen

> Regards,
> Daniel
> > Stephen
> > 
> > > Regards,
> > > Daniel
> > > 
> > > > 
> > > > Stephen Finucane (4):
> > > >   REST: Support embedded serializers without context
> > > >   signals: Only enable events when REST API enabled
> > > >   models: Migrate event fields to JSON field
> > > >   REST: Only fetch required fields event filtering
> > > > 
> > > >  patchwork/api/embedded.py                          |  60 ++++++++---
> > > >  patchwork/api/event.py                             | 110 +++++++++++++--------
> > > >  patchwork/api/filters.py                           |   7 ++
> > > >  patchwork/fields.py                                |  32 ++++++
> > > >  .../migrations/0025_add_event_payload_field.py     |  21 ++++
> > > >  ...26_migrate_data_from_event_fields_to_payload.py |  63 ++++++++++++
> > > >  .../migrations/0027_remove_old_event_fields.py     |  34 +++++++
> > > >  patchwork/models.py                                |  28 +-----
> > > >  patchwork/signals.py                               |  70 +++++++++++--
> > > >  patchwork/tests/test_events.py                     | 110 +++++++++++++--------
> > > >  .../events-require-rest-api-47eab4a3be745f75.yaml  |   5 +
> > > >  11 files changed, 413 insertions(+), 127 deletions(-)
> > > >  create mode 100644 patchwork/migrations/0025_add_event_payload_field.py
> > > >  create mode 100644 patchwork/migrations/0026_migrate_data_from_event_fields_to_payload.py
> > > >  create mode 100644 patchwork/migrations/0027_remove_old_event_fields.py
> > > >  create mode 100644 releasenotes/notes/events-require-rest-api-47eab4a3be745f75.yaml
> > > > 
> > > > -- 
> > > > 2.14.3
> > > > 
> > > > _______________________________________________
> > > > Patchwork mailing list
> > > > Patchwork@lists.ozlabs.org
> > > > https://lists.ozlabs.org/listinfo/patchwork