Message ID | 20180325151731.29959-1-stephen@that.guru |
---|---|
Headers | show |
Series | Add 'Event.payload' field | expand |
Stephen Finucane <stephen@that.guru> writes: Hi Stephen, > The '/event' API endpoint is really slow due to the amount of JOINs > necessary to retrieve records from the database. Resolve this by > a static JSON representation of any embedded data in the database. This > has some disadvantages, noted in the patches, but the performance > improvement is huge and users will not notice a thing. > > As a necessary side-effect of this change, events now require the REST > API be enabled. To be honest, this should have been the case from day 1 > as events are currently only exposed over this API. How does perfomance of this compare to https://patchwork.ozlabs.org/project/patchwork/list/?series=35029 ? This is a much bigger change than that series, so if the performance impact is similar perhaps the smaller, migration-free change would be preferable? Regards, Daniel > > Stephen Finucane (4): > REST: Support embedded serializers without context > signals: Only enable events when REST API enabled > models: Migrate event fields to JSON field > REST: Only fetch required fields event filtering > > patchwork/api/embedded.py | 60 ++++++++--- > patchwork/api/event.py | 110 +++++++++++++-------- > patchwork/api/filters.py | 7 ++ > patchwork/fields.py | 32 ++++++ > .../migrations/0025_add_event_payload_field.py | 21 ++++ > ...26_migrate_data_from_event_fields_to_payload.py | 63 ++++++++++++ > .../migrations/0027_remove_old_event_fields.py | 34 +++++++ > patchwork/models.py | 28 +----- > patchwork/signals.py | 70 +++++++++++-- > patchwork/tests/test_events.py | 110 +++++++++++++-------- > .../events-require-rest-api-47eab4a3be745f75.yaml | 5 + > 11 files changed, 413 insertions(+), 127 deletions(-) > create mode 100644 patchwork/migrations/0025_add_event_payload_field.py > create mode 100644 patchwork/migrations/0026_migrate_data_from_event_fields_to_payload.py > create mode 100644 patchwork/migrations/0027_remove_old_event_fields.py > create mode 100644 releasenotes/notes/events-require-rest-api-47eab4a3be745f75.yaml > > -- > 2.14.3 > > _______________________________________________ > Patchwork mailing list > Patchwork@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/patchwork
On Wed, 2018-03-28 at 10:02 +1100, Daniel Axtens wrote: > Stephen Finucane <stephen@that.guru> writes: > > Hi Stephen, > > > The '/event' API endpoint is really slow due to the amount of JOINs > > necessary to retrieve records from the database. Resolve this by > > a static JSON representation of any embedded data in the database. This > > has some disadvantages, noted in the patches, but the performance > > improvement is huge and users will not notice a thing. > > > > As a necessary side-effect of this change, events now require the REST > > API be enabled. To be honest, this should have been the case from day 1 > > as events are currently only exposed over this API. > > How does perfomance of this compare to > https://patchwork.ozlabs.org/project/patchwork/list/?series=35029 ? I've no idea. I know that it removes all but one (I think) JOIN so in theory it should massively improve performance. It's impossible for me to test this stuff as I simply can't get my hands on a big enough archive to test it. If you have an idea on how to test this, I'd appreciate the help. > This is a much bigger change than that series, so if the performance > impact is similar perhaps the smaller, migration-free change would be > preferable? From what I'm hearing, these large JOINs are generally killing us and only get worse the larger the instance is. I was envisioning the above series as something that would fix performance for 2.0, while this as a longer term solution than the above series that could be applied for 2.1. As noted above though, I need some way to test this so any help here would be appreciated. Stephen > Regards, > Daniel > > > > > Stephen Finucane (4): > > REST: Support embedded serializers without context > > signals: Only enable events when REST API enabled > > models: Migrate event fields to JSON field > > REST: Only fetch required fields event filtering > > > > patchwork/api/embedded.py | 60 ++++++++--- > > patchwork/api/event.py | 110 +++++++++++++-------- > > patchwork/api/filters.py | 7 ++ > > patchwork/fields.py | 32 ++++++ > > .../migrations/0025_add_event_payload_field.py | 21 ++++ > > ...26_migrate_data_from_event_fields_to_payload.py | 63 ++++++++++++ > > .../migrations/0027_remove_old_event_fields.py | 34 +++++++ > > patchwork/models.py | 28 +----- > > patchwork/signals.py | 70 +++++++++++-- > > patchwork/tests/test_events.py | 110 +++++++++++++-------- > > .../events-require-rest-api-47eab4a3be745f75.yaml | 5 + > > 11 files changed, 413 insertions(+), 127 deletions(-) > > create mode 100644 patchwork/migrations/0025_add_event_payload_field.py > > create mode 100644 patchwork/migrations/0026_migrate_data_from_event_fields_to_payload.py > > create mode 100644 patchwork/migrations/0027_remove_old_event_fields.py > > create mode 100644 releasenotes/notes/events-require-rest-api-47eab4a3be745f75.yaml > > > > -- > > 2.14.3 > > > > _______________________________________________ > > Patchwork mailing list > > Patchwork@lists.ozlabs.org > > https://lists.ozlabs.org/listinfo/patchwork
Stephen Finucane <stephen@that.guru> writes: > On Wed, 2018-03-28 at 10:02 +1100, Daniel Axtens wrote: >> Stephen Finucane <stephen@that.guru> writes: >> >> Hi Stephen, >> >> > The '/event' API endpoint is really slow due to the amount of JOINs >> > necessary to retrieve records from the database. Resolve this by >> > a static JSON representation of any embedded data in the database. This >> > has some disadvantages, noted in the patches, but the performance >> > improvement is huge and users will not notice a thing. >> > >> > As a necessary side-effect of this change, events now require the REST >> > API be enabled. To be honest, this should have been the case from day 1 >> > as events are currently only exposed over this API. >> >> How does perfomance of this compare to >> https://patchwork.ozlabs.org/project/patchwork/list/?series=35029 ? > > I've no idea. I know that it removes all but one (I think) JOIN so in > theory it should massively improve performance. It's impossible for me > to test this stuff as I simply can't get my hands on a big enough > archive to test it. If you have an idea on how to test this, I'd > appreciate the help. What I did to test it was to download the Ubuntu kernel team archive - https://lists.ubuntu.com/archives/kernel-team/ - the full raw archive is available from the top of the page and is currently 608MB. You can import it multiple times into different projects too, so you can scale up 1.2 GB pretty easily. This is what I did for my postgres tests. I then used Django Debug Toolbar to tell me how long the SQL queries were taking and how much time was spent in processing. I just tested this on my laptop but it was enough to see the big, measurable differences I included in my mail. >> This is a much bigger change than that series, so if the performance >> impact is similar perhaps the smaller, migration-free change would be >> preferable? > > From what I'm hearing, these large JOINs are generally killing us and > only get worse the larger the instance is. I was envisioning the above > series as something that would fix performance for 2.0, while this as a > longer term solution than the above series that could be applied for > 2.1. As noted above though, I need some way to test this so any help > here would be appreciated. On much reflection, my concern with this series is how we would make a change to the events API. As I understand it, we're going to have one static blob stored in the db, so if we change what constitues an event we're going to get a discontinuity in what's stored in the database. For example if we add a field to the event - say for the sake of the argument the SHA1 the message contents - we'd need to figure something out. I can think of a few options, none of which thrill me: - We unversion the events API and push the burden to consumers to deal with changes to time-series data. - post-process the json coming out of the database on every query. For v1 consumers of new data, strip out the new field. For v2 consumers of old data, we calculate the new data and supply it (possibly saving the combined data so we only have to do this once per event.) - Or we have to do scary migrations where we read-modify-write all of the many, many, previous events. What would be your plan for dealing with a change to the event model? Regards, Daniel > Stephen > >> Regards, >> Daniel >> >> > >> > Stephen Finucane (4): >> > REST: Support embedded serializers without context >> > signals: Only enable events when REST API enabled >> > models: Migrate event fields to JSON field >> > REST: Only fetch required fields event filtering >> > >> > patchwork/api/embedded.py | 60 ++++++++--- >> > patchwork/api/event.py | 110 +++++++++++++-------- >> > patchwork/api/filters.py | 7 ++ >> > patchwork/fields.py | 32 ++++++ >> > .../migrations/0025_add_event_payload_field.py | 21 ++++ >> > ...26_migrate_data_from_event_fields_to_payload.py | 63 ++++++++++++ >> > .../migrations/0027_remove_old_event_fields.py | 34 +++++++ >> > patchwork/models.py | 28 +----- >> > patchwork/signals.py | 70 +++++++++++-- >> > patchwork/tests/test_events.py | 110 +++++++++++++-------- >> > .../events-require-rest-api-47eab4a3be745f75.yaml | 5 + >> > 11 files changed, 413 insertions(+), 127 deletions(-) >> > create mode 100644 patchwork/migrations/0025_add_event_payload_field.py >> > create mode 100644 patchwork/migrations/0026_migrate_data_from_event_fields_to_payload.py >> > create mode 100644 patchwork/migrations/0027_remove_old_event_fields.py >> > create mode 100644 releasenotes/notes/events-require-rest-api-47eab4a3be745f75.yaml >> > >> > -- >> > 2.14.3 >> > >> > _______________________________________________ >> > Patchwork mailing list >> > Patchwork@lists.ozlabs.org >> > https://lists.ozlabs.org/listinfo/patchwork
Hi Stephen, I tried applying this and migrating, and I'm getting the following exception trying to list events: web_1 | Traceback (most recent call last): web_1 | File "/usr/local/lib/python3.6/dist-packages/django/core/handlers/exception.py", line 41, in inner web_1 | response = get_response(request) web_1 | File "/usr/local/lib/python3.6/dist-packages/django/core/handlers/base.py", line 187, in _get_response web_1 | response = self.process_exception_by_middleware(e, request) web_1 | File "/usr/local/lib/python3.6/dist-packages/django/core/handlers/base.py", line 185, in _get_response web_1 | response = wrapped_callback(request, *callback_args, **callback_kwargs) web_1 | File "/usr/local/lib/python3.6/dist-packages/django/views/decorators/csrf.py", line 58, in wrapped_view web_1 | return view_func(*args, **kwargs) web_1 | File "/usr/local/lib/python3.6/dist-packages/django/views/generic/base.py", line 68, in view web_1 | return self.dispatch(request, *args, **kwargs) web_1 | File "/usr/local/lib/python3.6/dist-packages/rest_framework/views.py", line 489, in dispatch web_1 | response = self.handle_exception(exc) web_1 | File "/usr/local/lib/python3.6/dist-packages/rest_framework/views.py", line 449, in handle_exception web_1 | self.raise_uncaught_exception(exc) web_1 | File "/usr/local/lib/python3.6/dist-packages/rest_framework/views.py", line 486, in dispatch web_1 | response = handler(request, *args, **kwargs) web_1 | File "/usr/local/lib/python3.6/dist-packages/rest_framework/generics.py", line 201, in get web_1 | return self.list(request, *args, **kwargs) web_1 | File "/usr/local/lib/python3.6/dist-packages/rest_framework/mixins.py", line 45, in list web_1 | return self.get_paginated_response(serializer.data) web_1 | File "/usr/local/lib/python3.6/dist-packages/rest_framework/serializers.py", line 739, in data web_1 | ret = super(ListSerializer, self).data web_1 | File "/usr/local/lib/python3.6/dist-packages/rest_framework/serializers.py", line 263, in data web_1 | self._data = self.to_representation(self.instance) web_1 | File "/usr/local/lib/python3.6/dist-packages/rest_framework/serializers.py", line 657, in to_representation web_1 | self.child.to_representation(item) for item in iterable web_1 | File "/usr/local/lib/python3.6/dist-packages/rest_framework/serializers.py", line 657, in <listcomp> web_1 | self.child.to_representation(item) for item in iterable web_1 | File "/home/patchwork/patchwork/patchwork/api/event.py", line 75, in to_representation web_1 | if not payload.get(field): web_1 | AttributeError: 'str' object has no attribute 'get' web_1 | "GET /api/events/ HTTP/1.1" 500 158617 It seems that data['payload'] is a string - is that supposed to be the case? It looks like the migrated data is in order, here's data: web_1 | OrderedDict([('id', 317126), ('category', 'series-completed'), ('project', OrderedDict([('id', 2), ('url', 'http://localhost:8000/api/projects/2/'), ('name', 'Kernel Team'), ('link_name', 'kernel-team'), ('list_id', 'kernel-team.lists.ubuntu.com'), ('list_email', 'kernel-team@lists.ubuntu.com'), ('web_url', ''), ('scm_url', ''), ('webscm_url', '')])), ('date', '2018-03-08T12:07:19.184178'), ('payload', '{"series": {"id": 55063, "url": null, "date": "2017-08-29T23:18:10", "name": "[1/2,Artful] clocksource/drivers/arm_arch_timer: Fix mem frame loop initialization", "version": 1, "mbox": null}}')]) Looking at that in the database looks fine too: patchwork=# select * from patchwork_event where id = 317126; id | category | date | cover_id | patch_id | project_id | series_id | payload --------+------------------+-------------------------------+----------+----------+------------+-----------+------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------- 317126 | series-completed | 2018-03-08 01:07:19.184178+00 | | | 2 | 55063 | "{\"series\": {\"id\": 55063, \"url\": null, \"date\": \"2017-08-29T23:18:10\", \"name\": \"[1/2,Artful] clocksour ce/drivers/arm_arch_timer: Fix mem frame loop initialization\", \"version\": 1, \"mbox\": null}}" (1 row) I had to do a merge migration so it's somewhat possible that went wrong, but it all looks in order to me. Any ideas? Regards, Daniel
On Mon, 2018-04-02 at 17:10 +1000, Daniel Axtens wrote: > Hi Stephen, > > I tried applying this and migrating, and I'm getting the following > exception trying to list events: > > [snip] > > I had to do a merge migration so it's somewhat possible that went wrong, > but it all looks in order to me. Any ideas? This was a mistake in the migration script, where I was saving escaped JSON instead of raw JSON. I didn't see this because I was creating enough events on the fly to cause the migrated events to drop down. I've fixed this in v2 and validated said fix. I should post this later this evening. Stephen
On Thu, 2018-03-29 at 02:36 +1100, Daniel Axtens wrote: > Stephen Finucane <stephen@that.guru> writes: > > > On Wed, 2018-03-28 at 10:02 +1100, Daniel Axtens wrote: > > > Stephen Finucane <stephen@that.guru> writes: > > > > > > Hi Stephen, > > > > > > > The '/event' API endpoint is really slow due to the amount of JOINs > > > > necessary to retrieve records from the database. Resolve this by > > > > a static JSON representation of any embedded data in the database. This > > > > has some disadvantages, noted in the patches, but the performance > > > > improvement is huge and users will not notice a thing. > > > > > > > > As a necessary side-effect of this change, events now require the REST > > > > API be enabled. To be honest, this should have been the case from day 1 > > > > as events are currently only exposed over this API. > > > > > > How does perfomance of this compare to > > > https://patchwork.ozlabs.org/project/patchwork/list/?series=35029 ? > > > > I've no idea. I know that it removes all but one (I think) JOIN so in > > theory it should massively improve performance. It's impossible for me > > to test this stuff as I simply can't get my hands on a big enough > > archive to test it. If you have an idea on how to test this, I'd > > appreciate the help. > > What I did to test it was to download the Ubuntu kernel team archive - > https://lists.ubuntu.com/archives/kernel-team/ - the full raw archive is > available from the top of the page and is currently 608MB. You can > import it multiple times into different projects too, so you can scale > up 1.2 GB pretty easily. This is what I did for my postgres tests. > > I then used Django Debug Toolbar to tell me how long the SQL queries > were taking and how much time was spent in processing. I just tested > this on my laptop but it was enough to see the big, measurable > differences I included in my mail. > > > > This is a much bigger change than that series, so if the performance > > > impact is similar perhaps the smaller, migration-free change would be > > > preferable? > > > > From what I'm hearing, these large JOINs are generally killing us and > > only get worse the larger the instance is. I was envisioning the above > > series as something that would fix performance for 2.0, while this as a > > longer term solution than the above series that could be applied for > > 2.1. As noted above though, I need some way to test this so any help > > here would be appreciated. > > On much reflection, my concern with this series is how we would make a > change to the events API. As I understand it, we're going to have one > static blob stored in the db, so if we change what constitues an event > we're going to get a discontinuity in what's stored in the database. > > For example if we add a field to the event - say for the sake of the > argument the SHA1 the message contents - we'd need to figure something > out. I can think of a few options, none of which thrill me: > > - We unversion the events API and push the burden to consumers to deal > with changes to time-series data. > > - post-process the json coming out of the database on every query. For > v1 consumers of new data, strip out the new field. For v2 consumers > of old data, we calculate the new data and supply it (possibly saving > the combined data so we only have to do this once per event.) > > - Or we have to do scary migrations where we read-modify-write all of > the many, many, previous events. > > What would be your plan for dealing with a change to the event model? I don't really have one, to be honest. However, I've avoided storing things that will change (urls) and in v2 I am using JSON types for the database column, where possible. This means we can use the various JSON features of the database backend to do whatever migration steps are necessary. Personally, if we _did_ have to make changes to the events, I would add a new column, 'payload_b' or similar, which either contain the slightly modified version of the 'payload' field or would only be populated for new events. This would mean we could tailor our serializers for various different versions. If not this, then I'd go for option (a) above (don't worry about versioning it). Stephen > Regards, > Daniel > > Stephen > > > > > Regards, > > > Daniel > > > > > > > > > > > Stephen Finucane (4): > > > > REST: Support embedded serializers without context > > > > signals: Only enable events when REST API enabled > > > > models: Migrate event fields to JSON field > > > > REST: Only fetch required fields event filtering > > > > > > > > patchwork/api/embedded.py | 60 ++++++++--- > > > > patchwork/api/event.py | 110 +++++++++++++-------- > > > > patchwork/api/filters.py | 7 ++ > > > > patchwork/fields.py | 32 ++++++ > > > > .../migrations/0025_add_event_payload_field.py | 21 ++++ > > > > ...26_migrate_data_from_event_fields_to_payload.py | 63 ++++++++++++ > > > > .../migrations/0027_remove_old_event_fields.py | 34 +++++++ > > > > patchwork/models.py | 28 +----- > > > > patchwork/signals.py | 70 +++++++++++-- > > > > patchwork/tests/test_events.py | 110 +++++++++++++-------- > > > > .../events-require-rest-api-47eab4a3be745f75.yaml | 5 + > > > > 11 files changed, 413 insertions(+), 127 deletions(-) > > > > create mode 100644 patchwork/migrations/0025_add_event_payload_field.py > > > > create mode 100644 patchwork/migrations/0026_migrate_data_from_event_fields_to_payload.py > > > > create mode 100644 patchwork/migrations/0027_remove_old_event_fields.py > > > > create mode 100644 releasenotes/notes/events-require-rest-api-47eab4a3be745f75.yaml > > > > > > > > -- > > > > 2.14.3 > > > > > > > > _______________________________________________ > > > > Patchwork mailing list > > > > Patchwork@lists.ozlabs.org > > > > https://lists.ozlabs.org/listinfo/patchwork