Message ID | 20180709161022.32622-1-vkabatov@redhat.com |
---|---|
Headers | show |
Series | Rework tagging infrastructure | expand |
On Mon, 2018-07-09 at 18:10 +0200, vkabatov@redhat.com wrote: > From: Veronika Kabatova <vkabatov@redhat.com> > > I was trying to send the full patch for a while now, but never saw it arrive to > the mailing list. Using a different email (from entirely different domain) > didn't help, so I'm thinking it's because of the size of the message. Hence I'm > now sending it again, this time split into two parts, in hopes it will arrive > (feel free to merge the patches when applying). About time we got going on this :) I've left my review comments but I wasn't able to run this locally and inspect the impact on the DB due to merge conflicts. Could I ask that you address what you can and rebase this onto the 'Convert Series-Patch relationship to 1:N' series I posted recently, as that will impact this series significantly. If you can, I'll apply locally and help you work through the DB issues that I'm pretty sure we're going to see with the patch as-is right now. Cheers, Stephen > The original cover letter for context follows. > > This patch attempts to solve GitHub issues #57 [1] and #113 [2] as well as some > other things we encountered. > > Right now, tags are extracted from all the comments on the patch and the patch > itself, and they are reextracted from all the sources every time a comment is > added or removed. It makes saving slower, and might contribute to races with > writes to database when we are parsing multiple emails at the same time. This > gets even more prevalent if we want to solve the issue #113 (tags on cover > letter should increment counters on every patch in series) -- for each added > comment on the cover letter, we would reextract tags from all the other sources, > for each patch in series; and for a change on comments related to the patch > directly, we would need to take the tags on the cover letter and it's comments > into account as well (I implemented this solution in my fork but I really > don't like it). > > The current approach has several other issues as well, some of which are > mentioned in the issue #57, eg duplicate tags are counted more times. Taking > into account the tags on cover letters would also be easier if we could store > tags against them and just query them on demand, instead of reextracting > everything all over again. > > Solutions for some other things we found missing solve the issues mentioned > above too. If we want to determine if the tag is duplicate, we need to save the > associated value. Having the value would help us to use arbitrary strings as > tags (for example links to issue trackers, like `Bugzilla: <link>` if the patch > solves a known bug). The key-values approach to storing tags is mentioned [3], > this email additionally mentions a comments REST API (currently worked on). For > the comments API we would also find it very useful to have the tags extracted > from the comments available directly so we can query for them, which means we > would either need to reextract the tags on every API call, or we could store > the comment data with the tags as they are extracted and only query them as > needed. Altogether, we would get rid of the `patch_responses` property used > when converting comments to mbox (we finally get all the custom tags there > instead of only the few hardcoded ones too). > > TL;DR: > Our goals: > - Avoid tag reextraction with each added comment > - Fix issues #57 and #113 > - Prepare tags addition to comments in the API > - Add tags to patch (currently returns {}) and cover letter APIs > > > [1] https://github.com/getpatchwork/patchwork/issues/57 > [2] https://github.com/getpatchwork/patchwork/issues/113 > [3] https://lists.ozlabs.org/pipermail/patchwork/2018-January/004741.html >
On Tue, 2018-09-11 at 12:03 -0600, Stephen Finucane wrote: > On Mon, 2018-07-09 at 18:10 +0200, vkabatov@redhat.com wrote: > > From: Veronika Kabatova <vkabatov@redhat.com> > > > > I was trying to send the full patch for a while now, but never saw it arrive to > > the mailing list. Using a different email (from entirely different domain) > > didn't help, so I'm thinking it's because of the size of the message. Hence I'm > > now sending it again, this time split into two parts, in hopes it will arrive > > (feel free to merge the patches when applying). > > About time we got going on this :) I've left my review comments but I > wasn't able to run this locally and inspect the impact on the DB due to > merge conflicts. Could I ask that you address what you can and rebase > this onto the 'Convert Series-Patch relationship to 1:N' series I > posted recently, as that will impact this series significantly. If you > can, I'll apply locally and help you work through the DB issues that > I'm pretty sure we're going to see with the patch as-is right now. I've actually gone and rebased this + reworked per the aforementioned series. You can find the end result here: https://github.com/stephenfin/patchwork/tree/review/rework-tagging-infrastructure This should be a good starting point, assuming you want to avoid having to rebase months old code yourself :) However, as expected, the current implementation hammers the DB when listing patches, so I think we need to rework this further. I'm still thinking about how to do this and would welcome input. Stephen > Cheers, > Stephen > > > The original cover letter for context follows. > > > > This patch attempts to solve GitHub issues #57 [1] and #113 [2] as well as some > > other things we encountered. > > > > Right now, tags are extracted from all the comments on the patch and the patch > > itself, and they are reextracted from all the sources every time a comment is > > added or removed. It makes saving slower, and might contribute to races with > > writes to database when we are parsing multiple emails at the same time. This > > gets even more prevalent if we want to solve the issue #113 (tags on cover > > letter should increment counters on every patch in series) -- for each added > > comment on the cover letter, we would reextract tags from all the other sources, > > for each patch in series; and for a change on comments related to the patch > > directly, we would need to take the tags on the cover letter and it's comments > > into account as well (I implemented this solution in my fork but I really > > don't like it). > > > > The current approach has several other issues as well, some of which are > > mentioned in the issue #57, eg duplicate tags are counted more times. Taking > > into account the tags on cover letters would also be easier if we could store > > tags against them and just query them on demand, instead of reextracting > > everything all over again. > > > > Solutions for some other things we found missing solve the issues mentioned > > above too. If we want to determine if the tag is duplicate, we need to save the > > associated value. Having the value would help us to use arbitrary strings as > > tags (for example links to issue trackers, like `Bugzilla: <link>` if the patch > > solves a known bug). The key-values approach to storing tags is mentioned [3], > > this email additionally mentions a comments REST API (currently worked on). For > > the comments API we would also find it very useful to have the tags extracted > > from the comments available directly so we can query for them, which means we > > would either need to reextract the tags on every API call, or we could store > > the comment data with the tags as they are extracted and only query them as > > needed. Altogether, we would get rid of the `patch_responses` property used > > when converting comments to mbox (we finally get all the custom tags there > > instead of only the few hardcoded ones too). > > > > TL;DR: > > Our goals: > > - Avoid tag reextraction with each added comment > > - Fix issues #57 and #113 > > - Prepare tags addition to comments in the API > > - Add tags to patch (currently returns {}) and cover letter APIs > > > > > > [1] https://github.com/getpatchwork/patchwork/issues/57 > > [2] https://github.com/getpatchwork/patchwork/issues/113 > > [3] https://lists.ozlabs.org/pipermail/patchwork/2018-January/004741.html > > > > > _______________________________________________ > Patchwork mailing list > Patchwork@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/patchwork
From: Veronika Kabatova <vkabatov@redhat.com> I was trying to send the full patch for a while now, but never saw it arrive to the mailing list. Using a different email (from entirely different domain) didn't help, so I'm thinking it's because of the size of the message. Hence I'm now sending it again, this time split into two parts, in hopes it will arrive (feel free to merge the patches when applying). The original cover letter for context follows. This patch attempts to solve GitHub issues #57 [1] and #113 [2] as well as some other things we encountered. Right now, tags are extracted from all the comments on the patch and the patch itself, and they are reextracted from all the sources every time a comment is added or removed. It makes saving slower, and might contribute to races with writes to database when we are parsing multiple emails at the same time. This gets even more prevalent if we want to solve the issue #113 (tags on cover letter should increment counters on every patch in series) -- for each added comment on the cover letter, we would reextract tags from all the other sources, for each patch in series; and for a change on comments related to the patch directly, we would need to take the tags on the cover letter and it's comments into account as well (I implemented this solution in my fork but I really don't like it). The current approach has several other issues as well, some of which are mentioned in the issue #57, eg duplicate tags are counted more times. Taking into account the tags on cover letters would also be easier if we could store tags against them and just query them on demand, instead of reextracting everything all over again. Solutions for some other things we found missing solve the issues mentioned above too. If we want to determine if the tag is duplicate, we need to save the associated value. Having the value would help us to use arbitrary strings as tags (for example links to issue trackers, like `Bugzilla: <link>` if the patch solves a known bug). The key-values approach to storing tags is mentioned [3], this email additionally mentions a comments REST API (currently worked on). For the comments API we would also find it very useful to have the tags extracted from the comments available directly so we can query for them, which means we would either need to reextract the tags on every API call, or we could store the comment data with the tags as they are extracted and only query them as needed. Altogether, we would get rid of the `patch_responses` property used when converting comments to mbox (we finally get all the custom tags there instead of only the few hardcoded ones too). TL;DR: Our goals: - Avoid tag reextraction with each added comment - Fix issues #57 and #113 - Prepare tags addition to comments in the API - Add tags to patch (currently returns {}) and cover letter APIs [1] https://github.com/getpatchwork/patchwork/issues/57 [2] https://github.com/getpatchwork/patchwork/issues/113 [3] https://lists.ozlabs.org/pipermail/patchwork/2018-January/004741.html