Message ID | 20200611183455.18240-1-rohitsarkar5398@gmail.com |
---|---|
Headers | show |
Series | add parserelations management command | expand |
Hey, On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote: > The parserelations command will be used to bulk import patch relations into > Patchwork using a "patch groups" file. > The patch groups file is structured such that each line contains a > relation. > Eg patch groups file contents. > 1 3 4 > 2 > 5 9 10 > > In this case 2 relations will be ingested into Patchwork, (1,3,4) and > (5,9,10). Further group (5,9,10) also points to two upstream commit > references. > Note that before ingesting the relations the existing relations in the > DB are removed. > > v1..v2 > - remove commit references from the patch groups file. > - Update documentation > - rename some variables to remove the overloading. > - use filter and update instead of get and save to reduce the db calls. > (Visible performance enhancement) > > Leaving the copyright untouched till Ralf and Lukas comment on how to > proceed. > > Rohit Sarkar (1): > management: introduce parserelations command > > docs/deployment/management.rst | 26 +++++++ > .../management/commands/parserelations.py | 71 +++++++++++++++++++ > patchwork/tests/test_management.py | 7 ++ > 3 files changed, 104 insertions(+) > create mode 100644 patchwork/management/commands/parserelations.py > > -- > 2.23.0.385.gbc12974a89 > Just wanted to follow up on this. Does this look good? Thanks, Rohit
Rohit Sarkar <rohitsarkar5398@gmail.com> writes: > Hey, > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote: >> The parserelations command will be used to bulk import patch relations into >> Patchwork using a "patch groups" file. >> The patch groups file is structured such that each line contains a >> relation. >> Eg patch groups file contents. >> 1 3 4 >> 2 >> 5 9 10 >> >> In this case 2 relations will be ingested into Patchwork, (1,3,4) and >> (5,9,10). Further group (5,9,10) also points to two upstream commit >> references. >> Note that before ingesting the relations the existing relations in the >> DB are removed. >> >> v1..v2 >> - remove commit references from the patch groups file. >> - Update documentation >> - rename some variables to remove the overloading. >> - use filter and update instead of get and save to reduce the db calls. >> (Visible performance enhancement) >> >> Leaving the copyright untouched till Ralf and Lukas comment on how to >> proceed. >> >> Rohit Sarkar (1): >> management: introduce parserelations command >> >> docs/deployment/management.rst | 26 +++++++ >> .../management/commands/parserelations.py | 71 +++++++++++++++++++ >> patchwork/tests/test_management.py | 7 ++ >> 3 files changed, 104 insertions(+) >> create mode 100644 patchwork/management/commands/parserelations.py >> >> -- >> 2.23.0.385.gbc12974a89 >> > > Just wanted to follow up on this. Does this look good? I was thinking about this as I washed the dishes last night. Purging all relations in the database means that if any API-based user of relations emerges before the big public servers adopt a release with this, then they'll never be able to use it. What's the speed impact of doing two passes through the data, with pass 1 collecting all the projects that this touches and pass 2 actually installing the relations? Regards, Daniel > Thanks, > Rohit
Hey Daniel, > I was thinking about this as I washed the dishes last night. > > Purging all relations in the database means that if any API-based user > of relations emerges before the big public servers adopt a release with > this, then they'll never be able to use it. > > What's the speed impact of doing two passes through the data, with pass > 1 collecting all the projects that this touches and pass 2 actually > installing the relations? > > Regards, > Daniel Dont know if you read my previous email [1], where I had essentially outlined a non destructive approach to this. This way, I dont think we will need 2 passes. Thanks, Rohit [1]: https://lists.ozlabs.org/pipermail/patchwork/2020-June/006631.html
Hey, On Wed, Jun 17, 2020 at 07:55:37AM +1000, Daniel Axtens wrote: > Rohit Sarkar <rohitsarkar5398@gmail.com> writes: > > > Hey, > > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote: > >> The parserelations command will be used to bulk import patch relations into > >> Patchwork using a "patch groups" file. > >> The patch groups file is structured such that each line contains a > >> relation. > >> Eg patch groups file contents. > >> 1 3 4 > >> 2 > >> 5 9 10 > >> > >> In this case 2 relations will be ingested into Patchwork, (1,3,4) and > >> (5,9,10). Further group (5,9,10) also points to two upstream commit > >> references. > >> Note that before ingesting the relations the existing relations in the > >> DB are removed. > >> > >> v1..v2 > >> - remove commit references from the patch groups file. > >> - Update documentation > >> - rename some variables to remove the overloading. > >> - use filter and update instead of get and save to reduce the db calls. > >> (Visible performance enhancement) > >> > >> Leaving the copyright untouched till Ralf and Lukas comment on how to > >> proceed. > >> > >> Rohit Sarkar (1): > >> management: introduce parserelations command > >> > >> docs/deployment/management.rst | 26 +++++++ > >> .../management/commands/parserelations.py | 71 +++++++++++++++++++ > >> patchwork/tests/test_management.py | 7 ++ > >> 3 files changed, 104 insertions(+) > >> create mode 100644 patchwork/management/commands/parserelations.py > >> > >> -- > >> 2.23.0.385.gbc12974a89 > >> > > > > Just wanted to follow up on this. Does this look good? > > I was thinking about this as I washed the dishes last night. > > Purging all relations in the database means that if any API-based user > of relations emerges before the big public servers adopt a release with > this, then they'll never be able to use it. > > What's the speed impact of doing two passes through the data, with pass > 1 collecting all the projects that this touches and pass 2 actually > installing the relations? > > Regards, > Daniel I wanted to reopen the discussion on this with the aim being to get everyone on the same page, and either go with this approach or decide on an alternate approach. The primary goal is to have a "parserelations" command that can parse a "patch-groups" file that contains a relation on each line. There are 2 approaches: a. Refresh the relations table. The parserelations command will remove the existing relations before inserting the newly parsed relations. This avoids any conflicts between existing and new relations. However other users of the API might be affected. b. Insert relations in a non destructive way. We need a way to handle conflicts between existing and incoming relations. So the solution is not really clear in this case. What should have precedence in the case of conflicts? The parserelations script or the existing relations. Thanks, Rohit > > Thanks, > > Rohit
Rohit Sarkar <rohitsarkar5398@gmail.com> writes: > Hey, > On Wed, Jun 17, 2020 at 07:55:37AM +1000, Daniel Axtens wrote: >> Rohit Sarkar <rohitsarkar5398@gmail.com> writes: >> >> > Hey, >> > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote: >> >> The parserelations command will be used to bulk import patch relations into >> >> Patchwork using a "patch groups" file. >> >> The patch groups file is structured such that each line contains a >> >> relation. >> >> Eg patch groups file contents. >> >> 1 3 4 >> >> 2 >> >> 5 9 10 >> >> >> >> In this case 2 relations will be ingested into Patchwork, (1,3,4) and >> >> (5,9,10). Further group (5,9,10) also points to two upstream commit >> >> references. >> >> Note that before ingesting the relations the existing relations in the >> >> DB are removed. >> >> >> >> v1..v2 >> >> - remove commit references from the patch groups file. >> >> - Update documentation >> >> - rename some variables to remove the overloading. >> >> - use filter and update instead of get and save to reduce the db calls. >> >> (Visible performance enhancement) >> >> >> >> Leaving the copyright untouched till Ralf and Lukas comment on how to >> >> proceed. >> >> >> >> Rohit Sarkar (1): >> >> management: introduce parserelations command >> >> >> >> docs/deployment/management.rst | 26 +++++++ >> >> .../management/commands/parserelations.py | 71 +++++++++++++++++++ >> >> patchwork/tests/test_management.py | 7 ++ >> >> 3 files changed, 104 insertions(+) >> >> create mode 100644 patchwork/management/commands/parserelations.py >> >> >> >> -- >> >> 2.23.0.385.gbc12974a89 >> >> >> > >> > Just wanted to follow up on this. Does this look good? >> >> I was thinking about this as I washed the dishes last night. >> >> Purging all relations in the database means that if any API-based user >> of relations emerges before the big public servers adopt a release with >> this, then they'll never be able to use it. >> >> What's the speed impact of doing two passes through the data, with pass >> 1 collecting all the projects that this touches and pass 2 actually >> installing the relations? >> >> Regards, >> Daniel > I wanted to reopen the discussion on this with the aim being to get > everyone on the same page, and either go with this approach or decide on > an alternate approach. > > The primary goal is to have a "parserelations" command that can parse a > "patch-groups" file that contains a relation on each line. There are 2 > approaches: > > a. Refresh the relations table. The parserelations command will remove > the existing relations before inserting the newly parsed relations. This > avoids any conflicts between existing and new relations. However other > users of the API might be affected. > > b. Insert relations in a non destructive way. We need a way to handle > conflicts between existing and incoming relations. So the solution is > not really clear in this case. What should have precedence in the case > of conflicts? The parserelations script or the existing relations. > Right, sorry for dropping this, work got a bit crazy. It occurs to me that I have an embedded assumption that may not be true here. Are you trying to build a set of relations that includes cross-project relations, or are you just trying to build a set of relations entirely or mostly within a single patchwork project? I've been assuming that your relations are entirely or predominantly intra-project, but I'm starting to suspect that maybe my understanding is not right. If you are expecting a non-trivial quantity of cross-project relations, then my suggested approach earlier (caller of the script specifies the projects that patches may belong to) doesn't make sense. It probably makes most sense to go with approach (a) in that case - it's really difficult to figure out how to do (b) in a way that preserves the meaning that any other API user, or the PaStA tool, is trying to create. If we do go down route (a), I want to make it really hard for users to accidentally shoot themselves in the foot here and wipe out data. So probably I would say that parserelations should only insert relations if the table is empty when the script begins, and there should be a separate command that with a suitably scary name that purges all existing relations. Have I correctly understood things here? Regards, Daniel > > Thanks, > Rohit >> > Thanks, >> > Rohit
On Tue, Jun 30, 2020 at 11:59:52AM +1000, Daniel Axtens wrote: > Rohit Sarkar <rohitsarkar5398@gmail.com> writes: > > > Hey, > > On Wed, Jun 17, 2020 at 07:55:37AM +1000, Daniel Axtens wrote: > >> Rohit Sarkar <rohitsarkar5398@gmail.com> writes: > >> > >> > Hey, > >> > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote: > >> >> The parserelations command will be used to bulk import patch relations into > >> >> Patchwork using a "patch groups" file. > >> >> The patch groups file is structured such that each line contains a > >> >> relation. > >> >> Eg patch groups file contents. > >> >> 1 3 4 > >> >> 2 > >> >> 5 9 10 > >> >> > >> >> In this case 2 relations will be ingested into Patchwork, (1,3,4) and > >> >> (5,9,10). Further group (5,9,10) also points to two upstream commit > >> >> references. > >> >> Note that before ingesting the relations the existing relations in the > >> >> DB are removed. > >> >> > >> >> v1..v2 > >> >> - remove commit references from the patch groups file. > >> >> - Update documentation > >> >> - rename some variables to remove the overloading. > >> >> - use filter and update instead of get and save to reduce the db calls. > >> >> (Visible performance enhancement) > >> >> > >> >> Leaving the copyright untouched till Ralf and Lukas comment on how to > >> >> proceed. > >> >> > >> >> Rohit Sarkar (1): > >> >> management: introduce parserelations command > >> >> > >> >> docs/deployment/management.rst | 26 +++++++ > >> >> .../management/commands/parserelations.py | 71 +++++++++++++++++++ > >> >> patchwork/tests/test_management.py | 7 ++ > >> >> 3 files changed, 104 insertions(+) > >> >> create mode 100644 patchwork/management/commands/parserelations.py > >> >> > >> >> -- > >> >> 2.23.0.385.gbc12974a89 > >> >> > >> > > >> > Just wanted to follow up on this. Does this look good? > >> > >> I was thinking about this as I washed the dishes last night. > >> > >> Purging all relations in the database means that if any API-based user > >> of relations emerges before the big public servers adopt a release with > >> this, then they'll never be able to use it. > >> > >> What's the speed impact of doing two passes through the data, with pass > >> 1 collecting all the projects that this touches and pass 2 actually > >> installing the relations? > >> > >> Regards, > >> Daniel > > I wanted to reopen the discussion on this with the aim being to get > > everyone on the same page, and either go with this approach or decide on > > an alternate approach. > > > > The primary goal is to have a "parserelations" command that can parse a > > "patch-groups" file that contains a relation on each line. There are 2 > > approaches: > > > > a. Refresh the relations table. The parserelations command will remove > > the existing relations before inserting the newly parsed relations. This > > avoids any conflicts between existing and new relations. However other > > users of the API might be affected. > > > > b. Insert relations in a non destructive way. We need a way to handle > > conflicts between existing and incoming relations. So the solution is > > not really clear in this case. What should have precedence in the case > > of conflicts? The parserelations script or the existing relations. > > > > > Right, sorry for dropping this, work got a bit crazy. > > It occurs to me that I have an embedded assumption that may not be true > here. Are you trying to build a set of relations that includes > cross-project relations, or are you just trying to build a set of > relations entirely or mostly within a single patchwork project? > > I've been assuming that your relations are entirely or predominantly > intra-project, but I'm starting to suspect that maybe my understanding > is not right. The relations can be inter project too. PaStA is configured with a list of Patchwork projects for which it will pull all the corresponding patches and perform it's analyses. PaStA can relate patches across projects too. > If you are expecting a non-trivial quantity of cross-project relations, > then my suggested approach earlier (caller of the script specifies the > projects that patches may belong to) doesn't make sense. It probably > makes most sense to go with approach (a) in that case - it's really > difficult to figure out how to do (b) in a way that preserves the > meaning that any other API user, or the PaStA tool, is trying to create. > > If we do go down route (a), I want to make it really hard for users to > accidentally shoot themselves in the foot here and wipe out data. So > probably I would say that parserelations should only insert relations if > the table is empty when the script begins, and there should be a > separate command that with a suitably scary name that purges all > existing relations. I agree that we should make parserelations actions very explicit to prevent any loss of data. However, if we use parserelations to differentially import relations we will need to be able to insert relations with a non empty table. Let me explain the complete flow between PaStA and Patchwork so that things are a bit clear. We plan to run a script, maybe as a cron job at regular time intervals which does the following: 1. Exports patches from Patchwork into PaStA 2. Runs PaStA's analyses 3. Runs the parserelations script to parse the patch-groups file generated by PaStA and inserts the relations into Patchwork. In this step we refresh the db and insert all the relations afresh. A non destructive approach would involve thinking of a way to resolve conflicts between existing and new relations. > > Have I correctly understood things here? > > Regards, > Daniel > Thanks, Rohit
Rohit Sarkar <rohitsarkar5398@gmail.com> writes: > On Tue, Jun 30, 2020 at 11:59:52AM +1000, Daniel Axtens wrote: >> Rohit Sarkar <rohitsarkar5398@gmail.com> writes: >> >> > Hey, >> > On Wed, Jun 17, 2020 at 07:55:37AM +1000, Daniel Axtens wrote: >> >> Rohit Sarkar <rohitsarkar5398@gmail.com> writes: >> >> >> >> > Hey, >> >> > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote: >> >> >> The parserelations command will be used to bulk import patch relations into >> >> >> Patchwork using a "patch groups" file. >> >> >> The patch groups file is structured such that each line contains a >> >> >> relation. >> >> >> Eg patch groups file contents. >> >> >> 1 3 4 >> >> >> 2 >> >> >> 5 9 10 >> >> >> >> >> >> In this case 2 relations will be ingested into Patchwork, (1,3,4) and >> >> >> (5,9,10). Further group (5,9,10) also points to two upstream commit >> >> >> references. >> >> >> Note that before ingesting the relations the existing relations in the >> >> >> DB are removed. >> >> >> >> >> >> v1..v2 >> >> >> - remove commit references from the patch groups file. >> >> >> - Update documentation >> >> >> - rename some variables to remove the overloading. >> >> >> - use filter and update instead of get and save to reduce the db calls. >> >> >> (Visible performance enhancement) >> >> >> >> >> >> Leaving the copyright untouched till Ralf and Lukas comment on how to >> >> >> proceed. >> >> >> >> >> >> Rohit Sarkar (1): >> >> >> management: introduce parserelations command >> >> >> >> >> >> docs/deployment/management.rst | 26 +++++++ >> >> >> .../management/commands/parserelations.py | 71 +++++++++++++++++++ >> >> >> patchwork/tests/test_management.py | 7 ++ >> >> >> 3 files changed, 104 insertions(+) >> >> >> create mode 100644 patchwork/management/commands/parserelations.py >> >> >> >> >> >> -- >> >> >> 2.23.0.385.gbc12974a89 >> >> >> >> >> > >> >> > Just wanted to follow up on this. Does this look good? >> >> >> >> I was thinking about this as I washed the dishes last night. >> >> >> >> Purging all relations in the database means that if any API-based user >> >> of relations emerges before the big public servers adopt a release with >> >> this, then they'll never be able to use it. >> >> >> >> What's the speed impact of doing two passes through the data, with pass >> >> 1 collecting all the projects that this touches and pass 2 actually >> >> installing the relations? >> >> >> >> Regards, >> >> Daniel >> > I wanted to reopen the discussion on this with the aim being to get >> > everyone on the same page, and either go with this approach or decide on >> > an alternate approach. >> > >> > The primary goal is to have a "parserelations" command that can parse a >> > "patch-groups" file that contains a relation on each line. There are 2 >> > approaches: >> > >> > a. Refresh the relations table. The parserelations command will remove >> > the existing relations before inserting the newly parsed relations. This >> > avoids any conflicts between existing and new relations. However other >> > users of the API might be affected. >> > >> > b. Insert relations in a non destructive way. We need a way to handle >> > conflicts between existing and incoming relations. So the solution is >> > not really clear in this case. What should have precedence in the case >> > of conflicts? The parserelations script or the existing relations. >> > >> >> >> Right, sorry for dropping this, work got a bit crazy. >> >> It occurs to me that I have an embedded assumption that may not be true >> here. Are you trying to build a set of relations that includes >> cross-project relations, or are you just trying to build a set of >> relations entirely or mostly within a single patchwork project? >> >> I've been assuming that your relations are entirely or predominantly >> intra-project, but I'm starting to suspect that maybe my understanding >> is not right. > The relations can be inter project too. PaStA is configured with a list > of Patchwork projects for which it will pull all the corresponding > patches and perform it's analyses. PaStA can relate patches across > projects too. > >> If you are expecting a non-trivial quantity of cross-project relations, >> then my suggested approach earlier (caller of the script specifies the >> projects that patches may belong to) doesn't make sense. It probably >> makes most sense to go with approach (a) in that case - it's really >> difficult to figure out how to do (b) in a way that preserves the >> meaning that any other API user, or the PaStA tool, is trying to create. >> >> If we do go down route (a), I want to make it really hard for users to >> accidentally shoot themselves in the foot here and wipe out data. So >> probably I would say that parserelations should only insert relations if >> the table is empty when the script begins, and there should be a >> separate command that with a suitably scary name that purges all >> existing relations. > > I agree that we should make parserelations actions very explicit to > prevent any loss of data. However, if we use parserelations to > differentially import relations we will need to be able to insert > relations with a non empty table. Let me explain the complete flow > between PaStA and Patchwork so that things are a bit clear. > We plan to run a script, maybe as a cron job at regular time intervals > which does the following: > 1. Exports patches from Patchwork into PaStA > 2. Runs PaStA's analyses > 3. Runs the parserelations script to parse the patch-groups file > generated by PaStA and inserts the relations into Patchwork. In this > step we refresh the db and insert all the relations afresh. A non > destructive approach would involve thinking of a way to resolve > conflicts between existing and new relations. OK. I am happy for you to stick with a destructive approach for the managment command, so long as it's clear to the user that data is potentially about to be destroyed. I'm happy for that to be: - two commands: a command that purges relations, and then a parse command that checks if the relations table is empty and refuses to run if it is not empty. - a single command that requires you to pass a command line option like '--really-delete-old-relations' - a single command with a name that captures the idea that data will be destroyed. I don't know what that would be, maybe 'replace...'. I'm not fussy which option you go with, I just don't want a command named 'parse...' to delete data without the user being properly warned. (You will at some point need to consider how to do incremental updates over the API for a publicly usable version, but that can definitely wait and might be out of scope for your GSoC project.) Regards, Daniel > >> >> Have I correctly understood things here? >> >> Regards, >> Daniel >> > Thanks, > Rohit
On Wed, Jul 01, 2020 at 05:33:03PM +1000, Daniel Axtens wrote: > Rohit Sarkar <rohitsarkar5398@gmail.com> writes: > > > On Tue, Jun 30, 2020 at 11:59:52AM +1000, Daniel Axtens wrote: > >> Rohit Sarkar <rohitsarkar5398@gmail.com> writes: > >> > >> > Hey, > >> > On Wed, Jun 17, 2020 at 07:55:37AM +1000, Daniel Axtens wrote: > >> >> Rohit Sarkar <rohitsarkar5398@gmail.com> writes: > >> >> > >> >> > Hey, > >> >> > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote: > >> >> >> The parserelations command will be used to bulk import patch relations into > >> >> >> Patchwork using a "patch groups" file. > >> >> >> The patch groups file is structured such that each line contains a > >> >> >> relation. > >> >> >> Eg patch groups file contents. > >> >> >> 1 3 4 > >> >> >> 2 > >> >> >> 5 9 10 > >> >> >> > >> >> >> In this case 2 relations will be ingested into Patchwork, (1,3,4) and > >> >> >> (5,9,10). Further group (5,9,10) also points to two upstream commit > >> >> >> references. > >> >> >> Note that before ingesting the relations the existing relations in the > >> >> >> DB are removed. > >> >> >> > >> >> >> v1..v2 > >> >> >> - remove commit references from the patch groups file. > >> >> >> - Update documentation > >> >> >> - rename some variables to remove the overloading. > >> >> >> - use filter and update instead of get and save to reduce the db calls. > >> >> >> (Visible performance enhancement) > >> >> >> > >> >> >> Leaving the copyright untouched till Ralf and Lukas comment on how to > >> >> >> proceed. > >> >> >> > >> >> >> Rohit Sarkar (1): > >> >> >> management: introduce parserelations command > >> >> >> > >> >> >> docs/deployment/management.rst | 26 +++++++ > >> >> >> .../management/commands/parserelations.py | 71 +++++++++++++++++++ > >> >> >> patchwork/tests/test_management.py | 7 ++ > >> >> >> 3 files changed, 104 insertions(+) > >> >> >> create mode 100644 patchwork/management/commands/parserelations.py > >> >> >> > >> >> >> -- > >> >> >> 2.23.0.385.gbc12974a89 > >> >> >> > >> >> > > >> >> > Just wanted to follow up on this. Does this look good? > >> >> > >> >> I was thinking about this as I washed the dishes last night. > >> >> > >> >> Purging all relations in the database means that if any API-based user > >> >> of relations emerges before the big public servers adopt a release with > >> >> this, then they'll never be able to use it. > >> >> > >> >> What's the speed impact of doing two passes through the data, with pass > >> >> 1 collecting all the projects that this touches and pass 2 actually > >> >> installing the relations? > >> >> > >> >> Regards, > >> >> Daniel > >> > I wanted to reopen the discussion on this with the aim being to get > >> > everyone on the same page, and either go with this approach or decide on > >> > an alternate approach. > >> > > >> > The primary goal is to have a "parserelations" command that can parse a > >> > "patch-groups" file that contains a relation on each line. There are 2 > >> > approaches: > >> > > >> > a. Refresh the relations table. The parserelations command will remove > >> > the existing relations before inserting the newly parsed relations. This > >> > avoids any conflicts between existing and new relations. However other > >> > users of the API might be affected. > >> > > >> > b. Insert relations in a non destructive way. We need a way to handle > >> > conflicts between existing and incoming relations. So the solution is > >> > not really clear in this case. What should have precedence in the case > >> > of conflicts? The parserelations script or the existing relations. > >> > > >> > >> > >> Right, sorry for dropping this, work got a bit crazy. > >> > >> It occurs to me that I have an embedded assumption that may not be true > >> here. Are you trying to build a set of relations that includes > >> cross-project relations, or are you just trying to build a set of > >> relations entirely or mostly within a single patchwork project? > >> > >> I've been assuming that your relations are entirely or predominantly > >> intra-project, but I'm starting to suspect that maybe my understanding > >> is not right. > > The relations can be inter project too. PaStA is configured with a list > > of Patchwork projects for which it will pull all the corresponding > > patches and perform it's analyses. PaStA can relate patches across > > projects too. > > > >> If you are expecting a non-trivial quantity of cross-project relations, > >> then my suggested approach earlier (caller of the script specifies the > >> projects that patches may belong to) doesn't make sense. It probably > >> makes most sense to go with approach (a) in that case - it's really > >> difficult to figure out how to do (b) in a way that preserves the > >> meaning that any other API user, or the PaStA tool, is trying to create. > >> > >> If we do go down route (a), I want to make it really hard for users to > >> accidentally shoot themselves in the foot here and wipe out data. So > >> probably I would say that parserelations should only insert relations if > >> the table is empty when the script begins, and there should be a > >> separate command that with a suitably scary name that purges all > >> existing relations. > > > > I agree that we should make parserelations actions very explicit to > > prevent any loss of data. However, if we use parserelations to > > differentially import relations we will need to be able to insert > > relations with a non empty table. Let me explain the complete flow > > between PaStA and Patchwork so that things are a bit clear. > > We plan to run a script, maybe as a cron job at regular time intervals > > which does the following: > > 1. Exports patches from Patchwork into PaStA > > 2. Runs PaStA's analyses > > 3. Runs the parserelations script to parse the patch-groups file > > generated by PaStA and inserts the relations into Patchwork. In this > > step we refresh the db and insert all the relations afresh. A non > > destructive approach would involve thinking of a way to resolve > > conflicts between existing and new relations. > > OK. > > I am happy for you to stick with a destructive approach for the > managment command, so long as it's clear to the user that data is > potentially about to be destroyed. I'm happy for that to be: > > - two commands: a command that purges relations, and then a parse > command that checks if the relations table is empty and refuses to > run if it is not empty. > > - a single command that requires you to pass a command line option like > '--really-delete-old-relations' > > - a single command with a name that captures the idea that data will be > destroyed. I don't know what that would be, maybe 'replace...'. > > I'm not fussy which option you go with, I just don't want a command > named 'parse...' to delete data without the user being properly warned. Yeah, I agree. The name "parserelations" doesnt make it explicit enough. I am fine with replacerelations. > (You will at some point need to consider how to do incremental updates > over the API for a publicly usable version, but that can definitely wait > and might be out of scope for your GSoC project.) Agreed. > Regards, > Daniel > > > > >> > >> Have I correctly understood things here? > >> > >> Regards, > >> Daniel > >> > > Thanks, > > Rohit Thanks Rohit
Rohit Sarkar <rohitsarkar5398@gmail.com> writes: > On Wed, Jul 01, 2020 at 05:33:03PM +1000, Daniel Axtens wrote: >> Rohit Sarkar <rohitsarkar5398@gmail.com> writes: >> >> > On Tue, Jun 30, 2020 at 11:59:52AM +1000, Daniel Axtens wrote: >> >> Rohit Sarkar <rohitsarkar5398@gmail.com> writes: >> >> >> >> > Hey, >> >> > On Wed, Jun 17, 2020 at 07:55:37AM +1000, Daniel Axtens wrote: >> >> >> Rohit Sarkar <rohitsarkar5398@gmail.com> writes: >> >> >> >> >> >> > Hey, >> >> >> > On Fri, Jun 12, 2020 at 12:04:54AM +0530, Rohit Sarkar wrote: >> >> >> >> The parserelations command will be used to bulk import patch relations into >> >> >> >> Patchwork using a "patch groups" file. >> >> >> >> The patch groups file is structured such that each line contains a >> >> >> >> relation. >> >> >> >> Eg patch groups file contents. >> >> >> >> 1 3 4 >> >> >> >> 2 >> >> >> >> 5 9 10 >> >> >> >> >> >> >> >> In this case 2 relations will be ingested into Patchwork, (1,3,4) and >> >> >> >> (5,9,10). Further group (5,9,10) also points to two upstream commit >> >> >> >> references. >> >> >> >> Note that before ingesting the relations the existing relations in the >> >> >> >> DB are removed. >> >> >> >> >> >> >> >> v1..v2 >> >> >> >> - remove commit references from the patch groups file. >> >> >> >> - Update documentation >> >> >> >> - rename some variables to remove the overloading. >> >> >> >> - use filter and update instead of get and save to reduce the db calls. >> >> >> >> (Visible performance enhancement) >> >> >> >> >> >> >> >> Leaving the copyright untouched till Ralf and Lukas comment on how to >> >> >> >> proceed. >> >> >> >> >> >> >> >> Rohit Sarkar (1): >> >> >> >> management: introduce parserelations command >> >> >> >> >> >> >> >> docs/deployment/management.rst | 26 +++++++ >> >> >> >> .../management/commands/parserelations.py | 71 +++++++++++++++++++ >> >> >> >> patchwork/tests/test_management.py | 7 ++ >> >> >> >> 3 files changed, 104 insertions(+) >> >> >> >> create mode 100644 patchwork/management/commands/parserelations.py >> >> >> >> >> >> >> >> -- >> >> >> >> 2.23.0.385.gbc12974a89 >> >> >> >> >> >> >> > >> >> >> > Just wanted to follow up on this. Does this look good? >> >> >> >> >> >> I was thinking about this as I washed the dishes last night. >> >> >> >> >> >> Purging all relations in the database means that if any API-based user >> >> >> of relations emerges before the big public servers adopt a release with >> >> >> this, then they'll never be able to use it. >> >> >> >> >> >> What's the speed impact of doing two passes through the data, with pass >> >> >> 1 collecting all the projects that this touches and pass 2 actually >> >> >> installing the relations? >> >> >> >> >> >> Regards, >> >> >> Daniel >> >> > I wanted to reopen the discussion on this with the aim being to get >> >> > everyone on the same page, and either go with this approach or decide on >> >> > an alternate approach. >> >> > >> >> > The primary goal is to have a "parserelations" command that can parse a >> >> > "patch-groups" file that contains a relation on each line. There are 2 >> >> > approaches: >> >> > >> >> > a. Refresh the relations table. The parserelations command will remove >> >> > the existing relations before inserting the newly parsed relations. This >> >> > avoids any conflicts between existing and new relations. However other >> >> > users of the API might be affected. >> >> > >> >> > b. Insert relations in a non destructive way. We need a way to handle >> >> > conflicts between existing and incoming relations. So the solution is >> >> > not really clear in this case. What should have precedence in the case >> >> > of conflicts? The parserelations script or the existing relations. >> >> > >> >> >> >> >> >> Right, sorry for dropping this, work got a bit crazy. >> >> >> >> It occurs to me that I have an embedded assumption that may not be true >> >> here. Are you trying to build a set of relations that includes >> >> cross-project relations, or are you just trying to build a set of >> >> relations entirely or mostly within a single patchwork project? >> >> >> >> I've been assuming that your relations are entirely or predominantly >> >> intra-project, but I'm starting to suspect that maybe my understanding >> >> is not right. >> > The relations can be inter project too. PaStA is configured with a list >> > of Patchwork projects for which it will pull all the corresponding >> > patches and perform it's analyses. PaStA can relate patches across >> > projects too. >> > >> >> If you are expecting a non-trivial quantity of cross-project relations, >> >> then my suggested approach earlier (caller of the script specifies the >> >> projects that patches may belong to) doesn't make sense. It probably >> >> makes most sense to go with approach (a) in that case - it's really >> >> difficult to figure out how to do (b) in a way that preserves the >> >> meaning that any other API user, or the PaStA tool, is trying to create. >> >> >> >> If we do go down route (a), I want to make it really hard for users to >> >> accidentally shoot themselves in the foot here and wipe out data. So >> >> probably I would say that parserelations should only insert relations if >> >> the table is empty when the script begins, and there should be a >> >> separate command that with a suitably scary name that purges all >> >> existing relations. >> > >> > I agree that we should make parserelations actions very explicit to >> > prevent any loss of data. However, if we use parserelations to >> > differentially import relations we will need to be able to insert >> > relations with a non empty table. Let me explain the complete flow >> > between PaStA and Patchwork so that things are a bit clear. >> > We plan to run a script, maybe as a cron job at regular time intervals >> > which does the following: >> > 1. Exports patches from Patchwork into PaStA >> > 2. Runs PaStA's analyses >> > 3. Runs the parserelations script to parse the patch-groups file >> > generated by PaStA and inserts the relations into Patchwork. In this >> > step we refresh the db and insert all the relations afresh. A non >> > destructive approach would involve thinking of a way to resolve >> > conflicts between existing and new relations. >> >> OK. >> >> I am happy for you to stick with a destructive approach for the >> managment command, so long as it's clear to the user that data is >> potentially about to be destroyed. I'm happy for that to be: >> >> - two commands: a command that purges relations, and then a parse >> command that checks if the relations table is empty and refuses to >> run if it is not empty. >> >> - a single command that requires you to pass a command line option like >> '--really-delete-old-relations' >> >> - a single command with a name that captures the idea that data will be >> destroyed. I don't know what that would be, maybe 'replace...'. >> >> I'm not fussy which option you go with, I just don't want a command >> named 'parse...' to delete data without the user being properly warned. > Yeah, I agree. The name "parserelations" doesnt make it explicit enough. > I am fine with replacerelations. > Ok cool! If you send a patch with this name and with tests, then - assuming I don't have any code-quality nits - I'd be happy to take it. Regards, Daneil >> (You will at some point need to consider how to do incremental updates >> over the API for a publicly usable version, but that can definitely wait >> and might be out of scope for your GSoC project.) > Agreed. > >> Regards, >> Daniel >> >> > >> >> >> >> Have I correctly understood things here? >> >> >> >> Regards, >> >> Daniel >> >> >> > Thanks, >> > Rohit > Thanks > Rohit