Message ID | 20200729185024.121766-1-abologna@redhat.com |
---|---|
State | New |
Headers | show |
Series | schemas: Add vim modeline | expand |
Andrea Bolognani <abologna@redhat.com> writes: > The various schemas included in QEMU use a JSON-based format which > is, however, strictly speaking not valid JSON. > > As a consequence, when vim tries to apply syntax highlight rules > for JSON (as guessed from the file name), the result is an unreadable > mess which mostly consist of red markers pointing out supposed errors > in, well, pretty much everything. > > Using Python syntax highlighting produces much better results, and > in fact these files already start with specially-formatted comments > that instruct Emacs to process them as if they were Python files. > > This commit adds the equivalent special comments for vim. > > Signed-off-by: Andrea Bolognani <abologna@redhat.com> Naming QAPI schema files .json even though their contents isn't was a mistake. Correcting it would be a pain. If we correct it, then the sooner the better. Renaming them to .py gives decent editor support out of the box. Their contents isn't quite Python, though: true vs. True, false vs. False. Do we care? Only a few dozen occurences; they could be adjusted. Renaming them to .qapi would perhaps be less confusing, for the price of "out of the box". Thoughts?
On Thu, Jul 30, 2020 at 11:07:26AM +0200, Markus Armbruster wrote: > Andrea Bolognani <abologna@redhat.com> writes: > > > The various schemas included in QEMU use a JSON-based format which > > is, however, strictly speaking not valid JSON. > > > > As a consequence, when vim tries to apply syntax highlight rules > > for JSON (as guessed from the file name), the result is an unreadable > > mess which mostly consist of red markers pointing out supposed errors > > in, well, pretty much everything. > > > > Using Python syntax highlighting produces much better results, and > > in fact these files already start with specially-formatted comments > > that instruct Emacs to process them as if they were Python files. > > > > This commit adds the equivalent special comments for vim. > > > > Signed-off-by: Andrea Bolognani <abologna@redhat.com> Given that we already have emacs mode-lines, I see no reason to not also have vim mode-lines, so regardless of the deeper discussion I think this is patch is fine to merge in the short term Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> > Naming QAPI schema files .json even though their contents isn't was a > mistake. Correcting it would be a pain. If we correct it, then the > sooner the better. > > Renaming them to .py gives decent editor support out of the box. Their > contents isn't quite Python, though: true vs. True, false vs. False. Do > we care? Only a few dozen occurences; they could be adjusted. > > Renaming them to .qapi would perhaps be less confusing, for the price of > "out of the box". IMHO, the critical rule is that if you a pick a particular file extension associated with an existing language, you absolutely MUST BE compliant with that language. We fail at compliance with both JSON and Python because we're actually using our own DSL (domain specific language). IOW if we're going to stick with our current file format, then we should be naming them .qapi. We can still use an editor mode line if we want to claim we're approximately equiv to another language, but we can't be surprised if editors get upset. The bigger question is whether having our own DSL is justified ? I'm *really* sceptical that it is. We can't use JSON because it lacks comments. So we invented our own psuedo-JSON parser that supported comments, and used ' instead of " for some reason. We also needed to be able to parse a sequence of multiple JSON documents in one file. We should have just picked a different language because JSON clearly didn't do what we eneeded. You suggest naming them .py. If we do that, we must declare that they are literally Python code and modify them so that we can load the files straight into the python intepretor as code, and not parse them as data. I feel unhappy about treating data as code though. While JSON doesn't do what we need, its second-cousin YAML is a more flexible format. Taking one example --- ## # @ImageInfoSpecificQCow2: # # @compat: compatibility level # # ...snip... # # Since: 1.7 ## struct: ImageInfoSpecificQCow2 data: compat: str "*data-file": str "*data-file-raw": bool "*lazy-refcounts": bool "*corrupt": bool refcount-bits: int "*encrypt": ImageInfoSpecificQCow2Encryption "*bitmaps": - Qcow2BitmapInfo compression-type: Qcow2CompressionType Then we could use a regular off the shelf YAML parser in python. The uglyiness with quotes is due to the use of "*". Slightly less ugly if we simply declare that quotes are always used, even where they're not strictly required. struct: ImageInfoSpecificQCow2 data: "compat": "str" "*data-file": "str" "*data-file-raw": "bool" "*lazy-refcounts": "bool" "*corrupt": "bool" "refcount-bits": "int" "*encrypt": "ImageInfoSpecificQCow2Encryption" "*bitmaps": - "Qcow2BitmapInfo" "compression-type": "Qcow2CompressionType" With the use of "---" to denote the start of document, we have no trouble parsing our files which would actually be a concatenation of multiple documents. The python YAML library provides the easy yaml.load_all() method. Regards, Daniel
Daniel P. Berrangé <berrange@redhat.com> writes: > On Thu, Jul 30, 2020 at 11:07:26AM +0200, Markus Armbruster wrote: >> Andrea Bolognani <abologna@redhat.com> writes: >> >> > The various schemas included in QEMU use a JSON-based format which >> > is, however, strictly speaking not valid JSON. >> > >> > As a consequence, when vim tries to apply syntax highlight rules >> > for JSON (as guessed from the file name), the result is an unreadable >> > mess which mostly consist of red markers pointing out supposed errors >> > in, well, pretty much everything. >> > >> > Using Python syntax highlighting produces much better results, and >> > in fact these files already start with specially-formatted comments >> > that instruct Emacs to process them as if they were Python files. >> > >> > This commit adds the equivalent special comments for vim. >> > >> > Signed-off-by: Andrea Bolognani <abologna@redhat.com> > > Given that we already have emacs mode-lines, I see no reason to > not also have vim mode-lines, so regardless of the deeper discussion > I think this is patch is fine to merge in the short term > > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> > > >> Naming QAPI schema files .json even though their contents isn't was a >> mistake. Correcting it would be a pain. If we correct it, then the >> sooner the better. >> >> Renaming them to .py gives decent editor support out of the box. Their >> contents isn't quite Python, though: true vs. True, false vs. False. Do >> we care? Only a few dozen occurences; they could be adjusted. >> >> Renaming them to .qapi would perhaps be less confusing, for the price of >> "out of the box". > > IMHO, the critical rule is that if you a pick a particular file extension > associated with an existing language, you absolutely MUST BE compliant > with that language. Can't argue with that :) > We fail at compliance with both JSON and Python because we're actually > using our own DSL (domain specific language). > > IOW if we're going to stick with our current file format, then we should > be naming them .qapi. We can still use an editor mode line if we want to > claim we're approximately equiv to another language, but we can't be > surprised if editors get upset. > > > The bigger question is whether having our own DSL is justified ? > > I'm *really* sceptical that it is. To be precise: our own DSL *syntax*. Semantics is a separate question to be skeptical about. > We can't use JSON because it lacks comments. So we invented our own > psuedo-JSON parser that supported comments, and used ' instead of " > for some reason. We also needed to be able to parse a sequence of > multiple JSON documents in one file. We should have just picked a > different language because JSON clearly didn't do what we eneeded. JSON is a exceptionally poor choice for a DSL, or even a configuration language. Correcting our mistake involves a flag day and a re-learn. We need to weigh costs against benefits. The QAPI schema language has two layers: * JSON, with a lexical and a syntactical sub-layer (both in parser.py) * QAPI, with a context-free and a context-dependend sub-layer (in expr.py and schema.py, respectively) Replacing the JSON layer is possible as long as the replacement is sufficiently expressive (not a tall order). > You suggest naming them .py. If we do that, we must declare that they > are literally Python code Agree. > modify them so that we can load the > files straight into the python intepretor as code, and not parse > them as data. I feel unhappy about treating data as code though. Stress on *can* load. Doesn't mean we should. Ancient prior art: Lisp programs routinely use s-expressions as configuration file syntax. They don't load them as code, they read them as data. With Python, it's ast.parse(), I think. > While JSON doesn't do what we need, its second-cousin YAML is a more > flexible format. Taking one example > > --- > ## > # @ImageInfoSpecificQCow2: > # > # @compat: compatibility level > # > # ...snip... > # > # Since: 1.7 > ## > struct: ImageInfoSpecificQCow2 > data: > compat: str > "*data-file": str > "*data-file-raw": bool > "*lazy-refcounts": bool > "*corrupt": bool > refcount-bits: int > "*encrypt": ImageInfoSpecificQCow2Encryption > "*bitmaps": > - Qcow2BitmapInfo > compression-type: Qcow2CompressionType > > > Then we could use a regular off the shelf YAML parser in python. > > The uglyiness with quotes is due to the use of "*". Slightly less ugly > if we simply declare that quotes are always used, even where they're > not strictly required. StrictYAML insists on quotes. I hate having to quote identifiers. There's a reason we don't write 'int' 'main'('int', 'argc', 'char' *'argv'[]) { 'printf'("hello world\n"); return 0; } > struct: ImageInfoSpecificQCow2 > data: > "compat": "str" > "*data-file": "str" > "*data-file-raw": "bool" > "*lazy-refcounts": "bool" > "*corrupt": "bool" > "refcount-bits": "int" > "*encrypt": "ImageInfoSpecificQCow2Encryption" > "*bitmaps": > - "Qcow2BitmapInfo" > "compression-type": "Qcow2CompressionType" > > With the use of "---" to denote the start of document, we have no trouble > parsing our files which would actually be a concatenation of multiple > documents. The python YAML library provides the easy yaml.load_all() > method. Required reading on YAML: https://www.arp242.net/yaml-config.html Some of the criticism there doesn't matter for our use case.
Andrea Bolognani <abologna@redhat.com> writes: > The various schemas included in QEMU use a JSON-based format which > is, however, strictly speaking not valid JSON. > > As a consequence, when vim tries to apply syntax highlight rules > for JSON (as guessed from the file name), the result is an unreadable > mess which mostly consist of red markers pointing out supposed errors > in, well, pretty much everything. > > Using Python syntax highlighting produces much better results, and > in fact these files already start with specially-formatted comments > that instruct Emacs to process them as if they were Python files. > > This commit adds the equivalent special comments for vim. > > Signed-off-by: Andrea Bolognani <abologna@redhat.com> Fixing the real mistake would be better, but until then, mitigating it helps, like the existing Emacs modelines do. Queued, thanks!
On Thu, Jul 30, 2020 at 01:51:10PM +0200, Markus Armbruster wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > modify them so that we can load the > > files straight into the python intepretor as code, and not parse > > them as data. I feel unhappy about treating data as code though. > > Stress on *can* load. Doesn't mean we should. > > Ancient prior art: Lisp programs routinely use s-expressions as > configuration file syntax. They don't load them as code, they read them > as data. > > With Python, it's ast.parse(), I think. Yes, that could work > > struct: ImageInfoSpecificQCow2 > > data: > > compat: str > > "*data-file": str > > "*data-file-raw": bool > > "*lazy-refcounts": bool > > "*corrupt": bool > > refcount-bits: int > > "*encrypt": ImageInfoSpecificQCow2Encryption > > "*bitmaps": > > - Qcow2BitmapInfo > > compression-type: Qcow2CompressionType > > > > > > Then we could use a regular off the shelf YAML parser in python. > > > > The uglyiness with quotes is due to the use of "*". Slightly less ugly > > if we simply declare that quotes are always used, even where they're > > not strictly required. > > StrictYAML insists on quotes. I wouldn't suggest StrictYAML, just normal YAML is what pretty much everyone uses. If we came up with a different way to mark a field as optional instead of using the magic "*" then we wouldn't need to quote anything > I hate having to quote identifiers. There's a reason we don't write > > 'int' > 'main'('int', 'argc', 'char' *'argv'[]) > { > 'printf'("hello world\n"); > return 0; > } > > > struct: ImageInfoSpecificQCow2 > > data: > > "compat": "str" > > "*data-file": "str" > > "*data-file-raw": "bool" > > "*lazy-refcounts": "bool" > > "*corrupt": "bool" > > "refcount-bits": "int" > > "*encrypt": "ImageInfoSpecificQCow2Encryption" > > "*bitmaps": > > - "Qcow2BitmapInfo" > > "compression-type": "Qcow2CompressionType" > > > > With the use of "---" to denote the start of document, we have no trouble > > parsing our files which would actually be a concatenation of multiple > > documents. The python YAML library provides the easy yaml.load_all() > > method. > > Required reading on YAML: > https://www.arp242.net/yaml-config.html I don't think this is especially helpful to our evaluation. You can write such blog posts about pretty much any thing if you want to pick holes in a proposal. Certainly there's plenty of awful stuff you can write about JSON, and Python. > Some of the criticism there doesn't matter for our use case. Yeah, what matters is whether it can do the job we need in a way that is better than what we have today, and whether there are any further options to consider that might be viable alternatives. Regards, Daniel
On 7/30/20 6:51 AM, Markus Armbruster wrote: >> Given that we already have emacs mode-lines, I see no reason to >> not also have vim mode-lines, so regardless of the deeper discussion >> I think this is patch is fine to merge in the short term >> >> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Agreed on that front. >> >> >>> Naming QAPI schema files .json even though their contents isn't was a >>> mistake. Correcting it would be a pain. If we correct it, then the >>> sooner the better. >>> >>> Renaming them to .py gives decent editor support out of the box. Their >>> contents isn't quite Python, though: true vs. True, false vs. False. Do >>> we care? Only a few dozen occurences; they could be adjusted. >>> >>> Renaming them to .qapi would perhaps be less confusing, for the price of >>> "out of the box". I've tried that patch in the past, but it went nowhere at the time. https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg03173.html Rebasing it to the present would be a complete rewrite, but I'm still willing to do it if we think it will go somewhere this time. >> >> IMHO, the critical rule is that if you a pick a particular file extension >> associated with an existing language, you absolutely MUST BE compliant >> with that language. > > Can't argue with that :) Also in violent agreement. > >> We fail at compliance with both JSON and Python because we're actually >> using our own DSL (domain specific language). >> >> IOW if we're going to stick with our current file format, then we should >> be naming them .qapi. We can still use an editor mode line if we want to >> claim we're approximately equiv to another language, but we can't be >> surprised if editors get upset. >> >> >> The bigger question is whether having our own DSL is justified ? >> >> I'm *really* sceptical that it is. > > To be precise: our own DSL *syntax*. Semantics is a separate question > to be skeptical about. > >> We can't use JSON because it lacks comments. So we invented our own >> psuedo-JSON parser that supported comments, and used ' instead of " >> for some reason. We also needed to be able to parse a sequence of >> multiple JSON documents in one file. We should have just picked a >> different language because JSON clearly didn't do what we eneeded. > > JSON is a exceptionally poor choice for a DSL, or even a configuration > language. > > Correcting our mistake involves a flag day and a re-learn. We need to > weigh costs against benefits. How much of that can be automated with tooling? Yes, a re-learn is needed, but if a tool can convert between the two syntaces with minimal pain, the re-learn (in both directions: rebasing patches written pre-change for merge after the change lands upstream, and backporting patches post-change to trees that are pre-change) is not as painful. > > The QAPI schema language has two layers: > > * JSON, with a lexical and a syntactical sub-layer (both in parser.py) > > * QAPI, with a context-free and a context-dependend sub-layer (in > expr.py and schema.py, respectively) > > Replacing the JSON layer is possible as long as the replacement is > sufficiently expressive (not a tall order). I'm open to the idea, if we want to attempt it, and agree with the assessment that it is not a tall order. In fact, if we were to go with the JSON5 language instead of JSON, we are already that much closer to having sufficiently expressive (although JSON5 does not seem to be as popular in terms of pre-written libraries). > >> You suggest naming them .py. If we do that, we must declare that they >> are literally Python code > > Agree. I'm not necessarily a fan of .py for QAPI files; it makes me think of executable python code, even if we chose it merely for something that python could parse natively instead of rolling our own parser. > >> modify them so that we can load the >> files straight into the python intepretor as code, and not parse >> them as data. I feel unhappy about treating data as code though. > > Stress on *can* load. Doesn't mean we should. > > Ancient prior art: Lisp programs routinely use s-expressions as > configuration file syntax. They don't load them as code, they read them > as data. > > With Python, it's ast.parse(), I think. > >> While JSON doesn't do what we need, its second-cousin YAML is a more >> flexible format. Taking one example >> > StrictYAML insists on quotes. > > I hate having to quote identifiers. There's a reason we don't write > > 'int' > 'main'('int', 'argc', 'char' *'argv'[]) > { > 'printf'("hello world\n"); > return 0; > } > JSON5 would also let us get rid of some quotes, if that is considered a desirable goal of the representation (although I'm not sure that quote avoidance should be driving our decision, so much as automated conversion).
On 7/30/20 11:11 AM, Eric Blake wrote: > > Agreed on that front. Thirded: Reviewed-by: John Snow <jsnow@redhat.com>
On 7/30/20 11:11 AM, Eric Blake wrote: > JSON5 would also let us get rid of some quotes, if that is considered a > desirable goal of the representation (although I'm not sure that quote > avoidance should be driving our decision, so much as automated conversion). There's no JSON5 parser built in to Python. OK, there's no built-in YAML parser either, but I am not keen on fighting the built-in json module.
On 7/30/20 9:24 AM, Daniel P. Berrangé wrote: > On Thu, Jul 30, 2020 at 01:51:10PM +0200, Markus Armbruster wrote: >> Daniel P. Berrangé <berrange@redhat.com> writes: >> >>> modify them so that we can load the >>> files straight into the python intepretor as code, and not parse >>> them as data. I feel unhappy about treating data as code though. >> >> Stress on *can* load. Doesn't mean we should. >> >> Ancient prior art: Lisp programs routinely use s-expressions as >> configuration file syntax. They don't load them as code, they read them >> as data. >> >> With Python, it's ast.parse(), I think. > > Yes, that could work > I use a similar trick for parsing "Fuzzy JSON" inside of qmp-shell. It's cute, and I'm not really proud of it. > >>> struct: ImageInfoSpecificQCow2 >>> data: >>> compat: str >>> "*data-file": str >>> "*data-file-raw": bool >>> "*lazy-refcounts": bool >>> "*corrupt": bool >>> refcount-bits: int >>> "*encrypt": ImageInfoSpecificQCow2Encryption >>> "*bitmaps": >>> - Qcow2BitmapInfo >>> compression-type: Qcow2CompressionType >>> >>> >>> Then we could use a regular off the shelf YAML parser in python. >>> I have a prototype where I started this, but I use "---" as a document separator to allow us multiple definitions per file so that the nesting remains pleasant. (YAML does not allow you to duplicate field names.) >>> The uglyiness with quotes is due to the use of "*". Slightly less ugly >>> if we simply declare that quotes are always used, even where they're >>> not strictly required. >> >> StrictYAML insists on quotes. > > I wouldn't suggest StrictYAML, just normal YAML is what pretty much > everyone uses. > > If we came up with a different way to mark a field as optional > instead of using the magic "*" then we wouldn't need to quote > anything > I have a YAML prototype branch where I use `?field` to indicate optional syntax. It works just fine, at the expense of being slightly new to people. I tested with normal YAML, but I was thinking about adopting strict YAML because Markus wanted some assurance we wouldn't get lost in the weeds using complex feature of YAML. (Or, shoot ourselves entirely by accident.) My prototype doesn't use anything that Strict YAML prohibits, so I thought it was a good idea. IF -- IF IF IF IF IF we decide that actually we need the crazy horsepower of standard YAML, or that strict YAML is too buggy -- we could always just replace it. No real big deal. >> I hate having to quote identifiers. There's a reason we don't write >> >> 'int' >> 'main'('int', 'argc', 'char' *'argv'[]) >> { >> 'printf'("hello world\n"); >> return 0; >> } >> Fair enough ... but there's no special meaning to quoting or not quoting the RHS in YAML, so maybe it's best to avoid pretending like there's a structural semantic between an identifier and a string there. (Since they're both just strings, and the semantic difference is picked up inside the QAPI generator post-parse.) >>> struct: ImageInfoSpecificQCow2 >>> data: >>> "compat": "str" >>> "*data-file": "str" >>> "*data-file-raw": "bool" >>> "*lazy-refcounts": "bool" >>> "*corrupt": "bool" >>> "refcount-bits": "int" >>> "*encrypt": "ImageInfoSpecificQCow2Encryption" >>> "*bitmaps": >>> - "Qcow2BitmapInfo" >>> "compression-type": "Qcow2CompressionType" >>> >>> With the use of "---" to denote the start of document, we have no trouble >>> parsing our files which would actually be a concatenation of multiple >>> documents. The python YAML library provides the easy yaml.load_all() >>> method. >> Nevermind the earlier comment, then. >> Required reading on YAML: >> https://www.arp242.net/yaml-config.html > > I don't think this is especially helpful to our evaluation. You can write > such blog posts about pretty much any thing if you want to pick holes in a > proposal. Certainly there's plenty of awful stuff you can write about > JSON, and Python. > >> Some of the criticism there doesn't matter for our use case. > > Yeah, what matters is whether it can do the job we need in a way that is > better than what we have today, and whether there are any further options > to consider that might be viable alternatives. > > Regards, > Daniel > I guess I'll dust off the work I have already to show the class. --js
Am 30.07.2020 um 17:11 hat Eric Blake geschrieben: > > The QAPI schema language has two layers: > > > > * JSON, with a lexical and a syntactical sub-layer (both in parser.py) > > > > * QAPI, with a context-free and a context-dependend sub-layer (in > > expr.py and schema.py, respectively) > > > > Replacing the JSON layer is possible as long as the replacement is > > sufficiently expressive (not a tall order). > > I'm open to the idea, if we want to attempt it, and agree with the > assessment that it is not a tall order. I'm not so sure about that. I mean, it certainly sounds doable if need be, but getting better syntax highlighting by default in some editors feels like a pretty weak reason to switch out the complete schema language. At first I was going to say "but if you don't have anything else to do with your time...", but it's actually not only your time, but the time of everyone who has development branches or downstream repositories and will suffer rather nasty merge conflicts. So this will likely end up having a non-negligible cost. So is there more to it or are we really considering doing this just because editors can tell more easily what to do with a different syntax? Kevin
On Fri, Jul 31, 2020 at 09:15:13AM +0200, Kevin Wolf wrote: > Am 30.07.2020 um 17:11 hat Eric Blake geschrieben: > > > The QAPI schema language has two layers: > > > > > > * JSON, with a lexical and a syntactical sub-layer (both in parser.py) > > > > > > * QAPI, with a context-free and a context-dependend sub-layer (in > > > expr.py and schema.py, respectively) > > > > > > Replacing the JSON layer is possible as long as the replacement is > > > sufficiently expressive (not a tall order). > > > > I'm open to the idea, if we want to attempt it, and agree with the > > assessment that it is not a tall order. > > I'm not so sure about that. I mean, it certainly sounds doable if need > be, but getting better syntax highlighting by default in some editors > feels like a pretty weak reason to switch out the complete schema > language. The main compelling reason to me is that we get to throw away our custom written parsing code and use a standard python library. Actually following a standardized document format is a benefit to contributors, over something that looks like a standard format but is actually slightly different as it is our own parser. Automatic correct syntax highlighting is a happy side effect. More interesting is if the editor automatically does the right thing with indentation/formatting when writing schemas. I wont miss the "not allowed a trailing comma after the last field" part of JSON as I always get it wrong. Regards, Daniel
Kevin Wolf <kwolf@redhat.com> writes: > Am 30.07.2020 um 17:11 hat Eric Blake geschrieben: >> > JSON is a exceptionally poor choice for a DSL, or even a configuration >> > language. >> > >> > Correcting our mistake involves a flag day and a re-learn. We need to >> > weigh costs against benefits. >> > >> > The QAPI schema language has two layers: >> > >> > * JSON, with a lexical and a syntactical sub-layer (both in parser.py) An incompatible dialect of JSON with a doc comment language, actually. The need to keep doc generation working could complicate replacing the lower layer. >> > >> > * QAPI, with a context-free and a context-dependend sub-layer (in >> > expr.py and schema.py, respectively) >> > >> > Replacing the JSON layer is possible as long as the replacement is >> > sufficiently expressive (not a tall order). >> >> I'm open to the idea, if we want to attempt it, and agree with the >> assessment that it is not a tall order. Careful, "not a tall order" is meant to apply to the "sufficiently expressive" requirement for a replacemnt syntax. On actually replacing the lower layer, I wrote "we need to weigh costs against benefits." > I'm not so sure about that. I mean, it certainly sounds doable if need > be, but getting better syntax highlighting by default in some editors > feels like a pretty weak reason to switch out the complete schema > language. > > At first I was going to say "but if you don't have anything else to do > with your time...", but it's actually not only your time, but the time > of everyone who has development branches or downstream repositories and > will suffer rather nasty merge conflicts. So this will likely end up > having a non-negligible cost. Yup. > So is there more to it or are we really considering doing this just > because editors can tell more easily what to do with a different syntax? If memory serves, the following arguments have been raised: 1. A chance to improve ergonomics for developers Pain points include - Confusion It claims to be JSON, but it's not. - Need to learn another syntax Sunk cost for old hands, but it's a valid point all the same. - Poor tool support JSON tools don't work. Python tools do, but you may have to work around the issue of true, false. - Excessive quoting - Verbosity When all you have is KEY: VALUE, defining things with multiple properties becomes verbose like 'status': { 'type': 'DirtyBitmapStatus', 'features': [ 'deprecated' ] } We need syntactic sugar to keep vebosity in check for the most common cases. More complexity. - No trailing comma in arrays and objects - No way to split long strings for legibility - The doc comment language is poorly specified - Parse error reporting could be better (JSON part) / could hardly be worse (doc comment part) 2. Not having to maintain our own code for the lower layer I consider this argument quite weak. parser.py has some 400 SLOC. Writing and rewriting it is sunk cost. Keeping it working has been cheap. Keeping the glue for some off-the-shelf parser working isn't free, either. No big savings to be had here, sorry. Almost half of parser.c is about doc comments, and it's the hairier part by far. Peter has patches to drag the doc comment language closer to rST. I don't remember whether they shrink parser.py. 3. Make the schema more easily consumable by other programs Use of a "standard" syntax instead of our funky dialect of JSON means other programs can use an off-the-shelf parser instead of using or reimplementing parser.py. Valid point for programs that parse the lower layer, and no more, say for basic syntax highlighting. Pretty much irrelevant for programs that need to go beyond the lower layer. Parsing the lower layer is the easy part. The code dealing with the upper layer is much larger (expr.py and schema.py), and it actually changes as we add features to the schema language. Duplicating it would be a Bad Idea. Reuse the existing frontend instead.
John Snow <jsnow@redhat.com> writes: > On 7/30/20 9:24 AM, Daniel P. Berrangé wrote: >> On Thu, Jul 30, 2020 at 01:51:10PM +0200, Markus Armbruster wrote: >>> Daniel P. Berrangé <berrange@redhat.com> writes: >>> >>>> modify them so that we can load the >>>> files straight into the python intepretor as code, and not parse >>>> them as data. I feel unhappy about treating data as code though. >>> >>> Stress on *can* load. Doesn't mean we should. >>> >>> Ancient prior art: Lisp programs routinely use s-expressions as >>> configuration file syntax. They don't load them as code, they read them >>> as data. >>> >>> With Python, it's ast.parse(), I think. >> >> Yes, that could work >> > > I use a similar trick for parsing "Fuzzy JSON" inside of qmp-shell. > > It's cute, and I'm not really proud of it. > >> >>>> struct: ImageInfoSpecificQCow2 >>>> data: >>>> compat: str >>>> "*data-file": str >>>> "*data-file-raw": bool >>>> "*lazy-refcounts": bool >>>> "*corrupt": bool >>>> refcount-bits: int >>>> "*encrypt": ImageInfoSpecificQCow2Encryption >>>> "*bitmaps": >>>> - Qcow2BitmapInfo >>>> compression-type: Qcow2CompressionType >>>> >>>> >>>> Then we could use a regular off the shelf YAML parser in python. >>>> > > I have a prototype where I started this, but I use "---" as a document > separator to allow us multiple definitions per file so that the > nesting remains pleasant. > > (YAML does not allow you to duplicate field names.) > >>>> The uglyiness with quotes is due to the use of "*". Slightly less ugly >>>> if we simply declare that quotes are always used, even where they're >>>> not strictly required. >>> >>> StrictYAML insists on quotes. >> >> I wouldn't suggest StrictYAML, just normal YAML is what pretty much >> everyone uses. >> > If we came up with a different way to mark a field as optional >> instead of using the magic "*" then we wouldn't need to quote >> anything >> > > I have a YAML prototype branch where I use `?field` to indicate > optional syntax. It works just fine, at the expense of being slightly > new to people. > > I tested with normal YAML, but I was thinking about adopting strict > YAML because Markus wanted some assurance we wouldn't get lost in the > weeds using complex feature of YAML. > > (Or, shoot ourselves entirely by accident.) > > My prototype doesn't use anything that Strict YAML prohibits, so I > thought it was a good idea. > > IF -- IF IF IF IF IF we decide that actually we need the crazy > horsepower of standard YAML, or that strict YAML is too buggy -- we > could always just replace it. No real big deal. > >>> I hate having to quote identifiers. There's a reason we don't write >>> >>> 'int' >>> 'main'('int', 'argc', 'char' *'argv'[]) >>> { >>> 'printf'("hello world\n"); >>> return 0; >>> } >>> > > Fair enough ... but there's no special meaning to quoting or not > quoting the RHS in YAML, so maybe it's best to avoid pretending like > there's a structural semantic between an identifier and a string > there. > > (Since they're both just strings, and the semantic difference is > picked up inside the QAPI generator post-parse.) You wish... An unquoted right hand side is a string, unless it can be interpreted as something else. For instance, when something else is one of the eleven(?) ways to say false, you have a variation of YAML's Norway problem: https://hitchdev.com/strictyaml/why/implicit-typing-removed/ >>>> struct: ImageInfoSpecificQCow2 >>>> data: >>>> "compat": "str" >>>> "*data-file": "str" >>>> "*data-file-raw": "bool" >>>> "*lazy-refcounts": "bool" >>>> "*corrupt": "bool" >>>> "refcount-bits": "int" >>>> "*encrypt": "ImageInfoSpecificQCow2Encryption" >>>> "*bitmaps": >>>> - "Qcow2BitmapInfo" >>>> "compression-type": "Qcow2CompressionType" >>>> >>>> With the use of "---" to denote the start of document, we have no trouble >>>> parsing our files which would actually be a concatenation of multiple >>>> documents. The python YAML library provides the easy yaml.load_all() >>>> method. >>> > > Nevermind the earlier comment, then. > >>> Required reading on YAML: >>> https://www.arp242.net/yaml-config.html >> >> I don't think this is especially helpful to our evaluation. You can write >> such blog posts about pretty much any thing if you want to pick holes in a >> proposal. Certainly there's plenty of awful stuff you can write about >> JSON, and Python. >> >>> Some of the criticism there doesn't matter for our use case. >> >> Yeah, what matters is whether it can do the job we need in a way that is >> better than what we have today, and whether there are any further options >> to consider that might be viable alternatives. The sheer complexity of YAML puts me off. The spec exceeds 20k words. > I guess I'll dust off the work I have already to show the class. Prototype code should beat guesswork.
On Fri, Jul 31, 2020 at 11:21:35AM +0200, Markus Armbruster wrote: > John Snow <jsnow@redhat.com> writes: > > > On 7/30/20 9:24 AM, Daniel P. Berrangé wrote: > >>> Some of the criticism there doesn't matter for our use case. > >> > >> Yeah, what matters is whether it can do the job we need in a way that is > >> better than what we have today, and whether there are any further options > >> to consider that might be viable alternatives. > > The sheer complexity of YAML puts me off. The spec exceeds 20k words. Just because complexity exists doesn't mean it has any negative impact on us as users. A large spec is a concern if writing a parser, but we'd be using an off the shelf parser so not a concern. Or if you're trying to do obscure stuff that pushes the boundaries of what YAML can do in terms of data representation, but QAPI schema needs are really trivial so won't ever hit anything complex unless we actively went looking for it. In all the times I've used YAML I can't say "complexity" has been a word that ever came to mind, quite the opposite, it felt simple. Regards, Daniel
Am 31.07.2020 um 11:01 hat Markus Armbruster geschrieben: > Kevin Wolf <kwolf@redhat.com> writes: > > > Am 30.07.2020 um 17:11 hat Eric Blake geschrieben: > >> > JSON is a exceptionally poor choice for a DSL, or even a configuration > >> > language. > >> > > >> > Correcting our mistake involves a flag day and a re-learn. We need to > >> > weigh costs against benefits. > >> > > >> > The QAPI schema language has two layers: > >> > > >> > * JSON, with a lexical and a syntactical sub-layer (both in parser.py) > > An incompatible dialect of JSON with a doc comment language, actually. > > The need to keep doc generation working could complicate replacing the > lower layer. Good point, we would have to keep the comment parser either way to be used on top of the YAML (or whatever) parser. Whatever parser we use would have to actually make comments available rather than immediately filtering them out. This might exist for most languages, but it will probably not be the most commonly used parser either (or at least it will not allow using a simple interface like json.loads() in Python). > >> > > >> > * QAPI, with a context-free and a context-dependend sub-layer (in > >> > expr.py and schema.py, respectively) > >> > > >> > Replacing the JSON layer is possible as long as the replacement is > >> > sufficiently expressive (not a tall order). > >> > >> I'm open to the idea, if we want to attempt it, and agree with the > >> assessment that it is not a tall order. > > Careful, "not a tall order" is meant to apply to the "sufficiently > expressive" requirement for a replacemnt syntax. > > On actually replacing the lower layer, I wrote "we need to weigh costs > against benefits." > > > I'm not so sure about that. I mean, it certainly sounds doable if need > > be, but getting better syntax highlighting by default in some editors > > feels like a pretty weak reason to switch out the complete schema > > language. > > > > At first I was going to say "but if you don't have anything else to do > > with your time...", but it's actually not only your time, but the time > > of everyone who has development branches or downstream repositories and > > will suffer rather nasty merge conflicts. So this will likely end up > > having a non-negligible cost. > > Yup. > > > So is there more to it or are we really considering doing this just > > because editors can tell more easily what to do with a different syntax? > > If memory serves, the following arguments have been raised: > > 1. A chance to improve ergonomics for developers > > Pain points include > > - Confusion > > It claims to be JSON, but it's not. > > - Need to learn another syntax > > Sunk cost for old hands, but it's a valid point all the same. We use a similar (same?) form of "almost JSON" for QMP which will still exist. So we'd be moving from having to learn one (non-standard) language to two languages (one still non-standard and another one that is hopefully more standard). > - Poor tool support > > JSON tools don't work. Python tools do, but you may have to work > around the issue of true, false. This is mostly the editor question this patch was about, right? Or are people trying to use more sophisticated tools on it? > - Excessive quoting > > - Verbosity > > When all you have is KEY: VALUE, defining things with multiple > properties becomes verbose like > > 'status': { 'type': 'DirtyBitmapStatus', > 'features': [ 'deprecated' ] } > > We need syntactic sugar to keep vebosity in check for the most > common cases. More complexity. I don't think this is something any of the suggested languages would address. > - No trailing comma in arrays and objects > > - No way to split long strings for legibility > > - The doc comment language is poorly specified > > - Parse error reporting could be better (JSON part) / could hardly be > worse (doc comment part) Has anyone looked into what error messages are like for the suggested alternatives? "error reporting could be better" is something that is true for a lot of software. The doc comment part is not going to change unless we get rid of comments and actually make documentation part of the objects themselves. This might not be very readable. Or I should rather say, making the doc comment part change is possible, but would require the same changes as with our current lower layer language and parser. > 2. Not having to maintain our own code for the lower layer > > I consider this argument quite weak. parser.py has some 400 SLOC. > Writing and rewriting it is sunk cost. Keeping it working has been > cheap. Keeping the glue for some off-the-shelf parser working isn't > free, either. No big savings to be had here, sorry. > > Almost half of parser.c is about doc comments, and it's the hairier > part by far. Peter has patches to drag the doc comment language > closer to rST. I don't remember whether they shrink parser.py. > > 3. Make the schema more easily consumable by other programs > > Use of a "standard" syntax instead of our funky dialect of JSON means > other programs can use an off-the-shelf parser instead of using or > reimplementing parser.py. > > Valid point for programs that parse the lower layer, and no more, say > for basic syntax highlighting. > > Pretty much irrelevant for programs that need to go beyond the lower > layer. Parsing the lower layer is the easy part. The code dealing > with the upper layer is much larger (expr.py and schema.py), and it > actually changes as we add features to the schema language. > Duplicating it would be a Bad Idea. Reuse the existing frontend > instead. Do other programs that go beyond syntax highlighting actually get to parse our QAPI schema definitions? Or would they rather deal with the return value of query-qmp-schema? Neither the QAPI schema nor a YAML file with the same structure are a standard approach to describe JSON documents. So even if we replace JSON in the lower layer, the whole thing (and as you say, the upper layer is the more interesting part) still stays non-standard in a way and more advanced tools can't be used with it. And of course, even if we did use something more standard like JSON Schema or whatever exists for YAML, we would still have to massively extend it because the QAPI schema contains much more information than just what would be needed to validate some input. We control all aspects of generated C code with it. Kevin
Daniel P. Berrangé <berrange@redhat.com> writes: > On Thu, Jul 30, 2020 at 01:51:10PM +0200, Markus Armbruster wrote: >> Daniel P. Berrangé <berrange@redhat.com> writes: >> >> > modify them so that we can load the >> > files straight into the python intepretor as code, and not parse >> > them as data. I feel unhappy about treating data as code though. >> >> Stress on *can* load. Doesn't mean we should. >> >> Ancient prior art: Lisp programs routinely use s-expressions as >> configuration file syntax. They don't load them as code, they read them >> as data. >> >> With Python, it's ast.parse(), I think. > > Yes, that could work > > >> > struct: ImageInfoSpecificQCow2 >> > data: >> > compat: str >> > "*data-file": str >> > "*data-file-raw": bool >> > "*lazy-refcounts": bool >> > "*corrupt": bool >> > refcount-bits: int >> > "*encrypt": ImageInfoSpecificQCow2Encryption >> > "*bitmaps": >> > - Qcow2BitmapInfo >> > compression-type: Qcow2CompressionType >> > >> > >> > Then we could use a regular off the shelf YAML parser in python. >> > >> > The uglyiness with quotes is due to the use of "*". Slightly less ugly >> > if we simply declare that quotes are always used, even where they're >> > not strictly required. >> >> StrictYAML insists on quotes. > > I wouldn't suggest StrictYAML, just normal YAML is what pretty much > everyone uses. > > If we came up with a different way to mark a field as optional > instead of using the magic "*" then we wouldn't need to quote > anything Nope. Stupid test script: $ cat test-yaml.py #!/usr/bin/python3 import sys, yaml data = yaml.load(sys.stdin, Loader=yaml.CLoader) print(data) Example taken from block.json: $ ./test-yaml.py <<EOF enum: FloppyDriveType data: - 144 - 288 - 120 - none - auto EOF {'enum': 'FloppyDriveType', 'data': [144, 288, 120, 'none', 'auto']} The upper layer will choke on this: qapi/block.yaml: In enum 'FloppyDriveType': qapi/block.yaml:61: 'data' member requires a string name You could propose to provide "wouldn't need to quote anything" by silently converting numbers back to strings. I got two issues with that. 1. What's the point in switching to an off-the-shelf parser to replace less than 400 SLOC if I then have to write non-trivial glue code to make the syntax less surprising? 2. Let me trot out the tired Norway problem again. Say we QAPIfy -k so that its argument is an enum, for nice introspection. Something like $ ./test-yaml.py <<EOF enum: Keymap data: - ar - cz - de - de-ch - en-gb - en-us - no - pt - pt-br EOF {'enum': 'Keymap', 'data': ['ar', 'cz', 'de', 'de-ch', 'en-gb', 'en-us', False, 'pt', 'pt-br']} To which of the eleven ways to say False in YAML should we convert? YAML and the schema language are fundamentally at odds here: they fight over types. YAML lets the value determine the type regardless of context. But in the schema, the context determines the type. >> I hate having to quote identifiers. There's a reason we don't write >> >> 'int' >> 'main'('int', 'argc', 'char' *'argv'[]) >> { >> 'printf'("hello world\n"); >> return 0; >> } >> >> > struct: ImageInfoSpecificQCow2 >> > data: >> > "compat": "str" >> > "*data-file": "str" >> > "*data-file-raw": "bool" >> > "*lazy-refcounts": "bool" >> > "*corrupt": "bool" >> > "refcount-bits": "int" >> > "*encrypt": "ImageInfoSpecificQCow2Encryption" >> > "*bitmaps": >> > - "Qcow2BitmapInfo" >> > "compression-type": "Qcow2CompressionType" >> > >> > With the use of "---" to denote the start of document, we have no trouble >> > parsing our files which would actually be a concatenation of multiple >> > documents. The python YAML library provides the easy yaml.load_all() >> > method. >> >> Required reading on YAML: >> https://www.arp242.net/yaml-config.html > > I don't think this is especially helpful to our evaluation. You can write > such blog posts about pretty much any thing if you want to pick holes in a > proposal. Certainly there's plenty of awful stuff you can write about > JSON, and Python. Picking holes in a proposal is precisely what we need to do before we act on it and rebase our QAPI schema DSL. That's expensive and painful, so we better don't screw it up *again*. Here's my superficial five minute assessment of the essay's main points for our use case: * Insecure by default Valid criticism of existing YAML tools, but hardly relevant for us, because 1. "don't do that then", and 2. the QAPI schema is trusted input. * Can be hard to edit, especially for large files Valid when you use YAML to describe data. We would use it as an IDL, though. If parts of the interface description get too deeply nested or too long for comfort, we better provide means to rearrange it in more pleasant ways. However, if our new base language got uncomfortable earlier than the old one, and the existing means to rearrange prove to weak (they are pretty weak), then we'd create additional work. I'm cautiously optimistic that YAML would do okay here. * It’s pretty complex If you go "we'll use only a simple subset", then I go "define the subset, and tell me how to enforce the subset. I have no interest in teaching contributors one by one which of the nine ways to write a multiline string to avoid, or which of the eleven ways to say False to use. * Surprising behavior See "fight over types" above. * It’s not portable Probably irrelevant, because feeding the schema to another YAML parser is so unlikely to be useful. >> Some of the criticism there doesn't matter for our use case. > > Yeah, what matters is whether it can do the job we need in a way that is > better than what we have today, and whether there are any further options > to consider that might be viable alternatives. Would it improve things enough to be worth the switching pain?
On Fri, Jul 31, 2020 at 02:55:34PM +0200, Markus Armbruster wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > >> Some of the criticism there doesn't matter for our use case. > > > > Yeah, what matters is whether it can do the job we need in a way that is > > better than what we have today, and whether there are any further options > > to consider that might be viable alternatives. > > Would it improve things enough to be worth the switching pain? The short answer is that I don't think that question matters. We should do the conversion regardless, as our JSON-but-not file format has no compelling reason to exist as a thing when there's a variety of standard file formats that could do the job. I'd explicitly ignore the sunk costs and minor amount of work to convert to a new format. The long answer is that as a general philosophy I'm in favour of agressively eliminating anything that is custom to a project and isn't offering an compelling benefit over a functionally equivalent, commonly used / standard solution. Any time a project re-invents the wheel, that is one more piece of custom knowledge a contributor has to learn. Each one may seem insignificant on its own, but cummulatively they result in death by a 1000 cuts. This makes a project increasingly less attractive to contribute to over the long term. Measuring the long term benefit of the change is generally quite difficult, because while you can see what impact a change will have today on current code, it is hard to usefully evaluate future benefits as you're trying to imagine the impact on things that don't even exist. Overall my POV is not to think too hard about measuring improvements, and discard any concern about sunk costs. Instead have a general presumption in favour of eliminating any examples of wheel re-invention in a project. Even if regular contributors don't want to spend time on such work, this kind of thing is pretty amenable to new contributors looking for tasks to start their involvement. The QAPI JSON-but-not file format is a case where I think we should just adopt a standard file format no matter what. A conversion will have some short term work, but this is really simple data to deal with and the code involved is nicely self contained. Again I'm not saying QAPI maintainers must do it, just put the idea out there as a piece of work that would be welcomed if someone is interested in working ont. Another example would be elimination of anything in QEMU code that is duplicating functionality in GLib, even if there zero functional difference between the two impls. Another example would be adopting a standard code style and using a tool like clang-format to enforce this for entire of existing code base and future contributions and throwing away our checkpatch.pl which nearly everyone is scared of touching as it is Perl code. Regards, Daniel
On 7/31/20 11:07 AM, Daniel P. Berrangé wrote: > On Fri, Jul 31, 2020 at 02:55:34PM +0200, Markus Armbruster wrote: >> Daniel P. Berrangé <berrange@redhat.com> writes: > >>>> Some of the criticism there doesn't matter for our use case. >>> >>> Yeah, what matters is whether it can do the job we need in a way that is >>> better than what we have today, and whether there are any further options >>> to consider that might be viable alternatives. >> >> Would it improve things enough to be worth the switching pain? > > The short answer is that I don't think that question matters. We should > do the conversion regardless, as our JSON-but-not file format has no > compelling reason to exist as a thing when there's a variety of standard > file formats that could do the job. I'd explicitly ignore the sunk costs > and minor amount of work to convert to a new format. > This is not a language we spend a lot of time editing, too. Is it really such a big cost to re-learn a tiny language like this? That concern seems overplayed, in my view. > The long answer is that as a general philosophy I'm in favour of agressively > eliminating anything that is custom to a project and isn't offering an > compelling benefit over a functionally equivalent, commonly used / standard > solution. > I agree as violently as I know how. The purpose of this is not for us, it's for the ecosystem. I saw the critique that we still use JSON-ish for the runtime QMP protocol, and moving the QAPI IDL to a standard wouldn't remove all instances of a custom format from our tree. This is actually a great argument for doing the other thing: moving QMP onto a standard protocol. The point is not to complicate matters for QEMU developers, but to simplify matters for developers from other projects who are integrating QEMU. > Any time a project re-invents the wheel, that is one more piece of custom > knowledge a contributor has to learn. Each one may seem insignificant on > its own, but cummulatively they result in death by a 1000 cuts. This makes > a project increasingly less attractive to contribute to over the long term. > > Measuring the long term benefit of the change is generally quite difficult, > because while you can see what impact a change will have today on current > code, it is hard to usefully evaluate future benefits as you're trying to > imagine the impact on things that don't even exist. > > Overall my POV is not to think too hard about measuring improvements, and > discard any concern about sunk costs. Instead have a general presumption > in favour of eliminating any examples of wheel re-invention in a project. > Even if regular contributors don't want to spend time on such work, this > kind of thing is pretty amenable to new contributors looking for tasks to > start their involvement. > > The QAPI JSON-but-not file format is a case where I think we should just > adopt a standard file format no matter what. A conversion will have some > short term work, but this is really simple data to deal with and the code > involved is nicely self contained. Again I'm not saying QAPI maintainers > must do it, just put the idea out there as a piece of work that would > be welcomed if someone is interested in working ont. > I have a lot of code trying to pylint and mypy-ify the QAPI parser, and a prototype YAML parser. The work that remains to be done is hooking up my new YAML parser into the code generator. Assuming I am able to continue working on this, an RFC could be soon. > Another example would be elimination of anything in QEMU code that is > duplicating functionality in GLib, even if there zero functional > difference between the two impls. > > Another example would be adopting a standard code style and using a > tool like clang-format to enforce this for entire of existing code > base and future contributions and throwing away our checkpatch.pl > which nearly everyone is scared of touching as it is Perl code. > > Regards, > Daniel >
On Fri, Jul 31, 2020 at 11:26:28AM -0400, John Snow wrote: > > The long answer is that as a general philosophy I'm in favour of agressively > > eliminating anything that is custom to a project and isn't offering an > > compelling benefit over a functionally equivalent, commonly used / standard > > solution. > > > > I agree as violently as I know how. The purpose of this is not for us, it's > for the ecosystem. > > I saw the critique that we still use JSON-ish for the runtime QMP protocol, > and moving the QAPI IDL to a standard wouldn't remove all instances of a > custom format from our tree. I'd consider the runtime protocol separately. In terms of what's on the wire, we use genuine JSON format. The runtime problem is simply that JSON standard is useless when it comes to integers, leaving behaviour undefined in the standard if you exceed 53 bits of precision. So there's no way to reliably parse unsigned 64-bit integers. Given that QEMU needs to pass uint64 values, JSON was simply the wrong choice of format for QMP. There's a 3rd aspect which is our source code that deals with JSON, where we defined some JSON extensions to make it easier for C code to construct JSON documents for sending over the wire. Back when we did this, it was a reasonably good idea as no obvious alternative existed for this problem. Today, I would just suggest using GLib's GVariant feature, which solves the same problem for GLib's DBus APIs. It is a shame we didn't just use DBus back in the day as that's a well specified, simple protocol that would have done everything we needed, including the ability to actually represent integers reliably. We would be able to trivially talk to QEMU from any programming language, and use common DBus code-generator tools instead of writing code generators ourselves. I wish that libvirt had picked DBus all that time ago too, instead of creating our own RPC based on XDR, which is 95% identical to what DBus provides but with a massive maint burden for ourselves. Back then DBus didn't seem like it was good enough as it didn't offer TLS or SASL support and that looked like such a big deal. With hindsight the right answer was to add TLS + SASL to DBus, and not invent our own protocol. We would have lacked TLS/SASL support in libvirt for 6 to 12 months or so, but then would have had 10 years benefitting from the DBus ecosystem. Life would have been much easier for mgmt tools using libvirt too, as they could have used native DBus APIs instead of having to use the C libvirt.so client. This is one of my biggest regrets with libvirt's architecture, and even ater 10 years it is still probably worth fixing this mistake and adopting DBus. Regards, Daniel
On 31/07/20 17:07, Daniel P. Berrangé wrote: > The QAPI JSON-but-not file format is a case where I think we should just > adopt a standard file format no matter what. A conversion will have some > short term work, but this is really simple data to deal with and the code > involved is nicely self contained. Again I'm not saying QAPI maintainers > must do it, just put the idea out there as a piece of work that would > be welcomed if someone is interested in working ont. The main issues with JSON-but-not in QEMU are: - the 64-bit integers, which does not apply to the QAPI schema though - the comments, which are a common extension to JSON (see JSON5, NodeJS config files, json_verify's -c option, etc.) so I find it quite surprising that no off-the-shelf Python component can parse JSON + comments - the single-quote strings, which are not particularly useful in QAPI schema If we changed the single-quote strings to double-quote, jsonc.vim (https://github.com/neoclide/jsonc.vim) seems to support JSON + comments. In Emacs you'd probably add an epilogue like (defconst json-mode-comments-re (rx (group "#" (zero-or-more nonl) line-end))) (push (list json-mode-comments-re 1 font-lock-comment-face) json-font-lock-keywords-1) Did I miss anything? Besides that, why are we using Python and not JavaScript in the mode line? > Another example would be elimination of anything in QEMU code that is > duplicating functionality in GLib, even if there zero functional > difference between the two impls. Would you consider intrusive lists vs. GList/GSList to be duplicating functionality? I don't think we have that many duplicates at this point. We have qemu_strto*, but unfortunately the GLib function g_ascii_strtoll does nothing to fix the awful design of the C standard strto* functions (especially the overloading of the return value for error and result). If there are cases that are clear cut against adopting the GLib version, I think patches would be easily accepted. > Another example would be adopting a standard code style and using a > tool like clang-format to enforce this for entire of existing code > base and future contributions and throwing away our checkpatch.pl > which nearly everyone is scared of touching as it is Perl code. Checkpatch does much more than that, though the scary part is indeed the one that enfoces coding style. I wouldn't have a problem with using clang-format and calling it a day. Paolo
On 31/07/20 17:26, John Snow wrote: > I saw the critique that we still use JSON-ish for the runtime QMP > protocol, and moving the QAPI IDL to a standard wouldn't remove all > instances of a custom format from our tree. Sorry, but "still using JSON" is not a critique that makes any sense. 99% of the websites you use daily use JSON as their RPC frontend-to-backend language; OpenAPI is essentially JSON over HTTP. There must be something good in JSON. Whenever you hear a complaint about "using JSON", it's actually a complaint about bindings for your language, which can be a sensible critique: gRPC is essentially {protobuf,FlatBuffers} over HTTP/2 plus a boatload of language mappings. Unfortunately C is not one of those mappings. Paolo
Am 31.07.2020 um 17:07 hat Daniel P. Berrangé geschrieben: > On Fri, Jul 31, 2020 at 02:55:34PM +0200, Markus Armbruster wrote: > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > >> Some of the criticism there doesn't matter for our use case. > > > > > > Yeah, what matters is whether it can do the job we need in a way that is > > > better than what we have today, and whether there are any further options > > > to consider that might be viable alternatives. > > > > Would it improve things enough to be worth the switching pain? > > The short answer is that I don't think that question matters. We should > do the conversion regardless, as our JSON-but-not file format has no > compelling reason to exist as a thing when there's a variety of standard > file formats that could do the job. I think the question does matter. Reusing existing code is not an end in itself, but it has to actually improve something. Usually this would be simplifying the code because a lot of the work is done by something external now. "The job" is defining APIs in a way that we they describe JSON messages and we can generate C boilerplate code and documentation from the description. If we look for a standard format, then we should start with this rather than keeping the non-standard thing that we have, but wrapping a new syntax around it. > I'd explicitly ignore the sunk costs and minor amount of work to > convert to a new format. I'm not worried about the work to convert the QAPI generator. QAPISchemaParser is 264 lines of code, presumably not a lot of work to rewrite. This is also the upper limit of code that could be improved. Given the discussion so far, I wouldn't even be sure that the diffstat after pulling in an external dependeny would be at least minimally favourable. More time will be spent with dealing with the results of the actual conversion of the schema file because people will get lots of merge conflicts. Sometimes this is worth it, but then the benefit has to be a little more than just that we could say we've reused an external library. > The long answer is that as a general philosophy I'm in favour of agressively > eliminating anything that is custom to a project and isn't offering an > compelling benefit over a functionally equivalent, commonly used / standard > solution. The differences between JSON and the QAPI schema lower layer language is structured documentation comments (I don't think any of the proposed alternatives have this - might be a compelling benefit) and having multiple objects in a single file (I would rather put brackets around the whole file and commas between each definition and live with that ugliness than taking the pain of a merge conflicts). > Any time a project re-invents the wheel, that is one more piece of custom > knowledge a contributor has to learn. Each one may seem insignificant on > its own, but cummulatively they result in death by a 1000 cuts. This makes > a project increasingly less attractive to contribute to over the long term. I doubt documentation comments and having multiple objects in a single file makes QEMU less attractice to contribute to. > Measuring the long term benefit of the change is generally quite difficult, > because while you can see what impact a change will have today on current > code, it is hard to usefully evaluate future benefits as you're trying to > imagine the impact on things that don't even exist. > > Overall my POV is not to think too hard about measuring improvements, and > discard any concern about sunk costs. Instead have a general presumption > in favour of eliminating any examples of wheel re-invention in a project. > Even if regular contributors don't want to spend time on such work, this > kind of thing is pretty amenable to new contributors looking for tasks to > start their involvement. > > The QAPI JSON-but-not file format is a case where I think we should just > adopt a standard file format no matter what. A conversion will have some > short term work, but this is really simple data to deal with and the code > involved is nicely self contained. Almost every subsystem has some QAPI parts that would be affected. > Again I'm not saying QAPI maintainers must do it, just put the idea > out there as a piece of work that would be welcomed if someone is > interested in working ont. > > Another example would be elimination of anything in QEMU code that is > duplicating functionality in GLib, even if there zero functional > difference between the two impls. If it's an actual duplicate and exact match, sure. But if using the GLib functions results in double the code size that implementing the functionality from scratch was, it becomes questionable in my opinion. Kevin
* Paolo Bonzini (pbonzini@redhat.com) wrote: > On 31/07/20 17:26, John Snow wrote: > > I saw the critique that we still use JSON-ish for the runtime QMP > > protocol, and moving the QAPI IDL to a standard wouldn't remove all > > instances of a custom format from our tree. > > Sorry, but "still using JSON" is not a critique that makes any sense. > > 99% of the websites you use daily use JSON as their RPC > frontend-to-backend language; OpenAPI is essentially JSON over HTTP. > There must be something good in JSON. If there is, I've not found it: a) It's integer definitions are a mess b) You can't require ordering c) No two parsers agree with each other and those are the only ones I've hit in my very limited JSON wrangling. It's possible attractions are that no one has anything widely used that's better, and it's easy to use from JS. But it seems popular to try and find replacements; e.g. Amazon Ion that landed a few weeks ago (like JSON but...not quite and with a binary format optionally). Dave > Whenever you hear a complaint about "using JSON", it's actually a > complaint about bindings for your language, which can be a sensible > critique: gRPC is essentially {protobuf,FlatBuffers} over HTTP/2 plus a > boatload of language mappings. Unfortunately C is not one of those > mappings. > > Paolo > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri, Jul 31, 2020 at 06:28:06PM +0200, Paolo Bonzini wrote: > On 31/07/20 17:07, Daniel P. Berrangé wrote: > > The QAPI JSON-but-not file format is a case where I think we should just > > adopt a standard file format no matter what. A conversion will have some > > short term work, but this is really simple data to deal with and the code > > involved is nicely self contained. Again I'm not saying QAPI maintainers > > must do it, just put the idea out there as a piece of work that would > > be welcomed if someone is interested in working ont. > > The main issues with JSON-but-not in QEMU are: > > - the 64-bit integers, which does not apply to the QAPI schema though > > - the comments, which are a common extension to JSON (see JSON5, NodeJS > config files, json_verify's -c option, etc.) so I find it quite surprising > that no off-the-shelf Python component can parse JSON + comments > > - the single-quote strings, which are not particularly useful in QAPI schema NB our files are not JSON documents, they are a concatenation of a list of JSON documents. > > If we changed the single-quote strings to double-quote, jsonc.vim > (https://github.com/neoclide/jsonc.vim) seems to support JSON + comments. > In Emacs you'd probably add an epilogue like > > (defconst json-mode-comments-re (rx (group "#" (zero-or-more nonl) line-end))) > (push (list json-mode-comments-re 1 font-lock-comment-face) json-font-lock-keywords-1) > > Did I miss anything? > > Besides that, why are we using Python and not JavaScript in the mode line? If you use javascript mode, then emacs will highlight all the javascript language keywords in the comments because it doesn't recognise "#" as a comment - you need // or /* .. */ for comments in JavaScript We could of course convert to genuinely use Javascript comment syntax if we want to consider these files to be JavaScript code instead of JSON data. > > Another example would be elimination of anything in QEMU code that is > > duplicating functionality in GLib, even if there zero functional > > difference between the two impls. > > Would you consider intrusive lists vs. GList/GSList to be duplicating > functionality? I don't think we have that many duplicates at this > point. Yep, I'd have a preference for use of GList / GSList. Anything which uses G_DEFINE_AUTOPTR_CLEANUP_FUNC() automatically gets support for auto cleanup of GList, GSList and GQueue, deep free'ing each element. https://developer.gnome.org/glib/stable/glib-Miscellaneous-Macros.html#g-autolist Some of the io/ subsystem could start making more use of GIO APIs. We didn't use GIO originally as our min-GLib version at the time meant we couldn't use the functionality we needed. I'd also suggest replacing fprintf/printf/sprintf with the g*printf* equivalents. The benefit here is that GLib guarantees that its variants work the same way on Windows as on Linux. They pulled in the GNULIB printf impl to replace Microsoft's broken impl. There's various silly little things like ARRAY_SIZE vs G_N_ELEMENTS, and __attribute__ macros - QEMU_NORETURN vs G_GNUC_NORETURN. QEMU_BUILD_BUG_ON vs G_STATIC_ASSERT There's QEMU's threads related wrappers. I vaguely there were a couple of cases where GLib's threads APIs didn't do what we needed. Even if we can't eliminate all our threads APIs, I expect we can cull alot. There's util/buffer.{c.h} that can be replaced by GString or GArray. There's some places where we have misc utility functions that we should just contribute back to GLib upstream. eg our util/base64.c which offers improved error checking on base64 decoding. GLib would benefit from this, and while it won't help qemu immediately, in the long term it will. > We have qemu_strto*, but unfortunately the GLib function g_ascii_strtoll > does nothing to fix the awful design of the C standard strto* functions > (especially the overloading of the return value for error and result). > If there are cases that are clear cut against adopting the GLib version, > I think patches would be easily accepted. The g_ascii_strtoll aren't a drop in replacement for qemu_strto*, as they have intentionally different semantics from each other, so not an example I was considering here. The GLib functions are explicitly always using C-locale, while QEMU's current places are honouring current locale. There are some places in QEMU that really should be forcing C-locale, and so g_ascii_strtoll would actually be a bug fix in those cases. Other places simply have to stick with what they're currently using to honour the locale. > > Another example would be adopting a standard code style and using a > > tool like clang-format to enforce this for entire of existing code > > base and future contributions and throwing away our checkpatch.pl > > which nearly everyone is scared of touching as it is Perl code. > > Checkpatch does much more than that, though the scary part is indeed the > one that enfoces coding style. I wouldn't have a problem with using > clang-format and calling it a day. If there are things missing that we consider important, a long term better strategy would be to use the Python binding to libclang to detect the problem, instead of trying to parse C code with Perl and regexes. Regards, Daniel
On 31/07/20 19:05, Daniel P. Berrangé wrote: > NB our files are not JSON documents, they are a concatenation of a list > of JSON documents. This is not something that editors generally have problems with. > If you use javascript mode, then emacs will highlight all the javascript > language keywords in the comments because it doesn't recognise "#" as a > comment - you need // or /* .. */ for comments in JavaScript Oops, good point. So yeah, using // comments and double quotes would fix editor problems and be much more kind to downstreams. > I'd also suggest replacing fprintf/printf/sprintf with the g*printf* > equivalents. The benefit here is that GLib guarantees that its variants > work the same way on Windows as on Linux. They pulled in the GNULIB > printf impl to replace Microsoft's broken impl. That should not be an issue to replace. Most of our printf uses anyway are wrapped within error_report, or they are g_strdup_printf which we're already preferring to the messy libc asprintf for obvious reasons. > There's various silly little things like ARRAY_SIZE vs G_N_ELEMENTS, > and __attribute__ macros - QEMU_NORETURN vs G_GNUC_NORETURN. > QEMU_BUILD_BUG_ON vs G_STATIC_ASSERT > There's util/buffer.{c.h} that can be replaced by GString or GArray. Some of these are no brainers too. QEMU_BUILD_BUG_ON reverses the logic; I'm not sure if Coccinelle can do De Morgan laws. >>> Another example would be adopting a standard code style and using a >>> tool like clang-format to enforce this for entire of existing code >>> base and future contributions and throwing away our checkpatch.pl >>> which nearly everyone is scared of touching as it is Perl code. >> Checkpatch does much more than that, though the scary part is indeed the >> one that enfoces coding style. I wouldn't have a problem with using >> clang-format and calling it a day. > If there are things missing that we consider important, a long term > better strategy would be to use the Python binding to libclang to > detect the problem, instead of trying to parse C code with Perl and > regexes. Most of it is simply "use this function instead of this one" or "place a comment to explain why you're using this". The main feature of checkpatch.pl however is that it works on patches, not just files, but still there would be a substantial advantage in employing clang-format. Paolo
On Fri, Jul 31, 2020 at 06:35:23PM +0200, Paolo Bonzini wrote: > On 31/07/20 17:26, John Snow wrote: > > I saw the critique that we still use JSON-ish for the runtime QMP > > protocol, and moving the QAPI IDL to a standard wouldn't remove all > > instances of a custom format from our tree. > > Sorry, but "still using JSON" is not a critique that makes any sense. > > 99% of the websites you use daily use JSON as their RPC > frontend-to-backend language; OpenAPI is essentially JSON over HTTP. > There must be something good in JSON. I think it is more about the way we're using JSON to define QMP. We take a raw socket (or really arbitrary reliable stream no matter what its transport), and are defining the full RPC protocol. We define the initial QMP handshake, the separation of commands/replies/events and of course the data format for the content sent. We've invented the whole stack above the raw sockets layer. In the case of typical REST APIs, HTTP provides the core protocol with handshake, and separation of commands/replies. The application is merely declaring JSON to be the data format for the messages. So in the case of REST APIS with JSON, you can use any standard HTTP / REST client library, for the protocol part and any standard JSON library for the data (de)serialization. Talking to QEMU you get to build your own client from first principals. QMP isn't especially complicated, so this isn't a massive burden, but it doesn't exactly give a good first impression either. It also means QMP isn't easily extensible. eg if we used HTTP as our base, then we'd get remote TLS support for free from whatever library we used. We could do TLS with QMP, but again we get to build the pieces for this on both client/server side. Using a standard like HTTP would open door to other interesting ideas, like putting a HTTP proxy on a host, so you can have 1 HTTP server fronting all 1000 VMs on the host, meaning only need a single port instead of 1000 ports in the firewall. Again you can build such an architecture on top of QMP but its all wheel reinvention. Regards, Daniel
On Fri, Jul 31, 2020 at 07:16:54PM +0200, Paolo Bonzini wrote: > >>> Another example would be adopting a standard code style and using a > >>> tool like clang-format to enforce this for entire of existing code > >>> base and future contributions and throwing away our checkpatch.pl > >>> which nearly everyone is scared of touching as it is Perl code. > >> Checkpatch does much more than that, though the scary part is indeed the > >> one that enfoces coding style. I wouldn't have a problem with using > >> clang-format and calling it a day. > > If there are things missing that we consider important, a long term > > better strategy would be to use the Python binding to libclang to > > detect the problem, instead of trying to parse C code with Perl and > > regexes. > > Most of it is simply "use this function instead of this one" or "place a > comment to explain why you're using this". The main feature of > checkpatch.pl however is that it works on patches, not just files, but > still there would be a substantial advantage in employing clang-format. You say "main feature", I say "biggest flaw" ;-P Doing checks on patches is the single worst thing about the way we do code style validation, at it means the bulk of committed code is never in compliance. The need to check patches is precisely because the committed code is unclean and so can't be checked without raising pages of problems. Once clang-format forces the entire codebase to be in compliance then there is (almost) no reason to check patches at all. Simply apply the patch and check the resulting tree. You do still want a check on the patch to validate Signed-off-by, but that can be done as a standalone script eg in libvirt when using GitLab CI for validating patch series, we have this as a job: https://gitlab.com/libvirt/libvirt-ci/-/blob/master/containers/check-dco/check-dco.py Regards, Daniel
On 31/07/20 19:27, Daniel P. Berrangé wrote: > You say "main feature", I say "biggest flaw" ;-P > > Doing checks on patches is the single worst thing about the way > we do code style validation, at it means the bulk of committed code > is never in compliance. The need to check patches is precisely because > the committed code is unclean and so can't be checked without raising > pages of problems. This is true for code formatting but not for style warnings. A stupid example is that you need to use strtol to implement the recommended replacement qemu_strtol. We could invent our own "allow-this" lint syntax but it would be a much bigger job. Paolo
On 31/07/20 19:20, Daniel P. Berrangé wrote: > It also means QMP isn't easily extensible. eg if we used > HTTP as our base, then we'd get remote TLS support for free from > whatever library we used. ... and we would lose events, unless we do something with HTTP/2 and streaming responses. We would also have to pass the TLS certificates to whatever library we used (which might even be using openssl instead of gnutls). So it's not that simple, and that's why I'm hesitant to see things as a universal improvement without seeing the code. Paolo > We could do TLS with QMP, but again we > get to build the pieces for this on both client/server side. > Using a standard like HTTP would open door to other interesting > ideas, like putting a HTTP proxy on a host, so you can have 1 > HTTP server fronting all 1000 VMs on the host, meaning only need > a single port instead of 1000 ports in the firewall. Again you > can build such an architecture on top of QMP but its all wheel > reinvention.
On 7/31/20 12:35 PM, Paolo Bonzini wrote: > On 31/07/20 17:26, John Snow wrote: >> I saw the critique that we still use JSON-ish for the runtime QMP >> protocol, and moving the QAPI IDL to a standard wouldn't remove all >> instances of a custom format from our tree. > > Sorry, but "still using JSON" is not a critique that makes any sense. > > 99% of the websites you use daily use JSON as their RPC > frontend-to-backend language; OpenAPI is essentially JSON over HTTP. > There must be something good in JSON. > > Whenever you hear a complaint about "using JSON", it's actually a > complaint about bindings for your language, which can be a sensible > critique: gRPC is essentially {protobuf,FlatBuffers} over HTTP/2 plus a > boatload of language mappings. Unfortunately C is not one of those > mappings. > > Paolo > You have misunderstood me. The critique I am relaying, but not raising, is that we already use a custom JSON parser in two or more places, and so replacing one instance of this with a new format actually complicates QEMU instead of simplifies it. I disagree with this concern on the premise that moving one non-standard JSON usage to a standard usage is a win because it reduces the total number of instances of proprietary formats. Further, if we remove ALL instances of proprietary JSON, then we're back to the same level of complexity internally, but with a reduced level of complexity for outside observers. This is why I said "JSON-ish", not "JSON". --js
On 31/07/20 19:53, John Snow wrote: > You have misunderstood me. > > The critique I am relaying, but not raising, is that we already use a > custom JSON parser in two or more places, and so replacing one instance > of this with a new format actually complicates QEMU instead of > simplifies it. > > I disagree with this concern on the premise that moving one non-standard > JSON usage to a standard usage is a win because it reduces the total > number of instances of proprietary formats. > > Further, if we remove ALL instances of proprietary JSON, then we're back > to the same level of complexity internally, but with a reduced level of > complexity for outside observers. I think we should first build a consensus on using "real" JSON (plus Javascript comments) for the schema, which is easy, and then somebody can try his hands at removing the custom JSON parser. I wouldn't conflate the QMP and schema parsers. For example, QMP does not need comments and schemas don't need either bigints or printf-style % interpolation. Paolo
On Thu, Jul 30, 2020 at 12:38 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Thu, Jul 30, 2020 at 11:07:26AM +0200, Markus Armbruster wrote: > > Andrea Bolognani <abologna@redhat.com> writes: > > > > > The various schemas included in QEMU use a JSON-based format which > > > is, however, strictly speaking not valid JSON. > > > > > > As a consequence, when vim tries to apply syntax highlight rules > > > for JSON (as guessed from the file name), the result is an unreadable > > > mess which mostly consist of red markers pointing out supposed errors > > > in, well, pretty much everything. > > > > > > Using Python syntax highlighting produces much better results, and > > > in fact these files already start with specially-formatted comments > > > that instruct Emacs to process them as if they were Python files. > > > > > > This commit adds the equivalent special comments for vim. > > > > > > Signed-off-by: Andrea Bolognani <abologna@redhat.com> > > Given that we already have emacs mode-lines, I see no reason to > not also have vim mode-lines, so regardless of the deeper discussion > I think this is patch is fine to merge in the short term > > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> > > > > Naming QAPI schema files .json even though their contents isn't was a > > mistake. Correcting it would be a pain. If we correct it, then the > > sooner the better. > > > > Renaming them to .py gives decent editor support out of the box. Their > > contents isn't quite Python, though: true vs. True, false vs. False. Do > > we care? Only a few dozen occurences; they could be adjusted. > > > > Renaming them to .qapi would perhaps be less confusing, for the price of > > "out of the box". > > IMHO, the critical rule is that if you a pick a particular file extension > associated with an existing language, you absolutely MUST BE compliant > with that language. > > We fail at compliance with both JSON and Python because we're actually > using our own DSL (domain specific language). > > IOW if we're going to stick with our current file format, then we should > be naming them .qapi. We can still use an editor mode line if we want to > claim we're approximately equiv to another language, but we can't be > surprised if editors get upset. > > > The bigger question is whether having our own DSL is justified ? > > I'm *really* sceptical that it is. > > > We can't use JSON because it lacks comments. So we invented our own > psuedo-JSON parser that supported comments, and used ' instead of " > for some reason. We also needed to be able to parse a sequence of > multiple JSON documents in one file. We should have just picked a > different language because JSON clearly didn't do what we eneeded. > > You suggest naming them .py. If we do that, we must declare that they > are literally Python code and modify them so that we can load the > files straight into the python intepretor as code, and not parse > them as data. I feel unhappy about treating data as code though. > > > While JSON doesn't do what we need, its second-cousin YAML is a more > flexible format. Taking one example > > --- > ## > # @ImageInfoSpecificQCow2: > # > # @compat: compatibility level > # > # ...snip... > # > # Since: 1.7 > ## > struct: ImageInfoSpecificQCow2 > data: > compat: str > "*data-file": str > "*data-file-raw": bool > "*lazy-refcounts": bool > "*corrupt": bool > refcount-bits: int > "*encrypt": ImageInfoSpecificQCow2Encryption > "*bitmaps": > - Qcow2BitmapInfo > compression-type: Qcow2CompressionType > > > Then we could use a regular off the shelf YAML parser in python. > > The uglyiness with quotes is due to the use of "*". Slightly less ugly > if we simply declare that quotes are always used, even where they're > not strictly required. > > struct: ImageInfoSpecificQCow2 > data: > "compat": "str" > "*data-file": "str" > "*data-file-raw": "bool" > "*lazy-refcounts": "bool" > "*corrupt": "bool" > "refcount-bits": "int" > "*encrypt": "ImageInfoSpecificQCow2Encryption" > "*bitmaps": > - "Qcow2BitmapInfo" > "compression-type": "Qcow2CompressionType" > > With the use of "---" to denote the start of document, we have no trouble > parsing our files which would actually be a concatenation of multiple > documents. The python YAML library provides the easy yaml.load_all() > method. We had the same issue in vdsm. Someone ported qemu "json" schema to vdsm, probbay when the plan was to add C API to vdsm, which never happened. My first patch to vdsm was fixing the parser for this "json" format, because it used to get in an endless loop if an unknown token was found. We hated this format, and finally replaced it with yaml. But we did not keep the comments since they duplicate data which is already in the json part, and not portable to other formats. Here is the patch adding schema convertor from qemu "json" format to standard yaml: https://github.com/oVirt/vdsm/commit/e57b69e72987c0929b20306c454835b52b5eb7ee The current version of the new yaml based schema: https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml We don't use comments, so the yaml is portable to json or regular python dict. In fact, we use the schama in as a pickle of the parsed schema for 5 times faster loading, which is important since we use the schema in the command line client. Having the comments part of the schema allows nice things like verifying requests and generating help messages directly from the schema. This is not a good example before our implementation is poor, but: $ vdsm-client Host getDeviceList -h usage: vdsm-client Host getDeviceList [-h] [arg=value [arg=value ...]] positional arguments: arg=value storageType: Only return devices of this type guids: Only return info on specific list of block device GUIDs checkStatus: Indicates if device status should be checked JSON representation: { "storageType": { "BlockDeviceType": "enum ['FCP', 'MIXED', 'iSCSI']" }, "guids": [ "string", {} ], "checkStatus": "boolean" } optional arguments: -h, --help show this help message and exit vdsm-client knows nothing about vdsm API and we never have to change it, because it generates the command line interface and the help messages from the schema on the fly, and its input and output is json. vdsm/client.py is similar, providing vdsm API without knowing anything about the API, or requiring changes when APIs are added or modified, because everything is done by inspecting the schema: >>> from vdsm import client >>> c = client.connect("localhost") >>> c.Host.getDeviceList(storageType="FCP", checkStatus=False) [] >>> print(c.Host) <vdsm.client.Namespace object at 0x7fcda017fa58> >>> print(c.Host.getDeviceList) functools.partial(<bound method _Client._call of <vdsm.client._Client object at 0x7fcda757e0f0>>, 'Host', 'getDeviceList') I think inventing DSLs and developing tools is wrong. Use standard format and tools and spend time on the core of the project. Nir
Daniel P. Berrangé <berrange@redhat.com> writes: > On Fri, Jul 31, 2020 at 11:26:28AM -0400, John Snow wrote: >> > The long answer is that as a general philosophy I'm in favour of agressively >> > eliminating anything that is custom to a project and isn't offering an >> > compelling benefit over a functionally equivalent, commonly used / standard >> > solution. >> > >> >> I agree as violently as I know how. The purpose of this is not for us, it's >> for the ecosystem. >> >> I saw the critique that we still use JSON-ish for the runtime QMP protocol, >> and moving the QAPI IDL to a standard wouldn't remove all instances of a >> custom format from our tree. > > I'd consider the runtime protocol separately. In terms of what's on the > wire, we use genuine JSON format. Yes. Fine print: QMP accepts (but does not emit) minor extensions[*]. RFC 8592 actually permits this: "A JSON parser MAY accept non-JSON forms or extensions." This is due to lazy implementation, not due to an actual need. We could deprecate and remove. > The Runtime problem is simply that JSON > standard is useless when it comes to integers, leaving behaviour undefined > in the standard if you exceed 53 bits of precision. So there's no way to > reliably parse unsigned 64-bit integers. Given that QEMU needs to pass > uint64 values, JSON was simply the wrong choice of format for QMP. To be precise: "An implementation may set limits on the range and precision of numbers." Not quite undefined behavior. Useless all the same. There are more interoperability pitfalls due to JSON's notoriously useless minimal requirements. Range of numbers is the most relevant one, because so many implementations set useless numeric limits, and there is no good way to work around the issue. > There's a 3rd aspect which is our source code that deals with JSON, where > we defined some JSON extensions to make it easier for C code to construct > JSON documents for sending over the wire. Example: qmp_error_response() uses the interpolation extension to safely insert values into a JSON syntax tree. rsp = qdict_from_jsonf_nofail("{ 'error': { 'class': %s, 'desc': %s } }", QapiErrorClass_str(error_get_class(err)), error_get_pretty(err)); Two extensions, actually: 1. Interpolation: tokens starting with % get replaced by the corresponding variable argument converted to a QObject. 2. Single-quoted strings to avoid leaning toothpick syndrome. Without interpolation, we'd have to construct the tree by hand, like this: error = qdict_new(); qdict_put_str(error, "class", QapiErrorClass_str(error_get_class(err)), qdict_put_str(error, "desc", error_get_pretty(err)); rsp = qdict_new(); qdict_put(rsp, error); Much harder to read even for such a simple example. > Back when we did this, it was a > reasonably good idea as no obvious alternative existed for this problem. > Today, I would just suggest using GLib's GVariant feature, which solves > the same problem for GLib's DBus APIs. > > It is a shame we didn't just use DBus back in the day as that's a well > specified, simple protocol that would have done everything we needed, > including the ability to actually represent integers reliably. We > would be able to trivially talk to QEMU from any programming language, > and use common DBus code-generator tools instead of writing code > generators ourselves. Water under the bridge. [...] [*] Single-quoted strings and \' in strings.
Paolo Bonzini <pbonzini@redhat.com> writes: > On 31/07/20 19:53, John Snow wrote: >> You have misunderstood me. >> >> The critique I am relaying, but not raising, is that we already use a >> custom JSON parser in two or more places, and so replacing one instance >> of this with a new format actually complicates QEMU instead of >> simplifies it. >> >> I disagree with this concern on the premise that moving one non-standard >> JSON usage to a standard usage is a win because it reduces the total >> number of instances of proprietary formats. >> >> Further, if we remove ALL instances of proprietary JSON, then we're back >> to the same level of complexity internally, but with a reduced level of >> complexity for outside observers. > > I think we should first build a consensus on using "real" JSON (plus > Javascript comments) for the schema, which is easy, and then somebody > can try his hands at removing the custom JSON parser. > > I wouldn't conflate the QMP and schema parsers. For example, QMP does > not need comments and schemas don't need either bigints or printf-style > % interpolation. Seconded. QAPI schema syntax and the QMP syntax are totally separate. Heck, the whole *languages* are. They happen to use vaguely similar concrete syntax, a bit like C, Java and JavaScript do. That's all. Let's keep this thread focused on the QAPI schema language.
Paolo Bonzini <pbonzini@redhat.com> writes: > On 31/07/20 17:07, Daniel P. Berrangé wrote: >> The QAPI JSON-but-not file format is a case where I think we should just >> adopt a standard file format no matter what. A conversion will have some >> short term work, but this is really simple data to deal with and the code >> involved is nicely self contained. Again I'm not saying QAPI maintainers >> must do it, just put the idea out there as a piece of work that would >> be welcomed if someone is interested in working ont. > > The main issues with JSON-but-not in QEMU are: > > - the 64-bit integers, which does not apply to the QAPI schema though QMP only. The QAPI schema does not use JSON numbers at all so far. > - the comments, which are a common extension to JSON (see JSON5, NodeJS > config files, json_verify's -c option, etc.) so I find it quite surprising > that no off-the-shelf Python component can parse JSON + comments QAPI schema only. QMP does not support comments. > - the single-quote strings, which are not particularly useful in QAPI schema Every single string in the QAPI schema uses them, though. I have no idea why Anthony put them in the QAPI schema language. We could remove them from the QAPI schema language. Flag day, and git-blame becomes pretty much useless for a couple of years. Removing them from QMP would be painless for QEMU itself, but could upset QMP clients that (unwisely) use them. > If we changed the single-quote strings to double-quote, jsonc.vim > (https://github.com/neoclide/jsonc.vim) seems to support JSON + comments. > In Emacs you'd probably add an epilogue like > > (defconst json-mode-comments-re (rx (group "#" (zero-or-more nonl) line-end))) > (push (list json-mode-comments-re 1 font-lock-comment-face) json-font-lock-keywords-1) > > Did I miss anything? Let me reiterate that parsing the lower layer of the QAPI schema language is the trivial part (because we made it trivial, accepting a considerable hit to ergonomics). For basic editor support, parsing the lower layer is all you need. But to truly work with the schema, you need to grok the (less than trivial!) upper layer. Making the lower layer slightly more trivial is not going to help with that. We can of course indulge in buyer's remorse on our choice to develop our own schema language for QAPI. But that's separate discussion. As long as we have our own QAPI schema language, we should use a single frontend for working with it. > Besides that, why are we using Python and not JavaScript in the mode line? Falls apart for # comments. JavaScript uses // and /* */. [...]
On 31/07/20 17:44, Daniel P. Berrangé wrote: > I'd consider the runtime protocol separately. In terms of what's on the > wire, we use genuine JSON format. The runtime problem is simply that JSON > standard is useless when it comes to integers, leaving behaviour undefined > in the standard if you exceed 53 bits of precision. So there's no way to > reliably parse unsigned 64-bit integers. Given that QEMU needs to pass > uint64 values, JSON was simply the wrong choice of format for QMP. JSON's 53-bit precision was not part of RFC4627, which was the JSON specification in 2010. They say hindsight is 20/20, but referring to RFC7159 which would come 4 years later is rewriting history, not hindsight. > There's a 3rd aspect which is our source code that deals with JSON, where > we defined some JSON extensions to make it easier for C code to construct > JSON documents for sending over the wire. Back when we did this, it was a > reasonably good idea as no obvious alternative existed for this problem. > Today, I would just suggest using GLib's GVariant feature, which solves > the same problem for GLib's DBus APIs. Many years ago actually I tried replacing QObject with GVariant. I'm pretty sure the code for that experiment is lost but it took me just a couple days so it could be redone. The only issue was that QObjects are mutable so some instances of QString had to be replaced with GString. (A small part of it was merged as commit 9bada8971173345ceb37ed1a47b00a01a4dd48cf for unrelated reasons). > It is a shame we didn't just use DBus back in the day as that's a well > specified, simple protocol that would have done everything we needed, > including the ability to actually represent integers reliably. We > would be able to trivially talk to QEMU from any programming language, > and use common DBus code-generator tools instead of writing code > generators ourselves. Not really, DBus doesn't provide the extensibility that we get from optional arguments in commands and optional fields in structs. Again, we may discuss the QMP protocol itself, but JSON *was chosen for a reason*. Paolo
On 03/08/20 10:18, Markus Armbruster wrote: >> - the single-quote strings, which are not particularly useful in QAPI schema > Every single string in the QAPI schema uses them, though. > > I have no idea why Anthony put them in the QAPI schema language. > > We could remove them from the QAPI schema language. Flag day, and > git-blame becomes pretty much useless for a couple of years. Is that a nack or a "whatever"? Paolo > Removing them from QMP would be painless for QEMU itself, but could > upset QMP clients that (unwisely) use them. >
Kevin Wolf <kwolf@redhat.com> writes: > Am 31.07.2020 um 11:01 hat Markus Armbruster geschrieben: >> Kevin Wolf <kwolf@redhat.com> writes: >> >> > Am 30.07.2020 um 17:11 hat Eric Blake geschrieben: >> >> > JSON is a exceptionally poor choice for a DSL, or even a configuration >> >> > language. >> >> > >> >> > Correcting our mistake involves a flag day and a re-learn. We need to >> >> > weigh costs against benefits. >> >> > >> >> > The QAPI schema language has two layers: >> >> > >> >> > * JSON, with a lexical and a syntactical sub-layer (both in parser.py) >> >> An incompatible dialect of JSON with a doc comment language, actually. >> >> The need to keep doc generation working could complicate replacing the >> lower layer. > > Good point, we would have to keep the comment parser either way to be > used on top of the YAML (or whatever) parser. > > Whatever parser we use would have to actually make comments available > rather than immediately filtering them out. This might exist for most > languages, but it will probably not be the most commonly used parser > either (or at least it will not allow using a simple interface like > json.loads() in Python). > >> >> > >> >> > * QAPI, with a context-free and a context-dependend sub-layer (in >> >> > expr.py and schema.py, respectively) >> >> > >> >> > Replacing the JSON layer is possible as long as the replacement is >> >> > sufficiently expressive (not a tall order). >> >> >> >> I'm open to the idea, if we want to attempt it, and agree with the >> >> assessment that it is not a tall order. >> >> Careful, "not a tall order" is meant to apply to the "sufficiently >> expressive" requirement for a replacemnt syntax. >> >> On actually replacing the lower layer, I wrote "we need to weigh costs >> against benefits." >> >> > I'm not so sure about that. I mean, it certainly sounds doable if need >> > be, but getting better syntax highlighting by default in some editors >> > feels like a pretty weak reason to switch out the complete schema >> > language. >> > >> > At first I was going to say "but if you don't have anything else to do >> > with your time...", but it's actually not only your time, but the time >> > of everyone who has development branches or downstream repositories and >> > will suffer rather nasty merge conflicts. So this will likely end up >> > having a non-negligible cost. >> >> Yup. >> >> > So is there more to it or are we really considering doing this just >> > because editors can tell more easily what to do with a different syntax? >> >> If memory serves, the following arguments have been raised: >> >> 1. A chance to improve ergonomics for developers >> >> Pain points include >> >> - Confusion >> >> It claims to be JSON, but it's not. >> >> - Need to learn another syntax >> >> Sunk cost for old hands, but it's a valid point all the same. > > We use a similar (same?) form of "almost JSON" for QMP which will still > exist. So we'd be moving from having to learn one (non-standard) > language to two languages (one still non-standard and another one that > is hopefully more standard). QMP is JSON (no almost). It accepts single-quoted strings as an extension (but does not produce them). This is permitted by the RFC. We can get rid of the extension if it irks us. There's also the QMP-generating language in the template string of qobject_from_jsonf_nofail() & friends. Helps keeping C code readable. This template language is definitely not JSON (not even almost). >> - Poor tool support >> >> JSON tools don't work. Python tools do, but you may have to work >> around the issue of true, false. > > This is mostly the editor question this patch was about, right? Or are > people trying to use more sophisticated tools on it? I occasionally paste schema bits into Python (working around the true/false issue), for quick ad hoc hackery, where hooking into the real frontend would be overkill. Anything more advanced than that would be a bad idea, in my opinion. Use the real frontend. >> - Excessive quoting >> >> - Verbosity >> >> When all you have is KEY: VALUE, defining things with multiple >> properties becomes verbose like >> >> 'status': { 'type': 'DirtyBitmapStatus', >> 'features': [ 'deprecated' ] } >> >> We need syntactic sugar to keep vebosity in check for the most >> common cases. More complexity. > > I don't think this is something any of the suggested languages would > address. Correct. >> - No trailing comma in arrays and objects >> >> - No way to split long strings for legibility >> >> - The doc comment language is poorly specified >> >> - Parse error reporting could be better (JSON part) / could hardly be >> worse (doc comment part) > > Has anyone looked into what error messages are like for the suggested > alternatives? "error reporting could be better" is something that is > true for a lot of software. > > The doc comment part is not going to change unless we get rid of > comments and actually make documentation part of the objects themselves. > This might not be very readable. With decent string syntax, the doc comment blocks could be turned into strings. But then we'd parse the strings instead, so no real change. > Or I should rather say, making the doc comment part change is possible, > but would require the same changes as with our current lower layer > language and parser. > >> 2. Not having to maintain our own code for the lower layer >> >> I consider this argument quite weak. parser.py has some 400 SLOC. >> Writing and rewriting it is sunk cost. Keeping it working has been >> cheap. Keeping the glue for some off-the-shelf parser working isn't >> free, either. No big savings to be had here, sorry. >> >> Almost half of parser.c is about doc comments, and it's the hairier >> part by far. Peter has patches to drag the doc comment language >> closer to rST. I don't remember whether they shrink parser.py. >> >> 3. Make the schema more easily consumable by other programs >> >> Use of a "standard" syntax instead of our funky dialect of JSON means >> other programs can use an off-the-shelf parser instead of using or >> reimplementing parser.py. >> >> Valid point for programs that parse the lower layer, and no more, say >> for basic syntax highlighting. >> >> Pretty much irrelevant for programs that need to go beyond the lower >> layer. Parsing the lower layer is the easy part. The code dealing >> with the upper layer is much larger (expr.py and schema.py), and it >> actually changes as we add features to the schema language. >> Duplicating it would be a Bad Idea. Reuse the existing frontend >> instead. > > Do other programs that go beyond syntax highlighting actually get to > parse our QAPI schema definitions? Or would they rather deal with the > return value of query-qmp-schema? query-qmp-schema is for introspecting QMP. It tells you what *this* build of QEMU's QMP can do. The schema tells you what QMP can do in *any* build of this version of QEMU, and more. To introspect QMP, process output of query-qmp-schema. To work with the QAPI schema, interface with the frontend from scripts/qapi/. > Neither the QAPI schema nor a YAML file with the same structure are a > standard approach to describe JSON documents. So even if we replace JSON > in the lower layer, the whole thing (and as you say, the upper layer is > the more interesting part) still stays non-standard in a way and more > advanced tools can't be used with it. > > And of course, even if we did use something more standard like JSON > Schema or whatever exists for YAML, we would still have to massively > extend it because the QAPI schema contains much more information than > just what would be needed to validate some input. We control all aspects > of generated C code with it. Yup. "IDL for QMP" is just one aspect of QAPI.
On Fri, Jul 31, 2020 at 07:42:52PM +0200, Paolo Bonzini wrote: > On 31/07/20 19:27, Daniel P. Berrangé wrote: > > You say "main feature", I say "biggest flaw" ;-P > > > > Doing checks on patches is the single worst thing about the way > > we do code style validation, at it means the bulk of committed code > > is never in compliance. The need to check patches is precisely because > > the committed code is unclean and so can't be checked without raising > > pages of problems. > > This is true for code formatting but not for style warnings. A stupid > example is that you need to use strtol to implement the recommended > replacement qemu_strtol. We could invent our own "allow-this" lint > syntax but it would be a much bigger job. Yes, I assumed use of a mechanism for identifying exceptions, as that is pretty standard practice for any tool that's doing whole source tree validation. This would still be better than what we have today because developers reading or copying the can actually then see whether the style violation was intentionale, as opposed to historical accident, and thus less likely to submit patches with style violations. Regards, Daniel
On Fri, Jul 31, 2020 at 07:47:01PM +0200, Paolo Bonzini wrote: > On 31/07/20 19:20, Daniel P. Berrangé wrote: > > It also means QMP isn't easily extensible. eg if we used > > HTTP as our base, then we'd get remote TLS support for free from > > whatever library we used. > > ... and we would lose events, unless we do something with HTTP/2 and > streaming responses. We would also have to pass the TLS certificates to > whatever library we used (which might even be using openssl instead of > gnutls). So it's not that simple, and that's why I'm hesitant to see > things as a universal improvement without seeing the code. I didn't mean to suggest that QEMU should use HTTP, I was just comparing QMP use of JSON vs the common webservices using RST with JSON. HTTP/2 streaming is not required for async events though, there's a variety of ways to do that with HTTP/1.1. Open a HTTP connection and issue GET for /events, and the server will then simply not respond until it has an event ready. Once a event is received, that request is complete. With connection reuse enabled though, another GET request can be made to wait for the next event without the overhead of re-establishing the connection. Alternatively a single HTTP request can be used, but with the response using chunked event so that it can send back an arbitrary unbounded amount of data spread over time. The client can receive and process this data on the fly without waiting for the request to complete. Regards, Daniel
On Mon, Aug 03, 2020 at 10:18:29AM +0200, Markus Armbruster wrote: > Paolo Bonzini <pbonzini@redhat.com> writes: > > > - the single-quote strings, which are not particularly useful in QAPI schema > > Every single string in the QAPI schema uses them, though. > > I have no idea why Anthony put them in the QAPI schema language. > > We could remove them from the QAPI schema language. Flag day, and > git-blame becomes pretty much useless for a couple of years. I don't think the git-blame issue is a big deal, just a minor inconvenience. Say you find the line of code you are examining hits the commit which did the refactoring. You merely have to run git blame a second time passing SHA-OF-REFACTOR^1 as an arg. Regards, Daniel
Paolo Bonzini <pbonzini@redhat.com> writes: > On 31/07/20 17:44, Daniel P. Berrangé wrote: >> I'd consider the runtime protocol separately. In terms of what's on the >> wire, we use genuine JSON format. The runtime problem is simply that JSON >> standard is useless when it comes to integers, leaving behaviour undefined >> in the standard if you exceed 53 bits of precision. So there's no way to >> reliably parse unsigned 64-bit integers. Given that QEMU needs to pass >> uint64 values, JSON was simply the wrong choice of format for QMP. > > JSON's 53-bit precision was not part of RFC4627, which was the JSON > specification in 2010. They say hindsight is 20/20, but referring to > RFC7159 which would come 4 years later is rewriting history, not hindsight. RFC 4627's original sin: 4. Parsers [...] An implementation may set limits on the size of texts that it accepts. An implementation may set limits on the maximum depth of nesting. An implementation may set limits on the range of numbers. An implementation may set limits on the length and character contents of strings. Giving license to set limits is not a sin, absence of minimal limits is. Became an actual problem only when many implementations followed the early JavaScript implementation(s), which had inherited JavaScript's quick-and-dirty approach to numerical towers: double should be enough for anybody. >> There's a 3rd aspect which is our source code that deals with JSON, where >> we defined some JSON extensions to make it easier for C code to construct >> JSON documents for sending over the wire. Back when we did this, it was a >> reasonably good idea as no obvious alternative existed for this problem. >> Today, I would just suggest using GLib's GVariant feature, which solves >> the same problem for GLib's DBus APIs. > > Many years ago actually I tried replacing QObject with GVariant. I'm > pretty sure the code for that experiment is lost but it took me just a > couple days so it could be redone. The only issue was that QObjects are > mutable so some instances of QString had to be replaced with GString. > > (A small part of it was merged as commit > 9bada8971173345ceb37ed1a47b00a01a4dd48cf for unrelated reasons). > >> It is a shame we didn't just use DBus back in the day as that's a well >> specified, simple protocol that would have done everything we needed, >> including the ability to actually represent integers reliably. We >> would be able to trivially talk to QEMU from any programming language, >> and use common DBus code-generator tools instead of writing code >> generators ourselves. > > Not really, DBus doesn't provide the extensibility that we get from > optional arguments in commands and optional fields in structs. Enables compatible evolution of interfaces, which has been a massive win for us. > Again, > we may discuss the QMP protocol itself, We can discuss everything, but please not everything in the same thread. This one is / tries to be about the QAPI schema language. > but JSON *was chosen for a reason*. In our field, dissing prior technical choices without making an effort to understand the tradeoffs that led to them is an ancient tradition :)
Paolo Bonzini <pbonzini@redhat.com> writes: > On 03/08/20 10:18, Markus Armbruster wrote: >>> - the single-quote strings, which are not particularly useful in QAPI schema >> Every single string in the QAPI schema uses them, though. >> >> I have no idea why Anthony put them in the QAPI schema language. >> >> We could remove them from the QAPI schema language. Flag day, and >> git-blame becomes pretty much useless for a couple of years. > > Is that a nack or a "whatever"? It's "is this really worth the trouble?" I guess that's halfway between NAK and whatever, ready to be moved in either direction by arguments :) [...]
Daniel P. Berrangé <berrange@redhat.com> writes: > On Mon, Aug 03, 2020 at 10:18:29AM +0200, Markus Armbruster wrote: >> Paolo Bonzini <pbonzini@redhat.com> writes: >> >> > - the single-quote strings, which are not particularly useful in QAPI schema >> >> Every single string in the QAPI schema uses them, though. >> >> I have no idea why Anthony put them in the QAPI schema language. >> >> We could remove them from the QAPI schema language. Flag day, and >> git-blame becomes pretty much useless for a couple of years. > > I don't think the git-blame issue is a big deal, just a minor inconvenience. > Say you find the line of code you are examining hits the commit which did > the refactoring. You merely have to run git blame a second time passing > SHA-OF-REFACTOR^1 as an arg. I do that all the time, and it's bloody annoying, especially when I have to recurse more than once. Show-stopper? Nah. Still, I insist it is put on the scales when we weigh the tradeoffs.
On Mon, Aug 03, 2020 at 10:41:22AM +0200, Paolo Bonzini wrote: > On 31/07/20 17:44, Daniel P. Berrangé wrote: > > I'd consider the runtime protocol separately. In terms of what's on the > > wire, we use genuine JSON format. The runtime problem is simply that JSON > > standard is useless when it comes to integers, leaving behaviour undefined > > in the standard if you exceed 53 bits of precision. So there's no way to > > reliably parse unsigned 64-bit integers. Given that QEMU needs to pass > > uint64 values, JSON was simply the wrong choice of format for QMP. > > JSON's 53-bit precision was not part of RFC4627, which was the JSON > specification in 2010. They say hindsight is 20/20, but referring to > RFC7159 which would come 4 years later is rewriting history, not hindsight. I wasn't refering to RFC7159. The problem of undefined integer precision with JSON came up right at the very start when QMP was first designed and implemented, and has come up again periodically ever since then. libvirt needed to do workarounds right at the start in 2009, in order to fully handle signed/unsigned 64-bit integers with QMP. Regards, Daniel
On 03/08/20 13:28, Markus Armbruster wrote: >>> We could remove them from the QAPI schema language. Flag day, and >>> git-blame becomes pretty much useless for a couple of years. >> Is that a nack or a "whatever"? > It's "is this really worth the trouble?" I guess that's halfway between > NAK and whatever, ready to be moved in either direction by arguments :) In general it seems like a good idea to use a standard file format and not "a standard file format except for two characters". :) We also wouldn't be having discussions on editors. That said, after a bit more research I'm skeptical about the possibility of using an off-the-shelf parser because most of them either don't support comments, or are based on YAJL which simply discards comments. Since '//' comments are harder to parse than "#" comments, this would actually _add_ code instead of removing it. Also since our doc comment syntax uses "##" as a delimiter, we'd have to bikeshed what the doc comments would look like ("//!", "///", etc.). This means the two parts might be considered separately: - replacing single-quote with double-quote strings - replacing # comments with // Paolo
On 01/08/20 01:12, Nir Soffer wrote: > I think inventing DSLs and developing tools is wrong. Use standard > format and tools and spend time on the core of the project. Please don't apply 2020 standards to choices that were made in 2009. Or if you do, be ready to contribute code. Paolo
On 03/08/20 13:36, Daniel P. Berrangé wrote: >>> Given that QEMU needs to pass >>> uint64 values, JSON was simply the wrong choice of format for QMP. > > I wasn't refering to RFC7159. The problem of undefined integer precision > with JSON came up right at the very start when QMP was first designed and > implemented, and has come up again periodically ever since then. libvirt > needed to do workarounds right at the start in 2009, in order to fully > handle signed/unsigned 64-bit integers with QMP. I assume the workaround you refer to is to store the number as a string and converting it lazily to either an integer or a floating-point type in whoever uses the JSON API. It may not be pretty but probably it would have been the same for any text-based, schema-less protocol. For example, it didn't require writing your own parser. It could be avoided by using a schema in Libvirt, just like QEMU has no problem with it on the other side; it's just a different design choice with different trade-offs, I don't think it's enough of an issue to declare JSON "the wrong choice of format for QMP". Paolo
On Mon, Aug 03, 2020 at 02:16:19PM +0200, Paolo Bonzini wrote: > On 03/08/20 13:36, Daniel P. Berrangé wrote: > >>> Given that QEMU needs to pass > >>> uint64 values, JSON was simply the wrong choice of format for QMP. > > > > I wasn't refering to RFC7159. The problem of undefined integer precision > > with JSON came up right at the very start when QMP was first designed and > > implemented, and has come up again periodically ever since then. libvirt > > needed to do workarounds right at the start in 2009, in order to fully > > handle signed/unsigned 64-bit integers with QMP. > > I assume the workaround you refer to is to store the number as a string > and converting it lazily to either an integer or a floating-point type > in whoever uses the JSON API. It may not be pretty but probably it > would have been the same for any text-based, schema-less protocol. For > example, it didn't require writing your own parser. Yes, we get the raw values as a string, but not all parsers for C allow you to do this. We'd really love to move off YAJL for JSON parsing, but the most viable alternatives don't have a way to let you get the string before they parse it as an integer and report errors. > It could be avoided by using a schema in Libvirt, just like QEMU has no > problem with it on the other side; it's just a different design choice > with different trade-offs, I don't think it's enough of an issue to > declare JSON "the wrong choice of format for QMP". The schema doesn't help - the problem is many JSON parsers don't allow use of full uint64 values when parsing - alot will simply report an error for anything bigger than LLONG_MAX and offer no workaround. Regards, Daniel
On 03/08/20 14:23, Daniel P. Berrangé wrote: > We'd really love to move off YAJL for JSON parsing What are the issues with YAJL? >> It could be avoided by using a schema in Libvirt, just like QEMU has no >> problem with it on the other side; it's just a different design choice >> with different trade-offs, I don't think it's enough of an issue to >> declare JSON "the wrong choice of format for QMP". > > The schema doesn't help - the problem is many JSON parsers don't allow > use of full uint64 values when parsing - alot will simply report an > error for anything bigger than LLONG_MAX and offer no workaround. Sure, but this problem is not at all unique to QEMU and JSON parsers have a way to support large integers in pretty much every language (including Javascript). In some of them like Python or Ruby it's even the default behavior. Paolo
On Mon, Aug 03, 2020 at 02:33:36PM +0200, Paolo Bonzini wrote: > On 03/08/20 14:23, Daniel P. Berrangé wrote: > > We'd really love to move off YAJL for JSON parsing > > What are the issues with YAJL? It is abandonware for more than 5 years now, with an ever growing list of bug reports and pull requests being ignored. This isn't good when the JSON parser is a security critical part of the interface between libvirt and QEMU. We tried to switch to Jannsson but that raises hard errors for values above LLONG_MAX and has no backdoor to workaround this. There's various other C JSON parsers which have their own flaws such as C namespace pollution in headers, or same integer parsing problems. > >> It could be avoided by using a schema in Libvirt, just like QEMU has no > >> problem with it on the other side; it's just a different design choice > >> with different trade-offs, I don't think it's enough of an issue to > >> declare JSON "the wrong choice of format for QMP". > > > > The schema doesn't help - the problem is many JSON parsers don't allow > > use of full uint64 values when parsing - alot will simply report an > > error for anything bigger than LLONG_MAX and offer no workaround. > > Sure, but this problem is not at all unique to QEMU and JSON parsers > have a way to support large integers in pretty much every language > (including Javascript). In some of them like Python or Ruby it's even > the default behavior. Some parsers can do it, some cannot, and in some it isn't obvious that you are loosing precision behind the scenes due to conversion to float. Regards, Daniel
Daniel P. Berrangé <berrange@redhat.com> writes: > On Mon, Aug 03, 2020 at 02:16:19PM +0200, Paolo Bonzini wrote: >> On 03/08/20 13:36, Daniel P. Berrangé wrote: >> >>> Given that QEMU needs to pass >> >>> uint64 values, JSON was simply the wrong choice of format for QMP. >> > >> > I wasn't refering to RFC7159. The problem of undefined integer precision >> > with JSON came up right at the very start when QMP was first designed and >> > implemented, and has come up again periodically ever since then. libvirt >> > needed to do workarounds right at the start in 2009, in order to fully >> > handle signed/unsigned 64-bit integers with QMP. >> >> I assume the workaround you refer to is to store the number as a string >> and converting it lazily to either an integer or a floating-point type >> in whoever uses the JSON API. It may not be pretty but probably it >> would have been the same for any text-based, schema-less protocol. For >> example, it didn't require writing your own parser. > > Yes, we get the raw values as a string, but not all parsers for C > allow you to do this. We'd really love to move off YAJL for JSON > parsing, but the most viable alternatives don't have a way to let > you get the string before they parse it as an integer and report > errors. You know, if I had to write a *general purpose* parser for a language that obviously supports numbers of arbitrary precision (but permits implementations to support less), then I'd try *hard* to stay as general as practical. At the very least, provide a way to retrieve number tokens as strings, so the decision to limit precision devolves to the client. Also, GMP exists. The fact that "most viable alternatives" to YAJL are unable to support uint64_t tempts me to condemn the whole lot as toys :) I suspect the simplicity of JSON not only lowers the barrier for toys, but also dampens the demand for general purpose non-toys. Writing and maintaining yet another JSON parser is quite possibly easier than getting one of the toys fixed up properly. [...]
Paolo Bonzini <pbonzini@redhat.com> writes: > On 03/08/20 13:28, Markus Armbruster wrote: >>>> We could remove them from the QAPI schema language. Flag day, and >>>> git-blame becomes pretty much useless for a couple of years. >>> Is that a nack or a "whatever"? >> It's "is this really worth the trouble?" I guess that's halfway between >> NAK and whatever, ready to be moved in either direction by arguments :) > > In general it seems like a good idea to use a standard file format and > not "a standard file format except for two characters". :) We also > wouldn't be having discussions on editors. No argument. But towards which standard file format should the schema evolve? * Standard JSON: no comments, no go * JSON with # comments: need to change strings from ' to " * JavaScript: need to change comments from # to // * Python: may want to switch bool literals from true, false to True, False > That said, after a bit more research I'm skeptical about the possibility > of using an off-the-shelf parser because most of them either don't > support comments, or are based on YAJL which simply discards comments. > > Since '//' comments are harder to parse than "#" comments, this would > actually _add_ code instead of removing it. Also since our doc comment > syntax uses "##" as a delimiter, we'd have to bikeshed what the doc > comments would look like ("//!", "///", etc.). Doc comments don't have to be comments in the schema language. They could be doc strings. Requires decent support for long strings, which JSON does not provide. > This means the two parts might be considered separately: > > - replacing single-quote with double-quote strings > > - replacing # comments with // If all we want is decent editor support out of the box, then rename to .py, and drop the modelines. No merge conflicts, no git-blame pollution. To make the .py files actual Python, additionally rename the bool literals. Much, much less churn than massaging all strings or all comments.
Am 03.08.2020 um 18:03 hat Markus Armbruster geschrieben: > Paolo Bonzini <pbonzini@redhat.com> writes: > > This means the two parts might be considered separately: > > > > - replacing single-quote with double-quote strings > > > > - replacing # comments with // > > If all we want is decent editor support out of the box, then rename to > .py, and drop the modelines. No merge conflicts, no git-blame > pollution. > > To make the .py files actual Python, additionally rename the bool > literals. Much, much less churn than massaging all strings or all > comments. I guess I could get behind this one. File renames still have a cost, but it feels like it wouldn't be absurdly high at least. And that you actually occasionally paste schema parts into real Python code suggests that there would be even a small benefit in addition to the good syntax highlighting out of the box. I fully expect that we'd keep our existing parser instead of using an actual Python parser, because the existing code (a) exists and (b) is probably simpler than the resulting code. Kevin
On 03/08/20 18:03, Markus Armbruster wrote: >> In general it seems like a good idea to use a standard file format and >> not "a standard file format except for two characters". :) We also >> wouldn't be having discussions on editors. > > No argument. But towards which standard file format should the schema > evolve? > > * Standard JSON: no comments, no go > > * JSON with # comments: need to change strings from ' to " > > * JavaScript: need to change comments from # to // > > * Python: may want to switch bool literals from true, false to True, > False Second or third, I'd say. I dislike using .py because a stream of Python objects doesn't really have a meaning in Python: that's the difference between .js and .json. Third requires someone to do the work in the parser. Unlikely. >> That said, after a bit more research I'm skeptical about the possibility >> of using an off-the-shelf parser because most of them either don't >> support comments, or are based on YAJL which simply discards comments. >> >> Since '//' comments are harder to parse than "#" comments, this would >> actually _add_ code instead of removing it. Also since our doc comment >> syntax uses "##" as a delimiter, we'd have to bikeshed what the doc >> comments would look like ("//!", "///", etc.). > > Doc comments don't have to be comments in the schema language. They > could be doc strings. Requires decent support for long strings, which > JSON does not provide. Exactly. This was the appeal of YAML (or StrictYAML so that Norwegians don't turn into Falsians) as far as I understood. But if we were to go YAML, I'd rather have make doc strings part of the YAML document too. That is what Nir suggested, it makes sense but someone has to write the conversion code. > If all we want is decent editor support out of the box, then rename to > .py, and drop the modelines. No merge conflicts, no git-blame > pollution. Another possibility is to rename to .qapi and keep Python modelines as a hack that does work. Paolo
On 8/3/20 8:01 AM, Paolo Bonzini wrote: > That said, after a bit more research I'm skeptical about the possibility > of using an off-the-shelf parser because most of them either don't > support comments, or are based on YAJL which simply discards comments. Heresy: Docstrings could become part of the data format so they can be parsed, analyzed and validated. Parsers largely treat comments like non-semantic information and discard it. Round-trip parsers that preserve comments in any language are extremely rare. If the docstrings are relevant to the generator and aren't discardable, they should be fully-fledged data members. In a prototype I had for a YAML format, I just promoted docstrings directly to fields, so I could allow clients to query help text for individual commands. --js
On 03/08/20 20:10, John Snow wrote: > Heresy: > > Docstrings could become part of the data format so they can be parsed, > analyzed and validated. Parsers largely treat comments like non-semantic > information and discard it. Round-trip parsers that preserve comments in > any language are extremely rare. > > If the docstrings are relevant to the generator and aren't discardable, > they should be fully-fledged data members. > > In a prototype I had for a YAML format, I just promoted docstrings > directly to fields, so I could allow clients to query help text for > individual commands. This would be actually a good idea, but somebody has to write the code. Each field's docstring should be attached to the field, however---no parsing needed only looking at the tree. Take a look at what Nir posted: > Here is the patch adding schema convertor from qemu "json" format to > standard yaml: > https://github.com/oVirt/vdsm/commit/e57b69e72987c0929b20306c454835b52b5eb7ee > > The current version of the new yaml based schema: > https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml VmDiskDevice: &VmDiskDevice added: '3.1' description: Properties of a VM disk device. name: VmDiskDevice properties: - description: Indicates if writes are prohibited for the device name: readonly type: boolean - description: The size of the disk (in bytes) name: apparentsize type: uint etc. Paolo
On 8/3/20 2:16 PM, Paolo Bonzini wrote: > On 03/08/20 20:10, John Snow wrote: >> Heresy: >> >> Docstrings could become part of the data format so they can be parsed, >> analyzed and validated. Parsers largely treat comments like non-semantic >> information and discard it. Round-trip parsers that preserve comments in >> any language are extremely rare. >> >> If the docstrings are relevant to the generator and aren't discardable, >> they should be fully-fledged data members. >> >> In a prototype I had for a YAML format, I just promoted docstrings >> directly to fields, so I could allow clients to query help text for >> individual commands. > > This would be actually a good idea, but somebody has to write the code. > Each field's docstring should be attached to the field, however---no > parsing needed only looking at the tree. Take a look at what Nir posted: > >> Here is the patch adding schema convertor from qemu "json" format to >> standard yaml: >> https://github.com/oVirt/vdsm/commit/e57b69e72987c0929b20306c454835b52b5eb7ee >> >> The current version of the new yaml based schema: >> https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml > > > VmDiskDevice: &VmDiskDevice > added: '3.1' > description: Properties of a VM disk device. > name: VmDiskDevice > properties: > - description: Indicates if writes are prohibited for the > device > name: readonly > type: boolean > > - description: The size of the disk (in bytes) > name: apparentsize > type: uint > > etc. > > Paolo > I was working on a small prototype that used something that looked like this; the "*opt" format was traded for "?opt", but otherwise: struct: name: AudiodevPerDirectionOptions doc: > General audio backend options that are used for both playback and recording. since: '4.0' members: ?mixing-engine: type: bool default: 'true' since: '4.2' doc: | Use QEMU's mixing engine to mix all streams inside QEMU and convert audio formats when not supported by the backend. When set to off, fixed-settings must be also off. ?fixed-settings: type: bool default: 'true' doc: >- Use fixed settings for host input/output. When off, frequency, channels and format must not be specified. ?frequency: type: bool default: '44100' doc: >- frequency to use when using fixed settings. ?channels: type: 'uint32' default: 2 doc: >- Number of channels when using fixed settings. ?voices: type: 'uint32' default: 1 doc: "Number of voices to use." ?format: type: 'AudioFormat' default: 's16' doc: "Sample format to use when using fixed settings." ?buffer-length: type: 'uint32' doc: 'The buffer length, in microseconds.' features: my-cool-feature: since: '6.0' doc: 'This is, no doubt, an extremely cool feature.' my-bad-feature: doc: 'This is a very bad feature. I am sorry for making it.' since: '1.0' deprecated: '5.9'
On Mon, Aug 3, 2020 at 9:19 PM John Snow <jsnow@redhat.com> wrote: > > On 8/3/20 2:16 PM, Paolo Bonzini wrote: > > On 03/08/20 20:10, John Snow wrote: > >> Heresy: > >> > >> Docstrings could become part of the data format so they can be parsed, > >> analyzed and validated. Parsers largely treat comments like non-semantic > >> information and discard it. Round-trip parsers that preserve comments in > >> any language are extremely rare. > >> > >> If the docstrings are relevant to the generator and aren't discardable, > >> they should be fully-fledged data members. > >> > >> In a prototype I had for a YAML format, I just promoted docstrings > >> directly to fields, so I could allow clients to query help text for > >> individual commands. > > > > This would be actually a good idea, but somebody has to write the code. > > Each field's docstring should be attached to the field, however---no > > parsing needed only looking at the tree. Take a look at what Nir posted: > > > >> Here is the patch adding schema convertor from qemu "json" format to > >> standard yaml: > >> https://github.com/oVirt/vdsm/commit/e57b69e72987c0929b20306c454835b52b5eb7ee > >> > >> The current version of the new yaml based schema: > >> https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml > > > > > > VmDiskDevice: &VmDiskDevice > > added: '3.1' > > description: Properties of a VM disk device. > > name: VmDiskDevice > > properties: > > - description: Indicates if writes are prohibited for the > > device > > name: readonly > > type: boolean > > > > - description: The size of the disk (in bytes) > > name: apparentsize > > type: uint > > > > etc. > > > > Paolo > > > > I was working on a small prototype that used something that looked like > this; the "*opt" format was traded for "?opt", but otherwise: > > > struct: > name: AudiodevPerDirectionOptions > doc: > > General audio backend options that are used for both > playback and recording. > since: '4.0' > members: > > ?mixing-engine: This optimizes for writing instead of reading. optional: true Would be nicer to read, but more important is all the tools parsing this schema in multiple languages that will have code like: def is_optional(node): return node.name.startswith("?") Instead of : if node.optional: ... Or maybe better: if node.required: Because it seems that more nodes are optional, so focusing on the required items will make the schema shorter and more clear. > type: bool > default: 'true' > since: '4.2' > doc: | > Use QEMU's mixing engine to mix all streams inside QEMU and > convert audio formats when not supported by the backend. Using | is nicer than >-. Not sure what is the difference. In vdsm we don't use anything and I think it causes trouble when indenting text. > When set to off, fixed-settings must be also off. > > ?fixed-settings: > type: bool > default: 'true' Why is the default a string and not the actual type? > doc: >- > Use fixed settings for host input/output. > When off, frequency, channels and format must not be specified. > > ?frequency: > type: bool > default: '44100' > doc: >- > frequency to use when using fixed settings. > > ?channels: > type: 'uint32' > default: 2 Here you use the real type, and this is nicer. > doc: >- > Number of channels when using fixed settings. > > ?voices: > type: 'uint32' > default: 1 > doc: "Number of voices to use." > > ?format: > type: 'AudioFormat' > default: 's16' > doc: "Sample format to use when using fixed settings." > > ?buffer-length: > type: 'uint32' > doc: 'The buffer length, in microseconds.' > > features: > my-cool-feature: > since: '6.0' > doc: 'This is, no doubt, an extremely cool feature.' > > my-bad-feature: > doc: 'This is a very bad feature. I am sorry for making it.' > since: '1.0' > deprecated: '5.9' Good example :-) > >
On 8/3/20 3:54 PM, Nir Soffer wrote: > On Mon, Aug 3, 2020 at 9:19 PM John Snow <jsnow@redhat.com> wrote: >> >> On 8/3/20 2:16 PM, Paolo Bonzini wrote: >>> On 03/08/20 20:10, John Snow wrote: >>>> Heresy: >>>> >>>> Docstrings could become part of the data format so they can be parsed, >>>> analyzed and validated. Parsers largely treat comments like non-semantic >>>> information and discard it. Round-trip parsers that preserve comments in >>>> any language are extremely rare. >>>> >>>> If the docstrings are relevant to the generator and aren't discardable, >>>> they should be fully-fledged data members. >>>> >>>> In a prototype I had for a YAML format, I just promoted docstrings >>>> directly to fields, so I could allow clients to query help text for >>>> individual commands. >>> >>> This would be actually a good idea, but somebody has to write the code. >>> Each field's docstring should be attached to the field, however---no >>> parsing needed only looking at the tree. Take a look at what Nir posted: >>> >>>> Here is the patch adding schema convertor from qemu "json" format to >>>> standard yaml: >>>> https://github.com/oVirt/vdsm/commit/e57b69e72987c0929b20306c454835b52b5eb7ee >>>> >>>> The current version of the new yaml based schema: >>>> https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml >>> >>> >>> VmDiskDevice: &VmDiskDevice >>> added: '3.1' >>> description: Properties of a VM disk device. >>> name: VmDiskDevice >>> properties: >>> - description: Indicates if writes are prohibited for the >>> device >>> name: readonly >>> type: boolean >>> >>> - description: The size of the disk (in bytes) >>> name: apparentsize >>> type: uint >>> >>> etc. >>> >>> Paolo >>> >> >> I was working on a small prototype that used something that looked like >> this; the "*opt" format was traded for "?opt", but otherwise: >> >> >> struct: >> name: AudiodevPerDirectionOptions >> doc: > >> General audio backend options that are used for both >> playback and recording. >> since: '4.0' >> members: >> >> ?mixing-engine: > > This optimizes for writing instead of reading. > Following a "path of least resistance" from the existing QAPI language, clearly carrying over the '*optional' syntax. > optional: true > > Would be nicer to read, but more important is all the tools parsing > this schema in multiple languages that will have code like: > > def is_optional(node): > return node.name.startswith("?") > > Instead of : > > if node.optional: > ... > > Or maybe better: > > if node.required: > > Because it seems that more nodes are optional, so focusing on the required > items will make the schema shorter and more clear. > >> type: bool >> default: 'true' >> since: '4.2' >> doc: | >> Use QEMU's mixing engine to mix all streams inside QEMU and >> convert audio formats when not supported by the backend. > > Using | is nicer than >-. Not sure what is the difference. In vdsm we don't use > anything and I think it causes trouble when indenting text. > I believe when I wrote this example I was trying to highlight the different space consumption styles for the purposes of demonstrating what it would do to Sphinx document generation support. ultimately, there's not really a way to enforce one or the other style post-parse. >> When set to off, fixed-settings must be also off. >> >> ?fixed-settings: >> type: bool >> default: 'true' > > Why is the default a string and not the actual type? > I'm going to be honest: I forget. I was playing around with the idea of documenting defaults for the purposes of documentation, but not necessarily for performing the actual code generation of those defaults. I believe I specified this field as a string in my grammar and `5` would get promoted to "5", but `true` caused a type error. Doing something in a type-safe way seemed ... harder. So I didn't. >> doc: >- >> Use fixed settings for host input/output. >> When off, frequency, channels and format must not be specified. >> >> ?frequency: >> type: bool >> default: '44100' >> doc: >- >> frequency to use when using fixed settings. >> >> ?channels: >> type: 'uint32' >> default: 2 > > Here you use the real type, and this is nicer. > >> doc: >- >> Number of channels when using fixed settings. >> >> ?voices: >> type: 'uint32' >> default: 1 >> doc: "Number of voices to use." >> >> ?format: >> type: 'AudioFormat' >> default: 's16' >> doc: "Sample format to use when using fixed settings." >> >> ?buffer-length: >> type: 'uint32' >> doc: 'The buffer length, in microseconds.' >> >> features: >> my-cool-feature: >> since: '6.0' >> doc: 'This is, no doubt, an extremely cool feature.' >> >> my-bad-feature: >> doc: 'This is a very bad feature. I am sorry for making it.' >> since: '1.0' >> deprecated: '5.9' > > Good example :-) > >> >> >
On Mon, Aug 3, 2020 at 3:23 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Mon, Aug 03, 2020 at 02:16:19PM +0200, Paolo Bonzini wrote: > > On 03/08/20 13:36, Daniel P. Berrangé wrote: > > >>> Given that QEMU needs to pass > > >>> uint64 values, JSON was simply the wrong choice of format for QMP. > > > > > > I wasn't refering to RFC7159. The problem of undefined integer precision > > > with JSON came up right at the very start when QMP was first designed and > > > implemented, and has come up again periodically ever since then. libvirt > > > needed to do workarounds right at the start in 2009, in order to fully > > > handle signed/unsigned 64-bit integers with QMP. > > > > I assume the workaround you refer to is to store the number as a string > > and converting it lazily to either an integer or a floating-point type > > in whoever uses the JSON API. It may not be pretty but probably it > > would have been the same for any text-based, schema-less protocol. For > > example, it didn't require writing your own parser. > > Yes, we get the raw values as a string, but not all parsers for C > allow you to do this. We'd really love to move off YAJL for JSON > parsing, but the most viable alternatives don't have a way to let > you get the string before they parse it as an integer and report > errors. > > > It could be avoided by using a schema in Libvirt, just like QEMU has no > > problem with it on the other side; it's just a different design choice > > with different trade-offs, I don't think it's enough of an issue to > > declare JSON "the wrong choice of format for QMP". > > The schema doesn't help - the problem is many JSON parsers don't allow > use of full uint64 values when parsing - alot will simply report an > error for anything bigger than LLONG_MAX and offer no workaround. Jansson is considering this: https://github.com/akheron/jansson/issues/69 The issue is 4 years old, but there is some activity lately. I guess someone who wants this needs to invest the time. > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > >
Kevin Wolf <kwolf@redhat.com> writes: > Am 03.08.2020 um 18:03 hat Markus Armbruster geschrieben: >> Paolo Bonzini <pbonzini@redhat.com> writes: >> > This means the two parts might be considered separately: >> > >> > - replacing single-quote with double-quote strings >> > >> > - replacing # comments with // >> >> If all we want is decent editor support out of the box, then rename to >> .py, and drop the modelines. No merge conflicts, no git-blame >> pollution. >> >> To make the .py files actual Python, additionally rename the bool >> literals. Much, much less churn than massaging all strings or all >> comments. > > I guess I could get behind this one. File renames still have a cost, but > it feels like it wouldn't be absurdly high at least. > > And that you actually occasionally paste schema parts into real Python > code suggests that there would be even a small benefit in addition to > the good syntax highlighting out of the box. > > I fully expect that we'd keep our existing parser instead of using an > actual Python parser, because the existing code (a) exists and (b) is > probably simpler than the resulting code. Replacing the part of parser.py that deals with JSON by off-the-shelf code is a non-goal for me. I got better things to do than replacing[*] a tiny parser that works by glue for another parser, moving QAPI sideways instead of forward. Any messing with the lower syntax layer in non-trivial ways needs to bring benefits that make it worth our while. [*] Includes reviewing patches.
Paolo Bonzini <pbonzini@redhat.com> writes: > On 01/08/20 01:12, Nir Soffer wrote: >> I think inventing DSLs and developing tools is wrong. Use standard >> format and tools and spend time on the core of the project. > > Please don't apply 2020 standards to choices that were made in 2009. Or > if you do, be ready to contribute code. Is it still a good choice today? For that question, we'd have to look beyond syntax. Syntax has been the most boring and least expensive part of QAPI.
Paolo Bonzini <pbonzini@redhat.com> writes: > On 03/08/20 18:03, Markus Armbruster wrote: >>> In general it seems like a good idea to use a standard file format and >>> not "a standard file format except for two characters". :) We also >>> wouldn't be having discussions on editors. >> >> No argument. But towards which standard file format should the schema >> evolve? >> >> * Standard JSON: no comments, no go >> >> * JSON with # comments: need to change strings from ' to " >> >> * JavaScript: need to change comments from # to // >> >> * Python: may want to switch bool literals from true, false to True, >> False > > Second or third, I'd say. I dislike using .py because a stream of > Python objects doesn't really have a meaning in Python: It does have a meaning: compute a bunch of dictionaries and throw them away. Its useless as a program, but it's not meaningless. > that's the > difference between .js and .json. True. RFC 4626: "JSON is a subset of JavaScript, but it is a safe subset that excludes assignment and invocation."[*] An analogous subset of Python is possible, but has not been formally defined as far as I know. > Third requires someone to do the work in the parser. Unlikely. The pain of tweaking the parser is likely dwarved several times over by the pain of the flag day. >>> That said, after a bit more research I'm skeptical about the possibility >>> of using an off-the-shelf parser because most of them either don't >>> support comments, or are based on YAJL which simply discards comments. >>> >>> Since '//' comments are harder to parse than "#" comments, this would >>> actually _add_ code instead of removing it. Also since our doc comment >>> syntax uses "##" as a delimiter, we'd have to bikeshed what the doc >>> comments would look like ("//!", "///", etc.). >> >> Doc comments don't have to be comments in the schema language. They >> could be doc strings. Requires decent support for long strings, which >> JSON does not provide. > > Exactly. This was the appeal of YAML (or StrictYAML so that Norwegians > don't turn into Falsians) as far as I understood. But if we were to go > YAML, I'd rather have make doc strings part of the YAML document too. > That is what Nir suggested, it makes sense but someone has to write the > conversion code. To write a converter, you first have to understand the doc comment language. It's a bit of a mess, because it was fitted to existing conventions to reduce churn. Peter Maydell has patches to generate rST instead of Texinfo. They affect the doc comment language. I expect to merge them for 5.2. >> If all we want is decent editor support out of the box, then rename to >> .py, and drop the modelines. No merge conflicts, no git-blame >> pollution. > > Another possibility is to rename to .qapi and keep Python modelines as a > hack that does work. Yes. [*] Unfortunately, this has become a statement of intent, not a description of reality, due to JSON design accidents.
Daniel P. Berrangé <berrange@redhat.com> writes: > On Thu, Jul 30, 2020 at 11:07:26AM +0200, Markus Armbruster wrote: >> Andrea Bolognani <abologna@redhat.com> writes: >> >> > The various schemas included in QEMU use a JSON-based format which >> > is, however, strictly speaking not valid JSON. >> > >> > As a consequence, when vim tries to apply syntax highlight rules >> > for JSON (as guessed from the file name), the result is an unreadable >> > mess which mostly consist of red markers pointing out supposed errors >> > in, well, pretty much everything. >> > >> > Using Python syntax highlighting produces much better results, and >> > in fact these files already start with specially-formatted comments >> > that instruct Emacs to process them as if they were Python files. >> > >> > This commit adds the equivalent special comments for vim. >> > >> > Signed-off-by: Andrea Bolognani <abologna@redhat.com> > > Given that we already have emacs mode-lines, I see no reason to > not also have vim mode-lines, so regardless of the deeper discussion > I think this is patch is fine to merge in the short term > > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> We could remove them all and defer to .editorconfig but I'm not going to get in the way of making our Vim brethren's lives a little easier.
On 8/4/20 4:03 AM, Markus Armbruster wrote: > The pain of tweaking the parser is likely dwarved several times over by > the pain of the flag day. You mention this often; I wonder if I misunderstand the critique, because the pain of a "flag day" for a new file format seems negligible to me. I don't think we edit these .json files very often. Generally, we add a new command when we need one. The edits are usually one or two lines plus docstrings. If anyone has patches in-flight, I genuinely doubt it will take more than a few minutes to rewrite for the new file format. No?
John Snow <jsnow@redhat.com> writes: > On 8/4/20 4:03 AM, Markus Armbruster wrote: >> The pain of tweaking the parser is likely dwarved several times over by >> the pain of the flag day. > > You mention this often; I wonder if I misunderstand the critique, > because the pain of a "flag day" for a new file format seems > negligible to me. > > I don't think we edit these .json files very often. Generally, we add > a new command when we need one. The edits are usually one or two lines > plus docstrings. > > If anyone has patches in-flight, I genuinely doubt it will take more > than a few minutes to rewrite for the new file format. > > No? You describe the the flag day's one-time pain. There's also the longer term pain of having to work around git-blame unable to see beyond the flag day. I'm not claiming the pain is prohibitive (if I thought it was, I would've tried to strange this thread in its crib), I am claiming it'll be much more painful (read: expensive) than a parser tweak.
On 05/08/20 09:36, Markus Armbruster wrote: > There's also the longer term pain of having to work around git-blame > unable to see beyond the flag day. Do you really use "git blame" that much? "git log -S" does more or less the same function (in a different way) and is not affected as much by large code movement and transformation patches. Paolo
* Paolo Bonzini (pbonzini@redhat.com) wrote: > On 05/08/20 09:36, Markus Armbruster wrote: > > There's also the longer term pain of having to work around git-blame > > unable to see beyond the flag day. > > Do you really use "git blame" that much? "git log -S" does more or less > the same function (in a different way) and is not affected as much by > large code movement and transformation patches. I use it a lot! Following stuff back to find where a change came from and then asking people. Dave > Paolo > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Markus Armbruster <armbru@redhat.com> writes: > Paolo Bonzini <pbonzini@redhat.com> writes: [...] >> That said, after a bit more research I'm skeptical about the possibility >> of using an off-the-shelf parser because most of them either don't >> support comments, or are based on YAJL which simply discards comments. >> >> Since '//' comments are harder to parse than "#" comments, this would >> actually _add_ code instead of removing it. Also since our doc comment >> syntax uses "##" as a delimiter, we'd have to bikeshed what the doc >> comments would look like ("//!", "///", etc.). > > Doc comments don't have to be comments in the schema language. They > could be doc strings. Requires decent support for long strings, which > JSON does not provide. There's another complication besides multi-line strings: funny characters. Since QAPI schema strings are all names, and names are restricted to ASCII letters, digits, hyphen, and underscore, we limit strings to printable ASCII, so we don't have to deal with control characters, escape sequences, surrogate pairs, and all that crap. Comments are UTF-8. [...]
On Wed, 5 Aug 2020 10:25:30 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > On 05/08/20 09:36, Markus Armbruster wrote: > > There's also the longer term pain of having to work around git-blame > > unable to see beyond the flag day. > > Do you really use "git blame" that much? "git log -S" does more or less > the same function (in a different way) and is not affected as much by > large code movement and transformation patches. I'm not sure the two of them really perform the same function. FWIW, I like using git {blame|annotate} to find out when/why some code areas were changed, and it's often not "when was this line introduced", but "I see some commits changing this function, let's find out more about them." And yes, I use that quite regularly.
On 05/08/20 10:39, Dr. David Alan Gilbert wrote: >> Do you really use "git blame" that much? "git log -S" does more or less >> the same function (in a different way) and is not affected as much by >> large code movement and transformation patches. > > I use it a lot! Following stuff back to find where a change came > from and then asking people. Indeed, but I use "git log -S" instead. :) Another possibility is to just do "git log -p" and search for a relevant line of the code I'm "blaming". Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > On 05/08/20 09:36, Markus Armbruster wrote: >> There's also the longer term pain of having to work around git-blame >> unable to see beyond the flag day. > > Do you really use "git blame" that much? "git log -S" does more or less > the same function (in a different way) and is not affected as much by > large code movement and transformation patches. C-x v g When that doesn't work, I fall back to git-log -S in a shell.
On Wed, Aug 05, 2020 at 10:49:35AM +0200, Paolo Bonzini wrote: > On 05/08/20 10:39, Dr. David Alan Gilbert wrote: > >> Do you really use "git blame" that much? "git log -S" does more or less > >> the same function (in a different way) and is not affected as much by > >> large code movement and transformation patches. > > > > I use it a lot! Following stuff back to find where a change came > > from and then asking people. > > Indeed, but I use "git log -S" instead. :) Another possibility is to > just do "git log -p" and search for a relevant line of the code I'm > "blaming". I used git blame alot too, but I don't think its a reason to not do the cleanups. It is easy enough to just tell blame to use an earlier commit if you see it displaying a refactor. I don't think such mild inconvenience should stop us making otherwise desirable code changes Regards, Daniel
On Wed, 5 Aug 2020 10:05:40 +0100 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Wed, Aug 05, 2020 at 10:49:35AM +0200, Paolo Bonzini wrote: > > On 05/08/20 10:39, Dr. David Alan Gilbert wrote: > > >> Do you really use "git blame" that much? "git log -S" does more or less > > >> the same function (in a different way) and is not affected as much by > > >> large code movement and transformation patches. > > > > > > I use it a lot! Following stuff back to find where a change came > > > from and then asking people. > > > > Indeed, but I use "git log -S" instead. :) Another possibility is to > > just do "git log -p" and search for a relevant line of the code I'm > > "blaming". > > I used git blame alot too, but I don't think its a reason to not do the > cleanups. It is easy enough to just tell blame to use an earlier commit > if you see it displaying a refactor. I don't think such mild inconvenience > should stop us making otherwise desirable code changes I don't think people argue that it should block changes; it it simply another thing to consider when weighing benefits vs. drawbacks.
On Wed, Aug 05, 2020 at 11:11:55AM +0200, Cornelia Huck wrote: > On Wed, 5 Aug 2020 10:05:40 +0100 > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > On Wed, Aug 05, 2020 at 10:49:35AM +0200, Paolo Bonzini wrote: > > > On 05/08/20 10:39, Dr. David Alan Gilbert wrote: > > > >> Do you really use "git blame" that much? "git log -S" does more or less > > > >> the same function (in a different way) and is not affected as much by > > > >> large code movement and transformation patches. > > > > > > > > I use it a lot! Following stuff back to find where a change came > > > > from and then asking people. > > > > > > Indeed, but I use "git log -S" instead. :) Another possibility is to > > > just do "git log -p" and search for a relevant line of the code I'm > > > "blaming". > > > > I used git blame alot too, but I don't think its a reason to not do the > > cleanups. It is easy enough to just tell blame to use an earlier commit > > if you see it displaying a refactor. I don't think such mild inconvenience > > should stop us making otherwise desirable code changes > > I don't think people argue that it should block changes; it it simply > another thing to consider when weighing benefits vs. drawbacks. Actually, I'm saying that including "git blame" in such a weighing is the mechanism for blocking otherwise beneficial change to the code. Or to put it another way, I believe the value assigned to "git blame" in such a comparison is miniscule / effectively zero. The only time "git blame" should win is if the change in question is purely the bike shed colour and has no technical benefits at all. If there's any technical benefit to making the change it should always win. We shouldn't preserve technical debt in the code merely to avoid an impact on "git blame". Regards, Daniel
Markus Armbruster <armbru@redhat.com> writes: > Paolo Bonzini <pbonzini@redhat.com> writes: > >> On 05/08/20 09:36, Markus Armbruster wrote: >>> There's also the longer term pain of having to work around git-blame >>> unable to see beyond the flag day. >> >> Do you really use "git blame" that much? "git log -S" does more or less >> the same function (in a different way) and is not affected as much by >> large code movement and transformation patches. > > C-x v g > > When that doesn't work, I fall back to git-log -S in a shell. Yep, I'm another that uses blame in the first instance (or magit-blame inside emacs). My usual fallback after that is git log -p and liberal use of / which is very inefficient.
On Wed, 5 Aug 2020 11:08:02 +0100 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Wed, Aug 05, 2020 at 11:11:55AM +0200, Cornelia Huck wrote: > > On Wed, 5 Aug 2020 10:05:40 +0100 > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > On Wed, Aug 05, 2020 at 10:49:35AM +0200, Paolo Bonzini wrote: > > > > On 05/08/20 10:39, Dr. David Alan Gilbert wrote: > > > > >> Do you really use "git blame" that much? "git log -S" does more or less > > > > >> the same function (in a different way) and is not affected as much by > > > > >> large code movement and transformation patches. > > > > > > > > > > I use it a lot! Following stuff back to find where a change came > > > > > from and then asking people. > > > > > > > > Indeed, but I use "git log -S" instead. :) Another possibility is to > > > > just do "git log -p" and search for a relevant line of the code I'm > > > > "blaming". > > > > > > I used git blame alot too, but I don't think its a reason to not do the > > > cleanups. It is easy enough to just tell blame to use an earlier commit > > > if you see it displaying a refactor. I don't think such mild inconvenience > > > should stop us making otherwise desirable code changes > > > > I don't think people argue that it should block changes; it it simply > > another thing to consider when weighing benefits vs. drawbacks. > > Actually, I'm saying that including "git blame" in such a weighing is > the mechanism for blocking otherwise beneficial change to the code. Or > to put it another way, I believe the value assigned to "git blame" in > such a comparison is miniscule / effectively zero. The only time > "git blame" should win is if the change in question is purely the > bike shed colour and has no technical benefits at all. If there's any > technical benefit to making the change it should always win. We > shouldn't preserve technical debt in the code merely to avoid an impact > on "git blame". I think we have to agree to disagree on this. Making "git blame" harder to use is annoying, and at least for me there's a threshold of usefulness for the code change that should be considered. (No judgment on the proposed change here; I don't have really have a horse in that race.)
On 8/5/20 3:36 AM, Markus Armbruster wrote: > John Snow <jsnow@redhat.com> writes: > >> On 8/4/20 4:03 AM, Markus Armbruster wrote: >>> The pain of tweaking the parser is likely dwarved several times over by >>> the pain of the flag day. >> >> You mention this often; I wonder if I misunderstand the critique, >> because the pain of a "flag day" for a new file format seems >> negligible to me. >> >> I don't think we edit these .json files very often. Generally, we add >> a new command when we need one. The edits are usually one or two lines >> plus docstrings. >> >> If anyone has patches in-flight, I genuinely doubt it will take more >> than a few minutes to rewrite for the new file format. >> >> No? > > You describe the the flag day's one-time pain. > > There's also the longer term pain of having to work around git-blame > unable to see beyond the flag day. > So it's not really the "flag day" we're worried about anymore, it's the ongoing ease-of-use for vcs history. > I'm not claiming the pain is prohibitive (if I thought it was, I > would've tried to strange this thread in its crib), I am claiming it'll > be much more painful (read: expensive) than a parser tweak. > I do use `git blame` quite a lot, but with a project as old as QEMU, most of my trips through history do involve jumping across a few refactor gaps as a normal part of that process. As Dan points out, I often have to back out and add refactorSHA^ to my invocation, and just keep hopping backwards until I find what I am truly after. It just feels like a fact of programmer life for me at this point. I've not used Paolo's invocation before, but it looks like it might be useful. I'll try to remember to try it out.
Am 05.08.2020 um 12:08 hat Daniel P. Berrangé geschrieben: > On Wed, Aug 05, 2020 at 11:11:55AM +0200, Cornelia Huck wrote: > > On Wed, 5 Aug 2020 10:05:40 +0100 > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > On Wed, Aug 05, 2020 at 10:49:35AM +0200, Paolo Bonzini wrote: > > > > On 05/08/20 10:39, Dr. David Alan Gilbert wrote: > > > > >> Do you really use "git blame" that much? "git log -S" does more or less > > > > >> the same function (in a different way) and is not affected as much by > > > > >> large code movement and transformation patches. > > > > > > > > > > I use it a lot! Following stuff back to find where a change came > > > > > from and then asking people. > > > > > > > > Indeed, but I use "git log -S" instead. :) Another possibility is to > > > > just do "git log -p" and search for a relevant line of the code I'm > > > > "blaming". > > > > > > I used git blame alot too, but I don't think its a reason to not do the > > > cleanups. It is easy enough to just tell blame to use an earlier commit > > > if you see it displaying a refactor. I don't think such mild inconvenience > > > should stop us making otherwise desirable code changes > > > > I don't think people argue that it should block changes; it it simply > > another thing to consider when weighing benefits vs. drawbacks. > > Actually, I'm saying that including "git blame" in such a weighing is > the mechanism for blocking otherwise beneficial change to the code. Or > to put it another way, I believe the value assigned to "git blame" in > such a comparison is miniscule / effectively zero. The only time > "git blame" should win is if the change in question is purely the > bike shed colour and has no technical benefits at all. If there's any > technical benefit to making the change it should always win. We > shouldn't preserve technical debt in the code merely to avoid an impact > on "git blame". We're basically weighing "git blame" against syntax highlighting defaults. I don't think the latter has an obviously higher weight. Kevin
On Wed, Aug 05, 2020 at 06:23:23PM +0200, Kevin Wolf wrote: > Am 05.08.2020 um 12:08 hat Daniel P. Berrangé geschrieben: > > On Wed, Aug 05, 2020 at 11:11:55AM +0200, Cornelia Huck wrote: > > > On Wed, 5 Aug 2020 10:05:40 +0100 > > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > > > On Wed, Aug 05, 2020 at 10:49:35AM +0200, Paolo Bonzini wrote: > > > > > On 05/08/20 10:39, Dr. David Alan Gilbert wrote: > > > > > >> Do you really use "git blame" that much? "git log -S" does more or less > > > > > >> the same function (in a different way) and is not affected as much by > > > > > >> large code movement and transformation patches. > > > > > > > > > > > > I use it a lot! Following stuff back to find where a change came > > > > > > from and then asking people. > > > > > > > > > > Indeed, but I use "git log -S" instead. :) Another possibility is to > > > > > just do "git log -p" and search for a relevant line of the code I'm > > > > > "blaming". > > > > > > > > I used git blame alot too, but I don't think its a reason to not do the > > > > cleanups. It is easy enough to just tell blame to use an earlier commit > > > > if you see it displaying a refactor. I don't think such mild inconvenience > > > > should stop us making otherwise desirable code changes > > > > > > I don't think people argue that it should block changes; it it simply > > > another thing to consider when weighing benefits vs. drawbacks. > > > > Actually, I'm saying that including "git blame" in such a weighing is > > the mechanism for blocking otherwise beneficial change to the code. Or > > to put it another way, I believe the value assigned to "git blame" in > > such a comparison is miniscule / effectively zero. The only time > > "git blame" should win is if the change in question is purely the > > bike shed colour and has no technical benefits at all. If there's any > > technical benefit to making the change it should always win. We > > shouldn't preserve technical debt in the code merely to avoid an impact > > on "git blame". > > We're basically weighing "git blame" against syntax highlighting > defaults. I don't think the latter has an obviously higher weight. I think "syntax highlight defaults" is far from being an accurate description of the motivations behind this discussion.
John Snow <jsnow@redhat.com> writes: > On 8/5/20 3:36 AM, Markus Armbruster wrote: >> John Snow <jsnow@redhat.com> writes: >> >>> On 8/4/20 4:03 AM, Markus Armbruster wrote: >>>> The pain of tweaking the parser is likely dwarved several times over by >>>> the pain of the flag day. >>> >>> You mention this often; I wonder if I misunderstand the critique, >>> because the pain of a "flag day" for a new file format seems >>> negligible to me. >>> >>> I don't think we edit these .json files very often. Generally, we add >>> a new command when we need one. The edits are usually one or two lines >>> plus docstrings. >>> >>> If anyone has patches in-flight, I genuinely doubt it will take more >>> than a few minutes to rewrite for the new file format. >>> >>> No? >> >> You describe the the flag day's one-time pain. >> >> There's also the longer term pain of having to work around git-blame >> unable to see beyond the flag day. >> > > So it's not really the "flag day" we're worried about anymore, it's > the ongoing ease-of-use for vcs history. Feel free to call that pain however you want. I'm going to call it "lasting aftereffects of the flag day" :) >> I'm not claiming the pain is prohibitive (if I thought it was, I >> would've tried to strange this thread in its crib), I am claiming it'll >> be much more painful (read: expensive) than a parser tweak. >> > > I do use `git blame` quite a lot, but with a project as old as QEMU, > most of my trips through history do involve jumping across a few > refactor gaps as a normal part of that process. > > As Dan points out, I often have to back out and add refactorSHA^ to my > invocation, and just keep hopping backwards until I find what I am > truly after. It just feels like a fact of programmer life for me at > this point. The fact that we all need to cope with this class of issue doesn't mean we should create more instances unthinkingly. We should only when we believe the benefits are worth it, and can't find a cheaper way to get them. We've discussed "is it really that bad" at some length. What I'm missing so far is a clear writeup of the benefits beyond "editor works out of the box" (which is quite a desirable benefit, but can also be had without a flag day). > I've not used Paolo's invocation before, but it looks like it might be > useful. I'll try to remember to try it out.
Eduardo Habkost <ehabkost@redhat.com> writes: > On Wed, Aug 05, 2020 at 06:23:23PM +0200, Kevin Wolf wrote: >> We're basically weighing "git blame" against syntax highlighting >> defaults. I don't think the latter has an obviously higher weight. > > I think "syntax highlight defaults" is far from being an accurate > description of the motivations behind this discussion. The motivation and the expected benefits are far from clear to me, because all I have so far is a meandering e-mail thread. For a change proposal as massive as "throw out working code and touch basically every line in the QAPI schema", I'd like to see a concise memo stating the goals, their benefits, the possible ways to get there, and their cost. I don't think that's too much to ask.
Markus Armbruster <armbru@redhat.com> writes: > Andrea Bolognani <abologna@redhat.com> writes: > >> The various schemas included in QEMU use a JSON-based format which >> is, however, strictly speaking not valid JSON. >> >> As a consequence, when vim tries to apply syntax highlight rules >> for JSON (as guessed from the file name), the result is an unreadable >> mess which mostly consist of red markers pointing out supposed errors >> in, well, pretty much everything. >> >> Using Python syntax highlighting produces much better results, and >> in fact these files already start with specially-formatted comments >> that instruct Emacs to process them as if they were Python files. >> >> This commit adds the equivalent special comments for vim. >> >> Signed-off-by: Andrea Bolognani <abologna@redhat.com> > > Naming QAPI schema files .json even though their contents isn't was a > mistake. Correcting it would be a pain. If we correct it, then the > sooner the better. > > Renaming them to .py gives decent editor support out of the box. Their > contents isn't quite Python, though: true vs. True, false vs. False. Do > we care? Only a few dozen occurences; they could be adjusted. > > Renaming them to .qapi would perhaps be less confusing, for the price of > "out of the box". > > Thoughts? Let me try to summarize the discussion from my point of view. Basing a DSL on some existing syntax has its advantages: the learning curve is slightly more gentle, and we get some tooling for free, notably editor support. Basing on JSON was a mistake: JSON is designed for data interchange, not for programming. To make it fit for programming, we extended it, losing much of what we gained by basing on existing syntax. Use of .json file names for the resulting bastard is confusing. Among the confused are editors. We unconfuse them one by one by telling them to treat the files as Python. Works, because the syntax happens to be a strict subset of Python's. We discussed swapping out the base syntax layer for one that is actually fit for purpose (and doesn't need extending), e.g. YAML. Radical change, benefits need to justify the cost. Possible benefits: * Better ergonomics for developers Depends on what we pick for a base, and how we use it. We discussed YAML at some length. It's complex, and the way it assigns types is at odds with the QAPI schema's needs. I doubt YAML could improve ergonomics all that much. * Not having to maintain our own code for the lower layer Replacing the (trivial) JSON parser by glue for an off-the-shelf parser is quite unlikely to pay off. Replacing the (non-trivial) doc comment parser could be a win, but no credible replacements have been suggested so far. * Make the schema more easily consumable by other programs Parsing is the easy part of this problem. Making the easy part easier won't make the total problem appreciably easier. Right now, I doubt the benefits are worth the cost. To make me reconsider, write a concise memo stating the goals, their benefits, the possible ways to get there, and their cost. More modest ways to stop our confusing misuse of .json: * Rename QAPI schema files to .qapi, keep unconfusing tools one by one by telling them to treat the files as Python. * Rename QAPI schema files to .py. Optionally rename false, true to False, True in the schema language, so the schema expressions are semantically valid Python. Humans might still find .py confusing. * Rename QAPI schema files to .js, change comments from # to //. * Keep .json, change comments from # to // and strings from ' to ". This is still not JSON, but should be close enough to make common tooling work. The last two are too much churn for too little benefit, in my opinion. I'm willing to do either of the first two. If you have a preference, let me know.
diff --git a/docs/interop/firmware.json b/docs/interop/firmware.json index 240f565397..989f10b626 100644 --- a/docs/interop/firmware.json +++ b/docs/interop/firmware.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # # Copyright (C) 2018 Red Hat, Inc. # diff --git a/docs/interop/vhost-user.json b/docs/interop/vhost-user.json index ef8ac5941f..feb5fe58ca 100644 --- a/docs/interop/vhost-user.json +++ b/docs/interop/vhost-user.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # # Copyright (C) 2018 Red Hat, Inc. # diff --git a/qapi/authz.json b/qapi/authz.json index 1c836a3abd..f3e9745426 100644 --- a/qapi/authz.json +++ b/qapi/authz.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # # QAPI authz definitions diff --git a/qapi/block-core.json b/qapi/block-core.json index 943df1926a..5f72b50149 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python ## # == Block core (VM unrelated) diff --git a/qapi/block.json b/qapi/block.json index 2ddbfa8306..c54a393cf3 100644 --- a/qapi/block.json +++ b/qapi/block.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python ## # = Block devices diff --git a/qapi/char.json b/qapi/char.json index daceb20f84..8aeedf96b2 100644 --- a/qapi/char.json +++ b/qapi/char.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # ## diff --git a/qapi/common.json b/qapi/common.json index 7b9cbcd97b..716712d4b3 100644 --- a/qapi/common.json +++ b/qapi/common.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python ## # = Common data types diff --git a/qapi/control.json b/qapi/control.json index 6b816bb61f..de51e9916c 100644 --- a/qapi/control.json +++ b/qapi/control.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # ## diff --git a/qapi/crypto.json b/qapi/crypto.json index b2a4cff683..c41e869e31 100644 --- a/qapi/crypto.json +++ b/qapi/crypto.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # ## diff --git a/qapi/dump.json b/qapi/dump.json index a1eed7b15c..f7c4267e3f 100644 --- a/qapi/dump.json +++ b/qapi/dump.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # # This work is licensed under the terms of the GNU GPL, version 2 or later. # See the COPYING file in the top-level directory. diff --git a/qapi/error.json b/qapi/error.json index 3fad08f506..94a6502de9 100644 --- a/qapi/error.json +++ b/qapi/error.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python ## # = QMP errors diff --git a/qapi/introspect.json b/qapi/introspect.json index b1aabd4cfd..944bb87a20 100644 --- a/qapi/introspect.json +++ b/qapi/introspect.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # # Copyright (C) 2015 Red Hat, Inc. # diff --git a/qapi/job.json b/qapi/job.json index 5e658281f5..e9fed7aeb5 100644 --- a/qapi/job.json +++ b/qapi/job.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python ## # == Background jobs diff --git a/qapi/machine-target.json b/qapi/machine-target.json index f2c82949d8..698850cc78 100644 --- a/qapi/machine-target.json +++ b/qapi/machine-target.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # # This work is licensed under the terms of the GNU GPL, version 2 or later. # See the COPYING file in the top-level directory. diff --git a/qapi/machine.json b/qapi/machine.json index ff7b5032e3..95558af98d 100644 --- a/qapi/machine.json +++ b/qapi/machine.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # # This work is licensed under the terms of the GNU GPL, version 2 or later. # See the COPYING file in the top-level directory. diff --git a/qapi/migration.json b/qapi/migration.json index eca2981d0a..a855f3c763 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # ## diff --git a/qapi/misc-target.json b/qapi/misc-target.json index dee3b45930..1e561fa97b 100644 --- a/qapi/misc-target.json +++ b/qapi/misc-target.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # ## diff --git a/qapi/misc.json b/qapi/misc.json index 99b90ac80b..9c5c227dfd 100644 --- a/qapi/misc.json +++ b/qapi/misc.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # ## diff --git a/qapi/net.json b/qapi/net.json index cebb1b52e3..f63c282140 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # ## diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json index 43b0ba0dea..f03ff91ceb 100644 --- a/qapi/qapi-schema.json +++ b/qapi/qapi-schema.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python ## # = Introduction # diff --git a/qapi/qdev.json b/qapi/qdev.json index f4ed9735c4..13254529bf 100644 --- a/qapi/qdev.json +++ b/qapi/qdev.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # # This work is licensed under the terms of the GNU GPL, version 2 or later. # See the COPYING file in the top-level directory. diff --git a/qapi/qom.json b/qapi/qom.json index 8abe998962..0b0b92944b 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # # This work is licensed under the terms of the GNU GPL, version 2 or later. # See the COPYING file in the top-level directory. diff --git a/qapi/rdma.json b/qapi/rdma.json index b58105b1b6..a1d2175a8b 100644 --- a/qapi/rdma.json +++ b/qapi/rdma.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # ## diff --git a/qapi/rocker.json b/qapi/rocker.json index 52597db491..b48e49a89b 100644 --- a/qapi/rocker.json +++ b/qapi/rocker.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python ## # = Rocker switch device diff --git a/qapi/run-state.json b/qapi/run-state.json index 2e22907740..7cc9f96a5b 100644 --- a/qapi/run-state.json +++ b/qapi/run-state.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # ## diff --git a/qapi/sockets.json b/qapi/sockets.json index ea933ed4b2..2d558ebee7 100644 --- a/qapi/sockets.json +++ b/qapi/sockets.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python ## # = Socket data types diff --git a/qapi/tpm.json b/qapi/tpm.json index dc1f081739..6a10c9ed8d 100644 --- a/qapi/tpm.json +++ b/qapi/tpm.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # ## diff --git a/qapi/transaction.json b/qapi/transaction.json index b6c11158f0..15ddebdbc3 100644 --- a/qapi/transaction.json +++ b/qapi/transaction.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # ## diff --git a/qapi/ui.json b/qapi/ui.json index e16e98a060..dacceeaf63 100644 --- a/qapi/ui.json +++ b/qapi/ui.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # ## diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json index 4be9aad48e..359ab52ad1 100644 --- a/qga/qapi-schema.json +++ b/qga/qapi-schema.json @@ -1,4 +1,5 @@ # *-*- Mode: Python -*-* +# vim: filetype=python ## # diff --git a/storage-daemon/qapi/qapi-schema.json b/storage-daemon/qapi/qapi-schema.json index 14f4f8fe61..6100d1f0c9 100644 --- a/storage-daemon/qapi/qapi-schema.json +++ b/storage-daemon/qapi/qapi-schema.json @@ -1,4 +1,5 @@ # -*- Mode: Python -*- +# vim: filetype=python # Note that modules are shared with the QEMU main schema under the assumption # that the storage daemon schema is a subset of the main schema. For the shared diff --git a/tests/qapi-schema/doc-good.json b/tests/qapi-schema/doc-good.json index ddd89d1233..9da72a1f55 100644 --- a/tests/qapi-schema/doc-good.json +++ b/tests/qapi-schema/doc-good.json @@ -1,4 +1,6 @@ # -*- Mode: Python -*- +# vim: filetype=python +# # Positive QAPI doc comment tests { 'pragma': { 'doc-required': true } } diff --git a/tests/qapi-schema/include/sub-module.json b/tests/qapi-schema/include/sub-module.json index afdb267228..b9f7b9bb56 100644 --- a/tests/qapi-schema/include/sub-module.json +++ b/tests/qapi-schema/include/sub-module.json @@ -1,4 +1,5 @@ # *-*- Mode: Python -*-* +# vim: filetype=python # Sub-module of ../qapi-schema-test.json diff --git a/tests/qapi-schema/qapi-schema-test.json b/tests/qapi-schema/qapi-schema-test.json index 6b1f05afa7..3a9f2cbb33 100644 --- a/tests/qapi-schema/qapi-schema-test.json +++ b/tests/qapi-schema/qapi-schema-test.json @@ -1,4 +1,5 @@ # *-*- Mode: Python -*-* +# vim: filetype=python # This file is a stress test of supported qapi constructs that must # parse and compile correctly. diff --git a/tests/qapi-schema/sub-sub-module.json b/tests/qapi-schema/sub-sub-module.json index 524ef9b83f..94f36ec0b1 100644 --- a/tests/qapi-schema/sub-sub-module.json +++ b/tests/qapi-schema/sub-sub-module.json @@ -1,4 +1,5 @@ # *-*- Mode: Python -*-* +# vim: filetype=python # Sub-module of sub-module include/sub-module.json of qapi-schema-test.json
The various schemas included in QEMU use a JSON-based format which is, however, strictly speaking not valid JSON. As a consequence, when vim tries to apply syntax highlight rules for JSON (as guessed from the file name), the result is an unreadable mess which mostly consist of red markers pointing out supposed errors in, well, pretty much everything. Using Python syntax highlighting produces much better results, and in fact these files already start with specially-formatted comments that instruct Emacs to process them as if they were Python files. This commit adds the equivalent special comments for vim. Signed-off-by: Andrea Bolognani <abologna@redhat.com> --- docs/interop/firmware.json | 1 + docs/interop/vhost-user.json | 1 + qapi/authz.json | 1 + qapi/block-core.json | 1 + qapi/block.json | 1 + qapi/char.json | 1 + qapi/common.json | 1 + qapi/control.json | 1 + qapi/crypto.json | 1 + qapi/dump.json | 1 + qapi/error.json | 1 + qapi/introspect.json | 1 + qapi/job.json | 1 + qapi/machine-target.json | 1 + qapi/machine.json | 1 + qapi/migration.json | 1 + qapi/misc-target.json | 1 + qapi/misc.json | 1 + qapi/net.json | 1 + qapi/qapi-schema.json | 1 + qapi/qdev.json | 1 + qapi/qom.json | 1 + qapi/rdma.json | 1 + qapi/rocker.json | 1 + qapi/run-state.json | 1 + qapi/sockets.json | 1 + qapi/tpm.json | 1 + qapi/transaction.json | 1 + qapi/ui.json | 1 + qga/qapi-schema.json | 1 + storage-daemon/qapi/qapi-schema.json | 1 + tests/qapi-schema/doc-good.json | 2 ++ tests/qapi-schema/include/sub-module.json | 1 + tests/qapi-schema/qapi-schema-test.json | 1 + tests/qapi-schema/sub-sub-module.json | 1 + 35 files changed, 36 insertions(+)