diff mbox

qemu: json: Fix parsing of integers >= 0x8000000000000000

Message ID 20110520180331.GA21837@amd.home.annexia.org
State New
Headers show

Commit Message

Richard W.M. Jones May 20, 2011, 6:03 p.m. UTC
There seem to be a few unsafe uses of strto* functions.  This patch
just fixes the one that affects me :-)

Rich.

Comments

Anthony Liguori May 20, 2011, 6:11 p.m. UTC | #1
On 05/20/2011 01:03 PM, Richard W.M. Jones wrote:
>
> There seem to be a few unsafe uses of strto* functions.  This patch
> just fixes the one that affects me :-)

Sending an integer of this size is not valid JSON.

Your patch won't accept negative numbers, correct?

JSON only supports int64_t.

Regards,

Anthony Liguori

>
> Rich.
>
Richard W.M. Jones May 20, 2011, 6:36 p.m. UTC | #2
On Fri, May 20, 2011 at 01:11:05PM -0500, Anthony Liguori wrote:
> On 05/20/2011 01:03 PM, Richard W.M. Jones wrote:
> >
> >There seem to be a few unsafe uses of strto* functions.  This patch
> >just fixes the one that affects me :-)
> 
> Sending an integer of this size is not valid JSON.
> 
> Your patch won't accept negative numbers, correct?
> 
> JSON only supports int64_t.

So we should be sending negative numbers for these large addresses?

It seems ... ugly ... to me, but OK.  We will have to change libvirt.

Rich.
Richard W.M. Jones May 20, 2011, 6:37 p.m. UTC | #3
I should add that not error checking and silently truncating numbers
in qemu is still a bug.

Rich.
Richard W.M. Jones May 20, 2011, 6:47 p.m. UTC | #4
On Fri, May 20, 2011 at 01:11:05PM -0500, Anthony Liguori wrote:
> JSON only supports int64_t.

By the way, where does this information come from?

The JSON RFC fails to define the range of numbers at all, just leaving
it completely up to the application, and if JSON is based on
Javascript then it would use double for numbers [a terrible idea from
a qemu p.o.v -- I'm just pointing out that's what JS does].

So are you saying that JSON as defined in qemu defines numbers as
int64_t, or is there something else I'm missing?

Rich.
Richard W.M. Jones May 20, 2011, 9:19 p.m. UTC | #5
On Fri, May 20, 2011 at 01:11:05PM -0500, Anthony Liguori wrote:
> Your patch won't accept negative numbers, correct?

'strtoull' works OK with negative numbers.

Rich.
Daniel P. Berrangé May 23, 2011, 1:04 p.m. UTC | #6
On Fri, May 20, 2011 at 01:11:05PM -0500, Anthony Liguori wrote:
> On 05/20/2011 01:03 PM, Richard W.M. Jones wrote:
> >
> >There seem to be a few unsafe uses of strto* functions.  This patch
> >just fixes the one that affects me :-)
> 
> Sending an integer of this size is not valid JSON.
> 
> Your patch won't accept negative numbers, correct?
> 
> JSON only supports int64_t.

That's not really true. JSON supports arbitrarily large numbers
& integers.  It is merely the QEMU parser / object model which
is artifically limiting them to int64_t. The core of the problem
is with the QInt implementation in QEMU, which uses an 'int64_t'
as its canonical form, rather than just holding a string representation
of the number. The JSON parser should only validate that the
data is a valid JSON number, and then pass the number as a string
to QInt. The conversion to int_64 or other integer sizes / formats
should be done at time of use, according to the type of data the
command actually wants, whether int64t, int32t, int16t etc. eg the
QInt API should look more like:

  QInt *qint_from_string(const char *number);
  QInt *qint_from_int64(int64_t val);
  QInt *qint_from_int32(int64_t val);
  QInt *qint_from_int16(int64_t val);
  QInt *qint_from_uint64(uint64_t val);
  QInt *qint_from_uint32(uint32_t val);
  QInt *qint_from_uint16(uint16_t val);

  int qint_get_int64(QInt *qi, int64t *val);
  int qint_get_int32(QInt *qi, int32t *val);
  int qint_get_int16(QInt *qi, int16t *val);
  int qint_get_uint64(QInt *qi, uint64t *val);
  int qint_get_uint32(QInt *qi, uint32t *val);
  int qint_get_uint16(QInt *qi, uint16t *val);


Regards,
Daniel
Anthony Liguori May 23, 2011, 1:33 p.m. UTC | #7
On 05/23/2011 08:04 AM, Daniel P. Berrange wrote:
> On Fri, May 20, 2011 at 01:11:05PM -0500, Anthony Liguori wrote:
>> On 05/20/2011 01:03 PM, Richard W.M. Jones wrote:
>>>
>>> There seem to be a few unsafe uses of strto* functions.  This patch
>>> just fixes the one that affects me :-)
>>
>> Sending an integer of this size is not valid JSON.
>>
>> Your patch won't accept negative numbers, correct?
>>
>> JSON only supports int64_t.
>
> That's not really true. JSON supports arbitrarily large numbers
> &  integers.

Try the following snippet in your browser:

<html>
<head>
<script type="text/javascript">
alert(9223372036854775807);
</script>
</head>
</html>

The actual value of the alert will surprise you :-)

Integers in Javascript are actually represented as doubles internally 
which means that integer constants are only accurate up to 52 bits.

So really, we should cap integers at 32-bit :-/

Have I mentioned recently that I really dislike JSON...

Regards,

Anthony Liguori
Anthony Liguori May 23, 2011, 1:38 p.m. UTC | #8
On 05/23/2011 08:04 AM, Daniel P. Berrange wrote:
> On Fri, May 20, 2011 at 01:11:05PM -0500, Anthony Liguori wrote:
>> On 05/20/2011 01:03 PM, Richard W.M. Jones wrote:
>>>
>>> There seem to be a few unsafe uses of strto* functions.  This patch
>>> just fixes the one that affects me :-)
>>
>> Sending an integer of this size is not valid JSON.
>>
>> Your patch won't accept negative numbers, correct?
>>
>> JSON only supports int64_t.
>
> That's not really true. JSON supports arbitrarily large numbers
> &  integers.

This really blows my mind:

alert(9223372036854775807 == 9223372036854775808);

Regards,

Anthony Liguori

   It is merely the QEMU parser / object model which
> is artifically limiting them to int64_t. The core of the problem
> is with the QInt implementation in QEMU, which uses an 'int64_t'
> as its canonical form, rather than just holding a string representation
> of the number. The JSON parser should only validate that the
> data is a valid JSON number, and then pass the number as a string
> to QInt. The conversion to int_64 or other integer sizes / formats
> should be done at time of use, according to the type of data the
> command actually wants, whether int64t, int32t, int16t etc. eg the
> QInt API should look more like:
>
>    QInt *qint_from_string(const char *number);
>    QInt *qint_from_int64(int64_t val);
>    QInt *qint_from_int32(int64_t val);
>    QInt *qint_from_int16(int64_t val);
>    QInt *qint_from_uint64(uint64_t val);
>    QInt *qint_from_uint32(uint32_t val);
>    QInt *qint_from_uint16(uint16_t val);
>
>    int qint_get_int64(QInt *qi, int64t *val);
>    int qint_get_int32(QInt *qi, int32t *val);
>    int qint_get_int16(QInt *qi, int16t *val);
>    int qint_get_uint64(QInt *qi, uint64t *val);
>    int qint_get_uint32(QInt *qi, uint32t *val);
>    int qint_get_uint16(QInt *qi, uint16t *val);
>
>
> Regards,
> Daniel
Richard W.M. Jones May 23, 2011, 1:39 p.m. UTC | #9
Just refer everyone at this point the actual standard:

http://www.ietf.org/rfc/rfc4627.txt

It leaves it up to the application how to interpret and store
integers.  It would be standard-conforming to only allow "0" and "1".

While qemu is technically correct, in practice it's being very
unhelpful here.

Rich.
Daniel P. Berrangé May 23, 2011, 1:40 p.m. UTC | #10
On Mon, May 23, 2011 at 08:33:03AM -0500, Anthony Liguori wrote:
> On 05/23/2011 08:04 AM, Daniel P. Berrange wrote:
> >On Fri, May 20, 2011 at 01:11:05PM -0500, Anthony Liguori wrote:
> >>On 05/20/2011 01:03 PM, Richard W.M. Jones wrote:
> >>>
> >>>There seem to be a few unsafe uses of strto* functions.  This patch
> >>>just fixes the one that affects me :-)
> >>
> >>Sending an integer of this size is not valid JSON.
> >>
> >>Your patch won't accept negative numbers, correct?
> >>
> >>JSON only supports int64_t.
> >
> >That's not really true. JSON supports arbitrarily large numbers
> >&  integers.
> 
> Try the following snippet in your browser:
> 
> <html>
> <head>
> <script type="text/javascript">
> alert(9223372036854775807);
> </script>
> </head>
> </html>
> 
> The actual value of the alert will surprise you :-)
> 
> Integers in Javascript are actually represented as doubles
> internally which means that integer constants are only accurate up
> to 52 bits.
> 
> So really, we should cap integers at 32-bit :-/
> 
> Have I mentioned recently that I really dislike JSON...

NB, I am distinguishing between JSON the generic specification and
JSON as implemented in web browsers. JSON the specification has *no*
limitation on integers. Any limitation, like the one you demonstrate,
is inherantly just specific to the implementation. We have no need to
limit ourselves to what web browsers currently support for integers in
JSON. Indeed, limiting ourselves to what browsers support will make the
JSON monitor mode essentially useless, requiring yet another new mode
with a format which can actually represent the data we need to use.

What I suggested is in compliance with the JSON specification and allows
us to support uint64 which we need for commands which take disk or memory
offsets.

Regards,
Daniel
Anthony Liguori May 23, 2011, 1:45 p.m. UTC | #11
On 05/23/2011 08:40 AM, Daniel P. Berrange wrote:
> On Mon, May 23, 2011 at 08:33:03AM -0500, Anthony Liguori wrote:
>> On 05/23/2011 08:04 AM, Daniel P. Berrange wrote:
>>> On Fri, May 20, 2011 at 01:11:05PM -0500, Anthony Liguori wrote:
>>>> On 05/20/2011 01:03 PM, Richard W.M. Jones wrote:
>>>>>
>>>>> There seem to be a few unsafe uses of strto* functions.  This patch
>>>>> just fixes the one that affects me :-)
>>>>
>>>> Sending an integer of this size is not valid JSON.
>>>>
>>>> Your patch won't accept negative numbers, correct?
>>>>
>>>> JSON only supports int64_t.
>>>
>>> That's not really true. JSON supports arbitrarily large numbers
>>> &   integers.
>>
>> Try the following snippet in your browser:
>>
>> <html>
>> <head>
>> <script type="text/javascript">
>> alert(9223372036854775807);
>> </script>
>> </head>
>> </html>
>>
>> The actual value of the alert will surprise you :-)
>>
>> Integers in Javascript are actually represented as doubles
>> internally which means that integer constants are only accurate up
>> to 52 bits.
>>
>> So really, we should cap integers at 32-bit :-/
>>
>> Have I mentioned recently that I really dislike JSON...
>
> NB, I am distinguishing between JSON the generic specification and
> JSON as implemented in web browsers. JSON the specification has *no*
> limitation on integers. Any limitation, like the one you demonstrate,
> is inherantly just specific to the implementation.

No, EMCA is very specific in how integers are handled in JavaScript. 
Every implementation of JavaScript is going to exhibit this behavior.

The JSON specification lack of specificity here I think has to be 
interpreted as a deferral to the EMCA specification.

But to the point, I don't see what the point of using JSON is if our 
interpretation doesn't actually work with JavaScript.

> We have no need to
> limit ourselves to what web browsers currently support for integers in
> JSON.

It's not web browsers.  This behavior is well specified in the EMCA 
specification.

> Indeed, limiting ourselves to what browsers support will make the
> JSON monitor mode essentially useless, requiring yet another new mode
> with a format which can actually represent the data we need to use.
>
> What I suggested is in compliance with the JSON specification and allows
> us to support uint64 which we need for commands which take disk or memory
> offsets.

At the end of the day, we need to worry about supporting clients.  I 
think clients are going to refer to the behavior of JavaScript for 
guidance.  So if we expect a client to not round integers, we can't send 
ones that are greater than 52-bit.

This is an extremely nasty silent failure mode.

Or, we need to just say that we're not JSON compatible.

Regards,

Anthony Liguori

>
> Regards,
> Daniel
Anthony Liguori May 23, 2011, 1:50 p.m. UTC | #12
>> The actual value of the alert will surprise you :-)
>>
>> Integers in Javascript are actually represented as doubles
>> internally which means that integer constants are only accurate up
>> to 52 bits.
>>
>> So really, we should cap integers at 32-bit :-/
>>
>> Have I mentioned recently that I really dislike JSON...
>
> NB, I am distinguishing between JSON the generic specification and
> JSON as implemented in web browsers. JSON the specification has *no*
> limitation on integers.

The spec has no notion of integers at all.  Here's the relevant text. 
Note that the BNF only has a single entry point for numbers.  It does 
not distinguish between integers and floating point numbers.  Also, the 
only discussion of valid numbers is about whether the number can be 
represented as a rational number.  I think the only way to read the spec 
here is that *all* numbers are meant to be represented as floating point 
numbers.

Regards,

Anthony Liguori

2.4.  Numbers

    The representation of numbers is similar to that used in most
    programming languages.  A number contains an integer component that
    may be prefixed with an optional minus sign, which may be followed by
    a fraction part and/or an exponent part.

    Octal and hex forms are not allowed.  Leading zeros are not allowed.

    A fraction part is a decimal point followed by one or more digits.

    An exponent part begins with the letter E in upper or lowercase,
    which may be followed by a plus or minus sign.  The E and optional
    sign are followed by one or more digits.

    Numeric values that cannot be represented as sequences of digits
    (such as Infinity and NaN) are not permitted.


          number = [ minus ] int [ frac ] [ exp ]

          decimal-point = %x2E       ; .

          digit1-9 = %x31-39         ; 1-9

          e = %x65 / %x45            ; e E

          exp = e [ minus / plus ] 1*DIGIT

          frac = decimal-point 1*DIGIT

          int = zero / ( digit1-9 *DIGIT )

          minus = %x2D               ; -

          plus = %x2B                ; +

          zero = %x30                ; 0
Luiz Capitulino May 23, 2011, 2:02 p.m. UTC | #13
On Mon, 23 May 2011 08:50:55 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> >> The actual value of the alert will surprise you :-)
> >>
> >> Integers in Javascript are actually represented as doubles
> >> internally which means that integer constants are only accurate up
> >> to 52 bits.
> >>
> >> So really, we should cap integers at 32-bit :-/
> >>
> >> Have I mentioned recently that I really dislike JSON...
> >
> > NB, I am distinguishing between JSON the generic specification and
> > JSON as implemented in web browsers. JSON the specification has *no*
> > limitation on integers.
> 
> The spec has no notion of integers at all.  Here's the relevant text. 
> Note that the BNF only has a single entry point for numbers.  It does 
> not distinguish between integers and floating point numbers.  Also, the 
> only discussion of valid numbers is about whether the number can be 
> represented as a rational number.  I think the only way to read the spec 
> here is that *all* numbers are meant to be represented as floating point 
> numbers.

Python json works just fine:

>>> import json
>>> json.dumps(9223372036854775807)
'9223372036854775807'
>>> json.loads('9223372036854775807')
9223372036854775807
Anthony Liguori May 23, 2011, 2:06 p.m. UTC | #14
On 05/23/2011 09:02 AM, Luiz Capitulino wrote:
> On Mon, 23 May 2011 08:50:55 -0500
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>>>> The actual value of the alert will surprise you :-)
>>>>
>>>> Integers in Javascript are actually represented as doubles
>>>> internally which means that integer constants are only accurate up
>>>> to 52 bits.
>>>>
>>>> So really, we should cap integers at 32-bit :-/
>>>>
>>>> Have I mentioned recently that I really dislike JSON...
>>>
>>> NB, I am distinguishing between JSON the generic specification and
>>> JSON as implemented in web browsers. JSON the specification has *no*
>>> limitation on integers.
>>
>> The spec has no notion of integers at all.  Here's the relevant text.
>> Note that the BNF only has a single entry point for numbers.  It does
>> not distinguish between integers and floating point numbers.  Also, the
>> only discussion of valid numbers is about whether the number can be
>> represented as a rational number.  I think the only way to read the spec
>> here is that *all* numbers are meant to be represented as floating point
>> numbers.
>
> Python json works just fine:
>
>>>> import json
>>>> json.dumps(9223372036854775807)
> '9223372036854775807'
>>>> json.loads('9223372036854775807')
> 9223372036854775807

Python integers use BigNumbers transparently when necessary.  That's 
what happening here.

There's a strong treatment of Python's behavior at 
http://deron.meranda.us/python/comparing_json_modules/numbers

Regards,

Anthony Liguori
Daniel P. Berrangé May 23, 2011, 2:14 p.m. UTC | #15
On Mon, May 23, 2011 at 08:45:54AM -0500, Anthony Liguori wrote:
> On 05/23/2011 08:40 AM, Daniel P. Berrange wrote:
> >On Mon, May 23, 2011 at 08:33:03AM -0500, Anthony Liguori wrote:
> >>On 05/23/2011 08:04 AM, Daniel P. Berrange wrote:
> >>>On Fri, May 20, 2011 at 01:11:05PM -0500, Anthony Liguori wrote:
> >>>>On 05/20/2011 01:03 PM, Richard W.M. Jones wrote:
> >>>>>
> >>>>>There seem to be a few unsafe uses of strto* functions.  This patch
> >>>>>just fixes the one that affects me :-)
> >>>>
> >>>>Sending an integer of this size is not valid JSON.
> >>>>
> >>>>Your patch won't accept negative numbers, correct?
> >>>>
> >>>>JSON only supports int64_t.
> >>>
> >>>That's not really true. JSON supports arbitrarily large numbers
> >>>&   integers.
> >>
> >>Try the following snippet in your browser:
> >>
> >><html>
> >><head>
> >><script type="text/javascript">
> >>alert(9223372036854775807);
> >></script>
> >></head>
> >></html>
> >>
> >>The actual value of the alert will surprise you :-)
> >>
> >>Integers in Javascript are actually represented as doubles
> >>internally which means that integer constants are only accurate up
> >>to 52 bits.
> >>
> >>So really, we should cap integers at 32-bit :-/
> >>
> >>Have I mentioned recently that I really dislike JSON...
> >
> >NB, I am distinguishing between JSON the generic specification and
> >JSON as implemented in web browsers. JSON the specification has *no*
> >limitation on integers. Any limitation, like the one you demonstrate,
> >is inherantly just specific to the implementation.
> 
> No, EMCA is very specific in how integers are handled in JavaScript.
> Every implementation of JavaScript is going to exhibit this
> behavior.
>
> The JSON specification lack of specificity here I think has to be
> interpreted as a deferral to the EMCA specification.

The EMCA spec declares that integers upto 52-bits can be stored
without loosing precision. This doesn't forbid sending of 64-bit
integers via JSON. It merely implies that when parsed into a
EMCA-Script object you'll loose precision. So this doesn't mean that
QEMU has to throw away the extra precision when parsing JSON, nor
do client apps have to throw away precision when generating JSON
for QEMU. Both client & QEMU can use a full uint64 if desired.

> But to the point, I don't see what the point of using JSON is if our
> interpretation doesn't actually work with JavaScript.

This simply means JavaScript is a useless language for talking to the
QEMU monitor, because it'll loose precision for integers > 52bits.

> >We have no need to
> >limit ourselves to what web browsers currently support for integers in
> >JSON.
> 
> It's not web browsers.  This behavior is well specified in the EMCA
> specification.
> 
> >Indeed, limiting ourselves to what browsers support will make the
> >JSON monitor mode essentially useless, requiring yet another new mode
> >with a format which can actually represent the data we need to use.
> >
> >What I suggested is in compliance with the JSON specification and allows
> >us to support uint64 which we need for commands which take disk or memory
> >offsets.
> 
> At the end of the day, we need to worry about supporting clients.  I
> think clients are going to refer to the behavior of JavaScript for
> guidance.  So if we expect a client to not round integers, we can't
> send ones that are greater than 52-bit.
> 
> This is an extremely nasty silent failure mode.
> 
> Or, we need to just say that we're not JSON compatible.

I don't see this as a JSON compatiblity problem. JSON allows arbitrary
numbers, the only restriction is wrt to the precision of the parsers
when using JavaScript. A C app can encode+decode a value of MAX_UINT64
in JSON precisely and remain JSON compatible. A JavaScript app will still
be able to decode the values without any trouble, it will simply loose
some precision at time of parsing.

Daniel
Markus Armbruster May 23, 2011, 2:20 p.m. UTC | #16
Anthony Liguori <anthony@codemonkey.ws> writes:

> On 05/23/2011 08:40 AM, Daniel P. Berrange wrote:
>> On Mon, May 23, 2011 at 08:33:03AM -0500, Anthony Liguori wrote:
>>> On 05/23/2011 08:04 AM, Daniel P. Berrange wrote:
>>>> On Fri, May 20, 2011 at 01:11:05PM -0500, Anthony Liguori wrote:
>>>>> On 05/20/2011 01:03 PM, Richard W.M. Jones wrote:
>>>>>>
>>>>>> There seem to be a few unsafe uses of strto* functions.  This patch
>>>>>> just fixes the one that affects me :-)
>>>>>
>>>>> Sending an integer of this size is not valid JSON.
>>>>>
>>>>> Your patch won't accept negative numbers, correct?
>>>>>
>>>>> JSON only supports int64_t.
>>>>
>>>> That's not really true. JSON supports arbitrarily large numbers
>>>> &   integers.
>>>
>>> Try the following snippet in your browser:
>>>
>>> <html>
>>> <head>
>>> <script type="text/javascript">
>>> alert(9223372036854775807);
>>> </script>
>>> </head>
>>> </html>
>>>
>>> The actual value of the alert will surprise you :-)
>>>
>>> Integers in Javascript are actually represented as doubles
>>> internally which means that integer constants are only accurate up
>>> to 52 bits.
>>>
>>> So really, we should cap integers at 32-bit :-/
>>>
>>> Have I mentioned recently that I really dislike JSON...
>>
>> NB, I am distinguishing between JSON the generic specification and
>> JSON as implemented in web browsers. JSON the specification has *no*
>> limitation on integers. Any limitation, like the one you demonstrate,
>> is inherantly just specific to the implementation.
>
> No, EMCA is very specific in how integers are handled in
> JavaScript. Every implementation of JavaScript is going to exhibit
> this behavior.

What about other implementations of JSON?

> The JSON specification lack of specificity here I think has to be
> interpreted as a deferral to the EMCA specification.

That's debatable.

RFC4627 says "Numeric values that cannot be represented as sequences of
digits (such as Infinity and NaN) are not permitted" (section 2.4) and
"An implementation may set limits on the range of numbers" (section 4).

The latter clearly suggests that an implementation may also do the
opposite, i.e. set no limits on the range on numbers.

For me, that carries at least as much weight as the rather vague "JSON's
design goals were for it to be [...] a subset of JavaScript" (section 1)
and "JSON is a subset of JavaScript" (section 10).

> But to the point, I don't see what the point of using JSON is if our
> interpretation doesn't actually work with JavaScript.

Why?

>> We have no need to
>> limit ourselves to what web browsers currently support for integers in
>> JSON.
>
> It's not web browsers.  This behavior is well specified in the EMCA
> specification.
>
>> Indeed, limiting ourselves to what browsers support will make the
>> JSON monitor mode essentially useless, requiring yet another new mode
>> with a format which can actually represent the data we need to use.
>>
>> What I suggested is in compliance with the JSON specification and allows
>> us to support uint64 which we need for commands which take disk or memory
>> offsets.
>
> At the end of the day, we need to worry about supporting clients.  I
> think clients are going to refer to the behavior of JavaScript for
> guidance.

Not if we clearly communicate the numerical limits our implementation of
JSON sets, and the consequences of clients ignoring them.

>            So if we expect a client to not round integers, we can't
> send ones that are greater than 52-bit.
>
> This is an extremely nasty silent failure mode.
>
> Or, we need to just say that we're not JSON compatible.

For one particular interpretation of the RFC.
Daniel P. Berrangé May 23, 2011, 2:24 p.m. UTC | #17
On Mon, May 23, 2011 at 08:50:55AM -0500, Anthony Liguori wrote:
> >>The actual value of the alert will surprise you :-)
> >>
> >>Integers in Javascript are actually represented as doubles
> >>internally which means that integer constants are only accurate up
> >>to 52 bits.
> >>
> >>So really, we should cap integers at 32-bit :-/
> >>
> >>Have I mentioned recently that I really dislike JSON...
> >
> >NB, I am distinguishing between JSON the generic specification and
> >JSON as implemented in web browsers. JSON the specification has *no*
> >limitation on integers.
> 
> The spec has no notion of integers at all.  Here's the relevant
> text. Note that the BNF only has a single entry point for numbers.
> It does not distinguish between integers and floating point numbers.
> Also, the only discussion of valid numbers is about whether the
> number can be represented as a rational number.  I think the only
> way to read the spec here is that *all* numbers are meant to be
> represented as floating point numbers.

I don't agree. It means that JSON as a format can represent arbitrary
numbers, whether rational or not. Interpretation as 32/64/128 bit
floating point, or as 16/32/64 bit integers is entirely a matter for
the parser.

The only issue is whether the implementation parser can hold the
numbers without loosing precision. eg, so if JavaScript gets a
number which doesn't fit in 52-bits, it'll loose some precision
due to floating point storage. Nothing in the JSON spec requires
other languages to throw away precision when parsing or formatting
JSON.

Daniel
Markus Armbruster May 23, 2011, 2:29 p.m. UTC | #18
Anthony Liguori <anthony@codemonkey.ws> writes:

>>> The actual value of the alert will surprise you :-)
>>>
>>> Integers in Javascript are actually represented as doubles
>>> internally which means that integer constants are only accurate up
>>> to 52 bits.
>>>
>>> So really, we should cap integers at 32-bit :-/
>>>
>>> Have I mentioned recently that I really dislike JSON...
>>
>> NB, I am distinguishing between JSON the generic specification and
>> JSON as implemented in web browsers. JSON the specification has *no*
>> limitation on integers.
>
> The spec has no notion of integers at all.  Here's the relevant

It doesn't differentiate between integers and floating-point numbers
*syntactically*.  There are just numbers.  Some of them happen to be
integers.

> text. Note that the BNF only has a single entry point for numbers.  It
> does not distinguish between integers and floating point numbers.
> Also, the only discussion of valid numbers is about whether the number
> can be represented as a rational number.  I think the only way to read
> the spec here is that *all* numbers are meant to be represented as
> floating point numbers.
>
> Regards,
>
> Anthony Liguori
>
> 2.4.  Numbers
>
>    The representation of numbers is similar to that used in most
>    programming languages.  A number contains an integer component that
>    may be prefixed with an optional minus sign, which may be followed by
>    a fraction part and/or an exponent part.
>
>    Octal and hex forms are not allowed.  Leading zeros are not allowed.
>
>    A fraction part is a decimal point followed by one or more digits.
>
>    An exponent part begins with the letter E in upper or lowercase,
>    which may be followed by a plus or minus sign.  The E and optional
>    sign are followed by one or more digits.
>
>    Numeric values that cannot be represented as sequences of digits
>    (such as Infinity and NaN) are not permitted.

Therefore, the number 1234567890123456789 is permitted.

>
>
>          number = [ minus ] int [ frac ] [ exp ]
>
>          decimal-point = %x2E       ; .
>
>          digit1-9 = %x31-39         ; 1-9
>
>          e = %x65 / %x45            ; e E
>
>          exp = e [ minus / plus ] 1*DIGIT
>
>          frac = decimal-point 1*DIGIT
>
>          int = zero / ( digit1-9 *DIGIT )
>
>          minus = %x2D               ; -
>
>          plus = %x2B                ; +
>
>          zero = %x30                ; 0

There's more:

4.  Parsers

   A JSON parser transforms a JSON text into another representation.  A
   JSON parser MUST accept all texts that conform to the JSON grammar.
   A JSON parser MAY accept non-JSON forms or extensions.

   An implementation may set limits on the size of texts that it
   accepts.  An implementation may set limits on the maximum depth of
==>nesting.  An implementation may set limits on the range of numbers.<==
   An implementation may set limits on the length and character contents
   of strings.

JavaScript's implementation of JSON sets limits on the range of numbers,
namely they need to fit into IEEE doubles.

Our implementation sets different limits.  IIRC, it's something like
"numbers with a fractional part or an exponent need to fit into IEEE
doubles, anything else into int64_t."  Not exactly the acme of elegance,
either.  But it works for us.
Daniel P. Berrangé May 23, 2011, 2:32 p.m. UTC | #19
On Mon, May 23, 2011 at 04:29:55PM +0200, Markus Armbruster wrote:
> Anthony Liguori <anthony@codemonkey.ws> writes:
> 
> >>> The actual value of the alert will surprise you :-)
> >>>
> >>> Integers in Javascript are actually represented as doubles
> >>> internally which means that integer constants are only accurate up
> >>> to 52 bits.
> >>>
> >>> So really, we should cap integers at 32-bit :-/
> >>>
> >>> Have I mentioned recently that I really dislike JSON...
> >>
> >> NB, I am distinguishing between JSON the generic specification and
> >> JSON as implemented in web browsers. JSON the specification has *no*
> >> limitation on integers.
> >
> > The spec has no notion of integers at all.  Here's the relevant
> 
> It doesn't differentiate between integers and floating-point numbers
> *syntactically*.  There are just numbers.  Some of them happen to be
> integers.
> 
> > text. Note that the BNF only has a single entry point for numbers.  It
> > does not distinguish between integers and floating point numbers.
> > Also, the only discussion of valid numbers is about whether the number
> > can be represented as a rational number.  I think the only way to read
> > the spec here is that *all* numbers are meant to be represented as
> > floating point numbers.
> >
> > Regards,
> >
> > Anthony Liguori
> >
> > 2.4.  Numbers
> >
> >    The representation of numbers is similar to that used in most
> >    programming languages.  A number contains an integer component that
> >    may be prefixed with an optional minus sign, which may be followed by
> >    a fraction part and/or an exponent part.
> >
> >    Octal and hex forms are not allowed.  Leading zeros are not allowed.
> >
> >    A fraction part is a decimal point followed by one or more digits.
> >
> >    An exponent part begins with the letter E in upper or lowercase,
> >    which may be followed by a plus or minus sign.  The E and optional
> >    sign are followed by one or more digits.
> >
> >    Numeric values that cannot be represented as sequences of digits
> >    (such as Infinity and NaN) are not permitted.
> 
> Therefore, the number 1234567890123456789 is permitted.
> 
> >
> >
> >          number = [ minus ] int [ frac ] [ exp ]
> >
> >          decimal-point = %x2E       ; .
> >
> >          digit1-9 = %x31-39         ; 1-9
> >
> >          e = %x65 / %x45            ; e E
> >
> >          exp = e [ minus / plus ] 1*DIGIT
> >
> >          frac = decimal-point 1*DIGIT
> >
> >          int = zero / ( digit1-9 *DIGIT )
> >
> >          minus = %x2D               ; -
> >
> >          plus = %x2B                ; +
> >
> >          zero = %x30                ; 0
> 
> There's more:
> 
> 4.  Parsers
> 
>    A JSON parser transforms a JSON text into another representation.  A
>    JSON parser MUST accept all texts that conform to the JSON grammar.
>    A JSON parser MAY accept non-JSON forms or extensions.

Incidentally, we should likely use the QMP capabilities handshake to
optionally enable parsing support for Infinity/-Infinity/NaN in QEMU
since that seems to be a common extension used[1].

Regards,
Daniel

[1] http://deron.meranda.us/python/comparing_json_modules/numbers
Anthony Liguori May 23, 2011, 3:03 p.m. UTC | #20
On 05/23/2011 09:14 AM, Daniel P. Berrange wrote:
> On Mon, May 23, 2011 at 08:45:54AM -0500, Anthony Liguori wrote:
>> On 05/23/2011 08:40 AM, Daniel P. Berrange wrote:
>>> On Mon, May 23, 2011 at 08:33:03AM -0500, Anthony Liguori wrote:
>>>> On 05/23/2011 08:04 AM, Daniel P. Berrange wrote:
>>>>> On Fri, May 20, 2011 at 01:11:05PM -0500, Anthony Liguori wrote:
>>>>>> On 05/20/2011 01:03 PM, Richard W.M. Jones wrote:
>>>>>>>
>>>>>>> There seem to be a few unsafe uses of strto* functions.  This patch
>>>>>>> just fixes the one that affects me :-)
>>>>>>
>>>>>> Sending an integer of this size is not valid JSON.
>>>>>>
>>>>>> Your patch won't accept negative numbers, correct?
>>>>>>
>>>>>> JSON only supports int64_t.
>>>>>
>>>>> That's not really true. JSON supports arbitrarily large numbers
>>>>> &    integers.
>>>>
>>>> Try the following snippet in your browser:
>>>>
>>>> <html>
>>>> <head>
>>>> <script type="text/javascript">
>>>> alert(9223372036854775807);
>>>> </script>
>>>> </head>
>>>> </html>
>>>>
>>>> The actual value of the alert will surprise you :-)
>>>>
>>>> Integers in Javascript are actually represented as doubles
>>>> internally which means that integer constants are only accurate up
>>>> to 52 bits.
>>>>
>>>> So really, we should cap integers at 32-bit :-/
>>>>
>>>> Have I mentioned recently that I really dislike JSON...
>>>
>>> NB, I am distinguishing between JSON the generic specification and
>>> JSON as implemented in web browsers. JSON the specification has *no*
>>> limitation on integers. Any limitation, like the one you demonstrate,
>>> is inherantly just specific to the implementation.
>>
>> No, EMCA is very specific in how integers are handled in JavaScript.
>> Every implementation of JavaScript is going to exhibit this
>> behavior.
>>
>> The JSON specification lack of specificity here I think has to be
>> interpreted as a deferral to the EMCA specification.
>
> The EMCA spec declares that integers upto 52-bits can be stored
> without loosing precision. This doesn't forbid sending of 64-bit
> integers via JSON. It merely implies that when parsed into a
> EMCA-Script object you'll loose precision. So this doesn't mean that
> QEMU has to throw away the extra precision when parsing JSON, nor
> do client apps have to throw away precision when generating JSON
> for QEMU. Both client&  QEMU can use a full uint64 if desired.

Thinking more carefully about this, I think the following rule is important:

1) Integers that would cause overflow should be treated as double 
precision floating point numbers.

2) A conforming implementation must support integer precision up to 
52-bit signed integers.

I think this is valid because the string:

9223372036854775808

Is a representation of:

9223372036854776e3

Both are equivalent representations of the same number.  So we can send 
and accept arbitrarily large integers provided that we always fallback 
to representing integers as double precision floating points if the 
integer would otherwise truncate.

I think this means we need to drop QFloat and QInt, add a QNumber, and 
then add _from_uint64/to_uint64 and _from_double/to_double.

Regards,

Anthony Liguori
Anthony Liguori May 23, 2011, 3:07 p.m. UTC | #21
On 05/23/2011 09:29 AM, Markus Armbruster wrote:
> Anthony Liguori<anthony@codemonkey.ws>  writes:
>
> JavaScript's implementation of JSON sets limits on the range of numbers,
> namely they need to fit into IEEE doubles.
>
> Our implementation sets different limits.  IIRC, it's something like
> "numbers with a fractional part or an exponent need to fit into IEEE
> doubles, anything else into int64_t."  Not exactly the acme of elegance,
> either.  But it works for us.

In order to be compatible with JavaScript (which I think is necessary to 
really satisfy the spec), we just need to make sure that our integers 
are represented by at least 53-bits (to enable signed integers) and 
critically, fall back to floating point representation to ensure that we 
round instead of truncate.

Regards,

Anthony Liguori

>
Richard W.M. Jones May 23, 2011, 3:19 p.m. UTC | #22
On Mon, May 23, 2011 at 10:07:21AM -0500, Anthony Liguori wrote:
> On 05/23/2011 09:29 AM, Markus Armbruster wrote:
> >Anthony Liguori<anthony@codemonkey.ws>  writes:
> >
> >JavaScript's implementation of JSON sets limits on the range of numbers,
> >namely they need to fit into IEEE doubles.
> >
> >Our implementation sets different limits.  IIRC, it's something like
> >"numbers with a fractional part or an exponent need to fit into IEEE
> >doubles, anything else into int64_t."  Not exactly the acme of elegance,
> >either.  But it works for us.
> 
> In order to be compatible with JavaScript (which I think is
> necessary to really satisfy the spec), we just need to make sure
> that our integers are represented by at least 53-bits (to enable
> signed integers) and critically, fall back to floating point
> representation to ensure that we round instead of truncate.

The problem is to be able to send 64 bit memory and disk offsets
faithfully.  This doesn't just fail to solve the problem, it's
actually going to make it a whole lot worse.

I don't agree with you that whatever the JSON standard (RFC) doesn't
specify must be filled in by reading Javascript/ECMA.  If this is so
important, it's very odd that it doesn't mention this fallback in the
RFC.  If you read the RFC alone then it's pretty clear (to me) that it
leaves limits up to the application.

Rich.
Anthony Liguori May 23, 2011, 3:24 p.m. UTC | #23
On 05/23/2011 10:19 AM, Richard W.M. Jones wrote:
> On Mon, May 23, 2011 at 10:07:21AM -0500, Anthony Liguori wrote:
>> On 05/23/2011 09:29 AM, Markus Armbruster wrote:
>>> Anthony Liguori<anthony@codemonkey.ws>   writes:
>>>
>>> JavaScript's implementation of JSON sets limits on the range of numbers,
>>> namely they need to fit into IEEE doubles.
>>>
>>> Our implementation sets different limits.  IIRC, it's something like
>>> "numbers with a fractional part or an exponent need to fit into IEEE
>>> doubles, anything else into int64_t."  Not exactly the acme of elegance,
>>> either.  But it works for us.
>>
>> In order to be compatible with JavaScript (which I think is
>> necessary to really satisfy the spec), we just need to make sure
>> that our integers are represented by at least 53-bits (to enable
>> signed integers) and critically, fall back to floating point
>> representation to ensure that we round instead of truncate.
>
> The problem is to be able to send 64 bit memory and disk offsets
> faithfully.  This doesn't just fail to solve the problem, it's
> actually going to make it a whole lot worse.
>
> I don't agree with you that whatever the JSON standard (RFC) doesn't
> specify must be filled in by reading Javascript/ECMA.

"  It is derived from the object
    literals of JavaScript, as defined in the ECMAScript Programming
    Language Standard, Third Edition [ECMA]."

> If this is so
> important, it's very odd that it doesn't mention this fallback in the
> RFC.  If you read the RFC alone then it's pretty clear (to me) that it
> leaves limits up to the application.

If it's left up to the application, doesn't that mean that we can't ever 
send 64-bit memory/disk faithfully?

Because a client would be allowed to represent integers as signed 32-bit 
numbers.

Fundamentally, we need to ask ourselves, do we want to support any JSON 
client or require JSON libraries explicitly written for QEMU?

What I suggested would let us work with any JSON client, but if clients 
loose precision after 53-bits, those clients would not work well with QEMU.

The alternative approach is to be conservative and only use 32-bit 
integers and represent everything in two numbers.

Regards,

Anthony Liguori

>
> Rich.
>
Richard W.M. Jones May 23, 2011, 3:29 p.m. UTC | #24
On Mon, May 23, 2011 at 10:24:07AM -0500, Anthony Liguori wrote:
> On 05/23/2011 10:19 AM, Richard W.M. Jones wrote:
> >On Mon, May 23, 2011 at 10:07:21AM -0500, Anthony Liguori wrote:
> >>On 05/23/2011 09:29 AM, Markus Armbruster wrote:
> >>>Anthony Liguori<anthony@codemonkey.ws>   writes:
> >>>
> >>>JavaScript's implementation of JSON sets limits on the range of numbers,
> >>>namely they need to fit into IEEE doubles.
> >>>
> >>>Our implementation sets different limits.  IIRC, it's something like
> >>>"numbers with a fractional part or an exponent need to fit into IEEE
> >>>doubles, anything else into int64_t."  Not exactly the acme of elegance,
> >>>either.  But it works for us.
> >>
> >>In order to be compatible with JavaScript (which I think is
> >>necessary to really satisfy the spec), we just need to make sure
> >>that our integers are represented by at least 53-bits (to enable
> >>signed integers) and critically, fall back to floating point
> >>representation to ensure that we round instead of truncate.
> >
> >The problem is to be able to send 64 bit memory and disk offsets
> >faithfully.  This doesn't just fail to solve the problem, it's
> >actually going to make it a whole lot worse.
> >
> >I don't agree with you that whatever the JSON standard (RFC) doesn't
> >specify must be filled in by reading Javascript/ECMA.
> 
> "  It is derived from the object
>    literals of JavaScript, as defined in the ECMAScript Programming
>    Language Standard, Third Edition [ECMA]."

I don't think this is anything other than advice or background
about the standard.

> >If this is so
> >important, it's very odd that it doesn't mention this fallback in the
> >RFC.  If you read the RFC alone then it's pretty clear (to me) that it
> >leaves limits up to the application.
> 
> If it's left up to the application, doesn't that mean that we can't
> ever send 64-bit memory/disk faithfully?
> 
> Because a client would be allowed to represent integers as signed
> 32-bit numbers.
> 
> Fundamentally, we need to ask ourselves, do we want to support any
> JSON client or require JSON libraries explicitly written for QEMU?
> 
> What I suggested would let us work with any JSON client, but if
> clients loose precision after 53-bits, those clients would not work
> well with QEMU.

I totally agree that the JSON "standard" is completely underspecified
and not very useful (lacking a schema, strong typing, well-specified
limits).  Nevertheless, for better or worse it's what we're using.

There is one very important JSON client we are using called libvirt.

> The alternative approach is to be conservative and only use 32-bit
> integers and represent everything in two numbers.

Or use strings ...

Rich.
Daniel P. Berrangé May 23, 2011, 3:38 p.m. UTC | #25
On Mon, May 23, 2011 at 10:24:07AM -0500, Anthony Liguori wrote:
> On 05/23/2011 10:19 AM, Richard W.M. Jones wrote:
> >On Mon, May 23, 2011 at 10:07:21AM -0500, Anthony Liguori wrote:
> >>On 05/23/2011 09:29 AM, Markus Armbruster wrote:
> >>>Anthony Liguori<anthony@codemonkey.ws>   writes:
> >>>
> >>>JavaScript's implementation of JSON sets limits on the range of numbers,
> >>>namely they need to fit into IEEE doubles.
> >>>
> >>>Our implementation sets different limits.  IIRC, it's something like
> >>>"numbers with a fractional part or an exponent need to fit into IEEE
> >>>doubles, anything else into int64_t."  Not exactly the acme of elegance,
> >>>either.  But it works for us.
> >>
> >>In order to be compatible with JavaScript (which I think is
> >>necessary to really satisfy the spec), we just need to make sure
> >>that our integers are represented by at least 53-bits (to enable
> >>signed integers) and critically, fall back to floating point
> >>representation to ensure that we round instead of truncate.
> >
> >The problem is to be able to send 64 bit memory and disk offsets
> >faithfully.  This doesn't just fail to solve the problem, it's
> >actually going to make it a whole lot worse.
> >
> >I don't agree with you that whatever the JSON standard (RFC) doesn't
> >specify must be filled in by reading Javascript/ECMA.
> 
> "  It is derived from the object
>    literals of JavaScript, as defined in the ECMAScript Programming
>    Language Standard, Third Edition [ECMA]."
> 
> >If this is so
> >important, it's very odd that it doesn't mention this fallback in the
> >RFC.  If you read the RFC alone then it's pretty clear (to me) that it
> >leaves limits up to the application.
> 
> If it's left up to the application, doesn't that mean that we can't
> ever send 64-bit memory/disk faithfully?
> 
> Because a client would be allowed to represent integers as signed
> 32-bit numbers.
> 
> Fundamentally, we need to ask ourselves, do we want to support any
> JSON client or require JSON libraries explicitly written for QEMU?

We just need to clarify what we mean by 'support' here. Any JSON
compliant client can talk to the QEMU monitor, but some JSON clients
may be better than others. So we just need state:

 - QEMU is compatible with any JSON client.
 - QEMU recommends use of JSON clients that can represent
   integers with a minimum precision of 64-bits.

> What I suggested would let us work with any JSON client, but if
> clients loose precision after 53-bits, those clients would not work
> well with QEMU.

Yep, if clients loose precision after 53-bits that's fine, just an
app problem. The key factor is that QEMU must not be the one throwing
away precision.

Daniel
Daniel P. Berrangé May 23, 2011, 3:41 p.m. UTC | #26
On Mon, May 23, 2011 at 10:03:18AM -0500, Anthony Liguori wrote:
> On 05/23/2011 09:14 AM, Daniel P. Berrange wrote:
> >On Mon, May 23, 2011 at 08:45:54AM -0500, Anthony Liguori wrote:
> >>On 05/23/2011 08:40 AM, Daniel P. Berrange wrote:
> >>>On Mon, May 23, 2011 at 08:33:03AM -0500, Anthony Liguori wrote:
> >>>>On 05/23/2011 08:04 AM, Daniel P. Berrange wrote:
> >>>>>On Fri, May 20, 2011 at 01:11:05PM -0500, Anthony Liguori wrote:
> >>>>>>On 05/20/2011 01:03 PM, Richard W.M. Jones wrote:
> >>>>>>>
> >>>>>>>There seem to be a few unsafe uses of strto* functions.  This patch
> >>>>>>>just fixes the one that affects me :-)
> >>>>>>
> >>>>>>Sending an integer of this size is not valid JSON.
> >>>>>>
> >>>>>>Your patch won't accept negative numbers, correct?
> >>>>>>
> >>>>>>JSON only supports int64_t.
> >>>>>
> >>>>>That's not really true. JSON supports arbitrarily large numbers
> >>>>>&    integers.
> >>>>
> >>>>Try the following snippet in your browser:
> >>>>
> >>>><html>
> >>>><head>
> >>>><script type="text/javascript">
> >>>>alert(9223372036854775807);
> >>>></script>
> >>>></head>
> >>>></html>
> >>>>
> >>>>The actual value of the alert will surprise you :-)
> >>>>
> >>>>Integers in Javascript are actually represented as doubles
> >>>>internally which means that integer constants are only accurate up
> >>>>to 52 bits.
> >>>>
> >>>>So really, we should cap integers at 32-bit :-/
> >>>>
> >>>>Have I mentioned recently that I really dislike JSON...
> >>>
> >>>NB, I am distinguishing between JSON the generic specification and
> >>>JSON as implemented in web browsers. JSON the specification has *no*
> >>>limitation on integers. Any limitation, like the one you demonstrate,
> >>>is inherantly just specific to the implementation.
> >>
> >>No, EMCA is very specific in how integers are handled in JavaScript.
> >>Every implementation of JavaScript is going to exhibit this
> >>behavior.
> >>
> >>The JSON specification lack of specificity here I think has to be
> >>interpreted as a deferral to the EMCA specification.
> >
> >The EMCA spec declares that integers upto 52-bits can be stored
> >without loosing precision. This doesn't forbid sending of 64-bit
> >integers via JSON. It merely implies that when parsed into a
> >EMCA-Script object you'll loose precision. So this doesn't mean that
> >QEMU has to throw away the extra precision when parsing JSON, nor
> >do client apps have to throw away precision when generating JSON
> >for QEMU. Both client&  QEMU can use a full uint64 if desired.
> 
> Thinking more carefully about this, I think the following rule is important:
> 
> 1) Integers that would cause overflow should be treated as double
> precision floating point numbers.
> 
> 2) A conforming implementation must support integer precision up to
> 52-bit signed integers.
> 
> I think this is valid because the string:
> 
> 9223372036854775808
> 
> Is a representation of:
> 
> 9223372036854776e3
> 
> Both are equivalent representations of the same number.  So we can
> send and accept arbitrarily large integers provided that we always
> fallback to representing integers as double precision floating
> points if the integer would otherwise truncate.
> 
> I think this means we need to drop QFloat and QInt, add a QNumber,
> and then add _from_uint64/to_uint64 and _from_double/to_double.

As long as QNumber is using the string as its internal representation,
and only converting to a more limited integer/float format at time of
use, this sounds workable.

Daniel
Anthony Liguori May 23, 2011, 3:59 p.m. UTC | #27
On 05/23/2011 10:29 AM, Richard W.M. Jones wrote:
> On Mon, May 23, 2011 at 10:24:07AM -0500, Anthony Liguori wrote:
>> On 05/23/2011 10:19 AM, Richard W.M. Jones wrote:
>> What I suggested would let us work with any JSON client, but if
>> clients loose precision after 53-bits, those clients would not work
>> well with QEMU.
>
> I totally agree that the JSON "standard" is completely underspecified
> and not very useful (lacking a schema, strong typing, well-specified
> limits).  Nevertheless, for better or worse it's what we're using.
>
> There is one very important JSON client we are using called libvirt.

No, libvirt is a virtualization library.  It happens to have a home 
grown JSON client but down the road, if there's a nice, common JSON 
library and it decides to switch to it, we don't want to have a bunch of 
compatibility issues that prevents that from happening.

The point of using a standard RPC is to support multiple clients. 
Otherwise, we should just stop calling it JSON and make it behave the 
way we want it to.

>> The alternative approach is to be conservative and only use 32-bit
>> integers and represent everything in two numbers.
>
> Or use strings ...

JSON types are variant.  We could send integers great than 53-bits back 
as strings..  This would break libvirt badly though.

>
> Rich.
>
Daniel P. Berrangé May 23, 2011, 4:06 p.m. UTC | #28
On Mon, May 23, 2011 at 10:59:29AM -0500, Anthony Liguori wrote:
> On 05/23/2011 10:29 AM, Richard W.M. Jones wrote:
> >On Mon, May 23, 2011 at 10:24:07AM -0500, Anthony Liguori wrote:
> >>On 05/23/2011 10:19 AM, Richard W.M. Jones wrote:
> >>What I suggested would let us work with any JSON client, but if
> >>clients loose precision after 53-bits, those clients would not work
> >>well with QEMU.
> >
> >I totally agree that the JSON "standard" is completely underspecified
> >and not very useful (lacking a schema, strong typing, well-specified
> >limits).  Nevertheless, for better or worse it's what we're using.
> >
> >There is one very important JSON client we are using called libvirt.
> 
> No, libvirt is a virtualization library.  It happens to have a home
> grown JSON client but down the road, if there's a nice, common JSON
> library and it decides to switch to it, we don't want to have a
> bunch of compatibility issues that prevents that from happening.

We use the YAJL json library for parsing & formatting JSON. When
parsing, it validates that the number is valid according to the
JSON grammer, and then passes it without semantic interpretation
onto the app using the library. So with YAJL, it is the app (libvirt)
which decides the whether to treat this as a float, double, int64,
uint64 as appropriate to the context of the JSON document.

Regards,
Daniel
Markus Armbruster May 23, 2011, 4:18 p.m. UTC | #29
Anthony Liguori <anthony@codemonkey.ws> writes:

> On 05/23/2011 10:19 AM, Richard W.M. Jones wrote:
>> On Mon, May 23, 2011 at 10:07:21AM -0500, Anthony Liguori wrote:
>>> On 05/23/2011 09:29 AM, Markus Armbruster wrote:
>>>> Anthony Liguori<anthony@codemonkey.ws>   writes:
>>>>
>>>> JavaScript's implementation of JSON sets limits on the range of numbers,
>>>> namely they need to fit into IEEE doubles.
>>>>
>>>> Our implementation sets different limits.  IIRC, it's something like
>>>> "numbers with a fractional part or an exponent need to fit into IEEE
>>>> doubles, anything else into int64_t."  Not exactly the acme of elegance,
>>>> either.  But it works for us.
>>>
>>> In order to be compatible with JavaScript (which I think is
>>> necessary to really satisfy the spec), we just need to make sure
>>> that our integers are represented by at least 53-bits (to enable
>>> signed integers) and critically, fall back to floating point
>>> representation to ensure that we round instead of truncate.
>>
>> The problem is to be able to send 64 bit memory and disk offsets
>> faithfully.  This doesn't just fail to solve the problem, it's
>> actually going to make it a whole lot worse.
>>
>> I don't agree with you that whatever the JSON standard (RFC) doesn't
>> specify must be filled in by reading Javascript/ECMA.
>
> "  It is derived from the object
>    literals of JavaScript, as defined in the ECMAScript Programming
>    Language Standard, Third Edition [ECMA]."

"derived from" != "identical with".

JSON is *syntax*.  Numerical limits are outside its scope.

>> If this is so
>> important, it's very odd that it doesn't mention this fallback in the
>> RFC.  If you read the RFC alone then it's pretty clear (to me) that it
>> leaves limits up to the application.
>
> If it's left up to the application, doesn't that mean that we can't
> ever send 64-bit memory/disk faithfully?
>
> Because a client would be allowed to represent integers as signed
> 32-bit numbers.

A client is allowed to represent numbers however it sees fit.  Even
fixed-point BCD.  The RFC doesn't specify minimal limits.

> Fundamentally, we need to ask ourselves, do we want to support any
> JSON client or require JSON libraries explicitly written for QEMU?

Simple: we require a JSON library that can represent 64 bit integers
faithfully.  Those aren't exactly in short supply.

> What I suggested would let us work with any JSON client, but if
> clients loose precision after 53-bits, those clients would not work
> well with QEMU.
>
> The alternative approach is to be conservative and only use 32-bit
> integers and represent everything in two numbers.

That's a fair guess of what any of today's JSON libraries should be able
to represent, but it's not conservative.
Anthony Liguori May 23, 2011, 4:37 p.m. UTC | #30
On 05/23/2011 11:18 AM, Markus Armbruster wrote:
> Anthony Liguori<anthony@codemonkey.ws>  writes:
>
>> If it's left up to the application, doesn't that mean that we can't
>> ever send 64-bit memory/disk faithfully?
>>
>> Because a client would be allowed to represent integers as signed
>> 32-bit numbers.
>
> A client is allowed to represent numbers however it sees fit.  Even
> fixed-point BCD.  The RFC doesn't specify minimal limits.

I think there's a point being lost in the discussion.

Consider a QMP function called identity(x) that just returns it's argument.

Here's what's fundamentally at question:

identity(1) -> 1                                           // goodness
identity(NaN) -> #Exception                                // goodness
identity(int64_max) -> int64_max                           // goodness
identity(int64_max) -> rnd_to_52_bits(int64_max)           // goodness
identity(int64_max + 1) -> rnd_to_52_bits(int64_max + 1)   // goodness
identity(int64_max + 1) -> int64_t max + 1                 // goodness
identity(int64_max + 1) -> -1                              // badness

JSON does not distinguish between integers and floating point numbers. 
It's behaviors with respect to overflow of whatever the defined 
representation is should not be integer overflow but IEEE rounding.

I would further argue that a conforming client/server guarantees *at 
least* double precision floating point accuracy.  Additional would be 
allowed providing that the additional accuracy meets the rounding rules 
of IEEE.

So for instance, you can use an 80-bit floating point number to do your 
math and send string representations of the 80-bit number (because the 
conversion should give you the same results within the expected accuracy).

Regards,

Anthony Liguori
Jamie Lokier May 23, 2011, 11:02 p.m. UTC | #31
Richard W.M. Jones wrote:
> The problem is to be able to send 64 bit memory and disk offsets
> faithfully.  This doesn't just fail to solve the problem, it's
> actually going to make it a whole lot worse.

Such offsets would be so much more readable in hexadecimal.

So why not use a string "0xffff800012340000" instead?

That is universally Javascript compatible as well as much more
convenient for humans.

Or at least, *accept* a hex string wherever a number is required by
QMP (just because hex is convenient anyway, no compatibility issue),
and *emit* a hex string where the number may be out of Javascript's
unambiguous range, or where a hex string would make more sense anyway.

-- Jamie
Anthony Liguori May 24, 2011, 2:50 a.m. UTC | #32
On 05/23/2011 06:02 PM, Jamie Lokier wrote:
> Richard W.M. Jones wrote:
>> The problem is to be able to send 64 bit memory and disk offsets
>> faithfully.  This doesn't just fail to solve the problem, it's
>> actually going to make it a whole lot worse.
>
> Such offsets would be so much more readable in hexadecimal.
>
> So why not use a string "0xffff800012340000" instead?

This doesn't change the fundamental issue here.  Javascript's internal 
representation for integers isn't 2s compliment, but IEEE794.  This 
means the expectations about how truncation/overflow is handled is 
fundamentally different.

Regards,

Anthony Liguori

>
> That is universally Javascript compatible as well as much more
> convenient for humans.
>
> Or at least, *accept* a hex string wherever a number is required by
> QMP (just because hex is convenient anyway, no compatibility issue),
> and *emit* a hex string where the number may be out of Javascript's
> unambiguous range, or where a hex string would make more sense anyway.
>
> -- Jamie
>
Jamie Lokier May 24, 2011, 5:30 a.m. UTC | #33
Anthony Liguori wrote:
> On 05/23/2011 06:02 PM, Jamie Lokier wrote:
> >Richard W.M. Jones wrote:
> >>The problem is to be able to send 64 bit memory and disk offsets
> >>faithfully.  This doesn't just fail to solve the problem, it's
> >>actually going to make it a whole lot worse.
> >
> >Such offsets would be so much more readable in hexadecimal.
> >
> >So why not use a string "0xffff800012340000" instead?
> 
> This doesn't change the fundamental issue here.  Javascript's internal 
> representation for integers isn't 2s compliment, but IEEE794.  This 
> means the expectations about how truncation/overflow is handled is 
> fundamentally different.

No, the point is it's a string so Javascript numerics doesn't come
into it, no overflow, no truncation, no arithmetic.  Every program
that wants to handle them handles them as a *string-valued attribute*
externally, and whatever representation it needs for a particular
attribute internally.  (Just as enum values are represented with
strings too).

In the unlikely event that someone wants to do arithmetic on these
values *in actual Javascript*, it'll be tricky for them, but the
representation doesn't have much to do with that.

-- Jamie
Markus Armbruster May 24, 2011, 6:26 a.m. UTC | #34
Anthony Liguori <anthony@codemonkey.ws> writes:

> On 05/23/2011 11:18 AM, Markus Armbruster wrote:
>> Anthony Liguori<anthony@codemonkey.ws>  writes:
>>
>>> If it's left up to the application, doesn't that mean that we can't
>>> ever send 64-bit memory/disk faithfully?
>>>
>>> Because a client would be allowed to represent integers as signed
>>> 32-bit numbers.
>>
>> A client is allowed to represent numbers however it sees fit.  Even
>> fixed-point BCD.  The RFC doesn't specify minimal limits.
>
> I think there's a point being lost in the discussion.
>
> Consider a QMP function called identity(x) that just returns it's argument.
>
> Here's what's fundamentally at question:
>
> identity(1) -> 1                                           // goodness
> identity(NaN) -> #Exception                                // goodness
> identity(int64_max) -> int64_max                           // goodness
> identity(int64_max) -> rnd_to_52_bits(int64_max)           // goodness
> identity(int64_max + 1) -> rnd_to_52_bits(int64_max + 1)   // goodness
> identity(int64_max + 1) -> int64_t max + 1                 // goodness
> identity(int64_max + 1) -> -1                              // badness

That's one workable numerical semantics.  There are others.

I don't follow your jump from JSON (a somewhat underspecified data
exchange format) to "QMP function" (a somewhat unspecified programming
language).

> JSON does not distinguish between integers and floating point
> numbers.

Correct.

>          It's behaviors with respect to overflow of whatever the
> defined representation is should not be integer overflow but IEEE
> rounding.

Citation needed.

> I would further argue that a conforming client/server guarantees *at
> least* double precision floating point accuracy.  Additional would be
> allowed providing that the additional accuracy meets the rounding
> rules of IEEE.

Conforming to what?  The RFC does not specify anything on the
interpretation of the syntactic entity "number".

You fill that gap by falling back to JavaScript, on the theory that
"derived from JavaScript" implies "all gaps in this RFC to be filled
from JavaScript".  Not only is that theory debatable, it also needs to
face the test of existing practice.  It fails that test.

Here's a more or less random pick from the real world:
http://deron.meranda.us/python/comparing_json_modules/numbers

    About JSON numbers

    It is important to realize that JSON only specifies the syntax for
    writing numerals in decimal notation, and does not deal with any
    semantics or meanings of actual numbers.  JSON does not distinguish
    types, such as integer or floating point.  Nor does JSON impose any
    size or precision limits on the numbers it can represent.

and

    Converting Python numbers to JSON

    Integral types

    Python integral types (int and long) should be convertable directly
    to JSON.  Since JSON imposes no size limits all Python integers can
    be represented by JSON; however, remember that many other JSON
    implementations in other languages may not be able to handle these
    large numbers.

> So for instance, you can use an 80-bit floating point number to do
> your math and send string representations of the 80-bit number
> (because the conversion should give you the same results within the
> expected accuracy).

You can also drill a hole into your knee and fill it with milk.

Look, JSON is just syntax.  It intentionally leaves assigning meaning to
the application.  This includes choice of numerical limits, as the RFC
clearly states.

The application QEMU needs to declare applicable numerical limits.
Clients ignore them at their own risk.  End of story.

Of course the application is free to twist its use of JSON into
unnatural contortions like sending numbers as strings (sometimes or
always), or splitting them apart.  This may make a few more JSON
implementations compatible with the application.  It also makes it
harder to build the interesting stuff on top of the JSON implementation.

In the case of QMP, that's a bad tradeoff.  More so, since it would
break the one known client: libvirt.  Which had no trouble whatsoever
finding a JSON implementation compatible with QMP's numerical limits.
diff mbox

Patch

diff --git a/json-parser.c b/json-parser.c
index 6c06ef9..3747ba5 100644
--- a/json-parser.c
+++ b/json-parser.c
@@ -512,6 +512,8 @@  static QObject *parse_literal(JSONParserContext *ctxt, QList **tokens)
 {
     QObject *token, *obj;
     QList *working = qlist_copy(*tokens);
+    const char *token_str;
+    unsigned long long ull;
 
     token = qlist_pop(working);
     switch (token_get_type(token)) {
@@ -519,7 +521,14 @@  static QObject *parse_literal(JSONParserContext *ctxt, QList **tokens)
         obj = QOBJECT(qstring_from_escaped_str(ctxt, token));
         break;
     case JSON_INTEGER:
-        obj = QOBJECT(qint_from_int(strtoll(token_get_value(token), NULL, 10)));
+        token_str = token_get_value(token);
+        errno = 0;
+        ull = strtoull(token_str, NULL, 10);
+        if (errno != 0) {
+            parse_error(ctxt, token, "invalid integer: %s", strerror(errno));
+            return NULL;
+        }
+        obj = QOBJECT(qint_from_int(ull));
         break;
     case JSON_FLOAT:
         /* FIXME dependent on locale */