We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1,
\xF5..\xFF in the lexer. That's insufficient; there's plenty of
invalid UTF-8 not containing these bytes, as demonstrated by
check-qjson:
* Malformed sequences
- Unexpected continuation bytes
- Missing continuation bytes after start bytes other than
\xC0..\xC1, \xF5..\xFD.
* Overlong sequences with start bytes other than \xC0..\xC1,
\xF5..\xFD.
* Invalid code points
Fixing this in the lexer would be bothersome. Fixing it in the parser
is straightforward, so do that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-23-armbru@redhat.com>
The JSON parser rejects some invalid sequences, but accepts others
without correcting the problem.
We should either reject all invalid sequences, or minimize overlong
sequences and replace all other invalid sequences by a suitable
replacement character. A common choice for replacement is U+FFFD.
I'm going to implement the former. Update the comments in
utf8_string() to expect this.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-22-armbru@redhat.com>
Fix the lexer to reject unescaped control characters in JSON strings,
in accordance with RFC 8259 "The JavaScript Object Notation (JSON)
Data Interchange Format".
Bonus: we now recover more nicely from unclosed strings. E.g.
{"one: 1}\n{"two": 2}
now recovers cleanly after the newline, where before the lexer
remained confused until the next unpaired double quote or lexical
error.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-19-armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-17-armbru@redhat.com>
RFC 8259 "The JavaScript Object Notation (JSON) Data Interchange
Format" requires control characters in strings to be escaped.
Demonstrate the JSON parser accepts U+0001 .. U+001F unescaped.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-16-armbru@redhat.com>
Some of utf8_string()'s test_cases[] contain multiple invalid
sequences. Testing that qobject_from_json() fails only tests we
reject at least one invalid sequence. That's incomplete.
Additionally test each non-space sequence in isolation.
This demonstrates that the JSON parser accepts invalid sequences
starting with \xC2..\xF4. Add a FIXME comment.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-15-armbru@redhat.com>
The previous commit made utf8_string()'s test_cases[].utf8_in
superfluous: we can use .json_in instead. Except for the case testing
U+0000. \x00 doesn't work in C strings, so it tests \\u0000 instead.
But testing \\uXXXX is escaped_string()'s job. It's covered there.
Test U+0001 here, and drop .utf8_in.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-14-armbru@redhat.com>
utf8_string() tests only double quoted strings. Cover single quoted
strings, too: store the strings to test without quotes, then wrap them
in either kind of quote.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-13-armbru@redhat.com>
simple_string() and single_quote_string() have become redundant with
escaped_string(), except for embedded single and double quotes.
Replace them by a test that covers just that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-12-armbru@redhat.com>
Cover escaped single quote, surrogates, invalid escapes, and
noncharacters. This demonstrates that valid surrogate pairs are
misinterpreted, and invalid surrogates and noncharacters aren't
rejected.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-11-armbru@redhat.com>
Merge a few closely related test strings, and drop a few redundant
ones.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-10-armbru@redhat.com>
escaped_string() first tests double quoted strings, then repeats a few
tests with single quotes. Repeat all of them: store the strings to
test without quotes, and wrap them in either kind of quote for
testing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-9-armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-5-armbru@redhat.com>
qobject_from_json() can return null without setting an error on
lexical errors. I call that a bug. Add test coverage to demonstrate
it.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-4-armbru@redhat.com>
qobject_from_json() & friends misbehave when the JSON text has more
than one JSON value. Add test coverage to demonstrate the bugs.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-3-armbru@redhat.com>
Commit ab45015a96 "qobject: Let qobject_from_jsonf() fail instead of
abort" fails to accomplish its stated aim: the function can still
abort due to its use of &error_abort.
Its rationale for letting it fail is that all remaining users cope
fine with failure. Well, they're just fine with aborting, too; it's
what they do on failure.
Simply reverting the broken commit would bring back the unfortunate
asymmetry between qobject_from_jsonf() and qobject_from_jsonv(): one
aborts, the other returns null. So also rename it to
qobject_from_jsonf_nofail().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180806065344.7103-7-armbru@redhat.com>
Now that we can safely call QOBJECT() on QObject * as well as its
subtypes, we can have macros qobject_ref() / qobject_unref() that work
everywhere instead of having to use QINCREF() / QDECREF() for QObject
and qobject_incref() / qobject_decref() for its subtypes.
The replacement is mechanical, except I broke a long line, and added a
cast in monitor_qmp_cleanup_req_queue_locked(). Unlike
qobject_decref(), qobject_unref() doesn't accept void *.
Note that the new macros evaluate their argument exactly once, thus no
need to shout them.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180419150145.24795-4-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Rebased, semantic conflict resolved, commit message improved]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This patch was generated using the following Coccinelle script:
@@
expression Obj;
@@
(
- qobject_to_qnum(Obj)
+ qobject_to(QNum, Obj)
|
- qobject_to_qstring(Obj)
+ qobject_to(QString, Obj)
|
- qobject_to_qdict(Obj)
+ qobject_to(QDict, Obj)
|
- qobject_to_qlist(Obj)
+ qobject_to(QList, Obj)
|
- qobject_to_qbool(Obj)
+ qobject_to(QBool, Obj)
)
and a bit of manual fix-up for overly long lines and three places in
tests/check-qjson.c that Coccinelle did not find.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-Id: <20180224154033.29559-4-mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: swap order from qobject_to(o, X), rebase to master, also a fix
to latent false-positive compiler complaint about hw/i386/acpi-build.c]
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-14-armbru@redhat.com>
The macro expansions of qdict_put_TYPE() and qlist_append_TYPE() need
qbool.h, qnull.h, qnum.h and qstring.h to compile. We include qnull.h
and qnum.h in the headers, but not qbool.h and qstring.h. Works,
because we include those wherever the macros get used.
Open-coding these helpers is of dubious value. Turn them into
functions and drop the includes from the headers.
This cleanup makes the number of objects depending on qapi/qmp/qnum.h
from 4551 (out of 4743) to 46 in my "build everything" tree. For
qapi/qmp/qnull.h, the number drops from 4552 to 21.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-10-armbru@redhat.com>
qapi/qmp/types.h is a convenience header to include a number of
qapi/qmp/ headers. Since we rarely need all of the headers
qapi/qmp/types.h includes, we bypass it most of the time. Most of the
places that use it don't need all the headers, either.
Include the necessary headers directly, and drop qapi/qmp/types.h.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-9-armbru@redhat.com>
Make it more obvious about the expected return values.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170825105913.4060-7-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
compare_litqobj_to_qobj() lacks a qlit_ prefix. Moreover, "compare"
suggests -1, 0, +1 for less than, equal and greater than. The
function actually returns non-zero for equal, zero for unequal.
Rename to qlit_equal_qobject().
Its return type will be cleaned up in the next patch.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170825105913.4060-6-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Rename from LiteralQ to QLit.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170825105913.4060-4-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Fix code style issues while at it, to please checkpatch.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170825105913.4060-3-marcandre.lureau@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Switch strtoll() usage to qemu_strtoi64() helper while at it.
Add a few tests for large numbers.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20170607163635.17635-11-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
We would like to use a same QObject type to represent numbers, whether
they are int, uint, or floats. Getters will allow some compatibility
between the various types if the number fits other representations.
Add a few more tests while at it.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20170607163635.17635-7-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[parse_stats_intervals() simplified a bit, comment in
test_visitor_in_int_overflow() tidied up, suppress bogus warnings]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Pass &error_abort with known-good input. Else pass &err and check
what comes back. This demonstrates that the parser fails silently for
many errors.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1488317230-26248-15-git-send-email-armbru@redhat.com>
The next few commits will put the errors to use where appropriate.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1488317230-26248-13-git-send-email-armbru@redhat.com>
qobject_to_qbool(obj) returns NULL when obj isn't a QBool. Check
that instead of qobject_type(obj) == QTYPE_QBOOL.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1487363905-9480-13-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
qobject_to_qfloat(obj) returns NULL when obj isn't a QFloat. Check
that instead of qobject_type(obj) == QTYPE_QFLOAT.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1487363905-9480-12-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
qobject_to_qint(obj) returns NULL when obj isn't a QInt. Check
that instead of qobject_type(obj) == QTYPE_QINT.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1487363905-9480-11-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
qobject_to_qstring(obj) returns NULL when obj isn't a QString. Check
that instead of qobject_type(obj) == QTYPE_QSTRING.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1487363905-9480-10-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Make compare_litqobj_to_qobj() cope with null, and drop non-null
assertions from callers.
compare_litqobj_to_qobj() already checks the QType matches; drop the
redundant assertions from callers.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1487363905-9480-5-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The qobject_from_jsonf() function implements a pseudo-printf
language for creating a QObject; however, it is hard-coded to
only parse a subset of formats understood by -Wformat, and is
not a straight synonym to bare printf(). In particular, any
use of an int64_t integer works only if the system's
definition of PRId64 matches what the parser expects; which
works on glibc (%lld or %ld depending on 32- vs. 64-bit) and
mingw (%I64d), but not on Mac OS (%qd). Rather than enhance
the parser, it is just as easy to force the use of int (where
the value is small enough) or long long instead of int64_t,
which we know always works.
This should cover all remaining testsuite uses of
qobject_from_json[fv]() that were trying to rely on PRId64,
although my proof for that was done by adding in asserts and
checking that 'make check' still passed, where such asserts
are inappropriate during hard freeze. A later series in 2.9
may remove all dynamic JSON parsing, but that's a bigger task.
Reported by: G 3 <programmingkidx@gmail.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1479922617-4400-4-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Rename value64 to value_ll]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
'qjson.h' is not a QObject subtype; include this file directly in
.c files that are using it, rather than abusing qmp/types.h for
that purpose.
Meanwhile, for files that include a list of individual QObject
subtypes, it's easier to just use qmp/types.h for that purpose.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1465490926-28625-2-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Remove glib.h includes, as it is provided by osdep.h.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
This would have prevented the regression mentioned in the previous
commit.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1448486613-17634-4-git-send-email-armbru@redhat.com>
We require a C99 compiler, so let's use 'bool' instead of 'int'
when dealing with boolean values. There are few enough clients
to fix them all in one pass.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
We document that in QMP, the client may send any json-value
for the optional "id" key, and then return that same value
on reply (both success and failures, insofar as the failure
happened after parsing the id). [Note that the output may
not be identical to the input, as whitespace may change and
since we may reorder keys within a json-object, but that this
still constitutes the same json-value]. However, we were not
handling the JSON literal null, which counts as a json-value
per RFC 7159.
Also, down the road, given the QAPI schema of {'*foo':'str'} or
{'*foo':'ComplexType'}, we could decide to allow the QMP client
to pass { "foo":null } instead of the current representation of
{ } where omitting the key is the only way to get at the default
NULL value. Such a change might be useful for argument
introspection (if a type in older qemu lacks 'foo' altogether,
then an explicit "foo":null probe will force an easily
distinguished error message for whether the optional "foo" key
is even understood in newer qemu). And if we add default values
to optional arguments, allowing an explicit null would be
required for getting a NULL value associated with an optional
string that has a non-null default. But all that can come at a
later day.
The 'check-unit' testsuite is enhanced to test that parsing
produces the same object as explicitly requesting a reference
to the special qnull object. In addition, I tested with:
$ ./x86_64-softmmu/qemu-system-x86_64 -qmp stdio -nodefaults
{"QMP": {"version": {"qemu": {"micro": 91, "minor": 2, "major": 2}, "package": ""}, "capabilities": []}}
{"execute":"qmp_capabilities","id":null}
{"return": {}, "id": null}
{"id":{"a":null,"b":[1,null]},"execute":"quit"}
{"return": {}, "id": {"a": null, "b": [1, null]}}
{"timestamp": {"seconds": 1427742379, "microseconds": 423128}, "event": "SHUTDOWN"}
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
This made the lexer wait for a closing *double* quote.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Known bugs in to_json():
* A start byte for a three-byte sequence followed by less than two
continuation bytes is split into one-byte sequences.
* Start bytes for sequences longer than three bytes get misinterpreted
as start bytes for three-byte sequences. Continuation bytes beyond
byte three become one-byte sequences.
This means all characters outside the BMP are decoded incorrectly.
* One-byte sequences with the MSB are put into the JSON string
verbatim when char is unsigned, producing invalid UTF-8. When char
is signed, they're replaced by "\\uFFFF" instead.
This includes \xFE, \xFF, and stray continuation bytes.
* Overlong sequences are happily accepted, unless screwed up by the
bugs above.
* Likewise, sequences encoding surrogate code points or noncharacters.
* Unlike other control characters, ASCII DEL is not escaped. Except
in overlong encodings.
My rewrite fixes them as follows:
* Malformed UTF-8 sequences are replaced.
Except the overlong encoding \xC0\x80 of U+0000 is still accepted.
Permits embedding NUL characters in C strings. This trick is known
as "Modified UTF-8".
* Sequences encoding code points beyond Unicode range are replaced.
* Sequences encoding code points beyond the BMP produce a surrogate
pair.
* Sequences encoding surrogate code points are replaced.
* Sequences encoding noncharacters are replaced.
* ASCII DEL is now always escaped.
The replacement character is U+FFFD.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Test cases cover the two noncharacters in the BMP. Add tests for the
other 64 noncharacters.
Three existing test cases involve noncharacters U+FFFF and U+10FFFF.
Instead of deleting them as now duplicates, adjust them to use U+FFFC
and U+10FFFFD.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Test cases are scraped from Markus Kuhn's UTF-8 decoder capability and
stress test at
http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
Unfortunately, both JSON parser and formatter misbehave right now.
This test expects current, incorrect results. They're all clearly
marked, and are to be replaced by correct ones as the bugs get fixed.
See comments in new utf8_string() for details.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
This introduces new test reporting infrastructure based on
gtester and gtester-report.
Also, all existing tests are moved to tests/, and tests/Makefile
is reorganized to factor out the commonalities in the rules.
Signed-off-by: Anthony Liguori <aliguori@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>