Commit Graph

156 Commits

Author SHA1 Message Date
Markus Armbruster
86cdf9ec8d json: Clean up headers
The JSON parser has three public headers, json-lexer.h, json-parser.h,
json-streamer.h.  They all contain stuff that is of no interest
outside qobject/json-*.c.

Collect the public interface in include/qapi/qmp/json-parser.h, and
everything else in qobject/json-parser-int.h.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-54-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
812ce33ead qobject: Drop superfluous includes of qemu-common.h
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-53-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
abe7c2067c json: Make JSONToken opaque outside json-parser.c
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-52-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
a2731e08ee json: Unbox tokens queue in JSONMessageParser
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-51-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
8d3265b3d0 json: Streamline json_message_process_token()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-50-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
da09cfbf9d json: Enforce token count and size limits more tightly
Token count and size limits exist to guard against excessive heap
usage.  We check them only after we created the token on the heap.
That's assigning a cowboy to the barn to lasso the horse after it has
bolted.  Close the barn door instead: check before we create the
token.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-49-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
dd98e84819 qjson: Have qobject_from_json() & friends reject empty and blank
The last case where qobject_from_json() & friends return null without
setting an error is empty or blank input.  Callers:

* block.c's parse_json_protocol() reports "Could not parse the JSON
  options".  It's marked as a work-around, because it also covered
  actual bugs, but they got fixed in the previous few commits.

* qobject_input_visitor_new_str() reports "JSON parse error".  Also
  marked as work-around.  The recent fixes have made this unreachable,
  because it currently gets called only for input starting with '{'.

* check-qjson.c's empty_input() and blank_input() demonstrate the
  behavior.

* The other callers are not affected since they only pass input with
  exactly one JSON value or, in the case of negative tests, one error.

Fail with "Expecting a JSON value" instead of returning null, and
simplify callers.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-48-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
5d50113cf6 json: Assert json_parser_parse() consumes all tokens on success
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-47-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
f9277915ee json: Fix streamer not to ignore trailing unterminated structures
json_message_process_token() accumulates tokens until it got the
sequence of tokens that comprise a single JSON value (it counts curly
braces and square brackets to decide).  It feeds those token sequences
to json_parser_parse().  If a non-empty sequence of tokens remains at
the end of the parse, it's silently ignored.  check-qjson.c cases
unterminated_array(), unterminated_array_comma(), unterminated_dict(),
unterminated_dict_comma() demonstrate this bug.

Fix as follows.  Introduce a JSON_END_OF_INPUT token.  When the
streamer receives it, it feeds the accumulated tokens to
json_parser_parse().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-46-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
e06d008ac8 json: Fix latent parser aborts at end of input
json-parser.c carefully reports end of input like this:

    token = parser_context_pop_token(ctxt);
    if (token == NULL) {
        parse_error(ctxt, NULL, "premature EOI");
        goto out;
    }

Except parser_context_pop_token() can't return null, it fails its
assertion instead.  Same for parser_context_peek_token().  Broken in
commit 65c0f1e955, and faithfully preserved in commit 95385fe9ac.
Only a latent bug, because the streamer throws away any input that
could trigger it.

Drop the assertions, so we can fix the streamer in the next commit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-45-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
2a4794ba14 qjson: Fix qobject_from_json() & friends for multiple values
qobject_from_json() & friends use the consume_json() callback to
receive either a value or an error from the parser.

When they are fed a string that contains more than either one JSON
value or one JSON syntax error, consume_json() gets called multiple
times.

When the last call receives a value, qobject_from_json() returns that
value.  Any other values are leaked.

When any call receives an error, qobject_from_json() sets the first
error received.  Any other errors are thrown away.

When values follow errors, qobject_from_json() returns both a value
and sets an error.  That's bad.  Impact:

* block.c's parse_json_protocol() ignores and leaks the value.  It's
  used to to parse pseudo-filenames starting with "json:".  The
  pseudo-filenames can come from the user or from image meta-data such
  as a QCOW2 image's backing file name.

* vl.c's parse_display_qapi() ignores and leaks the error.  It's used
  to parse the argument of command line option -display.

* vl.c's main() case QEMU_OPTION_blockdev ignores the error and leaves
  it in @err.  main() will then pass a pointer to a non-null Error *
  to net_init_clients(), which is forbidden.  It can lead to assertion
  failure or other misbehavior.

* check-qjson.c's multiple_values() demonstrates the badness.

* The other callers are not affected since they only pass strings with
  exactly one JSON value or, in the case of negative tests, one
  error.

The impact on the _nofail() functions is relatively harmless.  They
abort when any call receives an error.  Else they return the last
value, and leak the others, if any.

Fix consume_json() as follows.  On the first call, save value and
error as before.  On subsequent calls, if any, don't save them.  If
the first call saved a value, the next call, if any, replaces the
value by an "Expecting at most one JSON value" error.  Take care not
to leak values or errors that aren't saved.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-44-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
4d40066142 json: Improve names of lexer states related to numbers
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-43-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
53a0d616fe json: Replace %I64d, %I64u by %PRId64, %PRIu64
Support for %I64d got added in commit 2c0d4b36e7 "json: fix PRId64 on
Win32".  We had to hard-code I64d because we used the lexer's finite
state machine to check interpolations.  No more, so clean this up.

Additional conversion specifications would be easy enough to implement
when needed.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-42-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
f7617d45d4 json: Leave rejecting invalid interpolation to parser
Both lexer and parser reject invalid interpolation specifications.
The parser's check is useless.

The lexer ends the token right after the first bad character.  This
tends to lead to suboptimal error reporting.  For instance, input

    [ %04d ]

produces the tokens

    JSON_LSQUARE  [
    JSON_ERROR    %0
    JSON_INTEGER  4
    JSON_KEYWORD  d
    JSON_RSQUARE  ]

The parser then yields an error, an object and two more errors:

    error: Invalid JSON syntax
    object: 4
    error: JSON parse error, invalid keyword
    error: JSON parse error, expecting value

Dumb down the lexer to accept [A-Za-z0-9]*.  The parser's check is now
used.  Emit a proper error there.

The lexer now produces

    JSON_LSQUARE  [
    JSON_INTERP   %04d
    JSON_RSQUARE  ]

and the parser reports just

    JSON parse error, invalid interpolation '%04d'

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-41-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
84a56f38b2 json: Pass lexical errors and limit violations to callback
The callback to consume JSON values takes QObject *json, Error *err.
If both are null, the callback is supposed to make up an error by
itself.  This sucks.

qjson.c's consume_json() neglects to do so, which makes
qobject_from_json() null instead of failing.  I consider that a bug.

The culprit is json_message_process_token(): it passes two null
pointers when it runs into a lexical error or a limit violation.  Fix
it to pass a proper Error object then.  Update the callbacks:

* monitor.c's handle_qmp_command(): the code to make up an error is
  now dead, drop it.

* qga/main.c's process_event(): lumps the "both null" case together
  with the "not a JSON object" case.  The former is now gone.  The
  error message "Invalid JSON syntax" is misleading for the latter.
  Improve it to "Input must be a JSON object".

* qobject/qjson.c's consume_json(): no update; check-qjson
  demonstrates qobject_from_json() now sets an error on lexical
  errors, but still doesn't on some other errors.

* tests/libqtest.c's qmp_response(): the Error object is now reliable,
  so use it to improve the error message.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-40-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
2cbd15aa6f json: Treat unwanted interpolation as lexical error
The JSON parser optionally supports interpolation.  The lexer
recognizes interpolation tokens unconditionally.  The parser rejects
them when interpolation is disabled, in parse_interpolation().
However, it neglects to set an error then, which can make
json_parser_parse() fail without setting an error.

Move the check for unwanted interpolation from the parser's
parse_interpolation() into the lexer's finite state machine.  When
interpolation is disabled, '%' is now handled like any other
unexpected character.

The next commit will improve how such lexical errors are handled.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-39-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
61030280ca json: Rename token JSON_ESCAPE & friends to JSON_INTERP
The JSON parser optionally supports interpolation.  The code calls it
"escape".  Awkward, because it uses the same term for escape sequences
within strings.  The latter usage is consistent with RFC 8259 "The
JavaScript Object Notation (JSON) Data Interchange Format" and ISO C.
Call the former "interpolation" instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-38-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
269e57ae28 json: Don't create JSON_ERROR tokens that won't be used
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-37-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
ff281a272f json: Don't pass null @tokens to json_parser_parse()
json_parser_parse() normally returns the QObject on success.  Except
it returns null when its @tokens argument is null.

Its only caller json_message_process_token() passes null @tokens when
emitting a lexical error.  The call is a rather opaque way to say json
= NULL then.

Simplify matters by lifting the assignment to json out of the emit
path: initialize json to null, set it to the value of
json_parser_parse() when there's no lexical error.  Drop the special
case from json_parser_parse().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-36-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
62815d85ae json: Redesign the callback to consume JSON values
The classical way to structure parser and lexer is to have the client
call the parser to get an abstract syntax tree, the parser call the
lexer to get the next token, and the lexer call some function to get
input characters.

Another way to structure them would be to have the client feed
characters to the lexer, the lexer feed tokens to the parser, and the
parser feed abstract syntax trees to some callback provided by the
client.  This way is more easily integrated into an event loop that
dispatches input characters as they arrive.

Our JSON parser is kind of between the two.  The lexer feeds tokens to
a "streamer" instead of a real parser.  The streamer accumulates
tokens until it got the sequence of tokens that comprise a single JSON
value (it counts curly braces and square brackets to decide).  It
feeds those token sequences to a callback provided by the client.  The
callback passes each token sequence to the parser, and gets back an
abstract syntax tree.

I figure it was done that way to make a straightforward recursive
descent parser possible.  "Get next token" becomes "pop the first
token off the token sequence".  Drawback: we need to store a complete
token sequence.  Each token eats 13 + input characters + malloc
overhead bytes.

Observations:

1. This is not the only way to use recursive descent.  If we replaced
   "get next token" by a coroutine yield, we could do without a
   streamer.

2. The lexer reports errors by passing a JSON_ERROR token to the
   streamer.  This communicates the offending input characters and
   their location, but no more.

3. The streamer reports errors by passing a null token sequence to the
   callback.  The (already poor) lexical error information is thrown
   away.

4. Having the callback receive a token sequence duplicates the code to
   convert token sequence to abstract syntax tree in every callback.

5. Known bug: the streamer silently drops incomplete token sequences.

This commit rectifies 4. by lifting the call of the parser from the
callbacks into the streamer.  Later commits will address 3. and 5.

The lifting removes a bug from qjson.c's parse_json(): it passed a
pointer to a non-null Error * in certain cases, as demonstrated by
check-qjson.c.

json_parser_parse() is now unused.  It's a stupid wrapper around
json_parser_parse_err().  Drop it, and rename json_parser_parse_err()
to json_parser_parse().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-35-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
037f244088 json: Have lexer call streamer directly
json_lexer_init() takes the function to process a token as an
argument.  It's always json_message_process_token().  Makes the code
harder to understand for no actual gain.  Drop the indirection.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-34-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Marc-André Lureau
e8b19d7d73 json-parser: simplify and avoid JSONParserContext allocation
parser_context_new/free() are only used from json_parser_parse(). We
can fold the code there and avoid an allocation altogether.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180719184111.5129-9-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180823164025.12553-33-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Marc-André Lureau
7c1e1d5481 json: remove useless return value from lexer/parser
The lexer always returns 0 when char feeding. Furthermore, none of the
caller care about the return value.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180326150916.9602-10-marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180823164025.12553-32-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
dc45a07c36 json: Fix \uXXXX for surrogate pairs
The JSON parser treats each half of a surrogate pair as unpaired
surrogate.  Fix it to recognize surrogate pairs.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-30-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
46a628b139 json: Reject invalid \uXXXX, fix \u0000
The JSON parser translates invalid \uXXXX to garbage instead of
rejecting it, and swallows \u0000.

Fix by using mod_utf8_encode() instead of flawed wchar_to_utf8().

Valid surrogate pairs are now differently broken: they're rejected
instead of translated to garbage.  The next commit will fix them.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-29-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
de6decfe8e json: Simplify parse_string()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-28-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
b2da4a4d75 json: Leave rejecting invalid escape sequences to parser
Both lexer and parser reject invalid escape sequences in strings.  The
parser's check is useless.

The lexer ends the token right after the first non-well-formed byte.
This tends to lead to suboptimal error reporting.  For instance, input

    {"abc\@ijk": 1}

produces the tokens

    JSON_LCURLY   {
    JSON_ERROR    "abc\@
    JSON_KEYWORD  ijk
    JSON_ERROR   ": 1}\n

The parser then reports three errors

    Invalid JSON syntax
    JSON parse error, invalid keyword 'ijk'
    Invalid JSON syntax

before it recovers at the newline.

Drop the lexer's escape sequence checking, and make it accept the same
characters after backslash it accepts elsewhere in strings.  It now
produces

    JSON_LCURLY   {
    JSON_STRING   "abc\@ijk"
    JSON_COLON    :
    JSON_INTEGER  1
    JSON_RCURLY

and the parser reports just

    JSON parse error, invalid escape sequence in string

While there, fix parse_string()'s inaccurate function comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-27-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
4b1c0cd7c7 json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8")
Since the JSON grammer doesn't accept U+0000 anywhere, this merely
exchanges one kind of parse error for another.  It's purely for
consistency with qobject_to_json(), which accepts \xC0\x80 (see commit
e2ec3f9768).

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-26-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
de930f45cb json: Leave rejecting invalid UTF-8 to parser
Both the lexer and the parser (attempt to) validate UTF-8 in JSON
strings.

The lexer rejects bytes that can't occur in valid UTF-8: \xC0..\xC1,
\xF5..\xFF.  This rejects some, but not all invalid UTF-8.  It also
rejects ASCII control characters \x00..\x1F, in accordance with RFC
8259 (see recent commit "json: Reject unescaped control characters").

When the lexer rejects, it ends the token right after the first bad
byte.  Good when the bad byte is a newline.  Not so good when it's
something like an overlong sequence in the middle of a string.  For
instance, input

    {"abc\xC0\xAFijk": 1}\n

produces the tokens

    JSON_LCURLY   {
    JSON_ERROR    "abc\xC0
    JSON_ERROR    \xAF
    JSON_KEYWORD  ijk
    JSON_ERROR   ": 1}\n

The parser then reports four errors

    Invalid JSON syntax
    Invalid JSON syntax
    JSON parse error, invalid keyword 'ijk'
    Invalid JSON syntax

before it recovers at the newline.

The commit before previous made the parser reject invalid UTF-8
sequences.  Since then, anything the lexer rejects, the parser would
reject as well.  Thus, the lexer's rejecting is unnecessary for
correctness, and harmful for error reporting.

However, we want to keep rejecting ASCII control characters in the
lexer, because that produces the behavior we want for unclosed
strings.

We also need to keep rejecting \xFF in the lexer, because we
documented that as a way to reset the JSON parser
(docs/interop/qmp-spec.txt section 2.6 QGA Synchronization), which
means we can't change how we recover from this error now.  I wish we
hadn't done that.

I think we should treat \xFE the same as \xFF.

Change the lexer to accept \xC0..\xC1 and \xF5..\xFD.  It now rejects
only \x00..\x1F and \xFE..\xFF.  Error reporting for invalid UTF-8 in
strings is much improved, except for \xFE and \xFF.  For the example
above, the lexer now produces

    JSON_LCURLY   {
    JSON_STRING   "abc\xC0\xAFijk"
    JSON_COLON    :
    JSON_INTEGER  1
    JSON_RCURLY

and the parser reports just

    JSON parse error, invalid UTF-8 sequence in string

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-25-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
574bf16ff1 json: Report first rather than last parse error
Quiz time!  When a parser reports multiple errors, but the user gets
to see just one, which one is (on average) the least useful one?

Yes, you're right, it's the last one!  You're clearly familiar with
compilers.

Which one does QEMU report?

Right again, the last one!  You're clearly familiar with QEMU.

Reproducer: feeding

    {"abc\xC2ijk": 1}\n

to QMP produces

    {"error": {"class": "GenericError", "desc": "JSON parse error, key is not a string in object"}}

Report the first error instead.  The reproducer now produces

    {"error": {"class": "GenericError", "desc": "JSON parse error, invalid UTF-8 sequence in string"}}

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-24-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
e59f39d403 json: Reject invalid UTF-8 sequences
We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1,
\xF5..\xFF in the lexer.  That's insufficient; there's plenty of
invalid UTF-8 not containing these bytes, as demonstrated by
check-qjson:

* Malformed sequences

  - Unexpected continuation bytes

  - Missing continuation bytes after start bytes other than
    \xC0..\xC1, \xF5..\xFD.

* Overlong sequences with start bytes other than \xC0..\xC1,
  \xF5..\xFD.

* Invalid code points

Fixing this in the lexer would be bothersome.  Fixing it in the parser
is straightforward, so do that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-23-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
00ea57fadc json: Tighten and simplify qstring_from_escaped_str()'s loop
Simplify loop control, and assert that the string ends with the
appropriate quote (the lexer ensures it does).

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-21-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
eddc0a7f0a json: Revamp lexer documentation
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-20-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
340db1ed82 json: Reject unescaped control characters
Fix the lexer to reject unescaped control characters in JSON strings,
in accordance with RFC 8259 "The JavaScript Object Notation (JSON)
Data Interchange Format".

Bonus: we now recover more nicely from unclosed strings.  E.g.

    {"one: 1}\n{"two": 2}

now recovers cleanly after the newline, where before the lexer
remained confused until the next unpaired double quote or lexical
error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-19-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Markus Armbruster
a2ec6be72b json: Fix lexer to include the bad character in JSON_ERROR token
json_lexer[] maps (lexer state, input character) to the new lexer
state.  The input character is consumed unless the new state is
terminal and the input character doesn't belong to this token,
i.e. the state transition uses look-ahead.  When this is the case,
input character '\0' would result in the same state transition.
TERMINAL_NEEDED_LOOKAHEAD() exploits this.

Except this is wrong for transitions to IN_ERROR.  There, the
offending input character is in fact consumed: case IN_ERROR returns.
It isn't added to the JSON_ERROR token, though.

Fix that by making TERMINAL_NEEDED_LOOKAHEAD() return false for
transitions to IN_ERROR.

There's a slight complication.  json_lexer_flush() passes input
character '\0' to flush an incomplete token.  If this results in
JSON_ERROR, we'd now add the '\0' to the token.  Suppress that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-18-armbru@redhat.com>
2018-08-24 20:26:37 +02:00
Peter Maydell
c542a9f979 Testing patches for 2018-08-16
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbdTcjAAoJEDhwtADrkYZTZEYP/ivp0ozEfMeGgc6PFItv3zmx
 QVD+NYJ8bnv/iEoWl/pnQ0/HY3YLHz4G1DTK0dSlJAvAiChpPiR7YCeJRXeTyLHL
 9KCFQV5SV9llstVi0f4ebEK21mUkYWoqtlzxxyqXh0q2N/QLtaVQ85ysE6ufwhNH
 jlunmJLGRRwPR95F4a05uVHNOym1ig9eo5CtQ1Fa8viV9BgWTbpSp1t4feB1OLnt
 Ml9cbFubb1cA7CuhdNHazNOnRZtEW5A9eOo6rX4d5JcH/zgFWdPpKCRn/X/NdvSE
 aRKqk7ll0gxYlacqVpkea23pVKVl7e1oUqkziaL8rq/BYE0SePkRv+SnmsifD8uT
 kWl+eHLyaW1g43omc0uttyAuTkFnvAa+l9TqIrdEYcPJJNaCsZVgJpDvj9+Oxril
 fk3OIHAnzSWwp/AmFLCSOYdaoVuZhppp6rqnu26B0w9Rxkbqe1790LbjDJrLUB+2
 vN7+JmDhUfJk7/2pi+MGZrBtj3zcgbb3Qc5+NG8H1401bA/n8FNnPKgWdmAlmO7i
 pTafa1FXArJGWiBhzg2PUqmZq45MQiheQ1+SWgviIodQX5oHB3kEimcRPg4Wk18c
 fTKJDe7w8NFFNjuH6ou2LI4KzgQeewW+oCjxh2A7kwCqDmq5Eq8nBw/bYO1DgcDr
 bfCnicNJinjCHcgvvCVM
 =DuZ8
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/armbru/tags/pull-tests-2018-08-16' into staging

Testing patches for 2018-08-16

# gpg: Signature made Thu 16 Aug 2018 09:34:43 BST
# gpg:                using RSA key 3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-tests-2018-08-16: (25 commits)
  libqtest: Improve error reporting for bad read from QEMU
  tests/libqtest: Improve kill_qemu()
  libqtest: Rename qtest_FOOv() to qtest_vFOO() for consistency
  libqtest: Replace qtest_startf() by qtest_initf()
  libqtest: Enable compile-time format string checking
  migration-test: Clean up string interpolation into QMP, part 3
  migration-test: Clean up string interpolation into QMP, part 2
  migration-test: Clean up string interpolation into QMP, part 1
  migration-test: Make wait_command() cope with '%'
  tests: New helper qtest_qmp_receive_success()
  migration-test: Make wait_command() return the "return" member
  tests: Clean up string interpolation around qtest_qmp_device_add()
  cpu-plug-test: Don't pass integers as strings to device_add
  tests: Clean up string interpolation into QMP input (simple cases)
  tests: Pass literal format strings directly to qmp_FOO()
  qobject: qobject_from_jsonv() is dangerous, hide it away
  test-qobject-input-visitor: Avoid format string ambiguity
  libqtest: Simplify qmp_fd_vsend() a bit
  qobject: New qobject_from_vjsonf_nofail(), qdict_from_vjsonf_nofail()
  qobject: Replace qobject_from_jsonf() by qobject_from_jsonf_nofail()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-08-16 09:50:54 +01:00
Markus Armbruster
2d36e84304 qobject: qobject_from_jsonv() is dangerous, hide it away
qobject_from_jsonv() takes ownership of %p arguments.  On failure, we
can't generally know whether we failed before or after %p, so
ownership becomes indeterminate.  To avoid leaks, callers passing %p
must terminate on error, e.g. by passing &error_abort.  Trap for the
unwary; document and give the function internal linkage.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180806065344.7103-11-armbru@redhat.com>
2018-08-16 08:42:06 +02:00
Markus Armbruster
4ff184689b qobject: New qobject_from_vjsonf_nofail(), qdict_from_vjsonf_nofail()
Every printf()-like function sooner or later needs its vprintf()-like
buddy.  The next commit will need qobject_from_jsonf_nofail()'s buddy,
and qdict_from_jsonf_nofail()'s buddy will be used later in this
series.  Add both.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180806065344.7103-8-armbru@redhat.com>
2018-08-16 08:42:06 +02:00
Markus Armbruster
6ce80fd803 qobject: Replace qobject_from_jsonf() by qobject_from_jsonf_nofail()
Commit ab45015a96 "qobject: Let qobject_from_jsonf() fail instead of
abort" fails to accomplish its stated aim: the function can still
abort due to its use of &error_abort.

Its rationale for letting it fail is that all remaining users cope
fine with failure.  Well, they're just fine with aborting, too; it's
what they do on failure.

Simply reverting the broken commit would bring back the unfortunate
asymmetry between qobject_from_jsonf() and qobject_from_jsonv(): one
aborts, the other returns null.  So also rename it to
qobject_from_jsonf_nofail().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180806065344.7103-7-armbru@redhat.com>
2018-08-16 08:42:06 +02:00
Alberto Garcia
655b4b67e3 qdict: Make qdict_extract_subqdict() accept dst = NULL
This function extracts all options from a QDict starting with a
certain prefix and puts them in a new QDict.

We'll have a couple of cases where we simply want to discard those
options instead of copying them, and that's what this patch does.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-08-15 12:50:39 +02:00
Markus Armbruster
ba891d68b4 qstring: Move qstring_from_substr()'s @end one to the right
qstring_from_substr() takes the index of the substring's first and
last character.  qstring_from_substr(s, 0, SIZE_MAX) denotes an empty
substring.  Awkward.

Shift the end index one to the right.  This simplifies both
qstring_from_substr() and its callers.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180727062204.10401-3-armbru@redhat.com>
2018-07-28 09:09:58 +02:00
Markus Armbruster
b65ab77b3a qstring: Assert size calculations don't overflow
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180727062204.10401-2-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2018-07-28 09:09:58 +02:00
liujunjie
ad63c549ec qstring: Fix qstring_from_substr() not to provoke int overflow
qstring_from_substr() parameters @start and @end are of type int.
blkdebug_parse_filename(), blkverify_parse_filename(), nbd_parse_uri(),
and qstring_from_str() pass @end values of type size_t or ptrdiff_t.
Values exceeding INT_MAX get truncated, with possibly disastrous
results.

Such huge substrings seem unlikely, but we found one in a core dump,
where "info tlb" executed via QMP's human-monitor-command apparently
produced 35 GiB of output.

Fix by changing the parameters size_t.

Signed-off-by: liujunjie <liujunjie23@huawei.com>
Message-Id: <20180724134339.17832-1-liujunjie23@huawei.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-07-28 09:09:58 +02:00
Markus Armbruster
ab45015a96 qobject: Let qobject_from_jsonf() fail instead of abort
qobject_from_jsonf() aborts on error, unlike qobject_from_jsonv(),
which returns null.  Since all remaining users of qobject_from_jsonf()
cope fine with null, change it to return null.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180703085358.13941-30-armbru@redhat.com>
2018-07-03 23:18:56 +02:00
Markus Armbruster
a193352ff9 qobject: New qdict_from_jsonf_nofail()
Many uses of qobject_from_jsonf() convert JSON objects.  Create new
convenience function qdict_from_jsonf_nofail() that includes the
conversion to QDict.  The next few commits will put it to use.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180703085358.13941-22-armbru@redhat.com>
2018-07-03 23:18:56 +02:00
Markus Armbruster
17e9aa3f22 block-qdict: Pacify Coverity after commit f1b34a248e
Commit f1b34a248e replaced less-than-obvious test in
qdict_flatten_qdict() by the obvious one.  Sadly, it made something
else non-obvious: the fact that @new_key passed to qdict_put_obj()
can't be null, because that depends on the function's precondition
(target == qdict) == !prefix.

Tweak the function some more to help Coverity and human readers alike.

Fixes: CID 1393620
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-29 14:20:56 +02:00
Max Reitz
bf6e6a37ee qdict: Make qdict_flatten() shallow-clone-friendly
In its current form, qdict_flatten() removes all entries from nested
QDicts that are moved to the root QDict.  It is completely sufficient to
remove all old entries from the root QDict, however.  If the nested
dicts have a refcount of 1, this will automatically delete them, too.
And if they have a greater refcount, we probably do not want to modify
them in the first place.

The latter observation means that it was currently (in general)
impossible to qdict_flatten() a shallowly cloned dict because that would
empty nested QDicts in the original dict as well.  This patch changes
this, so you can now use qdict_flatten(qdict_shallow_clone(dict)) to get
a flattened copy without disturbing the original.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20180611205203.2624-7-mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-06-22 16:33:46 +02:00
Markus Armbruster
2860b2b2cb block: Fix -blockdev / blockdev-add for empty objects and arrays
-blockdev and blockdev-add silently ignore empty objects and arrays in
their argument.  That's because qmp_blockdev_add() converts the
argument to a flat QDict, and qdict_flatten() eats empty QDict and
QList members.  For instance, we ignore an empty BlockdevOptions
member @cache.  No real harm, as absent means the same as empty there.

Thus, the flaw puts an artificial restriction on the QAPI schema: we
can't have potentially empty objects and arrays within
BlockdevOptions, except when they're optional and "empty" has the same
meaning as "absent".

Our QAPI schema satisfies this restriction (I checked), but it's a
trap for the unwary, and a temptation to employ awkward workarounds
for the wary.  Let's get rid of it.

Change qdict_flatten() and qdict_crumple() to treat empty dictionaries
and lists exactly like scalars.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
c78b8cfbfd block-qdict: Simplify qdict_is_list() some
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00
Markus Armbruster
3692b5d768 block-qdict: Clean up qdict_crumple() a bit
When you mix scalar and non-scalar keys, whether you get an "already
set as scalar" or an "already set as dict" error depends on qdict
iteration order.  Neither message makes much sense.  Replace by
""Cannot mix scalar and non-scalar keys".  This is similar to the
message we get for mixing list and non-list keys.

I find qdict_crumple()'s first loop hard to understand.  Rearrange it
and add a comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-15 14:49:44 +02:00