Currently when we reach an error state we effectively flush everything
fed to the lexer, which can put us in a state where we keep feeding
tokens into the parser at arbitrary offsets in the stream. This makes it
difficult for the lexer/tokenizer/parser to get back in sync when bad
input is made by the client.
With these changes we emit an error state/token up to the tokenizer as
soon as we reach an error state, and continue processing any data passed
in rather than bailing out. The reset token will be used to reset the
tokenizer and parser, such that they'll recover state as soon as the
lexer begins generating valid token sequences again.
We also map chr(192,193,245-255) to an error state here, since they are
invalid UTF-8 characters. QMP guest proxy/agent will use chr(255) to
force a flush/reset of previous input for reliable delivery of certain
events, so also we document that thoroughly here.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently we flush the lexer by passing in a NULL character. This
generally forces the lexer to go to the corresponding TERMINAL() state
for whatever token type it is currently parsing, emits the token to the
parser, then puts the lexer back into IN_START state. However, since a
NULL character causes char_consumed to be 0, we always do a second pass
after this, which puts us in the IN_ERROR state. Fix this behavior by
adding a "flush" flag that tells the lexer not to do a more than 1
iteration.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The name ERROR is too generic, it conflicts with mingw32 ERROR definition.
Replace ERROR with IN_ERROR.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Not requiring one extra character when lookahead is not necessary
ensures that clients behave properly even if they, for example,
send QMP requests without a trailing newline.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
OK we are fooled by the json lexer and parser. As we use %I64d to
print 'long long' variables in Win32, but lexer and parser only deal
with %lld but not %I64d, this patch add support for %I64d and solve
'info pci', 'powser_reset' and 'power_powerdown' assert failure in
Win32.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Our JSON parser is a three stage parser. The first stage tokenizes the stream
into a set of lexical tokens. Since the lexical grammar is regular, we can
use a finite state machine to model it. The state machine will emit tokens
as they are identified.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>