Add internal core-developer-only documentation of the JSONB format.
FossilOrigin-Name: 4d30478863b2a60512010de9ec6e3099bfaf75d4afee20acec536713fe94334d
This commit is contained in:
parent
0d7f0e49a4
commit
c2eff91bd3
290
doc/jsonb.md
Normal file
290
doc/jsonb.md
Normal file
@ -0,0 +1,290 @@
|
||||
# The JSONB Format
|
||||
|
||||
This document describes SQLite's JSONB binary encoding of
|
||||
JSON.
|
||||
|
||||
## 1.0 What Is JSONB?
|
||||
|
||||
Beginning with version 3.45.0 (circa 2024-01-01), SQLite supports an
|
||||
alternative binary encoding of JSON which we call "JSONB". JSONB is
|
||||
a binary format that stored as a BLOB.
|
||||
|
||||
The advantage of JSONB over ordinary text RFC 8259 JSON is that JSONB
|
||||
is both slightly smaller (by between 5% and 10% in most cases) and
|
||||
can be processed in less than half the number of CPU cycles. The built-in
|
||||
[JSON SQL functions] of SQLite can accept either ordinary text JSON
|
||||
or the binary JSONB encoding for any of their JSON inputs.
|
||||
|
||||
The "JSONB" name is inspired by [PostgreSQL](https://postgresql.org), but the
|
||||
on-disk format for SQLite's JSONB is not the same as PostgreSQL's.
|
||||
The two formats have the same name, but they have wildly different internal
|
||||
representations and are not in any way binary compatible.
|
||||
|
||||
The central idea behind this JSONB specification is that each element
|
||||
begins with a header that includes the size and type of that element.
|
||||
The header takes the place of punctuation such as double-quotes,
|
||||
curly-brackes, square-brackets, commas, and colons. Since the size
|
||||
and type of each element is contained in its header, the element can
|
||||
be read faster since it is no longer necessary to carefully scan forward
|
||||
looking for the closing delimiter. The payload of JSONB is the same
|
||||
as for corresponding text JSON. The same payload bytes occur in the
|
||||
same order. The only real difference between JSONB and ordinary text
|
||||
JSON is that JSONB includes a binary header on
|
||||
each element and omits delimiter and separator punctuation.
|
||||
|
||||
### 1.1 Internal Use Only
|
||||
|
||||
The details of the JSONB are not intended to be visible to application
|
||||
developers. Application developers should look at JSONB as an opaque BLOB
|
||||
used internally by SQLite. Nevertheless, we want the format to be backwards
|
||||
compatible across all future versions of SQLite. To that end, the format
|
||||
is documented by this file in the source tree. But this file should be
|
||||
used only by SQLite core developers, not by developers of applications
|
||||
that only use SQLite.
|
||||
|
||||
## 2.0 The Purpose Of This Document
|
||||
|
||||
JSONB is not intended as an external format to be used by
|
||||
applications. JSONB is designed for internal use by SQLite only.
|
||||
Programmers do not need to understand the JSONB format in order to
|
||||
use it effectively.
|
||||
Applications should access JSONB only through the [JSON SQL functions],
|
||||
not by looking at individual bytes of the BLOB.
|
||||
|
||||
However, JSONB is intended to be portable and backwards compatible
|
||||
for all future versions of SQLite. In other words, you should not have
|
||||
to export and reimport your SQLite database files when you upgrade to
|
||||
a newer SQLite version. For that reason, the JSONB format needs to
|
||||
be well-defined.
|
||||
|
||||
This document is therefore similar in purpose to the
|
||||
[SQLite database file format] document that describes the on-disk
|
||||
format of an SQLite database file. Applications are not expected
|
||||
to directly read and write the bits and bytes of SQLite database files.
|
||||
The SQLite database file format is carefully documented so that it
|
||||
can be stable and enduring. In the same way, the JSONB representation
|
||||
of JSON is documented here so that it too can be stable and enduring,
|
||||
not so that applications can read or writes individual bytes.
|
||||
|
||||
## 3.0 Encoding
|
||||
|
||||
JSONB is a direct translation of the underlying text JSON. The difference
|
||||
is that JSONB uses a binary encoding that is faster to parse compared to
|
||||
the detailed syntax of text JSON.
|
||||
|
||||
Each JSON element is encoded as a header and a payload. The header
|
||||
determines type of element (string, numeric, boolean, null, object, or
|
||||
array) and the size of the payload. The header can be between 1 and
|
||||
9 bytes in size. The payload can be any size from zero bytes up to the
|
||||
maximum allowed BLOB size.
|
||||
|
||||
### 3.1 Payload Size
|
||||
|
||||
The upper four bits of the first byte of the header determine size of the
|
||||
header and possibly also the size of the payload.
|
||||
If the upper four bits have a value between 0 and 11, then the header is
|
||||
exactly one byte in size and the payload size is determined by those
|
||||
upper four bits. If the upper four bits have a value between 12 and 15,
|
||||
that means that the total header size is 2, 3, 5, or 9 bytes and the
|
||||
payload size is unsigned big-endian integer that is contained in the
|
||||
subsequent bytes. The size integer is the one byte that following the
|
||||
initial header byte if the upper four bits
|
||||
are 12, two bytes if the upper bits are 13, four bytes if the upper bits
|
||||
are 14, and eight bytes if the upper bits are 15. The current design
|
||||
of SQLite does not support BLOB values larger than 2GiB, so the eight-byte
|
||||
variant of the payload size integer will never be used by the current code.
|
||||
The eight-byte payload size integer is included in the specification
|
||||
to allow for future expansion.
|
||||
|
||||
The header for an element does *not* need to be in its simplest
|
||||
form. For example, consider the JSON numeric value "`1`".
|
||||
That element can be encode in five different ways:
|
||||
|
||||
* `0x13 0x31`
|
||||
* `0xc3 0x01 0x31`
|
||||
* `0xd3 0x00 0x01 0x31`
|
||||
* `0xe3 0x00 0x00 0x00 0x01 0x31`
|
||||
* `0xf3 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01 0x31`
|
||||
|
||||
The shortest encoding is preferred, of course, and usually happens with
|
||||
primitive elements such as numbers. However the total size of an array
|
||||
or object might not be known exactly when the header of the element is
|
||||
first generated. It is convenient to reserve space for the largest
|
||||
possible header and then go back and fill in the correct payload size
|
||||
at the end. This technique can result in array or object headers that
|
||||
are larger than absolutely necessary.
|
||||
|
||||
### 3.2 Element Type
|
||||
|
||||
The least-significant four bits of the first byte of the header (the first
|
||||
byte masked against 0x0f) determine element type. The following codes are
|
||||
used:
|
||||
|
||||
<ol>
|
||||
<li type="0"><p><b>NULL</b> →
|
||||
The element is a JSON "null". The payload size for a true JSON NULL must
|
||||
must be zero. Future versions of SQLite might extend the JSONB format
|
||||
with elements that have a zero element type but a non-zero size. In that
|
||||
way, legacy versions of SQLite will interpret the element as a NULL
|
||||
for backwards compatibility while newer versions will interpret the
|
||||
element in some other way.
|
||||
|
||||
<li value="1"><p><b>TRUE</b> →
|
||||
The element is a JSON "true". The payload size must be zero for a actual
|
||||
"true" value. Elements with type 1 and a non-zero payload size are
|
||||
reserved for future expansion. Legacy implementations that see an element
|
||||
type of 1 with a non-zero payload size should continue to interpret that
|
||||
element as "true" for compatibility.
|
||||
|
||||
<li value="2"><p><b>FALSE</b> →
|
||||
The element is a JSON "false". The payload size must be zero for a actual
|
||||
"false" value. Elements with type 2 and a non-zero payload size are
|
||||
reserved for future expansion. Legacy implementations that see an element
|
||||
type of 2 with a non-zero payload size should continue to interpret that
|
||||
element as "false" for compatibility.
|
||||
|
||||
<li value="3"><p><b>INT</b> →
|
||||
The element is a JSON integer value in the canonical
|
||||
RFC 8259 format, without extensions. The payload is the ASCII
|
||||
text representation of that numeric value.
|
||||
|
||||
<li value="4"><p><b>INT5</b> →
|
||||
The element is a JSON integer value that is not in the
|
||||
canonical format. The payload is the ASCII
|
||||
text representation of that numeric value. Because the payload is in a
|
||||
non-standard format, it will need to be translated when the JSONB is
|
||||
converted into RFC 8259 text JSON.
|
||||
|
||||
<li value="5"><p><b>FLOAT</b> →
|
||||
The element is a JSON floating-point value in the canonical
|
||||
RFC 8259 format, without extensions. The payload is the ASCII
|
||||
text representation of that numeric value.
|
||||
|
||||
<li value="6"><p><b>FLOAT5</b> →
|
||||
The element is a JSON floating-point value that is not in the
|
||||
canonical format. The payload is the ASCII
|
||||
text representation of that numeric value. Because the payload is in a
|
||||
non-standard format, it will need to be translated when the JSONB is
|
||||
converted into RFC 8259 text JSON.
|
||||
|
||||
<li value="7"><p><b>TEXT</b> →
|
||||
The element is a JSON string value that does not contain
|
||||
any escapes nor any characters that need to be escaped for either SQL or
|
||||
JSON. The payload is the UTF8 text representation of the string value.
|
||||
The payload does <i>not</i> include string delimiters.
|
||||
|
||||
<li value="8"><p><b>TEXTJ</b> →
|
||||
The element is a JSON string value that contains
|
||||
RFC 8259 character escapes (such as "<tt>\n</tt>" or "<tt>\u0020</tt>").
|
||||
Those escapes will need to be translated into actual UTF8 if this element
|
||||
is [json_extract|extracted] into SQL.
|
||||
The payload is the UTF8 text representation of the escaped string value.
|
||||
The payload does <i>not</i> include string delimiters.
|
||||
|
||||
<li value="9"><p><b>TEXT5</b> →
|
||||
The element is a JSON string value that contains
|
||||
character escapes, including some character escapes that part of JSON5
|
||||
and which are not found in the canonical RFC 8259 spec.
|
||||
Those escapes will need to be translated into standard JSON prior to
|
||||
rendering the JSON as text, or into their actual UTF8 characters if this
|
||||
element is [json_extract|extracted] into SQL.
|
||||
The payload is the UTF8 text representation of the escaped string value.
|
||||
The payload does <i>not</i> include string delimiters.
|
||||
|
||||
<li value="10"><p><b>TEXTRAW</b> →
|
||||
The element is a JSON string value that contains
|
||||
UTF8 characters that need to be escaped if this string is rendered into
|
||||
standard JSON text.
|
||||
The payload does <i>not</i> include string delimiters.
|
||||
|
||||
<li value="11"><p><b>ARRAY</b> →
|
||||
The element is a JSON array. The payload contains
|
||||
JSONB elements that comprise values contained within the array.
|
||||
|
||||
<li value="12"><p><b>OBJECT</b> →
|
||||
The element is a JSON object. The payload contains
|
||||
pairs of JSONB elements that comprise entries for the JSON object.
|
||||
The first element in each pair must be a string (types 7 through 10).
|
||||
The second element of each pair may be any types, including nested
|
||||
arrays or objects.
|
||||
|
||||
<li value="13"><p><b>RESERVED-13</b> →
|
||||
Reserved for future expansion. Legacy implements that encounter this
|
||||
element type should raise an error.
|
||||
|
||||
<li value="14"><p><b>RESERVED-14</b> →
|
||||
Reserved for future expansion. Legacy implements that encounter this
|
||||
element type should raise an error.
|
||||
|
||||
<li value="15"><p><b>RESERVED-15</b> →
|
||||
Reserved for future expansion. Legacy implements that encounter this
|
||||
element type should raise an error.
|
||||
</ol>
|
||||
|
||||
Element types outside the range of 0 to 12 are reserved for future
|
||||
expansion. The current implement raises an error if see an element type
|
||||
other than those listed above. However, future versions of SQLite might
|
||||
use of the three remaining element types to implement indexing or similar
|
||||
optimizations, to speed up lookup against large JSON arrays and/or objects.
|
||||
|
||||
### 3.3 Design Rationale For Element Types
|
||||
|
||||
A key goal of JSONB is that it should be quick to translate
|
||||
to and from text JSON and/or be constructed from SQL values.
|
||||
When converting from text into JSONB, we do not want the
|
||||
converter subroutine to burn CPU cycles converting elements
|
||||
values into some standard format which might never be used.
|
||||
Format conversion is "lazy" - it is deferred until actually
|
||||
needed. This has implications for the JSONB format design:
|
||||
|
||||
1. Numeric values are stored as text, not a numbers. The values are
|
||||
a direct copy of the text JSON values from which they are derived.
|
||||
|
||||
2. There are multiple element types depending on the details of value
|
||||
formats. For example, INT is used for pure RFC-8259 integer
|
||||
literals and INT5 exists for JSON5 extensions such as hexadecimal
|
||||
notation. FLOAT is used for pure RFC-8259 floating point literals
|
||||
and FLOAT5 is used for JSON5 extensions. There are four different
|
||||
representations of strings, depending on where the string came from
|
||||
and how special characters within the string are escaped.
|
||||
|
||||
A second goal of JSONB is that it should be capable of serving as the
|
||||
"parse tree" for JSON when a JSON value is being processed by the
|
||||
various [JSON SQL functions] built into SQLite. Before JSONB was
|
||||
developed, operations such [json_replace()] and [json_patch()]
|
||||
and similar worked in three stages:
|
||||
|
||||
|
||||
1. Translate the text JSON into a internal format that is
|
||||
easier to scan and edit.
|
||||
2. Perform the requested operation on the JSON.
|
||||
3. Translate the internal format back into text.
|
||||
|
||||
JSONB seeks to serve as the internal format directly - bypassing
|
||||
the first and third stages of that process. Since most of the CPU
|
||||
cycles are spent on the first and third stages, that suggests that
|
||||
JSONB processing will be much faster than text JSON processing.
|
||||
|
||||
So when processing JSONB, only the second stage of the three-stage
|
||||
process is required. But when processing text JSON, it is still necessary
|
||||
to do stages one and three. If JSONB is to be used as the internal
|
||||
binary representation, this is yet another reason to store numeric
|
||||
values as text. Storing numbers as text minimizes the amount of
|
||||
conversion work needed for stages one and three. This is also why
|
||||
there are four different representations of text in JSONB. Different
|
||||
text representations are used for text coming from different sources
|
||||
(RFC-8259 JSON, JSON5, or SQL string values) and conversions only
|
||||
happen if and when they are actually needed.
|
||||
|
||||
### 3.4 Valid JSONB BLOBs
|
||||
|
||||
A valid JSONB BLOB consists of a single JSON element. The element must
|
||||
exactly fill the BLOB. This one element is often a JSON object or array
|
||||
and those usually contain additional elements as its payload, but the
|
||||
element can be a primite value such a string, number, boolean, or null.
|
||||
|
||||
When the built-in JSON functions are attempting to determine if a BLOB
|
||||
argument is a JSONB or just a random BLOB, they look at the header of
|
||||
the outer element to see that it is well-formed and that the element
|
||||
completely fills the BLOB. If these conditions are met, then the BLOB
|
||||
is accepted as a JSONB value.
|
15
manifest
15
manifest
@ -1,5 +1,5 @@
|
||||
C Fix\sSQLITE_ENABLE_SETLK_TIMEOUT\sassert()\sstatements\sin\sos_unix.c\sto\savoid\sreading\spast\sthe\send\sof\sthe\sunixShmNode.aMutex[]\sarray.
|
||||
D 2023-12-20T19:33:41.679
|
||||
C Add\sinternal\score-developer-only\sdocumentation\sof\sthe\sJSONB\sformat.
|
||||
D 2023-12-21T18:08:05.881
|
||||
F .fossil-settings/empty-dirs dbb81e8fc0401ac46a1491ab34a7f2c7c0452f2f06b54ebb845d024ca8283ef1
|
||||
F .fossil-settings/ignore-glob 35175cdfcf539b2318cb04a9901442804be81cd677d8b889fcc9149c21f239ea
|
||||
F LICENSE.md df5091916dbb40e6e9686186587125e1b2ff51f022cc334e886c19a0e9982724
|
||||
@ -39,6 +39,7 @@ F contrib/sqlitecon.tcl 210a913ad63f9f991070821e599d600bd913e0ad
|
||||
F doc/F2FS.txt c1d4a0ae9711cfe0e1d8b019d154f1c29e0d3abfe820787ba1e9ed7691160fcd
|
||||
F doc/compile-for-windows.md 50b27d77be96195c66031a3181cb8684ed822327ea834e07f9c014213e5e3bcf
|
||||
F doc/json-enhancements.md e356fc834781f1f1aa22ee300027a270b2c960122468499bf347bb123ce1ea4f
|
||||
F doc/jsonb.md 5fab4b8613aa9153fbeb6259297bd4697988af8b3d23900deba588fa7841456b
|
||||
F doc/lemon.html 44a53a1d2b42d7751f7b2f478efb23c978e258d794bfd172442307a755b9fa44
|
||||
F doc/pager-invariants.txt 27fed9a70ddad2088750c4a2b493b63853da2710
|
||||
F doc/testrunner.md 8d36ec692cf4994bb66d84a4645b9afa1ce9d47dc12cbf8d437c5a5fb6ddeedb
|
||||
@ -696,7 +697,7 @@ F src/hash.h 3340ab6e1d13e725571d7cee6d3e3135f0779a7d8e76a9ce0a85971fa3953c51
|
||||
F src/hwtime.h f9c2dfb84dce7acf95ce6d289e46f5f9d3d1afd328e53da8f8e9008e3b3caae6
|
||||
F src/in-operator.md 10cd8f4bcd225a32518407c2fb2484089112fd71
|
||||
F src/insert.c 3f0a94082d978bbdd33c38fefea15346c6c6bffb70bc645a71dc0f1f87dd3276
|
||||
F src/json.c c6fae579d681d5ab43387c8667cc6ef0c0f8769fd881fde3af6260e837cae762
|
||||
F src/json.c 498cbe3346f216e3119e6d01585c3bc09994612c16459f2fa682a1cec61b1cf7
|
||||
F src/legacy.c d7874bc885906868cd51e6c2156698f2754f02d9eee1bae2d687323c3ca8e5aa
|
||||
F src/loadext.c 7432c944ff197046d67a1207790a1b13eec4548c85a9457eb0896bb3641dfb36
|
||||
F src/main.c ce714ee501122c76eb2e69b292bebe443aba611fc3b88e6786eb910285515fe4
|
||||
@ -2155,8 +2156,8 @@ F vsixtest/vsixtest.tcl 6a9a6ab600c25a91a7acc6293828957a386a8a93
|
||||
F vsixtest/vsixtest.vcxproj.data 2ed517e100c66dc455b492e1a33350c1b20fbcdc
|
||||
F vsixtest/vsixtest.vcxproj.filters 37e51ffedcdb064aad6ff33b6148725226cd608e
|
||||
F vsixtest/vsixtest_TemporaryKey.pfx e5b1b036facdb453873e7084e1cae9102ccc67a0
|
||||
P 206d8c650d937bc700946c40a82a62ea6bc4a80e5f3fb42d0ae2968de25f0644
|
||||
R f91549b5ba9f8490543a9270abff43ad
|
||||
U dan
|
||||
Z 03359f67466d165815d902ddaee87b98
|
||||
P 029a05cd2928d43d81e4549cce5388c432e2c9e75e3fa0b2fe6e91021b2fb9ac
|
||||
R 37c4deae61c278596f4d5133c7cb9761
|
||||
U drh
|
||||
Z eab6e0f06a97a969bebac69bfb9fe946
|
||||
# Remove this line to create a well-formed Fossil manifest.
|
||||
|
@ -1 +1 @@
|
||||
029a05cd2928d43d81e4549cce5388c432e2c9e75e3fa0b2fe6e91021b2fb9ac
|
||||
4d30478863b2a60512010de9ec6e3099bfaf75d4afee20acec536713fe94334d
|
@ -113,6 +113,9 @@
|
||||
** checks are true, the BLOB is assumed to be JSONB and processing continues.
|
||||
** Errors are only raised if some other miscoding is discovered during
|
||||
** processing.
|
||||
**
|
||||
** Additional information can be found in the doc/jsonb.md file of the
|
||||
** canonical SQLite source tree.
|
||||
*/
|
||||
#ifndef SQLITE_OMIT_JSON
|
||||
#include "sqliteInt.h"
|
||||
|
Loading…
Reference in New Issue
Block a user