mirror of https://github.com/postgres/postgres
3db72ebcbe
This commit changes the query jumbling code in queryjumblefuncs.c to be generated automatically based on the information of the nodes in the headers of src/include/nodes/ by using gen_node_support.pl. This approach offers many advantages: - Support for query jumbling for all the utility statements, based on the state of their parsed Nodes and not only their query string. This will greatly ease the switch to normalize the information of some DDLs, like SET or CALL for example (this is left unchanged and should be part of a separate discussion). With this feature, the number of entries stored for utilities in pg_stat_statements is reduced (for example now "CHECKPOINT" and "checkpoint" mean the same thing with the same query ID). - Documentation of query jumbling directly in the structure definition of the nodes. Since this code has been introduced in pg_stat_statements and then moved to code, the reasons behind the choices of what should be included in the jumble are rather sparse. Note that some explanation is added for the most relevant parts, as a start. - Overall code reduction and more consistency with the other parts generating read, write and copy depending on the nodes. The query jumbling is controlled by a couple of new node attributes, documented in nodes/nodes.h: - custom_query_jumble, to mark a Node as having a custom implementation. - no_query_jumble, to ignore entirely a Node. - query_jumble_ignore, to ignore a field in a Node. - query_jumble_location, to mark a location in a Node, for normalization. This can apply only to int fields, with "location" in their name (only Const as of this commit). There should be no compatibility impact on pg_stat_statements, as the new code applies the jumbling to the same fields for each node (its regression tests have no modification, for one). Some benchmark of the query jumbling between HEAD and this commit for SELECT and DMLs has proved that this new code does not cause a performance regression, with computation times close for both methods. For utility queries, the new method is slower than the previous method of calculating a hash of the query string, though we are talking about extra ns-level changes based on what I measured, which is unnoticeable even for OLTP workloads as a query ID is calculated once per query post-parse analysis. Author: Michael Paquier Reviewed-by: Peter Eisentraut Discussion: https://postgr.es/m/Y5BHOUhX3zTH/ig6@paquier.xyz |
||
---|---|---|
.. | ||
expected | ||
sql | ||
.gitignore | ||
Makefile | ||
meson.build | ||
pg_stat_statements--1.0--1.1.sql | ||
pg_stat_statements--1.1--1.2.sql | ||
pg_stat_statements--1.2--1.3.sql | ||
pg_stat_statements--1.3--1.4.sql | ||
pg_stat_statements--1.4--1.5.sql | ||
pg_stat_statements--1.4.sql | ||
pg_stat_statements--1.5--1.6.sql | ||
pg_stat_statements--1.6--1.7.sql | ||
pg_stat_statements--1.7--1.8.sql | ||
pg_stat_statements--1.8--1.9.sql | ||
pg_stat_statements--1.9--1.10.sql | ||
pg_stat_statements.c | ||
pg_stat_statements.conf | ||
pg_stat_statements.control |