Instead of allocating memory in brin_deform_tuple and brin_copy_tuple
over and over during a scan, allow reuse of previously allocated memory.
This is said to make for a measurable performance improvement.
Author: Jinyu Zhang, Álvaro Herrera
Reviewed by: Tomas Vondra
Discussion: https://postgr.es/m/495deb78.4186.1500dacaa63.Coremail.beijing_pg@163.com
When adding atomics back in b64d92f1a, I added 64bit support as
optional; there wasn't yet a direct user in sight. That turned out to
be a bit short-sighted, it'd already have been useful a number of times.
Add a fallback implementation of 64bit atomics, just like the one we
have for 32bit atomics.
Additionally optimize reads/writes to 64bit on a number of platforms
where aligned writes of that size are atomic. This can now be tested
with PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY.
Author: Andres Freund
Reviewed-By: Amit Kapila
Discussion: https://postgr.es/m/20160330230914.GH13305@awork2.anarazel.de
The WAL-writing piece was forgetting to set the pages-per-range value.
Also, fix the declared type of struct member heapBlk, which I mistakenly
set as OffsetNumber rather than BlockNumber.
Problem was introduced by commit c655899ba9ae (April 1st). Any system
that tries to replay the new WAL record written before this fix is
likely to die on replay and require pg_resetwal.
Reported by Tom Lane.
Discussion: https://postgr.es/m/20191.1491524824@sss.pgh.pa.us
As reported by Sean Johnston in bug #14614, since 9.6 the planner can fail
due to trying to look up the referent of a Var with varno 0. This happens
because we generate such Vars in generate_append_tlist, for lack of any
better way to describe the output of a SetOp node. In typical situations
nothing really cares about that, but given nested set-operation queries
we will call estimate_num_groups on the output of the subquery, and that
wants to know what a Var actually refers to. That logic used to look at
subquery->targetList, but in commit 3fc6e2d7f I'd switched it to look at
subroot->processed_tlist, ie the actual output of the subquery plan not the
parser's idea of the result. It seemed like a good idea at the time :-(.
As a band-aid fix, change it back.
Really we ought to have an honest way of naming the outputs of SetOp steps,
which suggests that it'd be a good idea for the parser to emit an RTE
corresponding to each one. But that's a task for another day, and it
certainly wouldn't yield a back-patchable fix.
Report: https://postgr.es/m/20170407115808.25934.51866@wrigleys.postgresql.org
An important step of SASLprep normalization, is to convert the string to
Unicode normalization form NFKC. Unicode normalization requires a fairly
large table of character decompositions, which is generated from data
published by the Unicode consortium. The script to generate the table is
put in src/common/unicode, as well test code for the normalization.
A pre-generated version of the tables is included in src/include/common,
so you don't need the code in src/common/unicode to build PostgreSQL, only
if you wish to modify the normalization tables.
The SASLprep implementation depends on the UTF-8 functions from
src/backend/utils/mb/wchar.c. So to use it, you must also compile and link
that. That doesn't change anything for the current users of these
functions, the backend and libpq, as they both already link with wchar.o.
It would be good to move those functions into a separate file in
src/commmon, but I'll leave that for another day.
No documentation changes included, because there is no details on the
SCRAM mechanism in the docs anyway. An overview on that in the protocol
specification would probably be good, even though SCRAM is documented in
detail in RFC5802. I'll write that as a separate patch. An important thing
to mention there is that we apply SASLprep even on invalid UTF-8 strings,
to support other encodings.
Patch by Michael Paquier and me.
Discussion: https://www.postgresql.org/message-id/CAB7nPqSByyEmAVLtEf1KxTRh=PWNKiWKEKQR=e1yGehz=wbymQ@mail.gmail.com
With this change array fields are populated from json(b) arrays, and
composite fields are populated from json(b) objects.
Along the way, some significant code refactoring is done to remove
redundancy in the way to populate_record[_set] and to_record[_set]
functions operate, and some significant efficiency gains are made by
caching tuple descriptors.
Nikita Glukhov, edited some by me.
Reviewed by Aleksander Alekseev and Tom Lane.
All documentation is now built using XSLT. Remove all references to
Jade, DSSSL, also JadeTex and some other outdated tooling.
For chunked HTML builds, this changes nothing, but removes the
transitional "oldhtml" target. The single-page HTML build is ported
over to XSLT. For PDF builds, this removes the JadeTex builds and moves
the FOP builds in their place.
tupconvert.c's functions formerly considered that an explicit tuple
conversion was necessary if the input and output tupdescs contained
different type OIDs. The point of that was to make sure that a composite
datum resulting from the conversion would contain the destination rowtype
OID in its composite-datum header. However, commit 3838074f8 entirely
misunderstood what that check was for, thinking that it had something to do
with presence or absence of an OID column within the tuple. Removal of the
check broke the no-op conversion path in ExecEvalConvertRowtype, as
reported by Ashutosh Bapat.
It turns out that of the dozen or so call sites for tupconvert.c functions,
ExecEvalConvertRowtype is the only one that cares about the composite-datum
header fields in the output tuple. In all the rest, we'd much rather avoid
an unnecessary conversion whenever the tuples are physically compatible.
Moreover, the comments in tupconvert.c only promise physical compatibility
not a metadata match. So, let's accept the removal of the guarantee about
the output tuple's rowtype marking, recognizing that this is a API change
that could conceivably break third-party callers of tupconvert.c. (So,
let's remember to mention it in the v10 release notes.)
However, commit 3838074f8 did have a bit of a point here, in that two
tuples mustn't be considered physically compatible if one has HEAP_HASOID
set and the other doesn't. (Some of the callers of tupconvert.c might not
really care about that, but we can't assume it in general.) The previous
check accidentally covered that issue, because no RECORD types ever have
OIDs, while if two tupdescs have the same named composite type OID then,
a fortiori, they have the same tdhasoid setting. If we're removing the
type OID match check then we'd better include tdhasoid match as part of
the physical compatibility check.
Without that hack in tupconvert.c, we need ExecEvalConvertRowtype to take
responsibility for inserting the correct rowtype OID label whenever
tupconvert.c decides it need not do anything. This is easily done with
heap_copy_tuple_as_datum, which will be considerably faster than a tuple
disassembly and reassembly anyway; so from a performance standpoint this
change is a win all around compared to what happened in earlier branches.
It just means a couple more lines of code in ExecEvalConvertRowtype.
Ashutosh Bapat and Tom Lane
Discussion: https://postgr.es/m/CAFjFpRfvHABV6+oVvGcshF8rHn+1LfRUhj7Jz1CDZ4gPUwehBg@mail.gmail.com
Add a "copy" argument to make it optional to receive a copy of caller
tuple that is safe to use following a subsequent manipulating of
tuplesort's state. This is a performance optimization. Most existing
tuplesort_gettupleslot() callers are made to opt out of copying.
Existing callers that happen to rely on the validity of tuple memory
beyond subsequent manipulations of the tuplesort request their own
copy.
This brings tuplesort_gettupleslot() in line with
tuplestore_gettupleslot(). In the future, a "copy"
tuplesort_getdatum() argument may be added, that similarly allows
callers to opt out of receiving their own copy of tuple.
In passing, clarify assumptions that callers of other tuplesort fetch
routines may make about tuple memory validity, per gripe from Tom
Lane.
Author: Peter Geoghegan
Discussion: CAM3SWZQWZZ_N=DmmL7tKy_OUjGH_5mN=N=A6h7kHyyDvEhg2DA@mail.gmail.com
At -Og optimization gcc warns that variable tclass may be used
uninitialized when relkind == RELKIND_INDEX. Actually that can't
happen due to an early return, but quiet the compiler by initializing
tclass to 0.
In passing, use uint16_t consistently for the declaration of tclass.
Complaint and initial patch by Mike Palmiotto. Editorializing by me.
Probably not worth backpatching given that it is cosmetic, so apply
to development head only.
Discussion: https://postgr.es/m/flat/623bcaae-112e-ced0-8c22-a84f75ae0c53%40joeconway.com
<selinux/label.h> includes <stdbool.h>, which creates an incompatible
We don't care if <stdbool.h> redefines "true"/"false"; those are close
enough.
Complaint and initial patch by Mike Palmiotto. Final approach per
Tom Lane's suggestion, as discussed on hackers. Backpatching to
all supported branches.
Discussion: https://postgr.es/m/flat/623bcaae-112e-ced0-8c22-a84f75ae0c53%40joeconway.com
The original code was overly optimistic about the cost of scanning a
BRIN index, leading to BRIN indexes being selected when they'd be a
worse choice than some other index. This complete rewrite should be
more accurate.
Author: David Rowley, based on an earlier patch by Emre Hasegeli
Reviewed-by: Emre Hasegeli
Discussion: https://postgr.es/m/CAKJS1f9n-Wapop5Xz1dtGdpdqmzeGqQK4sV2MK-zZugfC14Xtw@mail.gmail.com
When sending a tuple attribute, the previous coding erroneously sent the
length byte before encoding conversion, which would lead to protocol
failures on the receiving side if the length did not match the following
string.
To fix that, use pq_sendcountedtext() for sending tuple attributes,
which takes care of all of that internally. To match the API of
pq_sendcountedtext(), send even text values without a trailing zero byte
and have the receiving end put it in place instead. This matches how
the standard FE/BE protocol behaves.
Reported-by: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>
Also add opr_sanity check that all preloaded immutable functions are
parallel safe. (Per discussion, this does not necessarily have to be
true for all possible such functions, but deviations would be unlikely
enough that maintaining such a test is reasonable.)
Reported-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
In struct _archiveHandle, some of the fields have the same name as a
typedef. This is kind of confusing, so rename the types so they have
names distinct from the struct fields. In C++, the previous coding
changes the meaning of the typedef in the scope of the struct, causing
warnings and possibly other problems.
Reviewed-by: Andres Freund <andres@anarazel.de>
It was not used for what the comment claimed, at all. It was actually used
as the 'base' argument to strtol(), when reading the iteration count. We
don't need a constant for base-10, so remove it.
This is the SQL standard-conforming variant of PostgreSQL's serial
columns. It fixes a few usability issues that serial columns have:
- CREATE TABLE / LIKE copies default but refers to same sequence
- cannot add/drop serialness with ALTER TABLE
- dropping default does not drop sequence
- need to grant separate privileges to sequence
- other slight weirdnesses because serial is some kind of special macro
Reviewed-by: Vitaly Burovoy <vitaly.burovoy@gmail.com>
For normal commits and aborts we already reset PgXact->xmin,
so we can simply avoid running SnapshotResetXmin() twice.
During performance tests by Alexander Korotkov, diagnosis
by Andres Freund showed PgXact array as a bottleneck. After
manual analysis by me of the code paths that touch those
memory locations, I was able to identify extraneous code
in the main transaction commit path.
Avoiding touching highly contented shmem improves concurrent
performance slightly on all workloads, confirmed by tests
run by Ashutosh Sharma and Alexander Korotkov.
Simon Riggs
Discussion: CANP8+jJdXE9b+b9F8CQT-LuxxO0PBCB-SZFfMVAdp+akqo4zfg@mail.gmail.com
HandleFunctionRequest() is no longer responsible for reading the protocol
message from the client, since commit 2b3a8b20c2. Fix the outdated
comments.
HandleFunctionRequest() now always returns 0, because the code that used
to return EOF was moved in 2b3a8b20c2. Therefore, the caller no longer
needs to check the return value.
Reported by Andres Freund. Backpatch to all supported versions, even though
this doesn't have any user-visible effect, to make backporting future
patches in this area easier.
Discussion: https://www.postgresql.org/message-id/20170405010525.rt5azbya5fkbhvrx@alap3.anarazel.de
When using integer timestamps, the interval-comparison functions tried
to compute the overall magnitude of an interval as an int64 number of
microseconds. As reported by Frazer McLean, this overflows for intervals
exceeding about 296000 years, which is bad since we nominally allow
intervals many times larger than that. That results in wrong comparison
results, and possibly in corrupted btree indexes for columns containing
such large interval values.
To fix, compute the magnitude as int128 instead. Although some compilers
have native support for int128 calculations, many don't, so create our
own support functions that can do 128-bit addition and multiplication
if the compiler support isn't there. These support functions are designed
with an eye to allowing the int128 code paths in numeric.c to be rewritten
for use on all platforms, although this patch doesn't do that, or even
provide all the int128 primitives that will be needed for it.
Back-patch as far as 9.4. Earlier releases did not guard against overflow
of interval values at all (commit 146604ec4 fixed that), so it seems not
very exciting to worry about overly-large intervals for them.
Before 9.6, we did not assume that unreferenced "static inline" functions
would not draw compiler warnings, so omit functions not directly referenced
by timestamp.c, the only present consumer of int128.h. (We could have
omitted these functions in HEAD too, but since they were written and
debugged on the way to the present patch, and they look likely to be needed
by numeric.c, let's keep them in HEAD.) I did not bother to try to prevent
such warnings in a --disable-integer-datetimes build, though.
Before 9.5, configure will never define HAVE_INT128, so the part of
int128.h that exploits a native int128 implementation is dead code in the
9.4 branch. I didn't bother to remove it, thinking that keeping the file
looking similar in different branches is more useful.
In HEAD only, add a simple test harness for int128.h in src/tools/.
In back branches, this does not change the float-timestamps code path.
That's not subject to the same kind of overflow risk, since it computes
the interval magnitude as float8. (No doubt, when this code was originally
written, overflow was disregarded for exactly that reason.) There is a
precision hazard instead :-(, but we'll avert our eyes from that question,
since no complaints have been reported and that code's deprecated anyway.
Kyotaro Horiguchi and Tom Lane
Discussion: https://postgr.es/m/1490104629.422698.918452336.26FA96B7@webmail.messagingengine.com
Follow on patch in the multi-variate statistics patch series.
CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
ANALYZE;
will collect dependency stats on (a, b) and then use the measured
dependency in subsequent query planning.
Commit 7b504eb282ca2f5104b5c00b4f05a3ef6bb1385b added
CREATE STATISTICS with n-distinct coefficients. These are now
specified using the mutually exclusive option WITH (ndistinct).
Author: Tomas Vondra, David Rowley
Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas
and many other comments and contributions
Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
Make every page in a hash index which isn't all-zeroes have a valid
special space, so that tools like pageinspect don't error out.
Also, make pageinspect cope with all-zeroes pages, because
_hash_alloc_buckets can leave behind large numbers of those until
they're consumed by splits.
Ashutosh Sharma and Robert Haas, reviewed by Amit Kapila.
Original trouble report from Jeff Janes.
Discussion: http://postgr.es/m/CAMkU=1y6NjKmqbJ8wLMhr=F74WzcMALYWcVFhEpm7i=mV=XsOg@mail.gmail.com
All error messages use the American English spelling of recognize,
apply to the single one not doing so to be consistent.
Author: Daniel Gustafsson <daniel@yesql.se>
This improves code coverage and lays a foundation for testing
similar issues in a distributed environment.
Author: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>