Current maintainer: Bruce Momjian (pgman@candle.pha.pa.us)
Last updated: Mon May 9 22:16:11 EDT 2005
The most recent version of this document can be viewed at
http://www.postgresql.org/docs/faqs.TODO.html.
A hyphen, "-", marks changes that will appear in the upcoming 8.1 release.
Bracketed items, "[]", have more detail.
This list contains all known PostgreSQL bugs and feature requests. If
you would like to work on an item, please read the Developer's FAQ
first.
This would allow server log information to be easily loaded into a database for analysis.
Currently, if a user is removed while he still owns objects, a new user given might be given their user id and inherit the previous users objects.
This would allow an application inheriting a pooled connection to know the queries prepared in the current session.
The proposed syntax is:
GRANT SELECT ON ALL TABLES IN public TO phpuser; GRANT SELECT ON NEW TABLES IN public TO phpuser;
This item is difficult because a tablespace can contain objects from multiple databases. There is a server-side function that returns the databases which use a specific tablespace, so this requires a tool that will call that function and connect to each database to find the objects in each database for that tablespace.
All objects in the default database tablespace must have default tablespace specifications. This is because new databases are created by copying directories. If you mix default tablespace tables and tablespace-specified tables in the same directory, creating a new database from such a mixed directory would create a new database with tables that had incorrect explicit tablespaces. To fix this would require modifying pg_class in the newly copied database, which we don't currently do.
It could start with a random tablespace from a supplied list and cycle through the list.
This would require a new global table that is dumped to flat file for use by the postmaster. We do a similar thing for pg_shadow currently.
Right now, SIGTERM will terminate a session, but it is treated as though the postmaster has paniced and shared memory might not be cleaned up properly. A new signal is needed for safe termination.
By not showing commented-out variables, we discourage people from thinking that re-commenting a variable returns it to its default. This has to address environment variables that are then overridden by config file values. Another option is to allow commented values to return to their default values.
Currently only full WAL files are archived. This means that the most recent transactions aren't available for recovery in case of a disk failure. This could be triggered by a user command or a timer.
Doing this will allow administrators to know more easily when the archive contins all the files needed for point-in-time recovery.
Currently all schemas are owned by the super-user because they are copied from the template1 database.
This is useful for checking PITR recovery.
You can use any of the master/slave replication servers to use a standby server for data warehousing. To allow read/write queries to multiple servers, you need multi-master replication like pgcluster.
Current CURRENT_TIMESTAMP returns the start time of the current transaction, and gettimeofday() returns the wallclock time. This will make time reporting more consistent and will allow reporting of the statement start time.
If the TIMESTAMP value is stored with a time zone name, interval computations should adjust based on the time zone rules, e.g. adding 24 hours to a timestamp would yield a different result from adding one day.
For example, to_char('1 month', 'mon') is meaningless. Basically, most date-related parameters to to_char() are meaningless for intervals because interval is not anchored to a date.
Some special format flag would be required to request such
accumulation. Such functionality could also be added to EXTRACT.
Prevent accumulation that crosses the month/day boundary because of
the uneven number of days in a month.
Currently large objects entries do not have owners. Permissions can only be set at the pg_largeobject table level.
This requires the TOAST column to be stored EXTERNAL.
Currently locale can only be set during initdb. No global tables have locale-aware columns. However, the database template used during database creation might have locale-aware indexes. The indexes would need to be reindexed to match the new locale.
Right now only one encoding is allowed per database.
We can only auto-create rules for simple views. For more complex cases users will still have to write rules.
The main difficulty with this item is the problem of creating an index that can span more than one table.
MIN/MAX queries can already be rewritten as SELECT col FROM tab ORDER BY col {DESC} LIMIT 1. Completing this item involves doing this transformation automatically.
For an index on col1,col2,col3, and a WHERE clause of col1 = 5 and col3 = 9, spin though the index checking for col1 and col3 matches, rather than just col1; also called skip-scanning.
Uniqueness (index) checks are done when updating a column even if the column is not modified by the UPDATE.
Rather than randomly accessing heap pages based on index entries, mark heap pages needing access in a bitmap and do the lookups in sequential order. Another method would be to sort heap ctids matching the index before accessing the heap rows.
This feature allows separate indexes to be ANDed or ORed together. This is particularly useful for data warehousing applications that need to query the database in an many permutations. This feature scans an index and creates an in-memory bitmap, and allows that bitmap to be combined with other bitmap created in a similar way. The bitmap can either index all TIDs, or be lossy, meaning it records just page numbers and each page tuple has to be checked for validity in a separate pass.
Such indexes could be more compact if there are only a few distinct values. Such indexes can also be compressed. Keeping such indexes updated can be costly.
One solution is to create a partial index on an IS NULL expression.
Currently no only one hash bucket can be stored on a page. Ideally several hash buckets could be stored on a single page and greater granularity used for the hash algorithm.
The use of C-style backslashes (.e.g. \n, \r) in quoted strings is not SQL-spec compliant, so allow such handling to be disabled. However, disabling backslashes could break many third-party applications and tools.
This is not SQL-spec but many DBMSs allow it.
UPDATE already allows this (UPDATE...FROM) but we need similar functionality in DELETE. It's been agreed that the keyword should be USING, to avoid anything as confusing as DELETE FROM a FROM b.
Currently, queries prepared via the libpq API are planned on first execute using the supplied parameters --- allow SQL PREPARE to do the same. Also, allow control over replanning prepared queries either manually or automatically when statistics for execute parameters differ dramatically from those used during planning.
Currently LISTEN/NOTIFY information is stored in pg_listener. Storing such information in memory would improve performance.
This would allow an informational message to be added to the notify message, perhaps indicating the row modified or other custom information.
Currently the system uses the operating system COPY command to create a new database.
When enabled, this would allow errors in multi-statement transactions to be automatically ignored.
This would include resetting of all variables (RESET ALL), dropping of all temporary tables, removal of any NOTIFYs, cursors, prepared queries(?), currval()s, etc. This could be used for connection pooling. We could also change RESET ALL to have this functionality.
One complexity is whether moving a schema should move all existing schema objects or just define the location for future object creation.
Currently non-global system tables must be in the default database schema. Global system tables can never be moved.
This might require some background daemon to maintain clustering during periods of low usage. It might also require tables to be only paritally filled for easier reorganization. Another idea would be to create a merged heap/index data file so an index lookup would automatically access the heap data too. A third idea would be to store heap rows in hashed groups, perhaps using a user-supplied hash function.
To do this, determine the ideal cluster index for each system table and set the cluster setting during initdb.
This requires the use of a savepoint before each COPY line is processed, with ROLLBACK on COPY failure.
This requires using the row ctid to map cursor rows back to the original heap row. This become more complicated if WITH HOLD cursors are to be supported because WITH HOLD cursors have a copy of the row and no FOR UPDATE lock.
Because WITH HOLD cursors exist outside transactions, this allows them to be listed so they can be closed.
This is useful for returning the auto-generated key for an INSERT. One complication is how to handle rules that run as part of the insert.
This is basically the same as SET search_path.
Currently only constants are supported.
This requires the cached PL/PgSQL byte code to be invalidated when an object referenced in the function is changed.
We already allow XML to be stored in the database, and XPath queries can be used on that data using /contrib/xml2. It also supports XSLT transformations.
This would be used for checking if the server is up.
This would allow non-psql clients to pull the same information out of the database as psql.
This should be done by allowing a '-t schema.table' syntax.
This is probably best done by combining pg_dump and pg_dumpall into a single binary.
Document differences between ecpg and the SQL standard and information about the Informix-compatibility module.
Right now all deferred trigger information is stored in backend memory. This could exhaust memory for very large trigger queues. This item involves dumping large queues into files.
Currently the only way to disable triggers is to modify the system tables.
If the dump is known to be valid, allow foreign keys to be added without revalidating the data.
This was used in older releases to dump referential integrity constraints.
System tables are modified in many places in the backend without going through the executor and therefore not causing triggers to fire. To complete this item, the functions that modify system tables will have to fire triggers.
This could allow SQL written for other databases to run without modification.
Right now contrib/dblink can be used to issue such queries except it does not have locking or transaction semantics. Two-phase commit is needed to enable transaction semantics.
This will involve adding a way to respond to commit failure by either taking the server into offline/readonly mode or notifying the administrator
Posix_fadvise() can control both sequential/random file caching and free-behind behavior, but it is unclear how the setting affects other backends that also have the file open, and the feature is not supported on all operating systems.
We could use a fixed row count and a +/- count to follow MVCC visibility rules, or a single cached value could be used and invalidated if anyone modifies the table. Another idea is to get a count directly from a unique index, but for this to be faster than a sequential scan it must avoid access to the heap to obtain tuple visibility information.
Currently indexes do not have enough tuple visibility information to allow data to be pulled from the index without also accessing the heap. One way to allow this is to set a bit to index tuples to indicate if a tuple is currently visible to all transactions when the first valid heap lookup happens. This bit would have to be cleared when a heap tuple is expired.
Larger local buffer cache sizes requires more efficient handling of local cache lookups.
Allow the background writer to more efficiently write dirty buffers from the end of the LRU cache and use a clock sweep algorithm to write other dirty buffers to reduced checkpoint I/O
One possible implementation is to start sequential scans from the lowest numbered buffer in the shared cache, and when reaching the end wrap around to the beginning, rather than always starting sequential scans at the start of the table.
For large table adjustements during vacuum, it is faster to reindex rather than update the index.
Moved tuples are invisible to other backends so they don't require a write lock. However, the read lock promotion to write lock could lead to deadlock situations.
This allows vacuum to target specific pages for possible free space without requiring a sequential scan.
This requires that more locks be acquired but this would reduce lock contention, improving concurrency.
On SMP machines, it is possible that locks might be released shortly, while on non-SMP machines, the backend should sleep so the process holding the lock can complete and release it.
i386-based SMP machines can generate excessive context switching caused by lock failure in high concurrency situations. This may be caused by CPU cache line invalidation inefficiencies.
This would prevent the overhead associated with process creation. Most operating systems have trivial process creation time compared to database startup overhead, but a few operating systems (WIn32, Solaris) might benefit from threading. Also explore the idea of a single session using multiple threads to execute a query faster.
It is unclear if this should be done inside the backend code or done by something external like pgpool. The passing of file descriptors to existing backends is one of the difficulties with a backend approach.
Currently, to protect against partial disk page writes, we write the full page images to WAL before they are modified so we can correct any partial page writes during recovery. These pages can also be eliminated from point-in-time archive files.
If fsync is off, there is no purpose in writing full pages to WAL
Currently fsync of WAL requires the disk platter to perform a full rotation to fsync again. One idea is to write the WAL to different offsets that might reduce the rotational delay.
Instead of guaranteeing recovery of all committed transactions, this would provide improved performance by delaying WAL writes and fsync so an abrupt operating system restart might lose a few seconds of committed transactions but still be consistent. We could perhaps remove the 'fsync' parameter (which results in an an inconsistent database) in favor of this capability.
Right now, if no index exists, ORDER BY ... LIMIT # requires we sort all values to return the high/low value. Instead The idea is to do a sequential scan to find the high/low value, thus avoiding the sort. MIN/MAX already does this, but not for LIMIT > 1.
CHECK constraints contain information about the distribution of values within the table. This is also useful for implementing subtables where a tables content is distributed across several subtables.
This would be beneficial when there are few distinct values.
Async I/O allows multiple I/O requests to be sent to the disk with results coming back asynchronously.
This would remove the requirement for SYSV SHM but would introduce portability issues. Anonymous mmap (or mmap to /dev/zero) is required to prevent I/O overhead.
Doing I/O to large tables would consume a lot of address space or require frequent mapping/unmapping. Extending the file also causes mapping problems that might require mapping only individual pages, leading to thousands of mappings. Another problem is that there is no way to _prevent_ I/O to disk from the dirty shared buffers so changes could hit disk before WAL is written.
This is probably not possible because 'gmake' and other compiler tools do not fully support quoting of paths with spaces.
This is possible if proper quoting is added to the makefiles for the install targets. Because PostgreSQL supports relocatable installs, it is already possible to install into a directory that doesn't contain spaces and then copy the install to a directory with spaces.
To fix this, the data needs to be converted to/from UTF16/UTF8 so the Win32 wcscoll() can be used, and perhaps other functions like towupper(). However, UTF8 already works with normal locales but provides no ordering or character set classes.