TODO list for PostgreSQL ======================== Last updated: Fri Jun 18 12:03:13 EDT 2004 Current maintainer: Bruce Momjian (pgman@candle.pha.pa.us) The most recent version of this document can be viewed at the PostgreSQL web site, http://www.PostgreSQL.org. A dash (-) marks changes that will appear in the upcoming 7.5 release. Bracketed items "[]" have more detail. Urgent ====== * Add replication of distributed databases o Automatic failover o Load balancing o Master/slave replication o Multi-master replication o Partition data across servers o Queries across databases or servers (two-phase commit) o Allow replication over unreliable or non-persistent links * Point-in-time data recovery using backup and write-ahead log, http://momjian.postgresql.org/main/writings/pgsql/project/pitr.html * Create native Win32 port, http://momjian.postgresql.org/main/writings/pgsql/project/win32.html Administration ============== * Incremental backups * Remove behavior of postmaster -o after making postmaster/postgres flags unique * Allow configuration files to be specified in a different directory * Allow limits on per-db/user connections * Add group object ownership, so groups can rename/drop/grant on objects, so we can implement roles * -Add the concept of dataspaces/tablespaces (Gavin) * -Allow logging of only data definition(DDL), or DDL and modification statements * -Allow log lines to include session-level information, like database and user * Allow server log information to be output as INSERT statements * Prevent default re-use of sysids for dropped users and groups * Prevent dropping user that still owns objects, or auto-drop the objects * Allow pooled connections to query prepared queries * Allow pooled connections to close all open WITH HOLD cursors * Allow major upgrades without dump/reload, perhaps using pg_upgrade * Have SHOW ALL and pg_settings show descriptions for server-side variables(Joe) * Allow external interfaces to extend the GUC variable set * Allow GRANT/REVOKE permissions to be given to all schema objects with one command * Remove unreferenced table files created by transactions that were in-progress when the server terminated abruptly Data Types ========== * Remove Money type, add money formatting for decimal type * -Change factorial to return a numeric (Gavin) * Change NUMERIC to enforce the maximum precision, and increase it * Add function to return compressed length of TOAST data values (Tom) * Allow INET subnet tests using non-constants to be indexed * Add transaction_timestamp(), statement_timestamp(), clock_timestamp() functionality * Have sequence dependency track use of DEFAULT sequences, seqname.nextval * Disallow changing default expression of a SERIAL column * Allow infinite dates just like infinite timestamps * -Allow pg_dump to dump sequences using NO_MAXVALUE and NO_MINVALUE * Allow backend to output result sets in XML * -Prevent whole-row references from leaking memory, e.g. SELECT COUNT(tab.*) * Have initdb set DateStyle based on locale? * Add pg_get_acldef(), pg_get_typedefault(), and pg_get_attrdef() * Add ALTER DOMAIN, AGGREGATE, CONVERSION, SEQUENCE ... OWNER TO * Allow to_char to print localized month names (Karel) * Allow functions to have a search path specified at creation time * -Make LENGTH() of CHAR() not count trailing spaces * Allow substring/replace() to get/set bit values * Add GUC variable to allow output of interval values in ISO8601 format * Support composite types as table columns * ARRAYS o Allow nulls in arrays o Allow MIN()/MAX() on arrays o Delay resolution of array expression type so assignment coercion can be performed on empty array expressions (Joe) o Modify array literal representation to handle array index lower bound of other than one * BINARY DATA o Improve vacuum of large objects, like /contrib/vacuumlo o Add security checking for large objects o Make file in/out interface for TOAST columns, similar to large object interface (force out-of-line storage and no compression) o Auto-delete large objects when referencing row is deleted Multi-Language Support ====================== * Add NCHAR (as distinguished from ordinary varchar), * Allow locale to be set at database creation * Allow locale on a per-column basis, default to ASCII * Optimize locale to have minimal performance impact when not used (Peter E) * Support multiple simultaneous character sets, per SQL92 * Improve Unicode combined character handling * Add octet_length_server() and octet_length_client() (Thomas, Tatsuo) * Make octet_length_client the same as octet_length() (?) * Prevent mismatch of frontend/backend encodings from converting bytea data from being interpreted as encoded strings * Fix upper()/lower() to work for multibyte encodings Views / Rules ============= * Automatically create rules on views so they are updateable, per SQL92 [view] * Add the functionality for WITH CHECK OPTION clause of CREATE VIEW * Allow NOTIFY in rules involving conditionals * Have views on temporary tables exist in the temporary namespace * Allow temporary views on non-temporary tables * Allow RULE recompilation Indexes ======= * -Order duplicate index entries on creation by tid for faster heap lookups * Allow inherited tables to inherit index, UNIQUE constraint, and primary key, foreign key [inheritance] * UNIQUE INDEX on base column not honored on inserts from inherited table INSERT INTO inherit_table (unique_index_col) VALUES (dup) should fail [inheritance] * Add UNIQUE capability to non-btree indexes * Add rtree index support for line, lseg, path, point * Use indexes for min() and max() or convert to SELECT col FROM tab ORDER BY col DESC LIMIT 1 if appropriate index exists and WHERE clause acceptible * Use index to restrict rows returned by multi-key index when used with non-consecutive keys or OR clauses, so fewer heap accesses * Be smarter about insertion of already-ordered data into btree index * Prevent index uniqueness checks when UPDATE does not modify the column * Use bitmaps to fetch heap pages in sequential order [performance] * Use bitmaps to combine existing indexes [performance] * Allow use of indexes to search for NULLs * -Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8, float4, numeric/decimal too * Add FILLFACTOR to btree index creation * Add concurrency to GIST * Allow a single index to index multiple tables (for inheritance and subtables) * Pack hash index buckets onto disk pages more efficiently Commands ======== * Add BETWEEN ASYMMETRIC/SYMMETRIC (Christopher) * Change LIMIT/OFFSET to use int8 * CREATE TABLE AS can not determine column lengths from expressions [atttypmod] * Allow UPDATE to handle complex aggregates [update] * Allow command blocks to ignore certain types of errors * Allow backslash handling in quoted strings to be disabled for portability * Allow UPDATE, DELETE to handle table aliases for self-joins [delete] * Add CORRESPONDING BY to UNION/INTERSECT/EXCEPT * Allow REINDEX to rebuild all indexes, remove /contrib/reindex * Add ROLLUP, CUBE, GROUPING SETS options to GROUP BY * Add schema option to createlang * Allow savepoints / nested transactions [transactions] (Alvaro) * Use nested transactions to prevent syntax errors from aborting a transaction * Allow UPDATE tab SET ROW (col, ...) = (...) for updating multiple columns * Allow SET CONSTRAINTS to be qualified by schema/table * Prevent COMMENT ON DATABASE from using a database name * -Add NO WAIT LOCKs * Allow TRUNCATE ... CASCADE/RESTRICT * Allow PREPARE of cursors * Allow LISTEN/NOTIFY to store info in memory rather than tables * -COMMENT ON [ CAST | CONVERSION | OPERATOR CLASS | LARGE OBJECT | LANGUAGE ] (Christopher) * Dump large object comments in custom dump format * Add optional textual message to NOTIFY * -Allow more ISOLATION LEVELS to be accepted * Allow CREATE TABLE foo (f1 INT CHECK (f1 > 0) CHECK (f1 < 10)) to work by searching for non-conflicting constraint names, and prefix with table name * Use more reliable method for CREATE DATABASE to get a consistent copy of db * -Have psql \dn show only visible temp schemas using current_schemas() * -Have psql '\i ~/' actually load files it displays from home dir * Ignore temporary tables from other session when processing inheritance * -Add GUC setting to make created tables default to WITHOUT OIDS * Have pg_ctl look at PGHOST in case it is a socket directory * Allow column-level privileges * Add a session mode to warn about non-standard SQL usage * Add MERGE command that does UPDATE/DELETE, or on failure, INSERT (rules, triggers?) * Add ON COMMIT capability to CREATE TABLE AS SELECT * ALTER o -ALTER TABLE ADD COLUMN does not honor DEFAULT and non-CHECK CONSTRAINT o -ALTER TABLE ADD COLUMN column DEFAULT should fill existing rows with DEFAULT value o -ALTER TABLE ADD COLUMN column SERIAL doesn't create sequence because of the item above o Have ALTER TABLE rename SERIAL sequences o -Allow ALTER TABLE to modify column lengths and change to binary compatible types o Add ALTER DATABASE ... OWNER TO newowner o Add ALTER DOMAIN TYPE o Allow ALTER TABLE ... ALTER CONSTRAINT ... RENAME o Allow ALTER TABLE to change constraint deferrability and actions o Disallow dropping of an inherited constraint o Allow the schema of objects to be changed * CLUSTER o Automatically maintain clustering on a table o Add ALTER TABLE table SET WITHOUT CLUSTER (Christopher) o Add default clustering to system tables * COPY o -Allow dump/load of CSV format o Allow COPY to report error lines and continue; optionally allow error codes to be specified; requires savepoints or can not be run in a multi-statement transaction o Allow COPY to understand \x as hex o Have COPY return number of rows loaded/unloaded * CURSOR o Allow UPDATE/DELETE WHERE CURRENT OF cursor using per-cursor tid stored in the backend (Gavin) o Prevent DROP of table being referenced by our own open cursor * INSERT o Allow INSERT/UPDATE of system-generated oid value for a row o Allow INSERT INTO tab (col1, ..) VALUES (val1, ..), (val2, ..) o Allow INSERT/UPDATE ... RETURNING new.col or old.col; handle RULE cases (Philip) * SHOW/SET o Add SET PERFORMANCE_TIPS option to suggest INDEX, VACUUM, VACUUM ANALYZE, and CLUSTER o Add SET PATH for schemas * SERVER-SIDE LANGUAGES o Allow PL/PgSQL's RAISE function to take expressions o Change PL/PgSQL to use palloc() instead of malloc() o -Allow Java server-side programming o Fix problems with complex temporary table creation/destruction without using PL/PgSQL EXECUTE, needs cache prevention/invalidation o Fix PL/pgSQL RENAME to work on variables other than OLD/NEW o Improve PL/PgSQL exception handling o Allow parameters to be specified by name and type during definition o Allow function parameters to be passed by name, get_employee_salary(emp_id => 12345, tax_year => 2001) o Add PL/PgSQL packages o Add table function support to pltcl, plperl, plpython o Allow PL/pgSQL to name columns by ordinal position, e.g. rec.(3) o Allow PL/pgSQL EXECUTE query_var INTO record_var; o Add capability to create and call PROCEDURES o Allow PL/pgSQL to handle %TYPE arrays, e.g. tab.col%TYPE[] Clients ======= * Add XML capability to pg_dump and COPY, when backend XML capability * -Allow psql \du to show users, and add \dg for groups * Allow clients to query a list of WITH HOLD cursors and prepared statements * Add a libpq function to support Parse/DescribeStatement capability * Prevent libpq's PQfnumber() from lowercasing the column name * -Allow pg_dump to dump CREATE CONVERSION (Christopher) * Allow libpq to return information about prepared queries * -Make pg_restore continue after errors, so it acts more like pg_dump scripts * Have psql show more information about sequences * Allow pg_dumpall to use non-text output formats * Have pg_dump use multi-statement transactions for INSERT dumps * Move psql backslash database information into the backend, use mnemonic commands? [psql] * Allow pg_dump to use multiple -t and -n switches * ECPG o Docs o Implement set descriptor, using descriptor o Solve cardinality > 1 for input descriptors / variables o Improve error handling o Add a semantic check level, e.g. check if a table really exists o fix handling of DB attributes that are arrays o Use backend PREPARE/EXECUTE facility for ecpg where possible o Implement SQLDA o Fix nested C comments o sqlwarn[6] should be 'W' if the PRECISION or SCALE value specified o Make SET CONNECTION thread-aware, non-standard? o Allow multidimensional arrays Referential Integrity ===================== * Add MATCH PARTIAL referential integrity * Add deferred trigger queue file (Jan) * Implement dirty reads or shared row locks and use them in RI triggers * Enforce referential integrity for system tables * Change foreign key constraint for array -> element to mean element in array * Allow DEFERRABLE UNIQUE constraints * Allow triggers to be disabled [trigger] * With disabled triggers, allow pg_dump to use ALTER TABLE ADD FOREIGN KEY * Allow statement-level triggers to access modified rows * Support triggers on columns (Neil) * Have AFTER triggers execute after the appropriate SQL statement in a function, not at the end of the function * -Print table names with constraint names in error messages, or make constraint names unique within a schema * -Issue NOTICE if foreign key data requires costly test to match primary key * Remove CREATE CONSTRAINT TRIGGER * Allow AFTER triggers on system tables Dependency Checking =================== * Flush cached query plans when their underlying catalog data changes * -Use dependency information to dump data in proper order * -Have pg_dump -c clear the database using dependency information Exotic Features =============== * Add SQL99 WITH clause to SELECT (Tom, Fernando) * Add SQL99 WITH RECURSIVE to SELECT (Tom, Fernando) * Add pre-parsing phase that converts non-ANSI features to supported features * Allow plug-in modules to emulate features from other databases * SQL*Net listener that makes PostgreSQL appear as an Oracle database to clients * Add two-phase commit to all distributed transactions with offline/readonly server status or administrator notification for failure * Allow cross-db queries with transaction semantics PERFORMANCE =========== Fsync ===== * Delay fsync() when other backends are about to commit too o Determine optimal commit_delay value * Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options o Allow multiple blocks to be written to WAL with one write() Cache ===== * Shared catalog cache, reduce lseek()'s by caching table size in shared area * Add free-behind capability for large sequential scans [fadvise] * Consider use of open/fcntl(O_DIRECT) to minimize OS caching * Cache last known per-tuple offsets to speed long tuple access, adjusting for NULLs and TOAST values * Use a fixed row count and a +/- count with MVCC visibility rules to allow fast COUNT(*) queries with no WHERE clause(?) [count] Vacuum ====== * Improve speed with indexes (perhaps recreate index instead) * Reduce lock time by moving tuples with read lock, then write lock and truncate table * Provide automatic running of vacuum in the background in backend rather than in /contrib * Allow free space map to be auto-sized or warn when it is too small * Maintain a map of recently-expired of pages so vacuum can reclaim free space without a sequential scan * Have VACUUM FULL use REINDEX rather than index vacuum Locking ======= * Make locking of shared data structures more fine-grained * Add code to detect an SMP machine and handle spinlocks accordingly from distributted.net, http://www1.distributed.net/source, in client/common/cpucheck.cpp * Research use of sched_yield() for spinlock acquisition failure Startup Time ============ * Experiment with multi-threaded backend [thread] * Add connection pooling [pool] * Allow persistent backends [pool] * Create a transaction processor to aid in persistent connections and connection pooling [pool] * Do listen() in postmaster and accept() in pre-forked backend * Have pre-forked backend pre-connect to last requested database or pass file descriptor to backend pre-forked for matching database Write-Ahead Log =============== * Have after-change WAL write()'s write only modified data to kernel * Reduce number of after-change WAL writes; they exist only to gaurd against partial page writes [wal] * Turn off after-change writes if fsync is disabled (?) * Add WAL index reliability improvement to non-btree indexes * Find proper defaults for postgresql.conf WAL entries * Allow xlog directory location to be specified during initdb, perhaps using symlinks * Allow WAL information to recover corrupted pg_controldata * Find a way to reduce rotational delay when repeatedly writing last WAL page Optimizer / Executor ==================== * Missing optimizer selectivities for date, r-tree, etc * Allow ORDER BY ... LIMIT to select top values without sort or index using a sequential scan for highest/lowest values (Oleg) * Precompile SQL functions to avoid overhead (Neil) * Add utility to compute accurate random_page_cost value * Improve ability to display optimizer analysis using OPTIMIZER_DEBUG * Use CHECK constraints to improve optimizer decisions * Check GUC geqo_threshold to see if it is still accurate * Allow sorting, temp files, temp tables to use multiple work directories * Improve the planner to use CHECK constraints to prune the plan (for subtables) * Have EXPLAIN ANALYZE highlight poor optimizer estimates Miscellaneous ============= * Do async I/O for faster random read-ahead of data * Use mmap() rather than SYSV shared memory or to write WAL files (?) [mmap] * Improve caching of attribute offsets when NULLs exist in the row * Add a script to ask system configuration questions and tune postgresql.conf * Allow partitioning of table into multiple subtables * -Use background process to write dirty shared buffers to disk * Investigate SMP context switching issues Source Code =========== * Add use of 'const' for variables in source tree * Rename some /contrib modules from pg* to pg_* * Move some things from /contrib into main tree * Remove warnings created by -Wcast-align * Move platform-specific ps status display info from ps_status.c to ports * Improve access-permissions check on data directory in Cygwin (Tom) * Add documentation for perl, including mention of DBI/DBD perl location * Create improved PostgreSQL introductory documentation for the PHP manuals * Add optional CRC checksum to heap and index pages * -Change representation of whole-tuple parameters to functions * Clarify use of 'application' and 'command' tags in SGML docs * Better document ability to build only certain interfaces (Marc) * Remove or relicense modules that are not under the BSD license, if possible * Remove memory/file descriptor freeing before ereport(ERROR) (Bruce) * Acquire lock on a relation before building a relcache entry for it * Research interaction of setitimer() and sleep() used by statement_timeout * -Add checks for fclose() failure (Tom) * -Change CVS ID to PostgreSQL * -Exit postmaster if postgresql.conf can not be opened * Rename /scripts directory because they are all C programs now * Allow creation of a libpq-only tarball * Promote debug_query_string into a server-side function current_query() * Allow the identifier length to be increased via a configure option * Improve CREATE SCHEMA regression test * Allow binaries to be statically linked so they are more easily relocated * Wire Protocol Changes o Dynamic character set handling o Add decoded type, length, precision o Compression? o Update clients to use data types, typmod, schema.table.column names of result sets using new query protocol --------------------------------------------------------------------------- Developers who have claimed items are: -------------------------------------- * Alvaro is Alvaro Herrera * Barry is Barry Lind * Billy is Billy G. Allie * Bruce is Bruce Momjian of Software Research Assoc. * Christopher is Christopher Kings-Lynne of Family Health Network * D'Arcy is D'Arcy J.M. Cain of The Cain Gang Ltd. * Dave is Dave Cramer * Edmund is Edmund Mergl * Fernando is Fernando Nasser of Red Hat * Gavin is Gavin Sherry of Alcove Systems Engineering * Greg is Greg Sabino Mullane * Hiroshi is Hiroshi Inoue * Karel is Karel Zak * Jan is Jan Wieck of Afilias, Inc. * Joe is Joe Conway * Liam is Liam Stewart of Red Hat * Marc is Marc Fournier of PostgreSQL, Inc. * Mark is Mark Hollomon * Michael is Michael Meskes of Credativ * Neil is Neil Conway * Oleg is Oleg Bartunov * Peter M is Peter T Mount of Retep Software * Peter E is Peter Eisentraut * Philip is Philip Warner of Albatross Consulting Pty. Ltd. * Rod is Rod Taylor * Ross is Ross J. Reedstrom * Stephan is Stephan Szabo * Tatsuo is Tatsuo Ishii of Software Research Assoc. * Thomas is Thomas Lockhart of Jet Propulsion Labratory * Tom is Tom Lane of Red Hat