postgres/doc/TODO

TODO list for PostgreSQL
========================
Last updated:		Sun May 23 23:22:58 EDT 2004

Current maintainer:	Bruce Momjian (pgman@candle.pha.pa.us)

The most recent version of this document can be viewed at
the PostgreSQL web site, http://www.PostgreSQL.org.

A dash (-) marks changes that will appear in the upcoming 7.5 release.

Bracketed items "[]" have more detail.


Urgent
======

* Add replication of distributed databases
	o Automatic failover
	o Load balancing
	o Master/slave replication
	o Multi-master replication
	o Partition data across servers
	o Queries across databases or servers (two-phase commit)
	o Allow replication over unreliable or non-persistent links
* Point-in-time data recovery using backup and write-ahead log,
  http://momjian.postgresql.org/main/writings/pgsql/project/pitr.html
* Create native Win32 port, http://momjian.postgresql.org/main/writings/pgsql/project/win32.html


Administration
==============

* Incremental backups
* Remove behavior of postmaster -o after making postmaster/postgres
  flags unique
* Allow configuration files to be specified in a different directory
* Allow limits on per-db/user connections
* Add group object ownership, so groups can rename/drop/grant on objects,
  so we can implement roles
* Add the concept of dataspaces/tablespaces [tablespaces] (Gavin)
* -Allow logging of only data definition(DDL), or DDL and modification statements
* -Allow log lines to include session-level information, like database and user
* Allow server log information to be output as INSERT statements
* Prevent default re-use of sysids for dropped users and groups
* Prevent dropping user that still owns objects, or auto-drop the objects
* Allow pooled connections to query prepared queries
* Allow pooled connections to close all open WITH HOLD cursors
* Allow major upgrades without dump/reload, perhaps using pg_upgrade
* Have SHOW ALL and pg_settings show descriptions for server-side variables(Joe)
* Allow external interfaces to extend the GUC variable set
* Allow GRANT/REVOKE permissions to be given to all schema objects with one command


Data Types
==========

* Remove Money type, add money formatting for decimal type
* -Change factorial to return a numeric (Gavin)
* Change NUMERIC to enforce the maximum precision, and increase it
* Add function to return compressed length of TOAST data values (Tom)
* Allow INET subnet tests using non-constants to be indexed
* Add transaction_timestamp(), statement_timestamp(), clock_timestamp() functionality
* Have sequence dependency track use of DEFAULT sequences, seqname.nextval
* Disallow changing default expression of a SERIAL column
* Allow infinite dates just like infinite timestamps
* -Allow pg_dump to dump sequences using NO_MAXVALUE and NO_MINVALUE
* Allow backend to output result sets in XML
* -Prevent whole-row references from leaking memory, e.g. SELECT COUNT(tab.*)
* Have initdb set DateStyle based on locale?
* Add pg_get_acldef(), pg_get_typedefault(), and pg_get_attrdef()
* Add ALTER DOMAIN, AGGREGATE, CONVERSION, SEQUENCE ... OWNER TO
* Allow to_char to print localized month names (Karel)
* Allow functions to have a search path specified at creation time
* -Make LENGTH() of CHAR() not count trailing spaces
* Allow substring/replace() to get/set bit values
* Add GUC variable to allow output of interval values in ISO8601 format
* Support composite types as table columns


* ARRAYS
	o Allow nulls in arrays
	o Allow MIN()/MAX() on arrays
	o Delay resolution of array expression type so assignment coercion
	  can be performed on empty array expressions (Joe)
	o Modify array literal representation to handle array index lower bound
	  of other than one


* BINARY DATA
	o Improve vacuum of large objects, like /contrib/vacuumlo
	o Add security checking for large objects
	o Make file in/out interface for TOAST columns, similar to large object
	  interface (force out-of-line storage and no compression)
	o Auto-delete large objects when referencing row is deleted


Multi-Language Support
======================

* Add NCHAR (as distinguished from ordinary varchar),
* Allow locale to be set at database creation
* Allow locale on a per-column basis, default to ASCII
* Optimize locale to have minimal performance impact when not used (Peter E)
* Support multiple simultaneous character sets, per SQL92
* Improve Unicode combined character handling
* Add octet_length_server() and octet_length_client() (Thomas, Tatsuo)
* Make octet_length_client the same as octet_length() (?)
* Prevent mismatch of frontend/backend encodings from converting bytea
  data from being interpreted as encoded strings
* Fix upper()/lower() to work for multibyte encodings


Views / Rules
=============

* Automatically create rules on views so they are updateable, per SQL92 [view]
* Add the functionality for WITH CHECK OPTION clause of CREATE VIEW
* Allow NOTIFY in rules involving conditionals
* Have views on temporary tables exist in the temporary namespace
* Allow temporary views on non-temporary tables
* Allow RULE recompilation


Indexes
=======

* -Order duplicate index entries on creation by tid for faster heap lookups
* Allow inherited tables to inherit index, UNIQUE constraint, and primary
  key, foreign key  [inheritance]
* UNIQUE INDEX on base column not honored on inserts from inherited table
  INSERT INTO inherit_table (unique_index_col) VALUES (dup) should fail
  [inheritance]
* Add UNIQUE capability to non-btree indexes
* Add rtree index support for line, lseg, path, point
* Use indexes for min() and max() or convert to SELECT col FROM tab ORDER
  BY col DESC LIMIT 1 if appropriate index exists and WHERE clause acceptible
* Use index to restrict rows returned by multi-key index when used with
  non-consecutive keys or OR clauses, so fewer heap accesses
* Be smarter about insertion of already-ordered data into btree index
* Prevent index uniqueness checks when UPDATE does not modify the column
* Use bitmaps to fetch heap pages in sequential order [performance]
* Use bitmaps to combine existing indexes [performance]
* Allow use of indexes to search for NULLs
* -Allow SELECT * FROM tab WHERE int2col = 4 to use int2col index, int8,
  float4, numeric/decimal too
* Add FILLFACTOR to btree index creation
* Add concurrency to GIST
* Allow a single index to index multiple tables (for inheritance and subtables)


Commands
========

* Add BETWEEN ASYMMETRIC/SYMMETRIC (Christopher)
* Change LIMIT/OFFSET to use int8
* CREATE TABLE AS can not determine column lengths from expressions [atttypmod]
* Allow UPDATE to handle complex aggregates [update]
* Allow command blocks to ignore certain types of errors
* Allow backslash handling in quoted strings to be disabled for portability
* Allow UPDATE, DELETE to handle table aliases for self-joins [delete]
* Add CORRESPONDING BY to UNION/INTERSECT/EXCEPT
* Allow REINDEX to rebuild all indexes, remove /contrib/reindex
* Add ROLLUP, CUBE, GROUPING SETS options to GROUP BY
* Add schema option to createlang
* Allow savepoints / nested transactions [transactions] (Alvaro)
* Use nested transactions to prevent syntax errors from aborting a transaction
* Allow UPDATE tab SET ROW (col, ...) = (...) for updating multiple columns
* Allow SET CONSTRAINTS to be qualified by schema/table
* Prevent COMMENT ON DATABASE from using a database name
* -Add NO WAIT LOCKs
* Allow TRUNCATE ... CASCADE/RESTRICT
* Allow PREPARE of cursors
* Allow LISTEN/NOTIFY to store info in memory rather than tables
* -COMMENT ON [ CAST | CONVERSION | OPERATOR CLASS | LARGE OBJECT | LANGUAGE ]
  (Christopher)
* Dump large object comments in custom dump format
* Add optional textual message to NOTIFY
* -Allow more ISOLATION LEVELS to be accepted
* Allow CREATE TABLE foo (f1 INT CHECK (f1 > 0) CHECK (f1 < 10)) to work
  by searching for non-conflicting constraint names, and prefix with table name
* Use more reliable method for CREATE DATABASE to get a consistent copy of db
* -Have psql \dn show only visible temp schemas using current_schemas()
* -Have psql '\i ~/<tab><tab>' actually load files it displays from home dir
* Ignore temporary tables from other session when processing inheritance
* -Add GUC setting to make created tables default to WITHOUT OIDS
* Have pg_ctl look at PGHOST in case it is a socket directory
* Allow column-level privileges
* Add a session mode to warn about non-standard SQL usage
* Add MERGE command that does UPDATE/DELETE, or on failure, INSERT (rules, triggers?)

* ALTER
	o -ALTER TABLE ADD COLUMN does not honor DEFAULT and non-CHECK CONSTRAINT
	o -ALTER TABLE ADD COLUMN column DEFAULT should fill existing
	  rows with DEFAULT value
	o -ALTER TABLE ADD COLUMN column SERIAL doesn't create sequence because
          of the item above
	o Have ALTER TABLE rename SERIAL sequences
	o -Allow ALTER TABLE to modify column lengths and change to binary
	  compatible types
	o Add ALTER DATABASE ... OWNER TO newowner
	o Add ALTER DOMAIN TYPE
	o Allow ALTER TABLE ... ALTER CONSTRAINT ... RENAME
	o Allow ALTER TABLE to change constraint deferrability and actions
	o Disallow dropping of an inherited constraint
	o Allow the schema of objects to be changed

* CLUSTER
	o Automatically maintain clustering on a table
	o Add ALTER TABLE table SET WITHOUT CLUSTER (Christopher)
	o Add default clustering to system tables

* COPY
	o -Allow dump/load of CSV format
	o Allow COPY to report error lines and continue;  optionally
	  allow error codes to be specified; requires savepoints or can
	  not be run in a multi-statement transaction
	o Allow COPY to understand \x as hex
	o Have COPY return number of rows loaded/unloaded

* CURSOR
	o Allow UPDATE/DELETE WHERE CURRENT OF cursor using per-cursor tid
	  stored in the backend (Gavin)
	o Prevent DROP of table being referenced by our own open cursor

* INSERT
	o Allow INSERT/UPDATE of system-generated oid value for a row
	o Allow INSERT INTO tab (col1, ..) VALUES (val1, ..), (val2, ..)
	o Allow INSERT/UPDATE ... RETURNING new.col or old.col; handle
	  RULE cases (Philip)

* SHOW/SET
	o Add SET PERFORMANCE_TIPS option to suggest INDEX, VACUUM, VACUUM
	  ANALYZE, and CLUSTER
	o Add SET PATH for schemas

* SERVER-SIDE LANGUAGES
	o Allow PL/PgSQL's RAISE function to take expressions
	o Change PL/PgSQL to use palloc() instead of malloc()
	o Allow Java server-side programming [java] (Dave)
	o Fix problems with complex temporary table creation/destruction
	  without using PL/PgSQL EXECUTE, needs cache prevention/invalidation
        o Fix PL/pgSQL RENAME to work on variables other than OLD/NEW
	o Improve PL/PgSQL exception handling
	o Allow parameters to be specified by name and type during definition
	o Allow function parameters to be passed by name,
	  get_employee_salary(emp_id => 12345, tax_year => 2001)
	o Add PL/PgSQL packages
	o Add table function support to pltcl, plperl, plpython
	o Allow PL/pgSQL to name columns by ordinal position, e.g. rec.(3)
	o Allow PL/pgSQL EXECUTE query_var INTO record_var;
	o Add capability to create and call PROCEDURES
	o Allow PL/pgSQL to handle %TYPE arrays, e.g. tab.col%TYPE[]


Clients
=======

* Add XML capability to pg_dump and COPY, when backend XML capability
* -Allow psql \du to show users, and add \dg for groups
* Allow clients to query a list of WITH HOLD cursors and prepared statements
* Add a libpq function to support Parse/DescribeStatement capability
* Prevent libpq's PQfnumber() from lowercasing the column name
* -Allow pg_dump to dump CREATE CONVERSION (Christopher)
* Allow libpq to return information about prepared queries
* -Make pg_restore continue after errors, so it acts more like pg_dump scripts
* Have psql show more information about sequences
* Allow pg_dumpall to use non-text output formats
* Have pg_dump use multi-statement transactions for INSERT dumps
* Move psql backslash database information into the backend, use mnemonic
  commands? [psql]


* ECPG
	o Docs
	o Implement set descriptor, using descriptor
	o Solve cardinality > 1 for input descriptors / variables
	o Improve error handling
	o Add a semantic check level, e.g. check if a table really exists
	o fix handling of DB attributes that are arrays
	o Use backend PREPARE/EXECUTE facility for ecpg where possible
	o Implement SQLDA
	o Fix nested C comments
	o sqlwarn[6] should be 'W' if the PRECISION or SCALE value specified
	o Make SET CONNECTION thread-aware, non-standard?
	o Allow multidimensional arrays


Referential Integrity
=====================

* Add MATCH PARTIAL referential integrity
* Add deferred trigger queue file (Jan)
* Implement dirty reads or shared row locks and use them in RI triggers
* Enforce referential integrity for system tables
* Change foreign key constraint for array -> element to mean element
  in array
* Allow DEFERRABLE UNIQUE constraints
* Allow triggers to be disabled [trigger]
* With disabled triggers, allow pg_dump to use ALTER TABLE ADD FOREIGN KEY
* Allow statement-level triggers to access modified rows
* Support triggers on columns (Neil)
* Have AFTER triggers execute after the appropriate SQL statement in a
  function, not at the end of the function
* Print table names with constraint names in error messages, or make constraint
  names unique within a schema
* -Issue NOTICE if foreign key data requires costly test to match primary key
* Remove CREATE CONSTRAINT TRIGGER
* Allow AFTER triggers on system tables


Dependency Checking
===================

* Flush cached query plans when their underlying catalog data changes
* -Use dependency information to dump data in proper order
* -Have pg_dump -c clear the database using dependency information


Exotic Features
===============

* Add SQL99 WITH clause to SELECT (Tom, Fernando)
* Add SQL99 WITH RECURSIVE to SELECT (Tom, Fernando)
* Add pre-parsing phase that converts non-ANSI features to supported features
* Allow plug-in modules to emulate features from other databases
* SQL*Net listener that makes PostgreSQL appear as an Oracle database
  to clients
* Add two-phase commit to all distributed transactions with
  offline/readonly server status or administrator notification for failure
* Allow cross-db queries with transaction semantics


PERFORMANCE
===========


Fsync
=====

* Delay fsync() when other backends are about to commit too
	o Determine optimal commit_delay value
* Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options
	o Allow multiple blocks to be written to WAL with one write()


Cache
=====
* Shared catalog cache, reduce lseek()'s by caching table size in shared area
* Add free-behind capability for large sequential scans [fadvise]
* Consider use of open/fcntl(O_DIRECT) to minimize OS caching
* Cache last known per-tuple offsets to speed long tuple access, adjusting
  for NULLs and TOAST values
* Use a fixed row count and a +/- count with MVCC visibility rules
  to allow fast COUNT(*) queries with no WHERE clause(?) [count]


Vacuum
======

* Improve speed with indexes (perhaps recreate index instead)
* Reduce lock time by moving tuples with read lock, then write
  lock and truncate table
* Provide automatic running of vacuum in the background in backend
  rather than in /contrib
* Allow free space map to be auto-sized or warn when it is too small
* Maintain a map of recently-expired of pages so vacuum can reclaim
  free space without a sequential scan
* Have VACUUM FULL use REINDEX rather than index vacuum


Locking
=======

* Make locking of shared data structures more fine-grained
* Add code to detect an SMP machine and handle spinlocks accordingly
  from distributted.net, http://www1.distributed.net/source,
  in client/common/cpucheck.cpp
* Research use of sched_yield() for spinlock acquisition failure


Startup Time
============

* Experiment with multi-threaded backend [thread]
* Add connection pooling [pool]
* Allow persistent backends [pool]
* Create a transaction processor to aid in persistent connections and
  connection pooling [pool]
* Do listen() in postmaster and accept() in pre-forked backend
* Have pre-forked backend pre-connect to last requested database or pass
  file descriptor to backend pre-forked for matching database


Write-Ahead Log
===============

* Have after-change WAL write()'s write only modified data to kernel
* Reduce number of after-change WAL writes; they exist only to gaurd against
  partial page writes [wal]
* Turn off after-change writes if fsync is disabled (?)
* Add WAL index reliability improvement to non-btree indexes
* Find proper defaults for postgresql.conf WAL entries
* Allow xlog directory location to be specified during initdb, perhaps
  using symlinks
* Allow WAL information to recover corrupted pg_controldata
* Find a way to reduce rotational delay when repeatedly writing
  last WAL page


Optimizer / Executor
====================

* Missing optimizer selectivities for date, r-tree, etc
* Allow ORDER BY ... LIMIT to select top values without sort or index
  using a sequential scan for highest/lowest values (Oleg)
* Precompile SQL functions to avoid overhead (Neil)
* Add utility to compute accurate random_page_cost value
* Improve ability to display optimizer analysis using OPTIMIZER_DEBUG
* Use CHECK constraints to improve optimizer decisions
* Check GUC geqo_threshold to see if it is still accurate
* Allow sorting, temp files, temp tables to use multiple work directories
* Improve the planner to use CHECK constraints to prune the plan (for subtables)
* Have EXPLAIN ANALYZE highlight poor optimizer estimates


Miscellaneous
=============

* Do async I/O for faster random read-ahead of data
* Use mmap() rather than SYSV shared memory or to write WAL files (?) [mmap]
* Improve caching of attribute offsets when NULLs exist in the row
* Add a script to ask system configuration questions and tune postgresql.conf
* Allow partitioning of table into multiple subtables
* -Use background process to write dirty shared buffers to disk
* Investigate SMP context switching issues


Source Code
===========

* Add use of 'const' for variables in source tree
* Rename some /contrib modules from pg* to pg_*
* Move some things from /contrib into main tree
* Remove warnings created by -Wcast-align
* Move platform-specific ps status display info from ps_status.c to ports
* Improve access-permissions check on data directory in Cygwin (Tom)
* Add documentation for perl, including mention of DBI/DBD perl location
* Create improved PostgreSQL introductory documentation for the PHP
  manuals
* Add optional CRC checksum to heap and index pages
* -Change representation of whole-tuple parameters to functions
* Clarify use of 'application' and 'command' tags in SGML docs
* Better document ability to build only certain interfaces (Marc)
* Remove or relicense modules that are not under the BSD license, if possible
* Remove memory/file descriptor freeing before ereport(ERROR)  (Bruce)
* Acquire lock on a relation before building a relcache entry for it
* Research interaction of setitimer() and sleep() used by statement_timeout
* -Add checks for fclose() failure (Tom)
* -Change CVS ID to PostgreSQL
* -Exit postmaster if postgresql.conf can not be opened
* Rename /scripts directory because they are all C programs now
* Allow creation of a libpq-only tarball
* Promote debug_query_string into a server-side function current_query()
* Allow the identifier length to be increased via a configure option
* Improve CREATE SCHEMA regression test
* Allow binaries to be statically linked so they are more easily relocated


* Wire Protocol Changes
	o Dynamic character set handling
	o Add decoded type, length, precision
	o Compression?
	o Update clients to use data types, typmod, schema.table.column names of
	  result sets using new query protocol

---------------------------------------------------------------------------


Developers who have claimed items are:
--------------------------------------
* Alvaro is Alvaro Herrera <alvherre@dcc.uchile.cl>
* Barry is Barry Lind <barry@xythos.com>
* Billy is Billy G. Allie <Bill.Allie@mug.org>
* Bruce is Bruce Momjian <pgman@candle.pha.pa.us> of Software Research Assoc.
* Christopher is Christopher Kings-Lynne <chriskl@familyhealth.com.au> of
    Family Health Network
* D'Arcy is D'Arcy J.M. Cain <darcy@druid.net> of The Cain Gang Ltd.
* Dave is Dave Cramer <dave@fastcrypt.com>
* Edmund is Edmund Mergl <E.Mergl@bawue.de>
* Fernando is Fernando Nasser <fnasser@redhat.com> of Red Hat
* Gavin is Gavin Sherry <swm@linuxworld.com.au> of Alcove Systems Engineering
* Greg is Greg Sabino Mullane <greg@turnstep.com>
* Hiroshi is Hiroshi Inoue <Inoue@tpf.co.jp>
* Karel is Karel Zak <zakkr@zf.jcu.cz>
* Jan is Jan Wieck <JanWieck@Yahoo.com> of Afilias, Inc.
* Joe is Joe Conway <mail@joeconway.com>
* Liam is Liam Stewart <liams@redhat.com> of Red Hat
* Marc is Marc Fournier <scrappy@hub.org> of PostgreSQL, Inc.
* Mark is Mark Hollomon <mhh@mindspring.com>
* Michael is Michael Meskes <meskes@postgresql.org> of Credativ
* Neil is Neil Conway <neilc@samurai.com>
* Oleg is Oleg Bartunov <oleg@sai.msu.su>
* Peter M is Peter T Mount <peter@retep.org.uk> of Retep Software
* Peter E is Peter Eisentraut <peter_e@gmx.net>
* Philip is Philip Warner <pjw@rhyme.com.au> of Albatross Consulting Pty. Ltd.
* Rod is Rod Taylor <pg@rbt.ca>
* Ross is Ross J. Reedstrom <reedstrm@wallace.ece.rice.edu>
* Stephan is Stephan Szabo <sszabo@megazone23.bigpanda.com>
* Tatsuo is Tatsuo Ishii <t-ishii@sra.co.jp> of Software Research Assoc.
* Thomas is Thomas Lockhart <lockhart@fourpalms.org> of Jet Propulsion Labratory
* Tom is Tom Lane <tgl@sss.pgh.pa.us> of Red Hat