Update FAQ_DEV.
This commit is contained in:
parent
198152730b
commit
e0de8d9821
639
doc/FAQ_DEV
639
doc/FAQ_DEV
@ -9,31 +9,146 @@
|
|||||||
postgreSQL Web site, http://www.PostgreSQL.org.
|
postgreSQL Web site, http://www.PostgreSQL.org.
|
||||||
_________________________________________________________________
|
_________________________________________________________________
|
||||||
|
|
||||||
Questions
|
General Questions
|
||||||
|
|
||||||
1) What tools are available for developers?
|
1.1) How do I get involved in PostgreSQL development?
|
||||||
2) What books are good for developers?
|
1.2) How do I add a feature or fix a bug?
|
||||||
3) Why do we use palloc() and pfree() to allocate memory?
|
1.3) How do I download/update the current source tree?
|
||||||
4) Why do we use Node and List to make data structures?
|
1.4) How do I test my changes?
|
||||||
5) How do I add a feature or fix a bug?
|
1.5) What tools are available for developers?
|
||||||
6) How do I download/update the current source tree?
|
1.6) What books are good for developers?
|
||||||
7) How do I test my changes?
|
1.7) What is configure all about?
|
||||||
7) I just added a field to a structure. What else should I do?
|
1.8) How do I add a new port?
|
||||||
8) Why are table, column, type, function, view names sometimes
|
1.9) Why don't we use threads in the backend?
|
||||||
|
1.10) How are RPM's packaged?
|
||||||
|
1.11) How are CVS branches handled?
|
||||||
|
|
||||||
|
Technical Questions
|
||||||
|
|
||||||
|
2.1) How do I efficiently access information in tables from the
|
||||||
|
backend code?
|
||||||
|
2.2) Why are table, column, type, function, view names sometimes
|
||||||
referenced as Name or NameData, and sometimes as char *?
|
referenced as Name or NameData, and sometimes as char *?
|
||||||
9) How do I efficiently access information in tables from the backend
|
2.3) Why do we use Node and List to make data structures?
|
||||||
code?
|
2.4) I just added a field to a structure. What else should I do?
|
||||||
10) What is elog()?
|
2.5) Why do we use palloc() and pfree() to allocate memory?
|
||||||
11) What is configure all about?
|
2.6) What is elog()?
|
||||||
12) How do I add a new port?
|
2.7) What is CommandCounterIncrement()?
|
||||||
13) What is CommandCounterIncrement()?
|
|
||||||
14) Why don't we use threads in the backend?
|
|
||||||
15) How are RPM's packaged?
|
|
||||||
16) How are CVS branches handled?
|
|
||||||
17) How do I get involved in PostgreSQL development?
|
|
||||||
_________________________________________________________________
|
_________________________________________________________________
|
||||||
|
|
||||||
1) What tools are available for developers?
|
General Questions
|
||||||
|
|
||||||
|
1.1) How go I get involved in PostgreSQL development?
|
||||||
|
|
||||||
|
This was written by Lamar Owen:
|
||||||
|
|
||||||
|
2001-06-22
|
||||||
|
What open source development process is used by the PostgreSQL team?
|
||||||
|
|
||||||
|
Read HACKERS for six months (or a full release cycle, whichever is
|
||||||
|
longer). Really. HACKERS _is_the process. The process is not well
|
||||||
|
documented (AFAIK -- it may be somewhere that I am not aware of) --
|
||||||
|
and it changes continually.
|
||||||
|
What development environment (OS, system, compilers, etc) is required
|
||||||
|
to develop code?
|
||||||
|
|
||||||
|
Developers Corner on the website has links to this information. The
|
||||||
|
distribution tarball itself includes all the extra tools and documents
|
||||||
|
that go beyond a good Unix-like development environment. In general, a
|
||||||
|
modern unix with a modern gcc, GNU make or equivalent, autoconf (of a
|
||||||
|
particular version), and good working knowledge of those tools are
|
||||||
|
required.
|
||||||
|
What areas need support?
|
||||||
|
|
||||||
|
The TODO list.
|
||||||
|
|
||||||
|
You've made the first step, by finding and subscribing to HACKERS.
|
||||||
|
Once you find an area to look at in the TODO, and have read the
|
||||||
|
documentation on the internals, etc, then you check out a current
|
||||||
|
CVS,write what you are going to write (keeping your CVS checkout up to
|
||||||
|
date in the process), and make up a patch (as a context diff only) and
|
||||||
|
send to the PATCHES list, prefereably.
|
||||||
|
|
||||||
|
Discussion on the patch typically happens here. If the patch adds a
|
||||||
|
major feature, it would be a good idea to talk about it first on the
|
||||||
|
HACKERS list, in order to increase the chances of it being accepted,
|
||||||
|
as well as toavoid duplication of effort. Note that experienced
|
||||||
|
developers with a proven track record usually get the big jobs -- for
|
||||||
|
more than one reason. Also note that PostgreSQL is highly portable --
|
||||||
|
nonportable code will likely be dismissed out of hand.
|
||||||
|
|
||||||
|
Once your contributions get accepted, things move from there.
|
||||||
|
Typically, you would be added as a developer on the list on the
|
||||||
|
website when one of the other developers recommends it. Membership on
|
||||||
|
the steering committee is by invitation only, by the other steering
|
||||||
|
committee members, from what I have gathered watching froma distance.
|
||||||
|
|
||||||
|
I make these statements from having watched the process for over two
|
||||||
|
years.
|
||||||
|
|
||||||
|
To see a good example of how one goes about this, search the archives
|
||||||
|
for the name 'Tom Lane' and see what his first post consisted of, and
|
||||||
|
where he took things. In particular, note that this hasn't been _that_
|
||||||
|
long ago -- and his bugfixing and general deep knowledge with this
|
||||||
|
codebase is legendary. Take a few days to read after him. And pay
|
||||||
|
special attention to both the sheer quantity as well as the
|
||||||
|
painstaking quality of his work. Both are in high demand.
|
||||||
|
|
||||||
|
1.2) How do I add a feature or fix a bug?
|
||||||
|
|
||||||
|
The source code is over 250,000 lines. Many problems/features are
|
||||||
|
isolated to one specific area of the code. Others require knowledge of
|
||||||
|
much of the source. If you are confused about where to start, ask the
|
||||||
|
hackers list, and they will be glad to assess the complexity and give
|
||||||
|
pointers on where to start.
|
||||||
|
|
||||||
|
Another thing to keep in mind is that many fixes and features can be
|
||||||
|
added with surprisingly little code. I often start by adding code,
|
||||||
|
then looking at other areas in the code where similar things are done,
|
||||||
|
and by the time I am finished, the patch is quite small and compact.
|
||||||
|
|
||||||
|
When adding code, keep in mind that it should use the existing
|
||||||
|
facilities in the source, for performance reasons and for simplicity.
|
||||||
|
Often a review of existing code doing similar things is helpful.
|
||||||
|
|
||||||
|
1.3) How do I download/update the current source tree?
|
||||||
|
|
||||||
|
There are several ways to obtain the source tree. Occasional
|
||||||
|
developers can just get the most recent source tree snapshot from
|
||||||
|
ftp.postgresql.org. For regular developers, you can use CVS. CVS
|
||||||
|
allows you to download the source tree, then occasionally update your
|
||||||
|
copy of the source tree with any new changes. Using CVS, you don't
|
||||||
|
have to download the entire source each time, only the changed files.
|
||||||
|
Anonymous CVS does not allows developers to update the remote source
|
||||||
|
tree, though privileged developers can do this. There is a CVS FAQ on
|
||||||
|
our web site that describes how to use remote CVS. You can also use
|
||||||
|
CVSup, which has similarly functionality, and is available from
|
||||||
|
ftp.postgresql.org.
|
||||||
|
|
||||||
|
To update the source tree, there are two ways. You can generate a
|
||||||
|
patch against your current source tree, perhaps using the make_diff
|
||||||
|
tools mentioned above, and send them to the patches list. They will be
|
||||||
|
reviewed, and applied in a timely manner. If the patch is major, and
|
||||||
|
we are in beta testing, the developers may wait for the final release
|
||||||
|
before applying your patches.
|
||||||
|
|
||||||
|
For hard-core developers, Marc(scrappy@postgresql.org) will give you a
|
||||||
|
Unix shell account on postgresql.org, so you can use CVS to update the
|
||||||
|
main source tree, or you can ftp your files into your account, patch,
|
||||||
|
and cvs install the changes directly into the source tree.
|
||||||
|
|
||||||
|
1.4) How do I test my changes?
|
||||||
|
|
||||||
|
First, use psql to make sure it is working as you expect. Then run
|
||||||
|
src/test/regress and get the output of src/test/regress/checkresults
|
||||||
|
with and without your changes, to see that your patch does not change
|
||||||
|
the regression test in unexpected ways. This practice has saved me
|
||||||
|
many times. The regression tests test the code in ways I would never
|
||||||
|
do, and has caught many bugs in my patches. By finding the problems
|
||||||
|
now, you save yourself a lot of debugging later when things are
|
||||||
|
broken, and you can't figure out when it happened.
|
||||||
|
|
||||||
|
1.5) What tools are available for developers?
|
||||||
|
|
||||||
Aside from the User documentation mentioned in the regular FAQ, there
|
Aside from the User documentation mentioned in the regular FAQ, there
|
||||||
are several development tools available. First, all the files in the
|
are several development tools available. First, all the files in the
|
||||||
@ -141,7 +256,7 @@
|
|||||||
is also a script called unused_oids in pgsql/src/include/catalog that
|
is also a script called unused_oids in pgsql/src/include/catalog that
|
||||||
shows the unused oids.
|
shows the unused oids.
|
||||||
|
|
||||||
2) What books are good for developers?
|
1.6) What books are good for developers?
|
||||||
|
|
||||||
I have four good books, An Introduction to Database Systems, by C.J.
|
I have four good books, An Introduction to Database Systems, by C.J.
|
||||||
Date, Addison, Wesley, A Guide to the SQL Standard, by C.J. Date, et.
|
Date, Addison, Wesley, A Guide to the SQL Standard, by C.J. Date, et.
|
||||||
@ -151,239 +266,7 @@
|
|||||||
There is also a database performance site, with a handbook on-line
|
There is also a database performance site, with a handbook on-line
|
||||||
written by Jim Gray at http://www.benchmarkresources.com.
|
written by Jim Gray at http://www.benchmarkresources.com.
|
||||||
|
|
||||||
3) Why do we use palloc() and pfree() to allocate memory?
|
1.7) What is configure all about?
|
||||||
|
|
||||||
palloc() and pfree() are used in place of malloc() and free() because
|
|
||||||
we automatically free all memory allocated when a transaction
|
|
||||||
completes. This makes it easier to make sure we free memory that gets
|
|
||||||
allocated in one place, but only freed much later. There are several
|
|
||||||
contexts that memory can be allocated in, and this controls when the
|
|
||||||
allocated memory is automatically freed by the backend.
|
|
||||||
|
|
||||||
4) Why do we use Node and List to make data structures?
|
|
||||||
|
|
||||||
We do this because this allows a consistent way to pass data inside
|
|
||||||
the backend in a flexible way. Every node has a NodeTag which
|
|
||||||
specifies what type of data is inside the Node. Lists are groups of
|
|
||||||
Nodes chained together as a forward-linked list.
|
|
||||||
|
|
||||||
Here are some of the List manipulation commands:
|
|
||||||
|
|
||||||
lfirst(i)
|
|
||||||
return the data at list element i.
|
|
||||||
|
|
||||||
lnext(i)
|
|
||||||
return the next list element after i.
|
|
||||||
|
|
||||||
foreach(i, list)
|
|
||||||
loop through list, assigning each list element to i. It is
|
|
||||||
important to note that i is a List *, not the data in the List
|
|
||||||
element. You need to use lfirst(i) to get at the data. Here is
|
|
||||||
a typical code snipped that loops through a List containing Var
|
|
||||||
*'s and processes each one:
|
|
||||||
|
|
||||||
List *i, *list;
|
|
||||||
|
|
||||||
foreach(i, list)
|
|
||||||
{
|
|
||||||
Var *var = lfirst(i);
|
|
||||||
|
|
||||||
/* process var here */
|
|
||||||
}
|
|
||||||
|
|
||||||
lcons(node, list)
|
|
||||||
add node to the front of list, or create a new list with node
|
|
||||||
if list is NIL.
|
|
||||||
|
|
||||||
lappend(list, node)
|
|
||||||
add node to the end of list. This is more expensive that lcons.
|
|
||||||
|
|
||||||
nconc(list1, list2)
|
|
||||||
Concat list2 on to the end of list1.
|
|
||||||
|
|
||||||
length(list)
|
|
||||||
return the length of the list.
|
|
||||||
|
|
||||||
nth(i, list)
|
|
||||||
return the i'th element in list.
|
|
||||||
|
|
||||||
lconsi, ...
|
|
||||||
There are integer versions of these: lconsi, lappendi, nthi.
|
|
||||||
List's containing integers instead of Node pointers are used to
|
|
||||||
hold list of relation object id's and other integer quantities.
|
|
||||||
|
|
||||||
You can print nodes easily inside gdb. First, to disable output
|
|
||||||
truncation when you use the gdb print command:
|
|
||||||
(gdb) set print elements 0
|
|
||||||
|
|
||||||
Instead of printing values in gdb format, you can use the next two
|
|
||||||
commands to print out List, Node, and structure contents in a verbose
|
|
||||||
format that is easier to understand. List's are unrolled into nodes,
|
|
||||||
and nodes are printed in detail. The first prints in a short format,
|
|
||||||
and the second in a long format:
|
|
||||||
(gdb) call print(any_pointer)
|
|
||||||
(gdb) call pprint(any_pointer)
|
|
||||||
|
|
||||||
The output appears in the postmaster log file, or on your screen if
|
|
||||||
you are running a backend directly without a postmaster.
|
|
||||||
|
|
||||||
5) How do I add a feature or fix a bug?
|
|
||||||
|
|
||||||
The source code is over 250,000 lines. Many problems/features are
|
|
||||||
isolated to one specific area of the code. Others require knowledge of
|
|
||||||
much of the source. If you are confused about where to start, ask the
|
|
||||||
hackers list, and they will be glad to assess the complexity and give
|
|
||||||
pointers on where to start.
|
|
||||||
|
|
||||||
Another thing to keep in mind is that many fixes and features can be
|
|
||||||
added with surprisingly little code. I often start by adding code,
|
|
||||||
then looking at other areas in the code where similar things are done,
|
|
||||||
and by the time I am finished, the patch is quite small and compact.
|
|
||||||
|
|
||||||
When adding code, keep in mind that it should use the existing
|
|
||||||
facilities in the source, for performance reasons and for simplicity.
|
|
||||||
Often a review of existing code doing similar things is helpful.
|
|
||||||
|
|
||||||
6) How do I download/update the current source tree?
|
|
||||||
|
|
||||||
There are several ways to obtain the source tree. Occasional
|
|
||||||
developers can just get the most recent source tree snapshot from
|
|
||||||
ftp.postgresql.org. For regular developers, you can use CVS. CVS
|
|
||||||
allows you to download the source tree, then occasionally update your
|
|
||||||
copy of the source tree with any new changes. Using CVS, you don't
|
|
||||||
have to download the entire source each time, only the changed files.
|
|
||||||
Anonymous CVS does not allows developers to update the remote source
|
|
||||||
tree, though privileged developers can do this. There is a CVS FAQ on
|
|
||||||
our web site that describes how to use remote CVS. You can also use
|
|
||||||
CVSup, which has similarly functionality, and is available from
|
|
||||||
ftp.postgresql.org.
|
|
||||||
|
|
||||||
To update the source tree, there are two ways. You can generate a
|
|
||||||
patch against your current source tree, perhaps using the make_diff
|
|
||||||
tools mentioned above, and send them to the patches list. They will be
|
|
||||||
reviewed, and applied in a timely manner. If the patch is major, and
|
|
||||||
we are in beta testing, the developers may wait for the final release
|
|
||||||
before applying your patches.
|
|
||||||
|
|
||||||
For hard-core developers, Marc(scrappy@postgresql.org) will give you a
|
|
||||||
Unix shell account on postgresql.org, so you can use CVS to update the
|
|
||||||
main source tree, or you can ftp your files into your account, patch,
|
|
||||||
and cvs install the changes directly into the source tree.
|
|
||||||
|
|
||||||
6) How do I test my changes?
|
|
||||||
|
|
||||||
First, use psql to make sure it is working as you expect. Then run
|
|
||||||
src/test/regress and get the output of src/test/regress/checkresults
|
|
||||||
with and without your changes, to see that your patch does not change
|
|
||||||
the regression test in unexpected ways. This practice has saved me
|
|
||||||
many times. The regression tests test the code in ways I would never
|
|
||||||
do, and has caught many bugs in my patches. By finding the problems
|
|
||||||
now, you save yourself a lot of debugging later when things are
|
|
||||||
broken, and you can't figure out when it happened.
|
|
||||||
|
|
||||||
7) I just added a field to a structure. What else should I do?
|
|
||||||
|
|
||||||
The structures passing around from the parser, rewrite, optimizer, and
|
|
||||||
executor require quite a bit of support. Most structures have support
|
|
||||||
routines in src/backend/nodes used to create, copy, read, and output
|
|
||||||
those structures. Make sure you add support for your new field to
|
|
||||||
these files. Find any other places the structure may need code for
|
|
||||||
your new field. mkid is helpful with this (see above).
|
|
||||||
|
|
||||||
8) Why are table, column, type, function, view names sometimes referenced as
|
|
||||||
Name or NameData, and sometimes as char *?
|
|
||||||
|
|
||||||
Table, column, type, function, and view names are stored in system
|
|
||||||
tables in columns of type Name. Name is a fixed-length,
|
|
||||||
null-terminated type of NAMEDATALEN bytes. (The default value for
|
|
||||||
NAMEDATALEN is 32 bytes.)
|
|
||||||
typedef struct nameData
|
|
||||||
{
|
|
||||||
char data[NAMEDATALEN];
|
|
||||||
} NameData;
|
|
||||||
typedef NameData *Name;
|
|
||||||
|
|
||||||
Table, column, type, function, and view names that come into the
|
|
||||||
backend via user queries are stored as variable-length,
|
|
||||||
null-terminated character strings.
|
|
||||||
|
|
||||||
Many functions are called with both types of names, ie. heap_open().
|
|
||||||
Because the Name type is null-terminated, it is safe to pass it to a
|
|
||||||
function expecting a char *. Because there are many cases where
|
|
||||||
on-disk names(Name) are compared to user-supplied names(char *), there
|
|
||||||
are many cases where Name and char * are used interchangeably.
|
|
||||||
|
|
||||||
9) How do I efficiently access information in tables from the backend code?
|
|
||||||
|
|
||||||
You first need to find the tuples(rows) you are interested in. There
|
|
||||||
are two ways. First, SearchSysCache() and related functions allow you
|
|
||||||
to query the system catalogs. This is the preferred way to access
|
|
||||||
system tables, because the first call to the cache loads the needed
|
|
||||||
rows, and future requests can return the results without accessing the
|
|
||||||
base table. The caches use system table indexes to look up tuples. A
|
|
||||||
list of available caches is located in
|
|
||||||
src/backend/utils/cache/syscache.c.
|
|
||||||
src/backend/utils/cache/lsyscache.c contains many column-specific
|
|
||||||
cache lookup functions.
|
|
||||||
|
|
||||||
The rows returned are cache-owned versions of the heap rows.
|
|
||||||
Therefore, you must not modify or delete the tuple returned by
|
|
||||||
SearchSysCache(). What you should do is release it with
|
|
||||||
ReleaseSysCache() when you are done using it; this informs the cache
|
|
||||||
that it can discard that tuple if necessary. If you neglect to call
|
|
||||||
ReleaseSysCache(), then the cache entry will remain locked in the
|
|
||||||
cache until end of transaction, which is tolerable but not very
|
|
||||||
desirable.
|
|
||||||
|
|
||||||
If you can't use the system cache, you will need to retrieve the data
|
|
||||||
directly from the heap table, using the buffer cache that is shared by
|
|
||||||
all backends. The backend automatically takes care of loading the rows
|
|
||||||
into the buffer cache.
|
|
||||||
|
|
||||||
Open the table with heap_open(). You can then start a table scan with
|
|
||||||
heap_beginscan(), then use heap_getnext() and continue as long as
|
|
||||||
HeapTupleIsValid() returns true. Then do a heap_endscan(). Keys can be
|
|
||||||
assigned to the scan. No indexes are used, so all rows are going to be
|
|
||||||
compared to the keys, and only the valid rows returned.
|
|
||||||
|
|
||||||
You can also use heap_fetch() to fetch rows by block number/offset.
|
|
||||||
While scans automatically lock/unlock rows from the buffer cache, with
|
|
||||||
heap_fetch(), you must pass a Buffer pointer, and ReleaseBuffer() it
|
|
||||||
when completed.
|
|
||||||
|
|
||||||
Once you have the row, you can get data that is common to all tuples,
|
|
||||||
like t_self and t_oid, by merely accessing the HeapTuple structure
|
|
||||||
entries. If you need a table-specific column, you should take the
|
|
||||||
HeapTuple pointer, and use the GETSTRUCT() macro to access the
|
|
||||||
table-specific start of the tuple. You then cast the pointer as a
|
|
||||||
Form_pg_proc pointer if you are accessing the pg_proc table, or
|
|
||||||
Form_pg_type if you are accessing pg_type. You can then access the
|
|
||||||
columns by using a structure pointer:
|
|
||||||
((Form_pg_class) GETSTRUCT(tuple))->relnatts
|
|
||||||
|
|
||||||
You must not directly change live tuples in this way. The best way is
|
|
||||||
to use heap_modifytuple() and pass it your original tuple, and the
|
|
||||||
values you want changed. It returns a palloc'ed tuple, which you pass
|
|
||||||
to heap_replace(). You can delete tuples by passing the tuple's t_self
|
|
||||||
to heap_destroy(). You use t_self for heap_update() too. Remember,
|
|
||||||
tuples can be either system cache copies, which may go away after you
|
|
||||||
call ReleaseSysCache(), or read directly from disk buffers, which go
|
|
||||||
away when you heap_getnext(), heap_endscan, or ReleaseBuffer(), in the
|
|
||||||
heap_fetch() case. Or it may be a palloc'ed tuple, that you must
|
|
||||||
pfree() when finished.
|
|
||||||
|
|
||||||
10) What is elog()?
|
|
||||||
|
|
||||||
elog() is used to send messages to the front-end, and optionally
|
|
||||||
terminate the current query being processed. The first parameter is an
|
|
||||||
elog level of NOTICE, DEBUG, ERROR, or FATAL. NOTICE prints on the
|
|
||||||
user's terminal and the postmaster logs. DEBUG prints only in the
|
|
||||||
postmaster logs. ERROR prints in both places, and terminates the
|
|
||||||
current query, never returning from the call. FATAL terminates the
|
|
||||||
backend process. The remaining parameters of elog are a printf-style
|
|
||||||
set of parameters to print.
|
|
||||||
|
|
||||||
11) What is configure all about?
|
|
||||||
|
|
||||||
The files configure and configure.in are part of the GNU autoconf
|
The files configure and configure.in are part of the GNU autoconf
|
||||||
package. Configure allows us to test for various capabilities of the
|
package. Configure allows us to test for various capabilities of the
|
||||||
@ -405,7 +288,7 @@ typedef struct nameData
|
|||||||
removed, so you see only the file contained in the source
|
removed, so you see only the file contained in the source
|
||||||
distribution.
|
distribution.
|
||||||
|
|
||||||
12) How do I add a new port?
|
1.8) How do I add a new port?
|
||||||
|
|
||||||
There are a variety of places that need to be modified to add a new
|
There are a variety of places that need to be modified to add a new
|
||||||
port. First, start in the src/template directory. Add an appropriate
|
port. First, start in the src/template directory. Add an appropriate
|
||||||
@ -422,19 +305,7 @@ typedef struct nameData
|
|||||||
src/makefiles directory for port-specific Makefile handling. There is
|
src/makefiles directory for port-specific Makefile handling. There is
|
||||||
a backend/port directory if you need special files for your OS.
|
a backend/port directory if you need special files for your OS.
|
||||||
|
|
||||||
13) What is CommandCounterIncrement()?
|
1.9) Why don't we use threads in the backend?
|
||||||
|
|
||||||
Normally, transactions can not see the rows they modify. This allows
|
|
||||||
UPDATE foo SET x = x + 1 to work correctly.
|
|
||||||
|
|
||||||
However, there are cases where a transactions needs to see rows
|
|
||||||
affected in previous parts of the transaction. This is accomplished
|
|
||||||
using a Command Counter. Incrementing the counter allows transactions
|
|
||||||
to be broken into pieces so each piece can see rows modified by
|
|
||||||
previous pieces. CommandCounterIncrement() increments the Command
|
|
||||||
Counter, creating a new part of the transaction.
|
|
||||||
|
|
||||||
14) Why don't we use threads in the backend?
|
|
||||||
|
|
||||||
There are several reasons threads are not used:
|
There are several reasons threads are not used:
|
||||||
* Historically, threads were unsupported and buggy.
|
* Historically, threads were unsupported and buggy.
|
||||||
@ -443,7 +314,7 @@ typedef struct nameData
|
|||||||
remaining backend startup time.
|
remaining backend startup time.
|
||||||
* The backend code would be more complex.
|
* The backend code would be more complex.
|
||||||
|
|
||||||
15) How are RPM's packaged?
|
1.10) How are RPM's packaged?
|
||||||
|
|
||||||
This was written by Lamar Owen:
|
This was written by Lamar Owen:
|
||||||
|
|
||||||
@ -538,7 +409,7 @@ typedef struct nameData
|
|||||||
Of course, there are many projects that DO include all the files
|
Of course, there are many projects that DO include all the files
|
||||||
necessary to build RPMs from their Official Tarball (TM).
|
necessary to build RPMs from their Official Tarball (TM).
|
||||||
|
|
||||||
16) How are CVS branches managed?
|
1.11) How are CVS branches managed?
|
||||||
|
|
||||||
This was written by Tom Lane:
|
This was written by Tom Lane:
|
||||||
|
|
||||||
@ -597,58 +468,194 @@ typedef struct nameData
|
|||||||
tree right away after a major release --- we wait for a dot-release or
|
tree right away after a major release --- we wait for a dot-release or
|
||||||
two, so that we won't have to double-patch the first wave of fixes.
|
two, so that we won't have to double-patch the first wave of fixes.
|
||||||
|
|
||||||
17) How go I get involved in PostgreSQL development?
|
Technical Questions
|
||||||
|
|
||||||
|
2.1) How do I efficiently access information in tables from the backend code?
|
||||||
|
|
||||||
This was written by Lamar Owen:
|
You first need to find the tuples(rows) you are interested in. There
|
||||||
|
are two ways. First, SearchSysCache() and related functions allow you
|
||||||
|
to query the system catalogs. This is the preferred way to access
|
||||||
|
system tables, because the first call to the cache loads the needed
|
||||||
|
rows, and future requests can return the results without accessing the
|
||||||
|
base table. The caches use system table indexes to look up tuples. A
|
||||||
|
list of available caches is located in
|
||||||
|
src/backend/utils/cache/syscache.c.
|
||||||
|
src/backend/utils/cache/lsyscache.c contains many column-specific
|
||||||
|
cache lookup functions.
|
||||||
|
|
||||||
2001-06-22
|
The rows returned are cache-owned versions of the heap rows.
|
||||||
What open source development process is used by the PostgreSQL team?
|
Therefore, you must not modify or delete the tuple returned by
|
||||||
|
SearchSysCache(). What you should do is release it with
|
||||||
|
ReleaseSysCache() when you are done using it; this informs the cache
|
||||||
|
that it can discard that tuple if necessary. If you neglect to call
|
||||||
|
ReleaseSysCache(), then the cache entry will remain locked in the
|
||||||
|
cache until end of transaction, which is tolerable but not very
|
||||||
|
desirable.
|
||||||
|
|
||||||
Read HACKERS for six months (or a full release cycle, whichever is
|
If you can't use the system cache, you will need to retrieve the data
|
||||||
longer). Really. HACKERS _is_the process. The process is not well
|
directly from the heap table, using the buffer cache that is shared by
|
||||||
documented (AFAIK -- it may be somewhere that I am not aware of) --
|
all backends. The backend automatically takes care of loading the rows
|
||||||
and it changes continually.
|
into the buffer cache.
|
||||||
What development environment (OS, system, compilers, etc) is required
|
|
||||||
to develop code?
|
|
||||||
|
|
||||||
Developers Corner on the website has links to this information. The
|
Open the table with heap_open(). You can then start a table scan with
|
||||||
distribution tarball itself includes all the extra tools and documents
|
heap_beginscan(), then use heap_getnext() and continue as long as
|
||||||
that go beyond a good Unix-like development environment. In general, a
|
HeapTupleIsValid() returns true. Then do a heap_endscan(). Keys can be
|
||||||
modern unix with a modern gcc, GNU make or equivalent, autoconf (of a
|
assigned to the scan. No indexes are used, so all rows are going to be
|
||||||
particular version), and good working knowledge of those tools are
|
compared to the keys, and only the valid rows returned.
|
||||||
required.
|
|
||||||
What areas need support?
|
|
||||||
|
|
||||||
The TODO list.
|
You can also use heap_fetch() to fetch rows by block number/offset.
|
||||||
|
While scans automatically lock/unlock rows from the buffer cache, with
|
||||||
|
heap_fetch(), you must pass a Buffer pointer, and ReleaseBuffer() it
|
||||||
|
when completed.
|
||||||
|
|
||||||
You've made the first step, by finding and subscribing to HACKERS.
|
Once you have the row, you can get data that is common to all tuples,
|
||||||
Once you find an area to look at in the TODO, and have read the
|
like t_self and t_oid, by merely accessing the HeapTuple structure
|
||||||
documentation on the internals, etc, then you check out a current
|
entries. If you need a table-specific column, you should take the
|
||||||
CVS,write what you are going to write (keeping your CVS checkout up to
|
HeapTuple pointer, and use the GETSTRUCT() macro to access the
|
||||||
date in the process), and make up a patch (as a context diff only) and
|
table-specific start of the tuple. You then cast the pointer as a
|
||||||
send to the PATCHES list, prefereably.
|
Form_pg_proc pointer if you are accessing the pg_proc table, or
|
||||||
|
Form_pg_type if you are accessing pg_type. You can then access the
|
||||||
|
columns by using a structure pointer:
|
||||||
|
((Form_pg_class) GETSTRUCT(tuple))->relnatts
|
||||||
|
|
||||||
|
You must not directly change live tuples in this way. The best way is
|
||||||
|
to use heap_modifytuple() and pass it your original tuple, and the
|
||||||
|
values you want changed. It returns a palloc'ed tuple, which you pass
|
||||||
|
to heap_replace(). You can delete tuples by passing the tuple's t_self
|
||||||
|
to heap_destroy(). You use t_self for heap_update() too. Remember,
|
||||||
|
tuples can be either system cache copies, which may go away after you
|
||||||
|
call ReleaseSysCache(), or read directly from disk buffers, which go
|
||||||
|
away when you heap_getnext(), heap_endscan, or ReleaseBuffer(), in the
|
||||||
|
heap_fetch() case. Or it may be a palloc'ed tuple, that you must
|
||||||
|
pfree() when finished.
|
||||||
|
|
||||||
Discussion on the patch typically happens here. If the patch adds a
|
2.2) Why are table, column, type, function, view names sometimes referenced
|
||||||
major feature, it would be a good idea to talk about it first on the
|
as Name or NameData, and sometimes as char *?
|
||||||
HACKERS list, in order to increase the chances of it being accepted,
|
|
||||||
as well as toavoid duplication of effort. Note that experienced
|
Table, column, type, function, and view names are stored in system
|
||||||
developers with a proven track record usually get the big jobs -- for
|
tables in columns of type Name. Name is a fixed-length,
|
||||||
more than one reason. Also note that PostgreSQL is highly portable --
|
null-terminated type of NAMEDATALEN bytes. (The default value for
|
||||||
nonportable code will likely be dismissed out of hand.
|
NAMEDATALEN is 32 bytes.)
|
||||||
|
typedef struct nameData
|
||||||
|
{
|
||||||
|
char data[NAMEDATALEN];
|
||||||
|
} NameData;
|
||||||
|
typedef NameData *Name;
|
||||||
|
|
||||||
|
Table, column, type, function, and view names that come into the
|
||||||
|
backend via user queries are stored as variable-length,
|
||||||
|
null-terminated character strings.
|
||||||
|
|
||||||
Once your contributions get accepted, things move from there.
|
Many functions are called with both types of names, ie. heap_open().
|
||||||
Typically, you would be added as a developer on the list on the
|
Because the Name type is null-terminated, it is safe to pass it to a
|
||||||
website when one of the other developers recommends it. Membership on
|
function expecting a char *. Because there are many cases where
|
||||||
the steering committee is by invitation only, by the other steering
|
on-disk names(Name) are compared to user-supplied names(char *), there
|
||||||
committee members, from what I have gathered watching froma distance.
|
are many cases where Name and char * are used interchangeably.
|
||||||
|
|
||||||
I make these statements from having watched the process for over two
|
2.3) Why do we use Node and List to make data structures?
|
||||||
years.
|
|
||||||
|
We do this because this allows a consistent way to pass data inside
|
||||||
|
the backend in a flexible way. Every node has a NodeTag which
|
||||||
|
specifies what type of data is inside the Node. Lists are groups of
|
||||||
|
Nodes chained together as a forward-linked list.
|
||||||
|
|
||||||
To see a good example of how one goes about this, search the archives
|
Here are some of the List manipulation commands:
|
||||||
for the name 'Tom Lane' and see what his first post consisted of, and
|
|
||||||
where he took things. In particular, note that this hasn't been _that_
|
lfirst(i)
|
||||||
long ago -- and his bugfixing and general deep knowledge with this
|
return the data at list element i.
|
||||||
codebase is legendary. Take a few days to read after him. And pay
|
|
||||||
special attention to both the sheer quantity as well as the
|
lnext(i)
|
||||||
painstaking quality of his work. Both are in high demand.
|
return the next list element after i.
|
||||||
|
|
||||||
|
foreach(i, list)
|
||||||
|
loop through list, assigning each list element to i. It is
|
||||||
|
important to note that i is a List *, not the data in the List
|
||||||
|
element. You need to use lfirst(i) to get at the data. Here is
|
||||||
|
a typical code snipped that loops through a List containing Var
|
||||||
|
*'s and processes each one:
|
||||||
|
|
||||||
|
List *i, *list;
|
||||||
|
|
||||||
|
foreach(i, list)
|
||||||
|
{
|
||||||
|
Var *var = lfirst(i);
|
||||||
|
|
||||||
|
/* process var here */
|
||||||
|
}
|
||||||
|
|
||||||
|
lcons(node, list)
|
||||||
|
add node to the front of list, or create a new list with node
|
||||||
|
if list is NIL.
|
||||||
|
|
||||||
|
lappend(list, node)
|
||||||
|
add node to the end of list. This is more expensive that lcons.
|
||||||
|
|
||||||
|
nconc(list1, list2)
|
||||||
|
Concat list2 on to the end of list1.
|
||||||
|
|
||||||
|
length(list)
|
||||||
|
return the length of the list.
|
||||||
|
|
||||||
|
nth(i, list)
|
||||||
|
return the i'th element in list.
|
||||||
|
|
||||||
|
lconsi, ...
|
||||||
|
There are integer versions of these: lconsi, lappendi, nthi.
|
||||||
|
List's containing integers instead of Node pointers are used to
|
||||||
|
hold list of relation object id's and other integer quantities.
|
||||||
|
|
||||||
|
You can print nodes easily inside gdb. First, to disable output
|
||||||
|
truncation when you use the gdb print command:
|
||||||
|
(gdb) set print elements 0
|
||||||
|
|
||||||
|
Instead of printing values in gdb format, you can use the next two
|
||||||
|
commands to print out List, Node, and structure contents in a verbose
|
||||||
|
format that is easier to understand. List's are unrolled into nodes,
|
||||||
|
and nodes are printed in detail. The first prints in a short format,
|
||||||
|
and the second in a long format:
|
||||||
|
(gdb) call print(any_pointer)
|
||||||
|
(gdb) call pprint(any_pointer)
|
||||||
|
|
||||||
|
The output appears in the postmaster log file, or on your screen if
|
||||||
|
you are running a backend directly without a postmaster.
|
||||||
|
|
||||||
|
2.4) I just added a field to a structure. What else should I do?
|
||||||
|
|
||||||
|
The structures passing around from the parser, rewrite, optimizer, and
|
||||||
|
executor require quite a bit of support. Most structures have support
|
||||||
|
routines in src/backend/nodes used to create, copy, read, and output
|
||||||
|
those structures. Make sure you add support for your new field to
|
||||||
|
these files. Find any other places the structure may need code for
|
||||||
|
your new field. mkid is helpful with this (see above).
|
||||||
|
|
||||||
|
2.5) Why do we use palloc() and pfree() to allocate memory?
|
||||||
|
|
||||||
|
palloc() and pfree() are used in place of malloc() and free() because
|
||||||
|
we automatically free all memory allocated when a transaction
|
||||||
|
completes. This makes it easier to make sure we free memory that gets
|
||||||
|
allocated in one place, but only freed much later. There are several
|
||||||
|
contexts that memory can be allocated in, and this controls when the
|
||||||
|
allocated memory is automatically freed by the backend.
|
||||||
|
|
||||||
|
2.6) What is elog()?
|
||||||
|
|
||||||
|
elog() is used to send messages to the front-end, and optionally
|
||||||
|
terminate the current query being processed. The first parameter is an
|
||||||
|
elog level of NOTICE, DEBUG, ERROR, or FATAL. NOTICE prints on the
|
||||||
|
user's terminal and the postmaster logs. DEBUG prints only in the
|
||||||
|
postmaster logs. ERROR prints in both places, and terminates the
|
||||||
|
current query, never returning from the call. FATAL terminates the
|
||||||
|
backend process. The remaining parameters of elog are a printf-style
|
||||||
|
set of parameters to print.
|
||||||
|
|
||||||
|
2.7) What is CommandCounterIncrement()?
|
||||||
|
|
||||||
|
Normally, transactions can not see the rows they modify. This allows
|
||||||
|
UPDATE foo SET x = x + 1 to work correctly.
|
||||||
|
|
||||||
|
However, there are cases where a transactions needs to see rows
|
||||||
|
affected in previous parts of the transaction. This is accomplished
|
||||||
|
using a Command Counter. Incrementing the counter allows transactions
|
||||||
|
to be broken into pieces so each piece can see rows modified by
|
||||||
|
previous pieces. CommandCounterIncrement() increments the Command
|
||||||
|
Counter, creating a new part of the transaction.
|
||||||
|
@ -27,39 +27,169 @@
|
|||||||
|
|
||||||
|
|
||||||
<CENTER>
|
<CENTER>
|
||||||
<H2>Questions</H2>
|
<H2>General Questions</H2>
|
||||||
</CENTER>
|
</CENTER>
|
||||||
<A href="#1">1</A>) What tools are available for developers?<BR>
|
<A href="#1.1">1.1</A>) How do I get involved in PostgreSQL
|
||||||
<A href="#2">2</A>) What books are good for developers?<BR>
|
development?<BR>
|
||||||
<A href="#3">3</A>) Why do we use <I>palloc</I>() and
|
<A href="#1.2">1.2</A>) How do I add a feature or fix a bug?<BR>
|
||||||
<I>pfree</I>() to allocate memory?<BR>
|
<A href="#1.3">1.3</A>) How do I download/update the current source
|
||||||
<A href="#4">4</A>) Why do we use <I>Node</I> and <I>List</I> to
|
|
||||||
make data structures?<BR>
|
|
||||||
<A href="#5">5</A>) How do I add a feature or fix a bug?<BR>
|
|
||||||
<A href="#6">6</A>) How do I download/update the current source
|
|
||||||
tree?<BR>
|
tree?<BR>
|
||||||
<A href="#7">7</A>) How do I test my changes?<BR>
|
<A href="#1.4">1.4</A>) How do I test my changes?<BR>
|
||||||
<A href="#7">7</A>) I just added a field to a structure. What else
|
<A href="#1.5">1.5</A>) What tools are available for developers?<BR>
|
||||||
should I do?<BR>
|
<A href="#1.6">1.6</A>) What books are good for developers?<BR>
|
||||||
<A href="#8">8</A>) Why are table, column, type, function, view
|
<A href="#1.7">1.7</A>) What is configure all about?<BR>
|
||||||
|
<A href="#1.8">1.8</A>) How do I add a new port?<BR>
|
||||||
|
<A href="#1.9">1.9</A>) Why don't we use threads in the backend?<BR>
|
||||||
|
<A href="#1.10">1.10</A>) How are RPM's packaged?<BR>
|
||||||
|
<A href="#1.11">1.11</A>) How are CVS branches handled?<BR>
|
||||||
|
|
||||||
|
<H2>Technical Questions</H2>
|
||||||
|
<A href="#2.1">2.1</A>) How do I efficiently access information in
|
||||||
|
tables from the backend code?<BR>
|
||||||
|
<A href="#2.2">2.2</A>) Why are table, column, type, function, view
|
||||||
names sometimes referenced as <I>Name</I> or <I>NameData,</I> and
|
names sometimes referenced as <I>Name</I> or <I>NameData,</I> and
|
||||||
sometimes as <I>char *?</I><BR>
|
sometimes as <I>char *?</I><BR>
|
||||||
<A href="#9">9</A>) How do I efficiently access information in
|
<A href="#2.3">2.3</A>) Why do we use <I>Node</I> and <I>List</I> to
|
||||||
tables from the backend code?<BR>
|
make data structures?<BR>
|
||||||
<A href="#10">10</A>) What is elog()?<BR>
|
<A href="#2.4">2.4</A>) I just added a field to a structure. What else
|
||||||
<A href="#11">11</A>) What is configure all about?<BR>
|
should I do?<BR>
|
||||||
<A href="#12">12</A>) How do I add a new port?<BR>
|
<A href="#2.5">2.5</A>) Why do we use <I>palloc</I>() and
|
||||||
<A href="#13">13</A>) What is CommandCounterIncrement()?<BR>
|
<I>pfree</I>() to allocate memory?<BR>
|
||||||
<A href="#14">14</A>) Why don't we use threads in the backend?<BR>
|
<A href="#2.6">2.6</A>) What is elog()?<BR>
|
||||||
<A href="#15">15</A>) How are RPM's packaged?<BR>
|
<A href="#2.7">2.7</A>) What is CommandCounterIncrement()?<BR>
|
||||||
<A href="#16">16</A>) How are CVS branches handled?<BR>
|
|
||||||
<A href="#17">17</A>) How do I get involved in PostgreSQL
|
|
||||||
development?<BR>
|
|
||||||
<BR>
|
<BR>
|
||||||
|
|
||||||
<HR>
|
<HR>
|
||||||
|
|
||||||
<H3><A name="1">1</A>) What tools are available for
|
<CENTER>
|
||||||
|
<H2>General Questions</H2>
|
||||||
|
</CENTER>
|
||||||
|
|
||||||
|
<H3><A name="1.1">1.1</A>) How go I get involved in PostgreSQL
|
||||||
|
development?</H3>
|
||||||
|
|
||||||
|
<P>This was written by Lamar Owen:</P>
|
||||||
|
|
||||||
|
<P>2001-06-22</P>
|
||||||
|
|
||||||
|
<B>What open source development process is used by the PostgreSQL
|
||||||
|
team?</B>
|
||||||
|
|
||||||
|
<P>Read HACKERS for six months (or a full release cycle, whichever
|
||||||
|
is longer). Really. HACKERS _is_the process. The process is not
|
||||||
|
well documented (AFAIK -- it may be somewhere that I am not aware
|
||||||
|
of) -- and it changes continually.</P>
|
||||||
|
|
||||||
|
<B>What development environment (OS, system, compilers, etc) is
|
||||||
|
required to develop code?</B>
|
||||||
|
|
||||||
|
<P><A href="developers.postgresql.org">Developers Corner</A> on the
|
||||||
|
website has links to this information. The distribution tarball
|
||||||
|
itself includes all the extra tools and documents that go beyond a
|
||||||
|
good Unix-like development environment. In general, a modern unix
|
||||||
|
with a modern gcc, GNU make or equivalent, autoconf (of a
|
||||||
|
particular version), and good working knowledge of those tools are
|
||||||
|
required.</P>
|
||||||
|
|
||||||
|
<B>What areas need support?</B>
|
||||||
|
|
||||||
|
<P>The TODO list.</P>
|
||||||
|
|
||||||
|
<P>You've made the first step, by finding and subscribing to
|
||||||
|
HACKERS. Once you find an area to look at in the TODO, and have
|
||||||
|
read the documentation on the internals, etc, then you check out a
|
||||||
|
current CVS,write what you are going to write (keeping your CVS
|
||||||
|
checkout up to date in the process), and make up a patch (as a
|
||||||
|
context diff only) and send to the PATCHES list, prefereably.</P>
|
||||||
|
|
||||||
|
<P>Discussion on the patch typically happens here. If the patch
|
||||||
|
adds a major feature, it would be a good idea to talk about it
|
||||||
|
first on the HACKERS list, in order to increase the chances of it
|
||||||
|
being accepted, as well as toavoid duplication of effort. Note that
|
||||||
|
experienced developers with a proven track record usually get the
|
||||||
|
big jobs -- for more than one reason. Also note that PostgreSQL is
|
||||||
|
highly portable -- nonportable code will likely be dismissed out of
|
||||||
|
hand.</P>
|
||||||
|
|
||||||
|
<P>Once your contributions get accepted, things move from there.
|
||||||
|
Typically, you would be added as a developer on the list on the
|
||||||
|
website when one of the other developers recommends it. Membership
|
||||||
|
on the steering committee is by invitation only, by the other
|
||||||
|
steering committee members, from what I have gathered watching
|
||||||
|
froma distance.</P>
|
||||||
|
|
||||||
|
<P>I make these statements from having watched the process for over
|
||||||
|
two years.</P>
|
||||||
|
|
||||||
|
<P>To see a good example of how one goes about this, search the
|
||||||
|
archives for the name 'Tom Lane' and see what his first post
|
||||||
|
consisted of, and where he took things. In particular, note that
|
||||||
|
this hasn't been _that_ long ago -- and his bugfixing and general
|
||||||
|
deep knowledge with this codebase is legendary. Take a few days to
|
||||||
|
read after him. And pay special attention to both the sheer
|
||||||
|
quantity as well as the painstaking quality of his work. Both are
|
||||||
|
in high demand.</P>
|
||||||
|
|
||||||
|
<H3><A name="1.2">1.2</A>) How do I add a feature or fix a bug?</H3>
|
||||||
|
|
||||||
|
<P>The source code is over 250,000 lines. Many problems/features
|
||||||
|
are isolated to one specific area of the code. Others require
|
||||||
|
knowledge of much of the source. If you are confused about where to
|
||||||
|
start, ask the hackers list, and they will be glad to assess the
|
||||||
|
complexity and give pointers on where to start.</P>
|
||||||
|
|
||||||
|
<P>Another thing to keep in mind is that many fixes and features
|
||||||
|
can be added with surprisingly little code. I often start by adding
|
||||||
|
code, then looking at other areas in the code where similar things
|
||||||
|
are done, and by the time I am finished, the patch is quite small
|
||||||
|
and compact.</P>
|
||||||
|
|
||||||
|
<P>When adding code, keep in mind that it should use the existing
|
||||||
|
facilities in the source, for performance reasons and for
|
||||||
|
simplicity. Often a review of existing code doing similar things is
|
||||||
|
helpful.</P>
|
||||||
|
|
||||||
|
<H3><A name="1.3">1.3</A>) How do I download/update the current source
|
||||||
|
tree?</H3>
|
||||||
|
|
||||||
|
<P>There are several ways to obtain the source tree. Occasional
|
||||||
|
developers can just get the most recent source tree snapshot from
|
||||||
|
ftp.postgresql.org. For regular developers, you can use CVS. CVS
|
||||||
|
allows you to download the source tree, then occasionally update
|
||||||
|
your copy of the source tree with any new changes. Using CVS, you
|
||||||
|
don't have to download the entire source each time, only the
|
||||||
|
changed files. Anonymous CVS does not allows developers to update
|
||||||
|
the remote source tree, though privileged developers can do this.
|
||||||
|
There is a CVS FAQ on our web site that describes how to use remote
|
||||||
|
CVS. You can also use CVSup, which has similarly functionality, and
|
||||||
|
is available from ftp.postgresql.org.</P>
|
||||||
|
|
||||||
|
<P>To update the source tree, there are two ways. You can generate
|
||||||
|
a patch against your current source tree, perhaps using the
|
||||||
|
make_diff tools mentioned above, and send them to the patches list.
|
||||||
|
They will be reviewed, and applied in a timely manner. If the patch
|
||||||
|
is major, and we are in beta testing, the developers may wait for
|
||||||
|
the final release before applying your patches.</P>
|
||||||
|
|
||||||
|
<P>For hard-core developers, Marc(scrappy@postgresql.org) will give
|
||||||
|
you a Unix shell account on postgresql.org, so you can use CVS to
|
||||||
|
update the main source tree, or you can ftp your files into your
|
||||||
|
account, patch, and cvs install the changes directly into the
|
||||||
|
source tree.</P>
|
||||||
|
|
||||||
|
<H3><A name="1.4">1.4</A>) How do I test my changes?</H3>
|
||||||
|
|
||||||
|
<P>First, use <I>psql</I> to make sure it is working as you expect.
|
||||||
|
Then run <I>src/test/regress</I> and get the output of
|
||||||
|
<I>src/test/regress/checkresults</I> with and without your changes,
|
||||||
|
to see that your patch does not change the regression test in
|
||||||
|
unexpected ways. This practice has saved me many times. The
|
||||||
|
regression tests test the code in ways I would never do, and has
|
||||||
|
caught many bugs in my patches. By finding the problems now, you
|
||||||
|
save yourself a lot of debugging later when things are broken, and
|
||||||
|
you can't figure out when it happened.</P>
|
||||||
|
|
||||||
|
<H3><A name="1.5">1.5</A>) What tools are available for
|
||||||
developers?</H3>
|
developers?</H3>
|
||||||
|
|
||||||
<P>Aside from the User documentation mentioned in the regular FAQ,
|
<P>Aside from the User documentation mentioned in the regular FAQ,
|
||||||
@ -179,7 +309,7 @@
|
|||||||
There is also a script called <I>unused_oids</I> in
|
There is also a script called <I>unused_oids</I> in
|
||||||
<I>pgsql/src/include/catalog</I> that shows the unused oids.</P>
|
<I>pgsql/src/include/catalog</I> that shows the unused oids.</P>
|
||||||
|
|
||||||
<H3><A name="2">2</A>) What books are good for developers?</H3>
|
<H3><A name="1.6">1.6</A>) What books are good for developers?</H3>
|
||||||
|
|
||||||
<P>I have four good books, <I>An Introduction to Database
|
<P>I have four good books, <I>An Introduction to Database
|
||||||
Systems,</I> by C.J. Date, Addison, Wesley, <I>A Guide to the SQL
|
Systems,</I> by C.J. Date, Addison, Wesley, <I>A Guide to the SQL
|
||||||
@ -192,288 +322,7 @@
|
|||||||
on-line written by Jim Gray at <A href=
|
on-line written by Jim Gray at <A href=
|
||||||
"http://www.benchmarkresources.com">http://www.benchmarkresources.com.</A></P>
|
"http://www.benchmarkresources.com">http://www.benchmarkresources.com.</A></P>
|
||||||
|
|
||||||
<H3><A name="3">3</A>) Why do we use <I>palloc</I>() and
|
<H3><A name="1.7">1.7</A>) What is configure all about?</H3>
|
||||||
<I>pfree</I>() to allocate memory?</H3>
|
|
||||||
|
|
||||||
<P><I>palloc()</I> and <I>pfree()</I> are used in place of malloc()
|
|
||||||
and free() because we automatically free all memory allocated when
|
|
||||||
a transaction completes. This makes it easier to make sure we free
|
|
||||||
memory that gets allocated in one place, but only freed much later.
|
|
||||||
There are several contexts that memory can be allocated in, and
|
|
||||||
this controls when the allocated memory is automatically freed by
|
|
||||||
the backend.</P>
|
|
||||||
|
|
||||||
<H3><A name="4">4</A>) Why do we use <I>Node</I> and <I>List</I> to
|
|
||||||
make data structures?</H3>
|
|
||||||
|
|
||||||
<P>We do this because this allows a consistent way to pass data
|
|
||||||
inside the backend in a flexible way. Every node has a
|
|
||||||
<I>NodeTag</I> which specifies what type of data is inside the
|
|
||||||
Node. <I>Lists</I> are groups of <I>Nodes chained together as a
|
|
||||||
forward-linked list.</I></P>
|
|
||||||
|
|
||||||
<P>Here are some of the <I>List</I> manipulation commands:</P>
|
|
||||||
|
|
||||||
<BLOCKQUOTE>
|
|
||||||
<DL>
|
|
||||||
<DT>lfirst(i)</DT>
|
|
||||||
|
|
||||||
<DD>return the data at list element <I>i.</I></DD>
|
|
||||||
|
|
||||||
<DT>lnext(i)</DT>
|
|
||||||
|
|
||||||
<DD>return the next list element after <I>i.</I></DD>
|
|
||||||
|
|
||||||
<DT>foreach(i, list)</DT>
|
|
||||||
|
|
||||||
<DD>
|
|
||||||
loop through <I>list,</I> assigning each list element to
|
|
||||||
<I>i.</I> It is important to note that <I>i</I> is a List *,
|
|
||||||
not the data in the <I>List</I> element. You need to use
|
|
||||||
<I>lfirst(i)</I> to get at the data. Here is a typical code
|
|
||||||
snipped that loops through a List containing <I>Var *'s</I>
|
|
||||||
and processes each one:
|
|
||||||
<PRE>
|
|
||||||
<CODE>List *i, *list;
|
|
||||||
|
|
||||||
foreach(i, list)
|
|
||||||
{
|
|
||||||
Var *var = lfirst(i);
|
|
||||||
|
|
||||||
/* process var here */
|
|
||||||
}
|
|
||||||
</CODE>
|
|
||||||
</PRE>
|
|
||||||
</DD>
|
|
||||||
|
|
||||||
<DT>lcons(node, list)</DT>
|
|
||||||
|
|
||||||
<DD>add <I>node</I> to the front of <I>list,</I> or create a
|
|
||||||
new list with <I>node</I> if <I>list</I> is <I>NIL.</I></DD>
|
|
||||||
|
|
||||||
<DT>lappend(list, node)</DT>
|
|
||||||
|
|
||||||
<DD>add <I>node</I> to the end of <I>list.</I> This is more
|
|
||||||
expensive that lcons.</DD>
|
|
||||||
|
|
||||||
<DT>nconc(list1, list2)</DT>
|
|
||||||
|
|
||||||
<DD>Concat <I>list2</I> on to the end of <I>list1.</I></DD>
|
|
||||||
|
|
||||||
<DT>length(list)</DT>
|
|
||||||
|
|
||||||
<DD>return the length of the <I>list.</I></DD>
|
|
||||||
|
|
||||||
<DT>nth(i, list)</DT>
|
|
||||||
|
|
||||||
<DD>return the <I>i</I>'th element in <I>list.</I></DD>
|
|
||||||
|
|
||||||
<DT>lconsi, ...</DT>
|
|
||||||
|
|
||||||
<DD>There are integer versions of these: <I>lconsi, lappendi,
|
|
||||||
nthi.</I> <I>List's</I> containing integers instead of Node
|
|
||||||
pointers are used to hold list of relation object id's and
|
|
||||||
other integer quantities.</DD>
|
|
||||||
</DL>
|
|
||||||
</BLOCKQUOTE>
|
|
||||||
You can print nodes easily inside <I>gdb.</I> First, to disable
|
|
||||||
output truncation when you use the gdb <I>print</I> command:
|
|
||||||
<PRE>
|
|
||||||
<CODE>(gdb) set print elements 0
|
|
||||||
</CODE>
|
|
||||||
</PRE>
|
|
||||||
Instead of printing values in gdb format, you can use the next two
|
|
||||||
commands to print out List, Node, and structure contents in a
|
|
||||||
verbose format that is easier to understand. List's are unrolled
|
|
||||||
into nodes, and nodes are printed in detail. The first prints in a
|
|
||||||
short format, and the second in a long format:
|
|
||||||
<PRE>
|
|
||||||
<CODE>(gdb) call print(any_pointer)
|
|
||||||
(gdb) call pprint(any_pointer)
|
|
||||||
</CODE>
|
|
||||||
</PRE>
|
|
||||||
The output appears in the postmaster log file, or on your screen if
|
|
||||||
you are running a backend directly without a postmaster.
|
|
||||||
|
|
||||||
<H3><A name="5">5</A>) How do I add a feature or fix a bug?</H3>
|
|
||||||
|
|
||||||
<P>The source code is over 250,000 lines. Many problems/features
|
|
||||||
are isolated to one specific area of the code. Others require
|
|
||||||
knowledge of much of the source. If you are confused about where to
|
|
||||||
start, ask the hackers list, and they will be glad to assess the
|
|
||||||
complexity and give pointers on where to start.</P>
|
|
||||||
|
|
||||||
<P>Another thing to keep in mind is that many fixes and features
|
|
||||||
can be added with surprisingly little code. I often start by adding
|
|
||||||
code, then looking at other areas in the code where similar things
|
|
||||||
are done, and by the time I am finished, the patch is quite small
|
|
||||||
and compact.</P>
|
|
||||||
|
|
||||||
<P>When adding code, keep in mind that it should use the existing
|
|
||||||
facilities in the source, for performance reasons and for
|
|
||||||
simplicity. Often a review of existing code doing similar things is
|
|
||||||
helpful.</P>
|
|
||||||
|
|
||||||
<H3><A name="6">6</A>) How do I download/update the current source
|
|
||||||
tree?</H3>
|
|
||||||
|
|
||||||
<P>There are several ways to obtain the source tree. Occasional
|
|
||||||
developers can just get the most recent source tree snapshot from
|
|
||||||
ftp.postgresql.org. For regular developers, you can use CVS. CVS
|
|
||||||
allows you to download the source tree, then occasionally update
|
|
||||||
your copy of the source tree with any new changes. Using CVS, you
|
|
||||||
don't have to download the entire source each time, only the
|
|
||||||
changed files. Anonymous CVS does not allows developers to update
|
|
||||||
the remote source tree, though privileged developers can do this.
|
|
||||||
There is a CVS FAQ on our web site that describes how to use remote
|
|
||||||
CVS. You can also use CVSup, which has similarly functionality, and
|
|
||||||
is available from ftp.postgresql.org.</P>
|
|
||||||
|
|
||||||
<P>To update the source tree, there are two ways. You can generate
|
|
||||||
a patch against your current source tree, perhaps using the
|
|
||||||
make_diff tools mentioned above, and send them to the patches list.
|
|
||||||
They will be reviewed, and applied in a timely manner. If the patch
|
|
||||||
is major, and we are in beta testing, the developers may wait for
|
|
||||||
the final release before applying your patches.</P>
|
|
||||||
|
|
||||||
<P>For hard-core developers, Marc(scrappy@postgresql.org) will give
|
|
||||||
you a Unix shell account on postgresql.org, so you can use CVS to
|
|
||||||
update the main source tree, or you can ftp your files into your
|
|
||||||
account, patch, and cvs install the changes directly into the
|
|
||||||
source tree.</P>
|
|
||||||
|
|
||||||
<H3><A name="6">6</A>) How do I test my changes?</H3>
|
|
||||||
|
|
||||||
<P>First, use <I>psql</I> to make sure it is working as you expect.
|
|
||||||
Then run <I>src/test/regress</I> and get the output of
|
|
||||||
<I>src/test/regress/checkresults</I> with and without your changes,
|
|
||||||
to see that your patch does not change the regression test in
|
|
||||||
unexpected ways. This practice has saved me many times. The
|
|
||||||
regression tests test the code in ways I would never do, and has
|
|
||||||
caught many bugs in my patches. By finding the problems now, you
|
|
||||||
save yourself a lot of debugging later when things are broken, and
|
|
||||||
you can't figure out when it happened.</P>
|
|
||||||
|
|
||||||
<H3><A name="7">7</A>) I just added a field to a structure. What
|
|
||||||
else should I do?</H3>
|
|
||||||
|
|
||||||
<P>The structures passing around from the parser, rewrite,
|
|
||||||
optimizer, and executor require quite a bit of support. Most
|
|
||||||
structures have support routines in <I>src/backend/nodes</I> used
|
|
||||||
to create, copy, read, and output those structures. Make sure you
|
|
||||||
add support for your new field to these files. Find any other
|
|
||||||
places the structure may need code for your new field. <I>mkid</I>
|
|
||||||
is helpful with this (see above).</P>
|
|
||||||
|
|
||||||
<H3><A name="8">8</A>) Why are table, column, type, function, view
|
|
||||||
names sometimes referenced as <I>Name</I> or <I>NameData,</I> and
|
|
||||||
sometimes as <I>char *?</I></H3>
|
|
||||||
|
|
||||||
<P>Table, column, type, function, and view names are stored in
|
|
||||||
system tables in columns of type <I>Name.</I> Name is a
|
|
||||||
fixed-length, null-terminated type of <I>NAMEDATALEN</I> bytes.
|
|
||||||
(The default value for NAMEDATALEN is 32 bytes.)</P>
|
|
||||||
<PRE>
|
|
||||||
<CODE>typedef struct nameData
|
|
||||||
{
|
|
||||||
char data[NAMEDATALEN];
|
|
||||||
} NameData;
|
|
||||||
typedef NameData *Name;
|
|
||||||
</CODE>
|
|
||||||
</PRE>
|
|
||||||
Table, column, type, function, and view names that come into the
|
|
||||||
backend via user queries are stored as variable-length,
|
|
||||||
null-terminated character strings.
|
|
||||||
|
|
||||||
<P>Many functions are called with both types of names, ie.
|
|
||||||
<I>heap_open().</I> Because the Name type is null-terminated, it is
|
|
||||||
safe to pass it to a function expecting a char *. Because there are
|
|
||||||
many cases where on-disk names(Name) are compared to user-supplied
|
|
||||||
names(char *), there are many cases where Name and char * are used
|
|
||||||
interchangeably.</P>
|
|
||||||
|
|
||||||
<H3><A name="9">9</A>) How do I efficiently access information in
|
|
||||||
tables from the backend code?</H3>
|
|
||||||
|
|
||||||
<P>You first need to find the tuples(rows) you are interested in.
|
|
||||||
There are two ways. First, <I>SearchSysCache()</I> and related
|
|
||||||
functions allow you to query the system catalogs. This is the
|
|
||||||
preferred way to access system tables, because the first call to
|
|
||||||
the cache loads the needed rows, and future requests can return the
|
|
||||||
results without accessing the base table. The caches use system
|
|
||||||
table indexes to look up tuples. A list of available caches is
|
|
||||||
located in <I>src/backend/utils/cache/syscache.c.</I>
|
|
||||||
<I>src/backend/utils/cache/lsyscache.c</I> contains many
|
|
||||||
column-specific cache lookup functions.</P>
|
|
||||||
|
|
||||||
<P>The rows returned are cache-owned versions of the heap rows.
|
|
||||||
Therefore, you must not modify or delete the tuple returned by
|
|
||||||
<I>SearchSysCache()</I>. What you <I>should</I> do is release it
|
|
||||||
with <I>ReleaseSysCache()</I> when you are done using it; this
|
|
||||||
informs the cache that it can discard that tuple if necessary. If
|
|
||||||
you neglect to call <I>ReleaseSysCache()</I>, then the cache entry
|
|
||||||
will remain locked in the cache until end of transaction, which is
|
|
||||||
tolerable but not very desirable.</P>
|
|
||||||
|
|
||||||
<P>If you can't use the system cache, you will need to retrieve the
|
|
||||||
data directly from the heap table, using the buffer cache that is
|
|
||||||
shared by all backends. The backend automatically takes care of
|
|
||||||
loading the rows into the buffer cache.</P>
|
|
||||||
|
|
||||||
<P>Open the table with <I>heap_open().</I> You can then start a
|
|
||||||
table scan with <I>heap_beginscan(),</I> then use
|
|
||||||
<I>heap_getnext()</I> and continue as long as
|
|
||||||
<I>HeapTupleIsValid()</I> returns true. Then do a
|
|
||||||
<I>heap_endscan().</I> <I>Keys</I> can be assigned to the
|
|
||||||
<I>scan.</I> No indexes are used, so all rows are going to be
|
|
||||||
compared to the keys, and only the valid rows returned.</P>
|
|
||||||
|
|
||||||
<P>You can also use <I>heap_fetch()</I> to fetch rows by block
|
|
||||||
number/offset. While scans automatically lock/unlock rows from the
|
|
||||||
buffer cache, with <I>heap_fetch(),</I> you must pass a
|
|
||||||
<I>Buffer</I> pointer, and <I>ReleaseBuffer()</I> it when
|
|
||||||
completed.</P>
|
|
||||||
|
|
||||||
<P>Once you have the row, you can get data that is common to all
|
|
||||||
tuples, like <I>t_self</I> and <I>t_oid,</I> by merely accessing
|
|
||||||
the <I>HeapTuple</I> structure entries. If you need a
|
|
||||||
table-specific column, you should take the HeapTuple pointer, and
|
|
||||||
use the <I>GETSTRUCT()</I> macro to access the table-specific start
|
|
||||||
of the tuple. You then cast the pointer as a <I>Form_pg_proc</I>
|
|
||||||
pointer if you are accessing the pg_proc table, or
|
|
||||||
<I>Form_pg_type</I> if you are accessing pg_type. You can then
|
|
||||||
access the columns by using a structure pointer:</P>
|
|
||||||
<PRE>
|
|
||||||
<CODE>((Form_pg_class) GETSTRUCT(tuple))->relnatts
|
|
||||||
</CODE>
|
|
||||||
</PRE>
|
|
||||||
You must not directly change <I>live</I> tuples in this way. The
|
|
||||||
best way is to use <I>heap_modifytuple()</I> and pass it your
|
|
||||||
original tuple, and the values you want changed. It returns a
|
|
||||||
palloc'ed tuple, which you pass to <I>heap_replace().</I> You can
|
|
||||||
delete tuples by passing the tuple's <I>t_self</I> to
|
|
||||||
<I>heap_destroy().</I> You use <I>t_self</I> for
|
|
||||||
<I>heap_update()</I> too. Remember, tuples can be either system
|
|
||||||
cache copies, which may go away after you call
|
|
||||||
<I>ReleaseSysCache()</I>, or read directly from disk buffers, which
|
|
||||||
go away when you <I>heap_getnext()</I>, <I>heap_endscan</I>, or
|
|
||||||
<I>ReleaseBuffer()</I>, in the <I>heap_fetch()</I> case. Or it may
|
|
||||||
be a palloc'ed tuple, that you must <I>pfree()</I> when finished.
|
|
||||||
|
|
||||||
<H3><A name="10">10</A>) What is elog()?</H3>
|
|
||||||
|
|
||||||
<P><I>elog()</I> is used to send messages to the front-end, and
|
|
||||||
optionally terminate the current query being processed. The first
|
|
||||||
parameter is an elog level of <I>NOTICE,</I> <I>DEBUG,</I>
|
|
||||||
<I>ERROR,</I> or <I>FATAL.</I> <I>NOTICE</I> prints on the user's
|
|
||||||
terminal and the postmaster logs. <I>DEBUG</I> prints only in the
|
|
||||||
postmaster logs. <I>ERROR</I> prints in both places, and terminates
|
|
||||||
the current query, never returning from the call. <I>FATAL</I>
|
|
||||||
terminates the backend process. The remaining parameters of
|
|
||||||
<I>elog</I> are a <I>printf</I>-style set of parameters to
|
|
||||||
print.</P>
|
|
||||||
|
|
||||||
<H3><A name="11">11</A>) What is configure all about?</H3>
|
|
||||||
|
|
||||||
<P>The files <I>configure</I> and <I>configure.in</I> are part of
|
<P>The files <I>configure</I> and <I>configure.in</I> are part of
|
||||||
the GNU <I>autoconf</I> package. Configure allows us to test for
|
the GNU <I>autoconf</I> package. Configure allows us to test for
|
||||||
@ -497,7 +346,7 @@
|
|||||||
all files derived by configure are removed, so you see only the
|
all files derived by configure are removed, so you see only the
|
||||||
file contained in the source distribution.</P>
|
file contained in the source distribution.</P>
|
||||||
|
|
||||||
<H3><A name="12">12</A>) How do I add a new port?</H3>
|
<H3><A name="1.8">1.8</A>) How do I add a new port?</H3>
|
||||||
|
|
||||||
<P>There are a variety of places that need to be modified to add a
|
<P>There are a variety of places that need to be modified to add a
|
||||||
new port. First, start in the <I>src/template</I> directory. Add an
|
new port. First, start in the <I>src/template</I> directory. Add an
|
||||||
@ -516,20 +365,7 @@
|
|||||||
handling. There is a <I>backend/port</I> directory if you need
|
handling. There is a <I>backend/port</I> directory if you need
|
||||||
special files for your OS.</P>
|
special files for your OS.</P>
|
||||||
|
|
||||||
<H3><A name="13">13</A>) What is CommandCounterIncrement()?</H3>
|
<H3><A name="1.9">1.9</A>) Why don't we use threads in the
|
||||||
|
|
||||||
<P>Normally, transactions can not see the rows they modify. This
|
|
||||||
allows <CODE>UPDATE foo SET x = x + 1</CODE> to work correctly.</P>
|
|
||||||
|
|
||||||
<P>However, there are cases where a transactions needs to see rows
|
|
||||||
affected in previous parts of the transaction. This is accomplished
|
|
||||||
using a Command Counter. Incrementing the counter allows
|
|
||||||
transactions to be broken into pieces so each piece can see rows
|
|
||||||
modified by previous pieces. <I>CommandCounterIncrement()</I>
|
|
||||||
increments the Command Counter, creating a new part of the
|
|
||||||
transaction.</P>
|
|
||||||
|
|
||||||
<H3><A name="14">14</A>) Why don't we use threads in the
|
|
||||||
backend?</H3>
|
backend?</H3>
|
||||||
|
|
||||||
<P>There are several reasons threads are not used:</P>
|
<P>There are several reasons threads are not used:</P>
|
||||||
@ -545,7 +381,7 @@
|
|||||||
<LI>The backend code would be more complex.</LI>
|
<LI>The backend code would be more complex.</LI>
|
||||||
</UL>
|
</UL>
|
||||||
|
|
||||||
<H3><A name="15">15</A>) How are RPM's packaged?</H3>
|
<H3><A name="1.10">1.10</A>) How are RPM's packaged?</H3>
|
||||||
|
|
||||||
<P>This was written by Lamar Owen:</P>
|
<P>This was written by Lamar Owen:</P>
|
||||||
|
|
||||||
@ -650,7 +486,7 @@
|
|||||||
<P>Of course, there are many projects that DO include all the files
|
<P>Of course, there are many projects that DO include all the files
|
||||||
necessary to build RPMs from their Official Tarball (TM).</P>
|
necessary to build RPMs from their Official Tarball (TM).</P>
|
||||||
|
|
||||||
<H3><A name="16">16</A>) How are CVS branches managed?</H3>
|
<H3><A name="1.11">1.11</A>) How are CVS branches managed?</H3>
|
||||||
|
|
||||||
<P>This was written by Tom Lane:</P>
|
<P>This was written by Tom Lane:</P>
|
||||||
|
|
||||||
@ -720,70 +556,244 @@
|
|||||||
dot-release or two, so that we won't have to double-patch the first
|
dot-release or two, so that we won't have to double-patch the first
|
||||||
wave of fixes.</P>
|
wave of fixes.</P>
|
||||||
|
|
||||||
<H3><A name="17">17</A>) How go I get involved in PostgreSQL
|
<CENTER>
|
||||||
development?</H3>
|
<H2>Technical Questions</H2>
|
||||||
|
</CENTER>
|
||||||
|
|
||||||
<P>This was written by Lamar Owen:</P>
|
<H3><A name="2.1">2.1</A>) How do I efficiently access information in
|
||||||
|
tables from the backend code?</H3>
|
||||||
|
|
||||||
<P>2001-06-22</P>
|
<P>You first need to find the tuples(rows) you are interested in.
|
||||||
|
There are two ways. First, <I>SearchSysCache()</I> and related
|
||||||
|
functions allow you to query the system catalogs. This is the
|
||||||
|
preferred way to access system tables, because the first call to
|
||||||
|
the cache loads the needed rows, and future requests can return the
|
||||||
|
results without accessing the base table. The caches use system
|
||||||
|
table indexes to look up tuples. A list of available caches is
|
||||||
|
located in <I>src/backend/utils/cache/syscache.c.</I>
|
||||||
|
<I>src/backend/utils/cache/lsyscache.c</I> contains many
|
||||||
|
column-specific cache lookup functions.</P>
|
||||||
|
|
||||||
<B>What open source development process is used by the PostgreSQL
|
<P>The rows returned are cache-owned versions of the heap rows.
|
||||||
team?</B>
|
Therefore, you must not modify or delete the tuple returned by
|
||||||
|
<I>SearchSysCache()</I>. What you <I>should</I> do is release it
|
||||||
|
with <I>ReleaseSysCache()</I> when you are done using it; this
|
||||||
|
informs the cache that it can discard that tuple if necessary. If
|
||||||
|
you neglect to call <I>ReleaseSysCache()</I>, then the cache entry
|
||||||
|
will remain locked in the cache until end of transaction, which is
|
||||||
|
tolerable but not very desirable.</P>
|
||||||
|
|
||||||
<P>Read HACKERS for six months (or a full release cycle, whichever
|
<P>If you can't use the system cache, you will need to retrieve the
|
||||||
is longer). Really. HACKERS _is_the process. The process is not
|
data directly from the heap table, using the buffer cache that is
|
||||||
well documented (AFAIK -- it may be somewhere that I am not aware
|
shared by all backends. The backend automatically takes care of
|
||||||
of) -- and it changes continually.</P>
|
loading the rows into the buffer cache.</P>
|
||||||
|
|
||||||
<B>What development environment (OS, system, compilers, etc) is
|
<P>Open the table with <I>heap_open().</I> You can then start a
|
||||||
required to develop code?</B>
|
table scan with <I>heap_beginscan(),</I> then use
|
||||||
|
<I>heap_getnext()</I> and continue as long as
|
||||||
|
<I>HeapTupleIsValid()</I> returns true. Then do a
|
||||||
|
<I>heap_endscan().</I> <I>Keys</I> can be assigned to the
|
||||||
|
<I>scan.</I> No indexes are used, so all rows are going to be
|
||||||
|
compared to the keys, and only the valid rows returned.</P>
|
||||||
|
|
||||||
<P><A href="developers.postgresql.org">Developers Corner</A> on the
|
<P>You can also use <I>heap_fetch()</I> to fetch rows by block
|
||||||
website has links to this information. The distribution tarball
|
number/offset. While scans automatically lock/unlock rows from the
|
||||||
itself includes all the extra tools and documents that go beyond a
|
buffer cache, with <I>heap_fetch(),</I> you must pass a
|
||||||
good Unix-like development environment. In general, a modern unix
|
<I>Buffer</I> pointer, and <I>ReleaseBuffer()</I> it when
|
||||||
with a modern gcc, GNU make or equivalent, autoconf (of a
|
completed.</P>
|
||||||
particular version), and good working knowledge of those tools are
|
|
||||||
required.</P>
|
|
||||||
|
|
||||||
<B>What areas need support?</B>
|
<P>Once you have the row, you can get data that is common to all
|
||||||
|
tuples, like <I>t_self</I> and <I>t_oid,</I> by merely accessing
|
||||||
|
the <I>HeapTuple</I> structure entries. If you need a
|
||||||
|
table-specific column, you should take the HeapTuple pointer, and
|
||||||
|
use the <I>GETSTRUCT()</I> macro to access the table-specific start
|
||||||
|
of the tuple. You then cast the pointer as a <I>Form_pg_proc</I>
|
||||||
|
pointer if you are accessing the pg_proc table, or
|
||||||
|
<I>Form_pg_type</I> if you are accessing pg_type. You can then
|
||||||
|
access the columns by using a structure pointer:</P>
|
||||||
|
<PRE>
|
||||||
|
<CODE>((Form_pg_class) GETSTRUCT(tuple))->relnatts
|
||||||
|
</CODE>
|
||||||
|
</PRE>
|
||||||
|
You must not directly change <I>live</I> tuples in this way. The
|
||||||
|
best way is to use <I>heap_modifytuple()</I> and pass it your
|
||||||
|
original tuple, and the values you want changed. It returns a
|
||||||
|
palloc'ed tuple, which you pass to <I>heap_replace().</I> You can
|
||||||
|
delete tuples by passing the tuple's <I>t_self</I> to
|
||||||
|
<I>heap_destroy().</I> You use <I>t_self</I> for
|
||||||
|
<I>heap_update()</I> too. Remember, tuples can be either system
|
||||||
|
cache copies, which may go away after you call
|
||||||
|
<I>ReleaseSysCache()</I>, or read directly from disk buffers, which
|
||||||
|
go away when you <I>heap_getnext()</I>, <I>heap_endscan</I>, or
|
||||||
|
<I>ReleaseBuffer()</I>, in the <I>heap_fetch()</I> case. Or it may
|
||||||
|
be a palloc'ed tuple, that you must <I>pfree()</I> when finished.
|
||||||
|
|
||||||
<P>The TODO list.</P>
|
<H3><A name="2.2">2.2</A>) Why are table, column, type, function, view
|
||||||
|
names sometimes referenced as <I>Name</I> or <I>NameData,</I> and
|
||||||
|
sometimes as <I>char *?</I></H3>
|
||||||
|
|
||||||
<P>You've made the first step, by finding and subscribing to
|
<P>Table, column, type, function, and view names are stored in
|
||||||
HACKERS. Once you find an area to look at in the TODO, and have
|
system tables in columns of type <I>Name.</I> Name is a
|
||||||
read the documentation on the internals, etc, then you check out a
|
fixed-length, null-terminated type of <I>NAMEDATALEN</I> bytes.
|
||||||
current CVS,write what you are going to write (keeping your CVS
|
(The default value for NAMEDATALEN is 32 bytes.)</P>
|
||||||
checkout up to date in the process), and make up a patch (as a
|
<PRE>
|
||||||
context diff only) and send to the PATCHES list, prefereably.</P>
|
<CODE>typedef struct nameData
|
||||||
|
{
|
||||||
|
char data[NAMEDATALEN];
|
||||||
|
} NameData;
|
||||||
|
typedef NameData *Name;
|
||||||
|
</CODE>
|
||||||
|
</PRE>
|
||||||
|
Table, column, type, function, and view names that come into the
|
||||||
|
backend via user queries are stored as variable-length,
|
||||||
|
null-terminated character strings.
|
||||||
|
|
||||||
<P>Discussion on the patch typically happens here. If the patch
|
<P>Many functions are called with both types of names, ie.
|
||||||
adds a major feature, it would be a good idea to talk about it
|
<I>heap_open().</I> Because the Name type is null-terminated, it is
|
||||||
first on the HACKERS list, in order to increase the chances of it
|
safe to pass it to a function expecting a char *. Because there are
|
||||||
being accepted, as well as toavoid duplication of effort. Note that
|
many cases where on-disk names(Name) are compared to user-supplied
|
||||||
experienced developers with a proven track record usually get the
|
names(char *), there are many cases where Name and char * are used
|
||||||
big jobs -- for more than one reason. Also note that PostgreSQL is
|
interchangeably.</P>
|
||||||
highly portable -- nonportable code will likely be dismissed out of
|
|
||||||
hand.</P>
|
|
||||||
|
|
||||||
<P>Once your contributions get accepted, things move from there.
|
<H3><A name="2.3">2.3</A>) Why do we use <I>Node</I> and <I>List</I> to
|
||||||
Typically, you would be added as a developer on the list on the
|
make data structures?</H3>
|
||||||
website when one of the other developers recommends it. Membership
|
|
||||||
on the steering committee is by invitation only, by the other
|
|
||||||
steering committee members, from what I have gathered watching
|
|
||||||
froma distance.</P>
|
|
||||||
|
|
||||||
<P>I make these statements from having watched the process for over
|
<P>We do this because this allows a consistent way to pass data
|
||||||
two years.</P>
|
inside the backend in a flexible way. Every node has a
|
||||||
|
<I>NodeTag</I> which specifies what type of data is inside the
|
||||||
|
Node. <I>Lists</I> are groups of <I>Nodes chained together as a
|
||||||
|
forward-linked list.</I></P>
|
||||||
|
|
||||||
|
<P>Here are some of the <I>List</I> manipulation commands:</P>
|
||||||
|
|
||||||
|
<BLOCKQUOTE>
|
||||||
|
<DL>
|
||||||
|
<DT>lfirst(i)</DT>
|
||||||
|
|
||||||
|
<DD>return the data at list element <I>i.</I></DD>
|
||||||
|
|
||||||
|
<DT>lnext(i)</DT>
|
||||||
|
|
||||||
|
<DD>return the next list element after <I>i.</I></DD>
|
||||||
|
|
||||||
|
<DT>foreach(i, list)</DT>
|
||||||
|
|
||||||
|
<DD>
|
||||||
|
loop through <I>list,</I> assigning each list element to
|
||||||
|
<I>i.</I> It is important to note that <I>i</I> is a List *,
|
||||||
|
not the data in the <I>List</I> element. You need to use
|
||||||
|
<I>lfirst(i)</I> to get at the data. Here is a typical code
|
||||||
|
snipped that loops through a List containing <I>Var *'s</I>
|
||||||
|
and processes each one:
|
||||||
|
<PRE>
|
||||||
|
<CODE>List *i, *list;
|
||||||
|
|
||||||
|
foreach(i, list)
|
||||||
|
{
|
||||||
|
Var *var = lfirst(i);
|
||||||
|
|
||||||
|
/* process var here */
|
||||||
|
}
|
||||||
|
</CODE>
|
||||||
|
</PRE>
|
||||||
|
</DD>
|
||||||
|
|
||||||
|
<DT>lcons(node, list)</DT>
|
||||||
|
|
||||||
|
<DD>add <I>node</I> to the front of <I>list,</I> or create a
|
||||||
|
new list with <I>node</I> if <I>list</I> is <I>NIL.</I></DD>
|
||||||
|
|
||||||
|
<DT>lappend(list, node)</DT>
|
||||||
|
|
||||||
|
<DD>add <I>node</I> to the end of <I>list.</I> This is more
|
||||||
|
expensive that lcons.</DD>
|
||||||
|
|
||||||
|
<DT>nconc(list1, list2)</DT>
|
||||||
|
|
||||||
|
<DD>Concat <I>list2</I> on to the end of <I>list1.</I></DD>
|
||||||
|
|
||||||
|
<DT>length(list)</DT>
|
||||||
|
|
||||||
|
<DD>return the length of the <I>list.</I></DD>
|
||||||
|
|
||||||
|
<DT>nth(i, list)</DT>
|
||||||
|
|
||||||
|
<DD>return the <I>i</I>'th element in <I>list.</I></DD>
|
||||||
|
|
||||||
|
<DT>lconsi, ...</DT>
|
||||||
|
|
||||||
|
<DD>There are integer versions of these: <I>lconsi, lappendi,
|
||||||
|
nthi.</I> <I>List's</I> containing integers instead of Node
|
||||||
|
pointers are used to hold list of relation object id's and
|
||||||
|
other integer quantities.</DD>
|
||||||
|
</DL>
|
||||||
|
</BLOCKQUOTE>
|
||||||
|
You can print nodes easily inside <I>gdb.</I> First, to disable
|
||||||
|
output truncation when you use the gdb <I>print</I> command:
|
||||||
|
<PRE>
|
||||||
|
<CODE>(gdb) set print elements 0
|
||||||
|
</CODE>
|
||||||
|
</PRE>
|
||||||
|
Instead of printing values in gdb format, you can use the next two
|
||||||
|
commands to print out List, Node, and structure contents in a
|
||||||
|
verbose format that is easier to understand. List's are unrolled
|
||||||
|
into nodes, and nodes are printed in detail. The first prints in a
|
||||||
|
short format, and the second in a long format:
|
||||||
|
<PRE>
|
||||||
|
<CODE>(gdb) call print(any_pointer)
|
||||||
|
(gdb) call pprint(any_pointer)
|
||||||
|
</CODE>
|
||||||
|
</PRE>
|
||||||
|
The output appears in the postmaster log file, or on your screen if
|
||||||
|
you are running a backend directly without a postmaster.
|
||||||
|
|
||||||
|
<H3><A name="2.4">2.4</A>) I just added a field to a structure. What
|
||||||
|
else should I do?</H3>
|
||||||
|
|
||||||
|
<P>The structures passing around from the parser, rewrite,
|
||||||
|
optimizer, and executor require quite a bit of support. Most
|
||||||
|
structures have support routines in <I>src/backend/nodes</I> used
|
||||||
|
to create, copy, read, and output those structures. Make sure you
|
||||||
|
add support for your new field to these files. Find any other
|
||||||
|
places the structure may need code for your new field. <I>mkid</I>
|
||||||
|
is helpful with this (see above).</P>
|
||||||
|
|
||||||
|
<H3><A name="2.5">2.5</A>) Why do we use <I>palloc</I>() and
|
||||||
|
<I>pfree</I>() to allocate memory?</H3>
|
||||||
|
|
||||||
|
<P><I>palloc()</I> and <I>pfree()</I> are used in place of malloc()
|
||||||
|
and free() because we automatically free all memory allocated when
|
||||||
|
a transaction completes. This makes it easier to make sure we free
|
||||||
|
memory that gets allocated in one place, but only freed much later.
|
||||||
|
There are several contexts that memory can be allocated in, and
|
||||||
|
this controls when the allocated memory is automatically freed by
|
||||||
|
the backend.</P>
|
||||||
|
|
||||||
|
<H3><A name="2.6">2.6</A>) What is elog()?</H3>
|
||||||
|
|
||||||
|
<P><I>elog()</I> is used to send messages to the front-end, and
|
||||||
|
optionally terminate the current query being processed. The first
|
||||||
|
parameter is an elog level of <I>NOTICE,</I> <I>DEBUG,</I>
|
||||||
|
<I>ERROR,</I> or <I>FATAL.</I> <I>NOTICE</I> prints on the user's
|
||||||
|
terminal and the postmaster logs. <I>DEBUG</I> prints only in the
|
||||||
|
postmaster logs. <I>ERROR</I> prints in both places, and terminates
|
||||||
|
the current query, never returning from the call. <I>FATAL</I>
|
||||||
|
terminates the backend process. The remaining parameters of
|
||||||
|
<I>elog</I> are a <I>printf</I>-style set of parameters to
|
||||||
|
print.</P>
|
||||||
|
|
||||||
|
<H3><A name="2.7">2.7</A>) What is CommandCounterIncrement()?</H3>
|
||||||
|
|
||||||
|
<P>Normally, transactions can not see the rows they modify. This
|
||||||
|
allows <CODE>UPDATE foo SET x = x + 1</CODE> to work correctly.</P>
|
||||||
|
|
||||||
|
<P>However, there are cases where a transactions needs to see rows
|
||||||
|
affected in previous parts of the transaction. This is accomplished
|
||||||
|
using a Command Counter. Incrementing the counter allows
|
||||||
|
transactions to be broken into pieces so each piece can see rows
|
||||||
|
modified by previous pieces. <I>CommandCounterIncrement()</I>
|
||||||
|
increments the Command Counter, creating a new part of the
|
||||||
|
transaction.</P>
|
||||||
|
|
||||||
<P>To see a good example of how one goes about this, search the
|
|
||||||
archives for the name 'Tom Lane' and see what his first post
|
|
||||||
consisted of, and where he took things. In particular, note that
|
|
||||||
this hasn't been _that_ long ago -- and his bugfixing and general
|
|
||||||
deep knowledge with this codebase is legendary. Take a few days to
|
|
||||||
read after him. And pay special attention to both the sheer
|
|
||||||
quantity as well as the painstaking quality of his work. Both are
|
|
||||||
in high demand.</P>
|
|
||||||
</BODY>
|
</BODY>
|
||||||
</HTML>
|
</HTML>
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user