2003-09-30 18:48:04 +04:00
|
|
|
/** \mainpage NetSurf Documentation for Developers
|
2003-02-09 16:11:43 +03:00
|
|
|
|
|
|
|
This document contains an overview of the code for NetSurf, and any other
|
|
|
|
information useful to developers.
|
|
|
|
|
2003-09-30 18:48:04 +04:00
|
|
|
\section overview Source Code Overview
|
2003-02-09 16:11:43 +03:00
|
|
|
|
|
|
|
The source is split at top level as follows:
|
2003-09-30 18:48:04 +04:00
|
|
|
- \ref content
|
|
|
|
- \ref css
|
|
|
|
- \ref render
|
|
|
|
- Non-platform specific front-end (desktop/)
|
|
|
|
- RISC OS specific code (riscos/)
|
|
|
|
- Unix debug build specific code (debug/)
|
|
|
|
- Misc. useful functions (utils/)
|
2003-02-09 16:11:43 +03:00
|
|
|
|
2003-09-30 18:48:04 +04:00
|
|
|
\section content Fetching, caching, and converting content (content/)
|
2003-02-09 18:58:29 +03:00
|
|
|
|
2003-09-30 18:48:04 +04:00
|
|
|
Each URL is stored in a struct ::content. This structure contains a union with
|
2003-02-09 18:58:29 +03:00
|
|
|
fields for each type of data (HTML, CSS, images).
|
|
|
|
|
|
|
|
The content_* functions provide a general interface for handling these
|
|
|
|
structures. A content of a specified type is created using content_create(),
|
|
|
|
data is fed to it using content_process_data(), terminated by a call to
|
|
|
|
content_convert(), which converts the content into a structure which can be
|
|
|
|
displayed easily.
|
|
|
|
|
|
|
|
The cache stores this converted content. When content is retrieved from the
|
|
|
|
cache, content_revive() should result in content which can be displayed (eg. by
|
|
|
|
loading any images and styles required and updating pointers to them).
|
|
|
|
|
2003-07-08 02:10:51 +04:00
|
|
|
Code should not usually use the fetch_* and cache_* functions directly.
|
|
|
|
Instead use fetchcache(), which checks the cache for a url and
|
2003-02-09 18:58:29 +03:00
|
|
|
fetches, converts, and caches it if not present.
|
|
|
|
|
2003-09-30 18:48:04 +04:00
|
|
|
\section css CSS parser and interfaces (css/)
|
2003-04-06 22:16:14 +04:00
|
|
|
|
2004-05-01 21:54:49 +04:00
|
|
|
CSS is tokenised by a re2c-generated scanner (scanner.l), and then parsed into a
|
2003-04-06 22:16:14 +04:00
|
|
|
memory representation by a lemon-generated parser (parser.y, ruleset.c).
|
|
|
|
|
|
|
|
Styles are retrieved using css_get_style(). They can be cascaded by
|
|
|
|
css_cascade().
|
|
|
|
|
2004-05-01 21:54:49 +04:00
|
|
|
- http://re2c.sourceforge.net/
|
2003-09-30 18:48:04 +04:00
|
|
|
- http://www.hwaci.com/sw/lemon/
|
2003-04-06 22:16:14 +04:00
|
|
|
|
2003-09-30 18:48:04 +04:00
|
|
|
\section render HTML processing and layout (render/)
|
2003-02-09 16:11:43 +03:00
|
|
|
|
|
|
|
This is the process to render an HTML document:
|
|
|
|
|
|
|
|
First the HTML is parsed to a tree of xmlNodes using the HTML parser in libxml.
|
|
|
|
This happens simultaneously with the fetch [html_process_data()].
|
|
|
|
|
|
|
|
Any stylesheets which the document depends on are fetched and parsed.
|
|
|
|
|
|
|
|
The tree is converted to a 'box tree' by xml_to_box(). The box tree contains a
|
|
|
|
node for each block, inline element, table, etc. The aim of this stage is to
|
|
|
|
determine the 'display' or 'float' CSS property of each element, and create the
|
|
|
|
corresponding node in the box tree. At this stage the style for each element is
|
|
|
|
also calculated (from CSS rules and element attributes). The tree is normalised
|
|
|
|
so that each node only has children of permitted types (eg. TABLE_CELLs must be
|
|
|
|
within TABLE_ROWs) by adding missing boxes.
|
|
|
|
|
|
|
|
The box tree is passed to the layout engine [layout_document()], which finds the
|
|
|
|
space required by each element and assigns coordinates to the boxes, based on
|
|
|
|
the style of each element and the available width. This includes formatting
|
|
|
|
inline elements into lines, laying out tables, and positioning floats. The
|
|
|
|
layout engine can be invoked again on a already laid out box tree to reformat it
|
|
|
|
to a new width. Coordinates in the box tree are relative to the position of the
|
|
|
|
parent node.
|
|
|
|
|
|
|
|
The box tree can then be rendered using each node's coordinates.
|
|
|
|
|
2003-09-30 18:48:04 +04:00
|
|
|
\section links Other Documentation
|
2003-02-09 16:11:43 +03:00
|
|
|
|
2003-09-30 18:48:04 +04:00
|
|
|
RISC OS specific protocols:
|
|
|
|
- Plugin http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/funcspec.html
|
2003-06-02 01:56:27 +04:00
|
|
|
http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/browse-plugins.html
|
2003-09-30 18:48:04 +04:00
|
|
|
- URI http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/uri.html
|
|
|
|
- URL http://www.vigay.com/inet/inet_url.html
|
|
|
|
- Nested WIMP http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/nested.html
|
2003-06-02 01:56:27 +04:00
|
|
|
|
2003-09-30 18:48:04 +04:00
|
|
|
Specifications:
|
|
|
|
- HTML 4.01 http://www.w3.org/TR/html401/
|
2003-12-26 19:19:29 +03:00
|
|
|
(see also http://www.w3.org/MarkUp/)
|
2003-09-30 18:48:04 +04:00
|
|
|
- XHTML 1.0 http://www.w3.org/TR/xhtml1/
|
|
|
|
- CSS 2.1 http://www.w3.org/TR/CSS21/
|
|
|
|
- HTTP/1.1 http://www.w3.org/Protocols/rfc2616/rfc2616.html
|
2004-03-03 03:20:12 +03:00
|
|
|
and errata http://purl.org/NET/http-errata
|
2003-12-26 19:19:29 +03:00
|
|
|
(see also http://www.w3.org/Protocols/)
|
|
|
|
- HTTP Authentication http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2617.html
|
2003-09-30 18:48:04 +04:00
|
|
|
- PNG http://www.w3.org/Graphics/PNG/
|
2003-12-26 19:19:29 +03:00
|
|
|
- URI http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2396.html
|
|
|
|
(see also http://www.w3.org/Addressing/ and RFC 2616)
|
|
|
|
- Cookies http://wp.netscape.com/newsref/std/cookie_spec.html and
|
|
|
|
http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2109.html
|
2003-02-09 16:11:43 +03:00
|
|
|
|
2003-09-30 18:48:04 +04:00
|
|
|
\section libs Libraries
|
2003-02-09 16:11:43 +03:00
|
|
|
|
2003-02-26 00:00:27 +03:00
|
|
|
Get these compiled for RISC OS with headers from
|
|
|
|
http://netsurf.strcprstskrzkrk.co.uk/developer/
|
2003-12-26 19:19:29 +03:00
|
|
|
- libxml (XML and HTML parser) http://www.xmlsoft.org/
|
2003-09-30 18:48:04 +04:00
|
|
|
- libcurl (HTTP, FTP, etc) http://curl.haxx.se/libcurl/
|
|
|
|
- OSLib (C interface to RISC OS SWIs) http://ro-oslib.sourceforge.net/
|
|
|
|
- libpng (PNG support) http://www.libpng.org/pub/png/libpng.html
|
2004-05-01 21:54:49 +04:00
|
|
|
- libjpeg (JPEG support) http://www.ijg.org/
|
2003-09-30 18:48:04 +04:00
|
|
|
- zlib http://www.gzip.org/zlib/
|
2004-05-01 21:54:49 +04:00
|
|
|
- OpenSSL (HTTPS support) http://www.openssl.org/
|
2003-02-26 00:00:27 +03:00
|
|
|
|
2003-10-08 01:41:55 +04:00
|
|
|
\section addcss Implementing a new CSS property
|
|
|
|
|
|
|
|
In this section I go through adding a CSS property to NetSurf, using the
|
|
|
|
'white-space' property as an example. -- James Bursa
|
|
|
|
|
|
|
|
1. Read and understand the description of the property in the CSS specification
|
|
|
|
(I have worked from CSS 2, but now 2.1 is probably better).
|
|
|
|
|
|
|
|
These changes are required in the css directory:
|
|
|
|
|
|
|
|
2. Add the property to css_enums. This file is used to generate css_enum.h
|
|
|
|
and css_enum.c:
|
|
|
|
\code css_white_space inherit normal nowrap pre \endcode
|
|
|
|
(I'm not doing pre-wrap and pre-line for now.)
|
|
|
|
|
|
|
|
3. Add fields to struct ::css_style to represent the property:
|
|
|
|
\code css_white_space white_space; \endcode
|
|
|
|
|
|
|
|
4. Add a parser function for the property to ruleset.c. Declare a new
|
|
|
|
function:
|
|
|
|
\code static void parse_white_space(struct css_style * const s, const struct css_node * const v); \endcode
|
|
|
|
and add it to ::property_table:
|
|
|
|
\code { "white-space", parse_white_space }, \endcode
|
|
|
|
This will cause the function to be called when the parser comes to a rule
|
|
|
|
giving a value for white-space. The function is passed a linked list of
|
|
|
|
struct ::css_node, each of which corresponds to a token in the CSS source,
|
|
|
|
and must update s to correspond to that rule. For white-space, the
|
|
|
|
implementation is simply:
|
|
|
|
\code
|
|
|
|
void parse_white_space(struct css_style * const s, const struct css_node * const v)
|
|
|
|
{
|
|
|
|
css_white_space z;
|
|
|
|
if (v->type != CSS_NODE_IDENT || v->next != 0)
|
|
|
|
return;
|
|
|
|
z = css_white_space_parse(v->data);
|
|
|
|
if (z != CSS_WHITE_SPACE_UNKNOWN)
|
|
|
|
s->white_space = z;
|
|
|
|
}
|
|
|
|
\endcode
|
|
|
|
First we check that the value consists of exactly one identifier, as
|
|
|
|
described in the specification. If it is not, we ignore it, since it may be
|
|
|
|
some future CSS. The css_white_space_parse() function is generated in
|
|
|
|
css_enum.c, and converts a string giving a value to a constant. If the
|
|
|
|
conversion succeeds, the style s is updated.
|
|
|
|
|
|
|
|
5. Add defaults for the style to ::css_base_style, ::css_empty_style, and
|
|
|
|
::css_blank_style in css.c. The value in css_base_style should be the one
|
|
|
|
given as 'Initial' in the spec, and the value in css_empty_style should be
|
|
|
|
inherit. If 'Inherited' is yes in the spec, the value in css_blank_style
|
|
|
|
should be inherit, otherwise it should be the one given as 'Initial'. Thus
|
|
|
|
for white-space, which has "Initial: normal, Inherited: yes" in the spec, we
|
|
|
|
use CSS_WHITE_SPACE_NORMAL in css_base_style and CSS_WHITE_SPACE_INHERIT in
|
|
|
|
the other two.
|
|
|
|
|
|
|
|
6. Edit css_cascade() and css_merge() in css.c to handle the property. In
|
|
|
|
both cases for white-space this looks like:
|
|
|
|
\code
|
|
|
|
if (apply->white_space != CSS_WHITE_SPACE_INHERIT)
|
|
|
|
style->white_space = apply->white_space;
|
|
|
|
\endcode
|
|
|
|
Add the property to css_dump_style() (not essential).
|
|
|
|
|
|
|
|
Now the box, layout and / or redraw code needs to be changed to use the new
|
|
|
|
style property. This varies much more depending on the property.
|
|
|
|
|
|
|
|
For white-space, convert_xml_to_box() was changed to split text at newlines if
|
|
|
|
white-space was pre, and to replace spaces with hard spaces for nowrap.
|
|
|
|
Additionally, calculate_inline_container_widths() was changed to give the
|
|
|
|
appropriate minimum width for pre and nowrap.
|
|
|
|
|
2004-05-19 17:21:57 +04:00
|
|
|
\section errorhandling Error handling
|
|
|
|
|
|
|
|
This section gives some suggestions for error handling in the code.
|
|
|
|
|
|
|
|
The most common serious error is memory exhaustion. Previously we used xcalloc()
|
|
|
|
etc. instead of malloc(), so that no recovery code was required, and NetSurf
|
|
|
|
would just exit. We should no longer use this. If malloc(), strdup(), etc.
|
|
|
|
fails, clean up and free any partially complete structures leaving data in a
|
|
|
|
consistent state, and return a value which indicates failure, eg. 0 for
|
|
|
|
functions which return a pointer (document the value in the function
|
|
|
|
documentation). The caller should then propagate the failure up in the same way.
|
|
|
|
At some point, the error should stop being passed up and be reported to the user
|
|
|
|
using
|
|
|
|
\code
|
|
|
|
warn_user("NoMemory", 0);
|
|
|
|
\endcode
|
|
|
|
|
|
|
|
The other common error is one returned by a RISC OS SWI. Always use "X" SWIs,
|
|
|
|
something like this:
|
|
|
|
\code
|
|
|
|
os_error *error;
|
|
|
|
error = xwimp_get_pointer_info(&pointer);
|
|
|
|
if (error) {
|
|
|
|
LOG(("xwimp_get_pointer_info: 0x%x: %s\n",
|
|
|
|
error->errnum, error->errmess));
|
|
|
|
warn_user("WimpError", error->errmess);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
\endcode
|
|
|
|
|
|
|
|
If an error occurs during initialisation, in most cases exit immediately using
|
|
|
|
die(), since this indicates that there is already insufficient memory, or a
|
|
|
|
resource file is corrupted, etc.
|
|
|
|
|
2003-09-30 18:48:04 +04:00
|
|
|
*/
|