netsurf/Docs/developer

/** \mainpage NetSurf Documentation for Developers

This document contains an overview of the code for NetSurf, and any other
information useful to developers.

\section overview Source Code Overview

The source is split at top level as follows:
- \ref content
- \ref css
- \ref render
- Non-platform specific front-end (desktop/)
- RISC OS specific code (riscos/)
- Unix debug build specific code (debug/)
- Misc. useful functions (utils/)

\section content Fetching, caching, and converting content (content/)

Each URL is stored in a struct ::content. This structure contains a union with
fields for each type of data (HTML, CSS, images).

The content_* functions provide a general interface for handling these
structures. A content of a specified type is created using content_create(),
data is fed to it using content_process_data(), terminated by a call to
content_convert(), which converts the content into a structure which can be
displayed easily.

The cache stores this converted content. When content is retrieved from the
cache, content_revive() should result in content which can be displayed (eg. by
loading any images and styles required and updating pointers to them).

Code should not usually use the fetch_* and cache_* functions directly.
Instead use fetchcache(), which checks the cache for a url and
fetches, converts, and caches it if not present.

\section css CSS parser and interfaces (css/)

CSS is tokenised by a flex-generated scanner (scanner.l), and then parsed into a
memory representation by a lemon-generated parser (parser.y, ruleset.c).

Styles are retrieved using css_get_style(). They can be cascaded by
css_cascade().

- http://lex.sourceforge.net/
- http://www.hwaci.com/sw/lemon/

\section render HTML processing and layout (render/)

This is the process to render an HTML document:

First the HTML is parsed to a tree of xmlNodes using the HTML parser in libxml.
This happens simultaneously with the fetch [html_process_data()].

Any stylesheets which the document depends on are fetched and parsed.

The tree is converted to a 'box tree' by xml_to_box(). The box tree contains a
node for each block, inline element, table, etc. The aim of this stage is to
determine the 'display' or 'float' CSS property of each element, and create the
corresponding node in the box tree. At this stage the style for each element is
also calculated (from CSS rules and element attributes). The tree is normalised
so that each node only has children of permitted types (eg. TABLE_CELLs must be
within TABLE_ROWs) by adding missing boxes.

The box tree is passed to the layout engine [layout_document()], which finds the
space required by each element and assigns coordinates to the boxes, based on
the style of each element and the available width. This includes formatting
inline elements into lines, laying out tables, and positioning floats. The
layout engine can be invoked again on a already laid out box tree to reformat it
to a new width. Coordinates in the box tree are relative to the position of the
parent node.

The box tree can then be rendered using each node's coordinates.

\section links Other Documentation

RISC OS specific protocols:
- Plugin	http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/funcspec.html
		http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/browse-plugins.html
- URI		http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/uri.html
- URL		http://www.vigay.com/inet/inet_url.html
- Nested WIMP	http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/nested.html

Specifications:
- HTML 4.01	http://www.w3.org/TR/html401/
		(see also http://www.w3.org/MarkUp/)
- XHTML 1.0	http://www.w3.org/TR/xhtml1/
- CSS 2.1	http://www.w3.org/TR/CSS21/
- HTTP/1.1	http://www.w3.org/Protocols/rfc2616/rfc2616.html
		(see also http://www.w3.org/Protocols/)
- HTTP Authentication	http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2617.html
- PNG		http://www.w3.org/Graphics/PNG/
- URI		http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2396.html
		(see also http://www.w3.org/Addressing/ and RFC 2616)
- Cookies	http://wp.netscape.com/newsref/std/cookie_spec.html and
		http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2109.html

\section libs Libraries

Get these compiled for RISC OS with headers from
http://netsurf.strcprstskrzkrk.co.uk/developer/
- libxml (XML and HTML parser)		http://www.xmlsoft.org/
- libcurl (HTTP, FTP, etc)		http://curl.haxx.se/libcurl/
- OSLib (C interface to RISC OS SWIs)	http://ro-oslib.sourceforge.net/
- libpng (PNG support)			http://www.libpng.org/pub/png/libpng.html
- zlib					http://www.gzip.org/zlib/
- uri					http://www.nongnu.org/uri/

\section addcss Implementing a new CSS property

In this section I go through adding a CSS property to NetSurf, using the
'white-space' property as an example. -- James Bursa

1. Read and understand the description of the property in the CSS specification
   (I have worked from CSS 2, but now 2.1 is probably better).

These changes are required in the css directory:

2. Add the property to css_enums. This file is used to generate css_enum.h
   and css_enum.c:
   \code css_white_space inherit normal nowrap pre \endcode
   (I'm not doing pre-wrap and pre-line for now.)

3. Add fields to struct ::css_style to represent the property:
   \code css_white_space white_space; \endcode

4. Add a parser function for the property to ruleset.c. Declare a new
   function:
   \code static void parse_white_space(struct css_style * const s, const struct css_node * const v); \endcode
   and add it to ::property_table:
   \code { "white-space",      parse_white_space }, \endcode
   This will cause the function to be called when the parser comes to a rule
   giving a value for white-space. The function is passed a linked list of
   struct ::css_node, each of which corresponds to a token in the CSS source,
   and must update s to correspond to that rule. For white-space, the
   implementation is simply:
\code
void parse_white_space(struct css_style * const s, const struct css_node * const v)
{
	css_white_space z;
	if (v->type != CSS_NODE_IDENT || v->next != 0)
		return;
	z = css_white_space_parse(v->data);
	if (z != CSS_WHITE_SPACE_UNKNOWN)
		s->white_space = z;
}
\endcode
   First we check that the value consists of exactly one identifier, as
   described in the specification. If it is not, we ignore it, since it may be
   some future CSS. The css_white_space_parse() function is generated in
   css_enum.c, and converts a string giving a value to a constant. If the
   conversion succeeds, the style s is updated.

5. Add defaults for the style to ::css_base_style, ::css_empty_style, and
   ::css_blank_style in css.c. The value in css_base_style should be the one
   given as 'Initial' in the spec, and the value in css_empty_style should be
   inherit. If 'Inherited' is yes in the spec, the value in css_blank_style
   should be inherit, otherwise it should be the one given as 'Initial'. Thus
   for white-space, which has "Initial: normal, Inherited: yes" in the spec, we
   use CSS_WHITE_SPACE_NORMAL in css_base_style and CSS_WHITE_SPACE_INHERIT in
   the other two.

6. Edit css_cascade() and css_merge() in css.c to handle the property. In
   both cases for white-space this looks like:
   \code
	if (apply->white_space != CSS_WHITE_SPACE_INHERIT)
		style->white_space = apply->white_space;
   \endcode
   Add the property to css_dump_style() (not essential).

Now the box, layout and / or redraw code needs to be changed to use the new
style property. This varies much more depending on the property.

For white-space, convert_xml_to_box() was changed to split text at newlines if
white-space was pre, and to replace spaces with hard spaces for nowrap.
Additionally, calculate_inline_container_widths() was changed to give the
appropriate minimum width for pre and nowrap.

*/
[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`/** \mainpage NetSurf Documentation for Developers`
[project @ 2003-02-09 13:11:43 by bursa] Notes for developers. svn path=/import/netsurf/; revision=97 2003-02-09 16:11:43 +03:00
			`This document contains an overview of the code for NetSurf, and any other`
			`information useful to developers.`

[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`\section overview Source Code Overview`
[project @ 2003-02-09 13:11:43 by bursa] Notes for developers. svn path=/import/netsurf/; revision=97 2003-02-09 16:11:43 +03:00
			`The source is split at top level as follows:`
[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`- \ref content`
			`- \ref css`
			`- \ref render`
			`- Non-platform specific front-end (desktop/)`
			`- RISC OS specific code (riscos/)`
			`- Unix debug build specific code (debug/)`
			`- Misc. useful functions (utils/)`
[project @ 2003-02-09 13:11:43 by bursa] Notes for developers. svn path=/import/netsurf/; revision=97 2003-02-09 16:11:43 +03:00
[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`\section content Fetching, caching, and converting content (content/)`
[project @ 2003-02-09 15:58:29 by bursa] Document content/. svn path=/import/netsurf/; revision=98 2003-02-09 18:58:29 +03:00
[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`Each URL is stored in a struct ::content. This structure contains a union with`
[project @ 2003-02-09 15:58:29 by bursa] Document content/. svn path=/import/netsurf/; revision=98 2003-02-09 18:58:29 +03:00			`fields for each type of data (HTML, CSS, images).`

			`The content_* functions provide a general interface for handling these`
			`structures. A content of a specified type is created using content_create(),`
			`data is fed to it using content_process_data(), terminated by a call to`
			`content_convert(), which converts the content into a structure which can be`
			`displayed easily.`

			`The cache stores this converted content. When content is retrieved from the`
			`cache, content_revive() should result in content which can be displayed (eg. by`
			`loading any images and styles required and updating pointers to them).`

[project @ 2003-07-07 22:10:51 by jmb] Rewrite plugin system backend. svn path=/import/netsurf/; revision=210 2003-07-08 02:10:51 +04:00			`Code should not usually use the fetch_* and cache_* functions directly.`
			`Instead use fetchcache(), which checks the cache for a url and`
[project @ 2003-02-09 15:58:29 by bursa] Document content/. svn path=/import/netsurf/; revision=98 2003-02-09 18:58:29 +03:00			`fetches, converts, and caches it if not present.`

[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`\section css CSS parser and interfaces (css/)`
[project @ 2003-04-06 18:16:14 by bursa] Describe css code. svn path=/import/netsurf/; revision=117 2003-04-06 22:16:14 +04:00
			`CSS is tokenised by a flex-generated scanner (scanner.l), and then parsed into a`
			`memory representation by a lemon-generated parser (parser.y, ruleset.c).`

			`Styles are retrieved using css_get_style(). They can be cascaded by`
			`css_cascade().`

[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`- http://lex.sourceforge.net/`
			`- http://www.hwaci.com/sw/lemon/`
[project @ 2003-04-06 18:16:14 by bursa] Describe css code. svn path=/import/netsurf/; revision=117 2003-04-06 22:16:14 +04:00
[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`\section render HTML processing and layout (render/)`
[project @ 2003-02-09 13:11:43 by bursa] Notes for developers. svn path=/import/netsurf/; revision=97 2003-02-09 16:11:43 +03:00
			`This is the process to render an HTML document:`

			`First the HTML is parsed to a tree of xmlNodes using the HTML parser in libxml.`
			`This happens simultaneously with the fetch [html_process_data()].`

			`Any stylesheets which the document depends on are fetched and parsed.`

			`The tree is converted to a 'box tree' by xml_to_box(). The box tree contains a`
			`node for each block, inline element, table, etc. The aim of this stage is to`
			`determine the 'display' or 'float' CSS property of each element, and create the`
			`corresponding node in the box tree. At this stage the style for each element is`
			`also calculated (from CSS rules and element attributes). The tree is normalised`
			`so that each node only has children of permitted types (eg. TABLE_CELLs must be`
			`within TABLE_ROWs) by adding missing boxes.`

			`The box tree is passed to the layout engine [layout_document()], which finds the`
			`space required by each element and assigns coordinates to the boxes, based on`
			`the style of each element and the available width. This includes formatting`
			`inline elements into lines, laying out tables, and positioning floats. The`
			`layout engine can be invoked again on a already laid out box tree to reformat it`
			`to a new width. Coordinates in the box tree are relative to the position of the`
			`parent node.`

			`The box tree can then be rendered using each node's coordinates.`

[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`\section links Other Documentation`
[project @ 2003-02-09 13:11:43 by bursa] Notes for developers. svn path=/import/netsurf/; revision=97 2003-02-09 16:11:43 +03:00
[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`RISC OS specific protocols:`
			`- Plugin http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/funcspec.html`
[project @ 2003-06-01 21:56:27 by jmb] RISC OS specific protocol documentation links, #targets in URLs svn path=/import/netsurf/; revision=144 2003-06-02 01:56:27 +04:00			`http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/browse-plugins.html`
[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`- URI http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/uri.html`
			`- URL http://www.vigay.com/inet/inet_url.html`
			`- Nested WIMP http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/nested.html`
[project @ 2003-06-01 21:56:27 by jmb] RISC OS specific protocol documentation links, #targets in URLs svn path=/import/netsurf/; revision=144 2003-06-02 01:56:27 +04:00
[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`Specifications:`
			`- HTML 4.01 http://www.w3.org/TR/html401/`
[project @ 2003-12-26 16:19:29 by bursa] Add links to some more specifications and libraries. svn path=/import/netsurf/; revision=442 2003-12-26 19:19:29 +03:00			`(see also http://www.w3.org/MarkUp/)`
[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`- XHTML 1.0 http://www.w3.org/TR/xhtml1/`
			`- CSS 2.1 http://www.w3.org/TR/CSS21/`
			`- HTTP/1.1 http://www.w3.org/Protocols/rfc2616/rfc2616.html`
[project @ 2003-12-26 16:19:29 by bursa] Add links to some more specifications and libraries. svn path=/import/netsurf/; revision=442 2003-12-26 19:19:29 +03:00			`(see also http://www.w3.org/Protocols/)`
			`- HTTP Authentication http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2617.html`
[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`- PNG http://www.w3.org/Graphics/PNG/`
[project @ 2003-12-26 16:19:29 by bursa] Add links to some more specifications and libraries. svn path=/import/netsurf/; revision=442 2003-12-26 19:19:29 +03:00			`- URI http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2396.html`
			`(see also http://www.w3.org/Addressing/ and RFC 2616)`
			`- Cookies http://wp.netscape.com/newsref/std/cookie_spec.html and`
			`http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2109.html`
[project @ 2003-02-09 13:11:43 by bursa] Notes for developers. svn path=/import/netsurf/; revision=97 2003-02-09 16:11:43 +03:00
[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`\section libs Libraries`
[project @ 2003-02-09 13:11:43 by bursa] Notes for developers. svn path=/import/netsurf/; revision=97 2003-02-09 16:11:43 +03:00
[project @ 2003-02-25 21:00:27 by bursa] Bug fixes, experimental JPEG support. svn path=/import/netsurf/; revision=100 2003-02-26 00:00:27 +03:00			`Get these compiled for RISC OS with headers from`
			`http://netsurf.strcprstskrzkrk.co.uk/developer/`
[project @ 2003-12-26 16:19:29 by bursa] Add links to some more specifications and libraries. svn path=/import/netsurf/; revision=442 2003-12-26 19:19:29 +03:00			`- libxml (XML and HTML parser) http://www.xmlsoft.org/`
[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`- libcurl (HTTP, FTP, etc) http://curl.haxx.se/libcurl/`
			`- OSLib (C interface to RISC OS SWIs) http://ro-oslib.sourceforge.net/`
			`- libpng (PNG support) http://www.libpng.org/pub/png/libpng.html`
			`- zlib http://www.gzip.org/zlib/`
[project @ 2003-12-26 16:19:29 by bursa] Add links to some more specifications and libraries. svn path=/import/netsurf/; revision=442 2003-12-26 19:19:29 +03:00			`- uri http://www.nongnu.org/uri/`
[project @ 2003-02-25 21:00:27 by bursa] Bug fixes, experimental JPEG support. svn path=/import/netsurf/; revision=100 2003-02-26 00:00:27 +03:00
[project @ 2003-10-07 21:41:55 by bursa] Implementing a new CSS property overview. svn path=/import/netsurf/; revision=349 2003-10-08 01:41:55 +04:00			`\section addcss Implementing a new CSS property`

			`In this section I go through adding a CSS property to NetSurf, using the`
			`'white-space' property as an example. -- James Bursa`

			`1. Read and understand the description of the property in the CSS specification`
			`(I have worked from CSS 2, but now 2.1 is probably better).`

			`These changes are required in the css directory:`

			`2. Add the property to css_enums. This file is used to generate css_enum.h`
			`and css_enum.c:`
			`\code css_white_space inherit normal nowrap pre \endcode`
			`(I'm not doing pre-wrap and pre-line for now.)`

			`3. Add fields to struct ::css_style to represent the property:`
			`\code css_white_space white_space; \endcode`

			`4. Add a parser function for the property to ruleset.c. Declare a new`
			`function:`
			`\code static void parse_white_space(struct css_style * const s, const struct css_node * const v); \endcode`
			`and add it to ::property_table:`
			`\code { "white-space", parse_white_space }, \endcode`
			`This will cause the function to be called when the parser comes to a rule`
			`giving a value for white-space. The function is passed a linked list of`
			`struct ::css_node, each of which corresponds to a token in the CSS source,`
			`and must update s to correspond to that rule. For white-space, the`
			`implementation is simply:`
			`\code`
			`void parse_white_space(struct css_style * const s, const struct css_node * const v)`
			`{`
			`css_white_space z;`
			`if (v->type != CSS_NODE_IDENT \|\| v->next != 0)`
			`return;`
			`z = css_white_space_parse(v->data);`
			`if (z != CSS_WHITE_SPACE_UNKNOWN)`
			`s->white_space = z;`
			`}`
			`\endcode`
			`First we check that the value consists of exactly one identifier, as`
			`described in the specification. If it is not, we ignore it, since it may be`
			`some future CSS. The css_white_space_parse() function is generated in`
			`css_enum.c, and converts a string giving a value to a constant. If the`
			`conversion succeeds, the style s is updated.`

			`5. Add defaults for the style to ::css_base_style, ::css_empty_style, and`
			`::css_blank_style in css.c. The value in css_base_style should be the one`
			`given as 'Initial' in the spec, and the value in css_empty_style should be`
			`inherit. If 'Inherited' is yes in the spec, the value in css_blank_style`
			`should be inherit, otherwise it should be the one given as 'Initial'. Thus`
			`for white-space, which has "Initial: normal, Inherited: yes" in the spec, we`
			`use CSS_WHITE_SPACE_NORMAL in css_base_style and CSS_WHITE_SPACE_INHERIT in`
			`the other two.`

			`6. Edit css_cascade() and css_merge() in css.c to handle the property. In`
			`both cases for white-space this looks like:`
			`\code`
			`if (apply->white_space != CSS_WHITE_SPACE_INHERIT)`
			`style->white_space = apply->white_space;`
			`\endcode`
			`Add the property to css_dump_style() (not essential).`

			`Now the box, layout and / or redraw code needs to be changed to use the new`
			`style property. This varies much more depending on the property.`

			`For white-space, convert_xml_to_box() was changed to split text at newlines if`
			`white-space was pre, and to replace spaces with hard spaces for nowrap.`
			`Additionally, calculate_inline_container_widths() was changed to give the`
			`appropriate minimum width for pre and nowrap.`

[project @ 2003-09-30 14:48:03 by bursa] Add config file for Doxygen. svn path=/import/netsurf/; revision=334 2003-09-30 18:48:04 +04:00			`*/`