diff --git a/Docs/00-overview b/Docs/00-overview new file mode 100644 index 000000000..c7984d06b --- /dev/null +++ b/Docs/00-overview @@ -0,0 +1,58 @@ +NetSurf Documentation for Developers +==================================== + +The documents in this directory describe how the NetSurf code works, and any +other information useful to developers. + +Directory Structure +------------------- +The source is split at top level as follows: + +content:: Fetching, managing, and converting content +render:: HTML processing and layout +css:: CSS parser +image:: Image conversion +desktop:: Non-platform specific front-end +riscos:: RISC OS specific code +debug:: Unix debug build specific code +gtk:: GTK specific code +utils:: Misc. useful functions + +Other Documentation +------------------- +RISC OS specific protocols: + +- Plugin http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/funcspec.html[] + http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/browse-plugins.html[] +- URI http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/uri.html[] +- URL http://www.vigay.com/inet/inet_url.html[] +- Nested WIMP http://www.ecs.soton.ac.uk/~jmb202/riscos/acorn/nested.html[] + +Specifications: + +- HTML 4.01 http://www.w3.org/TR/html401/[] + (see also http://www.w3.org/MarkUp/[]) +- XHTML 1.0 http://www.w3.org/TR/xhtml1/[] +- CSS 2.1 http://www.w3.org/TR/CSS21/[] +- HTTP/1.1 http://www.w3.org/Protocols/rfc2616/rfc2616.html[] + and errata http://purl.org/NET/http-errata[] + (see also http://www.w3.org/Protocols/[]) +- HTTP Authentication http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2617.html[] +- PNG http://www.w3.org/Graphics/PNG/[] +- URI http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2396.html[] + (see also http://www.w3.org/Addressing/[] and RFC 2616) +- Cookies http://wp.netscape.com/newsref/std/cookie_spec.html[] and + http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2109.html[] + +Libraries +--------- +Get these compiled for RISC OS with headers from +http://netsurf.strcprstskrzkrk.co.uk/developer/[] + +- libxml (XML and HTML parser) http://www.xmlsoft.org/[] +- libcurl (HTTP, FTP, etc) http://curl.haxx.se/libcurl/[] +- OSLib (C interface to RISC OS SWIs) http://ro-oslib.sourceforge.net/[] +- libmng (PNG, JNG, MNG support) http://www.libmng.com/[] +- libjpeg (JPEG support) http://www.ijg.org/[] +- zlib http://www.gzip.org/zlib/[] +- OpenSSL (HTTPS support) http://www.openssl.org/[] diff --git a/Docs/01-content b/Docs/01-content new file mode 100644 index 000000000..4db4bade8 --- /dev/null +++ b/Docs/01-content @@ -0,0 +1,24 @@ +Fetching, managing, and converting content +========================================== + +The modules in the content directory provide the infrastructure for fetching +data, managing it in memory, and converting it for display. + +Struct Content +-------------- +Each URL is stored in a struct ::content. This structure contains the +content_type and a union with fields for each type of data (HTML, CSS, +images). The content_* functions provide a general interface for handling these +structures. For example, content_redraw() calls html_redraw() or +nsjpeg_redraw(), etc., depending on the type of content. See content.h and +content.c. + +Fetching +-------- +A high-level interface to starting the process of fetching and converting an URL +is provided by the fetchcache functions, which check the memory cache for a url +and fetch, convert, and cache it if not present. See fetchcache.h and +fetchcache.c. + +The fetch module provides a low-level URL fetching interface. See fetch.h and +fetch.c. diff --git a/Docs/02-layout b/Docs/02-layout new file mode 100644 index 000000000..ddc7cfd06 --- /dev/null +++ b/Docs/02-layout @@ -0,0 +1,31 @@ +HTML processing and layout +========================== + +The modules in the layout directory process and layout HTML pages. + +Overview +-------- +This is the process to render an HTML document: + +First the HTML is parsed to a tree of xmlNodes using the HTML parser in libxml. +This happens simultaneously with the fetch [html_process_data()]. + +Any stylesheets which the document depends on are fetched and parsed. + +The tree is converted to a 'box tree' by xml_to_box(). The box tree contains a +node for each block, inline element, table, etc. The aim of this stage is to +determine the 'display' or 'float' CSS property of each element, and create the +corresponding node in the box tree. At this stage the style for each element is +also calculated (from CSS rules and element attributes). The tree is normalised +so that each node only has children of permitted types (eg. TABLE_CELLs must be +within TABLE_ROWs) by adding missing boxes. + +The box tree is passed to the layout engine [layout_document()], which finds the +space required by each element and assigns coordinates to the boxes, based on +the style of each element and the available width. This includes formatting +inline elements into lines, laying out tables, and positioning floats. The +layout engine can be invoked again on a already laid out box tree to reformat it +to a new width. Coordinates in the box tree are relative to the position of the +parent node. + +The box tree can then be rendered using each node's coordinates. diff --git a/Docs/03-css b/Docs/03-css new file mode 100644 index 000000000..5744e27b7 --- /dev/null +++ b/Docs/03-css @@ -0,0 +1,81 @@ +CSS parser +========== + +CSS is tokenised by a re2c-generated scanner (scanner.l), and then parsed into a +memory representation by a lemon-generated parser (parser.y, ruleset.c). + +Styles are retrieved using css_get_style(). They can be cascaded by +css_cascade(). + +Implementing a new CSS property +------------------------------- +In this section I go through adding a CSS property to NetSurf, using the +'white-space' property as an example. -- James Bursa + +First read and understand the description of the property in the CSS +specification (I have worked from CSS 2, but now 2.1 is probably better). + +Add the property to css_enums. This file is used to generate css_enum.h and +css_enum.c: + + css_white_space inherit normal nowrap pre + +(I'm not doing pre-wrap and pre-line for now.) + +Add fields to struct css_style to represent the property: + + css_white_space white_space; + +Add a parser function for the property to ruleset.c. Declare a new function: + + static void parse_white_space(struct css_style * const s, const struct css_node * const v); + +and add it to property_table: + + { "white-space", parse_white_space }, + +This will cause the function to be called when the parser comes to a rule giving +a value for white-space. The function is passed a linked list of struct +::css_node, each of which corresponds to a token in the CSS source, and must +update s to correspond to that rule. For white-space, the implementation is +simply: + + void parse_white_space(struct css_style * const s, const struct css_node * const v) + { + css_white_space z; + if (v->type != CSS_NODE_IDENT || v->next != 0) + return; + z = css_white_space_parse(v->data, v->data_length); + if (z != CSS_WHITE_SPACE_UNKNOWN) + s->white_space = z; + } + +First we check that the value consists of exactly one identifier, as described +in the specification. If it is not, we ignore it, since it may be some future +CSS. The css_white_space_parse() function is generated in css_enum.c, and +converts a string giving a value to a constant. If the conversion succeeds, the +style s is updated. + +Add defaults for the style to css_base_style, css_empty_style, and +css_blank_style in css.c. The value in css_base_style should be the one given as +'Initial' in the spec, and the value in css_empty_style should be inherit. If +'Inherited' is yes in the spec, the value in css_blank_style should be inherit, +otherwise it should be the one given as 'Initial'. Thus for white-space, which +has "Initial: normal, Inherited: yes" in the spec, we use CSS_WHITE_SPACE_NORMAL +in css_base_style and CSS_WHITE_SPACE_INHERIT in the other two. + +Edit css_cascade() and css_merge() in css.c to handle the property. In both +cases for white-space this looks like: + + if (apply->white_space != CSS_WHITE_SPACE_INHERIT) + style->white_space = apply->white_space; + +Add the property to css_dump_style() (not essential). + +Now the box, layout and / or redraw code needs to be changed to use the new +style property. This varies much more depending on the property. + +For white-space, convert_xml_to_box() was changed to split text at newlines if +white-space was pre, and to replace spaces with hard spaces for nowrap. +Additionally, calculate_inline_container_widths() was changed to give the +appropriate minimum width for pre and nowrap. diff --git a/Docs/04-errors b/Docs/04-errors new file mode 100644 index 000000000..786c46374 --- /dev/null +++ b/Docs/04-errors @@ -0,0 +1,30 @@ +Error handling +============== + +This section describes error handling in the code. + +The most common serious error is memory exhaustion. If malloc(), strdup(), etc. +fails, clean up and free any partially complete structures leaving data in a +consistent state, and return a value which indicates failure, eg. 0 for +functions which return a pointer (document the value in the function +documentation). The caller should then propagate the failure up in the same way. +At some point, the error should stop being passed up and be reported to the user +using + + warn_user("NoMemory", 0); + +The other common error is one returned by a RISC OS SWI. Always use "X" SWIs, +something like this: + + os_error *error; + error = xwimp_get_pointer_info(&pointer); + if (error) { + LOG(("xwimp_get_pointer_info: 0x%x: %s\n", + error->errnum, error->errmess)); + warn_user("WimpError", error->errmess); + return false; + } + +If an error occurs during initialisation, in most cases exit immediately using +die(), since this indicates that there is already insufficient memory, or a +resource file is corrupted, etc.