1
0
mirror of https://github.com/rkd77/elinks.git synced 2024-11-04 08:17:17 -05:00
elinks/doc/dev-intro.txt
Petr Baudis 0f6d4310ad Initial commit of the HEAD branch of the ELinks CVS repository, as of
Thu Sep 15 15:57:07 CEST 2005. The previous history can be added to this
by grafting.
2005-09-15 15:58:31 +02:00

350 lines
13 KiB
Plaintext

Introduction to ELinks Developing
---------------------------------
This is an introduction to how some of the internals of ELinks work. It
focuses on covering the basic low level subsystems and data structures and
give an overview of how things are glued together. Some additional information
about ELinks internals are available in ELinks Hacking. Consult the README
file\cite{6} in the source directory to get an overview of how the different
parts of ELinks depend on each other.
Memory Management
~~~~~~~~~~~~~~~~~
ELinks wraps access to `alloc()`, `calloc()` and `free()` in a memory
management layer which provides, among other things, leak debugging, and
checking for out of boundary access. It is defined in `src/util/memory.h`. The
wrappers are `mem_alloc()`, `mem_calloc()` and `mem_free()`. Additionally,
it also provides wrappers for `mmap()` and `munmap()` as `mem_mmap_alloc()`
and `mem_mmap_free()`.
Configuration Options
~~~~~~~~~~~~~~~~~~~~~
All options known by ELinks are accessible using a common interface.
Options can have different types, such as string, boolean, int and option
tree. The latter makes it possible to nest options in a hierarchy. Options
are statically declared using the `INIT_OPTION_<type>()` macros.
An example of an option declaration is:
INIT_OPT_INT("protocol.bittorrent.ports", N_("Minimum port"),
"min", 0, LOWEST_PORT, HIGHEST_PORT, 6881,
N_("The minimum port to try and listen on.")),
Once an option has been declared it can be accessed using the
`get_opt_<type>()` functions. To get the value of the option
declared above use:
int min_port = get_opt_int("protocol.bittorrent.ports.min");
Data Structures and Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The C language does not natively support essential data structures and
utilities such as lists and strings. ELinks provides interfaces for both of
these as described in the following section.
Lists
^^^^^
In `src/util/lists.h` ELinks defines a fairly complete interface for
handling double-linked lists. A list item is declared by using the
`LIST_HEAD()` macro to put `next` and `prev` members
in a `struct` like this:
------------------------------------------------------------------------------
struct bittorrent_data_structure {
LIST_HEAD(struct bittorrent_data_structure);
/* ... */
};
------------------------------------------------------------------------------
A list of type `struct list_head` can then be declared by using
the `INIT_LIST_HEAD()` macro if it is statically allocated:
INIT_LIST_HEAD(bittorrent_list);
or initialised in an allocated data structure using the following:
init_list(bittorrent_data_structure->list);
When iterating a list, there are two looping macros:
foreach (list-item, list-head})::
Iterates through all items in the list.
foreachsafe (list-item, next-list-item, list-head)::
Which is similar to the `foreach` iterator, but makes it possible
to delete items from the list without causing trouble.
For both of the above iterators there exists inverse iterators called
`foreachback()` and `foreachbacksafe()`, respectively.
Strings
^^^^^^^
The strings library add functionality for appending character to strings
from various sources. Strings are maintained using the `string`
`struct`, and defined in `src/util/string.h` and `util/conv.h`.
Some of the most important functions are shown below:
init_string(string)::
Creates string.
done_string(string)::
Deallocates string.
add_to_string(string, char-array)::
Adds a `NULL`-terminated array of type `unsigned char`
to string.
add_bytes_to_string(string, char-array, char-array-length)::
Adds char-array-length bytes from an array of type
`unsigned char` to string.
Bitfields
^^^^^^^^^
To handle bitfields we have made a small bitfield library available in
`src/util/bitfield.h`. The most important functions and macros are
explained below:
init_bitfield(size)::
Allocates a bitfield with the given number of bits. The bitfield
is deallocated by using `mem_free()`.
set_bitfield_bit(bitfield, bit)::
Sets bit in bitfield. There is also a complementary function
`clear_bitfield_bit()`.
test_bitfield_bit(bitfield, bit)::
Tests whether the given bit is set in the bitfield.
Subsystems
~~~~~~~~~~
In the following sections various important subsystems are presented. An
overview of the subsystems is given in the figure below.
.Overview of the hierarchy of the various subsystems. At the bottom are \
subsystems that provide functionality used by the upper layers.
------------------------------------------------------------------------------
+-----------------------------+ +-----------------------------+
| Document Type Handling | | URI Downloading |
+-----------------------------+ +-----------------------------+
+-------------------------------------------------------------+
| Protocol Implementation |
+-------------------------------------------------------------+
+-------------------------------------------------------------+
| Socket API |
+-------------------------------------------------------------+
+-------------+ +--------+ +---------------+ +----------------+
| Select loop | | Timers | | Bottom halves | | Simple threads |
+-------------+ +--------+ +---------------+ +----------------+
------------------------------------------------------------------------------
The `select()` loop
^^^^^^^^^^^^^^^^^^^
At the heart of ELinks is the `select()` loop which is entered after all
module have been initialised. It runs until ELinks is shutdown. The
`select()` loop handles detecting status changes to registered file
descriptors, and calls the registered handlers. It is part of the lowest layer
in shown in the figure, and is defined in `src/main/select.h`. It provides
two functions:
set_handlers(file-descriptor, read-handler, write-handler, error-handler, data)::
This function registers the read, write and error handlers for the
given file descriptor. The handlers will be called with data as the
only argument if the status of the file descriptor changes. Any of the
passed handlers maybe be `NULL` if no handling is required for the
specific status change.
clear_handlers(file-descriptor)::
Unregister all handlers for the given file descriptor.
Timers
^^^^^^
The `select()` loop regularly causes timer expiration to be checked. The
interface is defined in `src/main/timer.h` and can be used to schedule work to
be performed some time in the future.
install_timer(timer-id, time-in-milliseconds, expiration-callback}, data)::
Starts a one shot timer bound to the given timer ID that will expire
after the given time in milliseconds. When the timer expires the
expiration callback will be called with data as the only argument.
kill_timer(timer-id)::
Stops the timer associated with the given timer ID.
Bottom Halves
^^^^^^^^^^^^^
Work can be prosponed and scheduled for completion in the nearest future by
using bottom halves. This is especially useful when nested deeply in some call
tree and one needs to cleanup some data structure but cannot do so immidately.
The interface is defined in `src/main/select.h`
register_bottom_half(work-function, data})::
Schedules postponed work to be run and completed in the nearest
future. Note that although this helps to improve the response time it
requires more focus on concurrency, such as ensuring that the data
passed to the bottom half won't disappear while being handled.
Simple Threads
^^^^^^^^^^^^^^
ELinks also supports a simple threading architecture. It is possible to start
a thread\footnote{(on UNIX systems this will end up using the `fork()`
system call} so that communication between it and the ELinks program is
handled via a pipe. This pipe can then be used to pass to the
`set_handlers()` function so that communcation is done asynchronously.
start_thread(thread-function, thread-data, thread-data-length)::
Starts a new thread which will run the given thread function. The
thread function is passed the given thread data as its first argument
and a pipe descriptor it can use for communication with the ELinks
program as its second argument. This function returns the pipe
descriptor for the other end of the pipe to the calling ELinks
program.
Socket API
^^^^^^^^^^
This layer provides a high-level interface for socket communication
appropriate for protocol implementations. When initialising a socket, a
`struct` containing a set of callbacks is passed along with some private
connection data which will be associated with the socket and passed to the
callbacks. Among the callbacks are `set_state()` which is called when the
socket state changes, for instance after a successful DNS look-up. Another
function is `set_timeout()` that will be called when progress is done, such
as during writing or reading . Finally, there is `retry()` and `done()` both
of which are called when an error occurs: the former is used for errors that
might be possible to resolve; the latter is used for fatal errors.
The interface is defined in `src/network/socket.h` and the most notable
functions are:
make_connection(socket, uri, connect-callback, no-dns-cache)::
This function handles connecting to the host and port given in the uri
argument. If the internal DNS cache is not to be used the
no-dns-cache} argument should be non-zero. Upon successful connection
establishment the connect callback is called.
read_from_socket(socket, read-buffer, connection-state, read-callback)::
Reads data from socket into the given read buffer. Each time new data
is received the read callback is called. Changes the connection state
to connection-state} immediately after it has been called.
write_to_socket(socket, data, datalen, connection-state, write-callback)::
Writes datalen bytes from the data to the given socket. Upon
completion the write callback is called. Changes the connection state
to connection state immediately after being called.
The buffer interface used for reading is as follows:
alloc_read_buffer(socket)::
Allocates a new read buffer which can be passed to
`read_from_socket()`.
kill_buffer_data(read-buffer, number-of-bytes)::
Removes the requested number of bytes from the start of the given read
buffer.
Protocol Layer and URI Loading
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The protocol layer multiplexes over different protocol back-ends. The protocol
multiplexing decides what protocol back-end to use by looking at the protocol
string in the connection URI, i.e. `http` part of `http://127.0.0.1/` will
cause the HTTP protocol back-end to be selected. Each protocol back-end
exports a protocol handler taking a `connection` `struct` as its only
argument. The `connection` struct contains information such as the URI being
loaded and status about how much has been downloaded and the current
connection state. How the protocol back-end chooses to handle the connection
and load the URI it points to is entirely up to the back-end. The interface
is defined in `src/protocol/protocol.h` and `src/network/connection.h`.
A connection can have multiple downloads attached to it. Each download struct}
contains information about the connection state and the connection with which
it is attached. Additionally, the `download struct` specifies callback data,
and a callback which is called each time the download progresses. The
interface for downloads is defined in `src/session/download.h` and the most
important functions it provides are:
load_uri(uri, referrer, download, priority, cache-mode, offset)::
Attaches the given download to a connection which will try to load the
given URI. If a connection is already in progress the download is
simply attached to that one.
change_connection(old-download, new-download, new-priority, interrupt)::
Removes the old download from the connection it is attached to and add
the new download instead. If the new download argument is `NULL`, the
old download is simply stopped.
All data loaded by a connection is put into ELinks' cache. Cache entries can
then used for passing data to download handler, etc. The connection will
create a cache entry once it starts receiving data. Each time it adds data a
new fragment is added to the cache entry, this means that the cache entry
needs to be defragmented before data in it can be processed. From time to
time the cache will be shrunk and unlocked cache entries will be removed.
Document Type Handling
^^^^^^^^^^^^^^^^^^^^^^
When a user asks ELinks to load a URI, ELinks will eventually have to try and
figure out the document type of the URI in order to handle it properly, either
by using an internal renderer or by passing the document to an external
handler. This is done in the following function:
setup_download_handler(session, download, cache-entry, frame)::
Tries to resolve the document type of the given cache entry by first
looking at the protocol header it contains. If no content type is
found it will try to resolve the document type by looking at the file
extension in the loading URI. Based on the content type, either an
internal or an external handler is selected. The user will be queried
before using an external handler.