* Since DNS are normally restricted to ASCII, the use of UTF-8 in domain
names is implemented using a "punycode" encoding.
* The request to the DNS server must be sent with the ASCII
representation of the domain name, however the Unicode one should be
used for user-visible parts.
* ICU provides an implementation of the conversion, which we use here.
* Conversion is currently done in-place and modifies the BUrl object
(this is similar to UrlEncode/UrlDecode).
* Adjust existing IDN test to make use of these methods. It's passing
now.
The allocation of fImpl can fail, and some methods used it without
checking. Return an error code (or NULL or 0) instead of crashing in
these cases.
Also InitCheck the fInputBuffer in BHttpRequest before trying to use it.
Fixes#11350.
* It seems openbmap is using a variation of the API that's not
compatible with what other providers use.
* Fix a ";" instead of "," in the JSON request. We should get a
BJsonBuilder to avoid such silly errors
* Improve indenting of the request to ease readability.
* Parse the latitude and longitude as doubles, not strings.
This was tested against Mozilla Location Services and I get accurate
results (withing a few hundred meters). However I'm not sure how to
share the MLS API key safely so it is used only in Haiku, I will discuss
this with the MLS team.
A BGeolocation object can query an online service to get geolocation
and geotagging data:
* LocateSelf() tries to locate the machine it is running on, by using an
online database of wifi access points
* Locate() (not yet implemented) searches a BString and converts it to
lat/lon coordinates (reverse geotagging)
* Name() (not yet implemented) finds a suitable name for the given
coordinates (address, building name, or anything fitting).
The default service used is openbmap.org, which is freely available but
not very accurate. A request has been sent to Mozilla to use MLS
(Mozilla Location Services), which is a bit more accurate but needs an
API key. MLS is used for geolocation on FirefoxOS, for mobile phones
which don't have a GPS, and the data can be contributed by Firefox for
Android or the dedicated MozStumbler app.
Alternatively, Google Maps also provide the service, but wants
people to pay for it. Google Maps data is more accurate as all Android
devices contribute data to it.
All 3 services use the same JSON-based API: we send a list of reachable
Wifi APs (mac address and signal strength), and we get lattitude and
longitude information, and possibly extra data which will currently be
unused.
This can be used to implement HTML5 geolocation with reasonably accurate
results, but it can also be used in other places. For example
FirstBootPrompt could try to guess a list of most likely languages and
keyboard layouts from it (if wifi is working at install time, that is).
BUrl is passed by value in many places, and we should make sure this is
as efficient as possible. There is little point in initializing all the
strings then overwriting them by using the copy constructor, when we can
set them directly.
* Move default context management to BUrlRequest since some code
(including the testsuite) bypass the BUrlProtocolRoster.
* Introduce proxy host and port in BUrlContext
* Have BHttpRequest use the proxy when making requests
This shoiuld make OpenSSL more thread safe and help with the random
network related crashes in Web+ (and anything else using SecureSocket
with more than one thread).
* From now on, the gcc-specific system libraries (libgcc, libsupc++ and
libstdc++) are provided by separate packages built along with gcc:
- gcc_syslibs contains the shared libraries (libgcc_s.so, libsupc++.so and
libstdc++.so)
- gcc_syslibs_devel contains the static libraries and both c++ and gcc
headers
The shared libraries now make proper use of symbol versioning and there
are version-specific symlinks
* The buildsystem has been adjusted to no longer use the libraries and
headers from the cross-compiler, but use the ones provided by the
above-mentioned packages. The only exception is that the 32-bit libraries
required for the bootloader of the x86_64 architecture are still taken
from the cross-compiler.
* Instead of faking libstdc++.so from libstdc++.a, use libstdc++.so
from the gcc_syslibs build feature for everything except x86_gcc2.
* Use libgcc_s.so from the gcc_syslibs build feature for everything but
x86_gcc2 (which still carries libgcc as part of libroot.so).
* Drop filtering of libgcc objects for libroot, as that is no longer
necessary since we're only using libgcc-as-single-object for libroot
with x86_gcc2, where the filtered object file doesn't exist. Should
the objects that used to be filtered cause any problems as part of
libgcc_s.so, we can always filter them as part of the gcc build.
* Use libsupc++.so from the gcc_syslibs build feature for everything but
x86_gcc2.
* Adjust all Jamfiles accordingly.
* Deactivate building of faked libstdc++.so for non-x86-gcc2. For
x86_gcc2, we still build libstdc++.so from the sources in the Haiku
source tree as part of the Haiku build .
* Put gcc_syslibs package onto the image, when needed.
instead or additionally to string.h, in preparation for functions move.
* moves str[n]casecmp() functions and others to strings.h.
* strings.h doesn't include string.h anymore.
* this solves #10949
* The thread could set fRunning to false before the caller had set it to
true, leading to a stalled request.
* Happened easily when testing "data" requests, which are fast since no
I/O is involved.
* Possibly also helps with stalled redirects on Google search I've been
seeing for some time.
There is some misunderstanding on what the "deflate" is, and we can't
reliably decode it in all cases. So, don't advertise support for it and
let servers use gzip (or no compression) instead.
Fixes#11093
The RFC for Data URLs specifies a nonstandard format, and because of
this it doesn't support queries and fragments. This allows the use of
the # and ? characters in the URL data. We didn't handle this properly,
which would lead to truncated data.
Some URLs may use the % character for other purposes than URL-encoding
(this is seen in some data URLs). Make sure we parse that properly, and
avoid a possible out of bounds access if the percent char is near the
end of the string.
* Remove unneeded field fOutputHeaders and convert it to a local for the
only method that uses it,
* Don't return EOVERFLOW when flushing data from ZLib (the ZLib
decompressor returns this, but zlib docs states that this is NOT an
error condition).
* Replace unneeded temporary BNetBuffer of fixed size with BStackOrHeapArray.
BUrlContext now inherits BReferenceable to make it easier to handle the
context lifespan. Make the default context an always-retained reference
to match this.
No functional change in normal conditions, however this avoids an assert
when BReferenceable is built in debug mode.
We don't have support for TCP_CORK, which would let the kernel handle
this, so this resulted in lots of very small packets being sent over the
network. Besides the performance issues, this confused aliceadsl.fr HTTP
server and prevented logging in to their website.
Fixes#10556.
* receiveEnd is set in a different place in case of chunked transfers,
which would cause the decompressor to never be flushed.
* In the case of chunked transfers, we call Flush() without any input
data (to flush only whatever is remaining in the decompression buffer).
This causes ZLib to return Z_BUF_ERROR which is translated to
B_BUFFER_OVERFLOW. This is a non-fatal error and is expected behavior in
that case. Don't handle this as an error, and do use the extracted data.
Fixes various cases of missing the last chunk of a page (pastie.org,
Google search results, and more).
Instead of relying on the global protocol loop to call _ParseHeaders
once for each header, extract as much as possible from the current
buffer.
This saves memory, avoids useless operations on the socket and various
processing steps, and fixes#10245.
Also improve the handling of 0-size requests to make sure they terminate
properly.
* Each BHttpAuthentication object is locked on all field accesses,
* They are owned by the BUrlContext and never deleted, so there is no
need for reference-counting them,
* The BUrlContext itself is now reference counted, and all BUrlRequests
hold a reference to it.
This makes sure using the BHttpAuthentication objects from requests is
thread-safe.
* Change the semantics of the iterators copy constructor and assignment
operator: they now return a new iterator for the same cookie jar (and
same url for the UrlIterator). They don't try to point to the same
position as the copied iterator. The only purpose of these is to write
code such as:
Iterator it = jar.GetIterator();
so having a full copy isn't that useful.
* The per-domain cookie lists are now protected with a read-write lock.
The iterators retain a read lock while they are handling cookies from
that list. They get a write lock when doing Remove. Adding a cookie to
the jar also gets the write lock for the matching list
* Fix a memory leak when adding a new domain-list to the jar failed
* Simplify the declaration of the PrivateHashMap type (it would be
even simpler if HashMap was a public API)
* The domain hashmap is now a SynchronizedHashMap. It is locked as long
as an Iterator or UrlIterator exists, which may be a problem as these
are public APIs. Writing safe iterators for an hashmap with concurrent
accesses is not easy, so the API could be modified to return a list of
domains and a list of cookies for a given domain or URL instead. This
would suit the intended uses just as well.
* The jar now store const cookies, so there is no need to lock them for
access/modification. Updating a cookie is done by replacing it with
another one in the jar (with the same domain and value). There is still
the problem of deleting a cookie while other threads may still access
it, this will be fixed by making cookies BReferenceable.
* Cookies sometimes come with the UTC timezone, or no tz info at all.
* No other timezone seems to be used
This allows better matching of cookies which would otherwise be kept
only as session cookies.
* An empty "expires" field results in a session cookie, rather than
rejecting the cookie altogether
* A page can set a cookie it is not allowed to access (for example in a
subdirectory of where the page is located). Separate IsValidForUrl and
_CanBeSetFromUrl to perform the appropriate checks in each case.
* Limit cookie path to 4096 characters. As a result of the previous
change, a page would be allowed to set a cookie with an aribrarily long
subpath, wasting disk space and RAM by growing hte cookie jar.
* Don't allow path with . or .. elements. These are a source of
confusion and are not needed.
* Reset the cookie fields when parsing failed. This does not matter when
using the cookie jar, but is useful when working directly with
BNetworkCookie.
strptime can return non-NULL values even if it only parsed part of the
string. This was sometimes making us use the wrong format. Now we try
all formats and checks how much of the string strptime managed to parse.
We stop when it has parsed a big enough part of it.
* BString::CopyInfo() takes length as second
parameter, not the end-offset. This bug didn't
have any effect, since BString clamps the length.
* ',' is a comma, ':' is a colon.
* When parsing "additionalData" in the loop for
name=value pairs, an empty name is not useful,
continue the loop early.
* "value" may have the length 0, accessing it
with value[0] would not lead to an access violation,
since it just reads the terminating zero (I think),
but it's nicer if the code makes it clear that
value being empty is considered.
* _H() was using a static buffer, in a heavily
multi-threaded situation. I don't think this was
healthy.
* Implement the proposed optimization of using
BString::LockBuffer(). Appending one char at a time
is really bad for peformance. The Base64 encoding/decoding
should really be rewritten as well for similar reasons.
These were getting out of sync and causing trouble, and they are easy to
compute from existing information.
Fixes some problems detected by the testsuite where the user/password or
the host would sometime disappear from the URL.
* The DataReceived hook gets a position argument, making it possible for
listeners to handle out-of-order data (from two range requests at
different positions, for example)
* Adjust HaikuDepot (only user of the API in our sources)
* Add a copy constructor to HTTPRequest that copies the relevant
parameters from an existing request. Makes it easy to repeat a request
with a different range. Could be useful for restarting downloads, or
paralellizing them.
* Add SetRangeStart, SetRangeEnd calls to HTTPRequest, no implementation
yet. I'm putting all the API changes in this commit as it needs to be
synced with a matching haikuwebkit release.
* All archs must update to HaikuWebkit 1.3.0. Previous versions are
broken by this.
* WebKit testsuite relies on the MIME types being correct, so when the
file doesn't have one, try to identify it.
* May be useful for other apps using FileRequest, anyway.
* Libbe is not available when cross-building the *_bootstrap packages,
so no libnetwork could not be used either, which made building
anything network-related impossible.
* The only code in libnetwork that requires libbe is the notification,
so I moved that over to libbnetapi. Non-C++ applications can't use
the notification calls anyway, as their interface is C++-only.
* BSecureSocket::CertificateVerificationFailed() took a BCertificate
instance by value as parameter.
BCertificate deletes internal data in its destructor. Passing an
object by value creates a copy, so the copy attempted to delete
the internal data again during its destruction.
This caused mail_daemon to crash here when it came across a failed
certificate.
* Fix: pass BCertificate object as reference.
This is useless, chunked support is mandatory in HTTP1.1, and it's not a
content-encoding, but a transfer-encoding, so accept-encoding wouldn't
help anyway.
* Use the ZlibDecompressor to decompress the data
* Advertise support in accept-encoding
This should make web browsing feel even faster on wesites that support
these compresion schemes. It also fixes some websites (www.ru,
rainloop.net, ...) that serve gzipped resources even to browser not
supporting it.
We don't want this method to be exposed in the global namespace!
Also add a note about removing this signal-based solution when our
close() method can be interrupted in other ways.
thanks axel for watching!
Some websites set cookies expiring in the (not so) far future, after year 2038.
So, using time_t to store the cookie expiration date won't do. Use the
BDateTime class instead.
This makes goodsearch.com login work again (#10460).
Sometimes an HTTP request would get stuck in a call to connect(). While
our read() and write() implementations are interrupted if you close()
the socket, our connect() isn't. It would nonetheless abort the
connection process, and connect would stay blocked for quite a long
time.
When navigating away from a page, WebKit closes all pending connections,
and in our network backend this also means freeing the request object.
But, the destructor can't be called until the thread has stopped
running, to ensure proper cleanup.
Avoid the lockup in connect by sending a signal (SIGUSR1) to the thread
we want to unlock. This interrupts the syscall, and the thread exits
immediately.
Cookies date are always in GMT time. Using mktime wrongly converts them
as local time instead. This would lead to cookies expiring too early or
too late.
* Add some tracing, std::nothrow and null checks
* The HashString class doesn't like SetTo being called with a substring
of the current key, so use a copy of it instead.
Fixes#6667.
* Use pthread_once to initialize the SSL context once, in a thread-safe
way.
* Do not delete the BIO immediately when closing a connexion, instead
delay this to the destructor. This makes sure the protocol loop is done
running when we do that.
* Instead of creating a new BIO when we reconnect an already used
connection, create the BIO upfront, and reuse it with the new file
descriptor.
* Fix a memory leak: the SSL struct from OpenSSL was never freed, only
the BIO was.
Fixes#10414.
* Don't crash when opening a symlink, traverse it instead.
* Add a ".." entry to navigate to the parent folder
* Set the encoding to utf-8 in the MIME header, but this doesn't seem to
work.
* Instead of creating an OpenSSL context ofor each socket, use a global
one and initialize it lazily when the first SecureSocket is created
* Load the certificates from our certificate list so SSL certificates
sent by servers can be validated.
* Add a callback for signalling that certificate validation failed, the
default implementation proceeds with the connection anyway (to keep the
old behavior).
* Introduce BCertificate class, that provides some information about a
certificate. Currently it's only used by the callback mentionned above,
but it will be possible to get the leaf certificate for the connection
after it's established.
Review of the API and implementation is welcome, before I start making
use of this in HttpRequest and WebKit to allow the user to accept new
certificates.
Use standard error codes instead.
This allows using error code returned by the underlying functions
directly, and makes it possible to use strerror for debugging. So, we
can also remove StatusString() from the various *Request classes.
Use the EPFL (Easily Parsed File Listing) format. This is one of the
formats that WebKit allows for directory listings, and it's easily
parsed and generated.
Deleting the BIO while it's still waiting on a read() in another thread
will lead to a crash when the socket is eventually closed. Close the
socket first, so the read() is unlocked, then safely delete the BIO.
When calling Stop(), we expect the request thread to exit as soon as
possible. Closing the connection unlocks it from any blocking read() or
write(), avoiding some lockup situations.
* BUrlResult is back, with ContentType and Length methods.
* BHttpResult subclasses it and use HTTP header fields to implement
those
* Introduce BDataRequest for "data" URIs. These embed the data inside
the URI, either as plaintext or base64 encoded.
* These are shared with HTTP cookies set for localhost. We probably want
to split them apart later on, so cookies should store and check the
protocol, additionally to the path and domain.
* Fixes#10195.
* Do not start with a ridiculously small buffer for socket reads.
Sockets return data they have available, instead of trying to fill as
much of the buffer as possible. In some cases a single Ethernet frame
can hold a complete request.
* Remove some looping and try parsing all the request in sequence each
time we receive some bytes.
* Avoid reallocating a temporary buffer each time we read some data from
the socket. Instead, allocate it once, and grow it as needed. Since
servers usually send chunks of equal size, we should get away with one
reallocation on the first chunk.
We can send the data directly to the output socket instead of copying it
into a BString first, at the cost of very slightly less information in
debug output.
* Network stack socket module: socket_control(): The FIONREAD argument
is int, not ssize_t.
* Net kit: getsockopt(): R5_SO_FIONREAD: Fix ioctl() argument. Was
taking a pointer of what already was a pointer to the buffer.
* libedit: el_gets(): The FIONREAD argument is int, not long.
Use a default context instead. This allows apps without a context
management to still have cookies and HTTP digest authentication (without
persistence to disk).
First part of fix to #10239 (it also needs changes in WebKit).
When using the copy constructor of BNetEndpoint the socket of the
original endpoint gets dup'ed. The Accept() method later directly reset
the fSocket member of the newly created BNetEndpoint to the socket
returned by accept(). The socket dup'ed by the copy constructor was
therefore leaked.
Of course dup'ing the socket and copying the local and remote addresses
is superfluous in the accept case, as these members all get set to new
values. To reduce that overhead there is now a new private constructor
that directly gets the final socket and remote and local address.
Trying to Read from a directory results in an error code, but we also
missed that because an unsigned variable was used to store the result.
Fixes#10210.
Implement BNetworkRoster::GetRoutes() and BNetworkInterface::GetRoutes().
Also implement BNetworkInterface::GetDefaultGateway().
There is code duplication at the moment, and the api only supports IPV4.
* Remove useless dummy protocol loop in UrlRequest
* Stop HTTP requests before deleting the socket and other things the
loop may still be using
* Deletion of items from the authentication map wasn't working
* Remove some debug traces
The authentication state is stored (in a hash map, using the domain+path
as a key) in the UrlContext class. It can then be reused for multiple
requests to the same place. We also lookup stored authentications for
parent directories and stop at the first we find.
Authentication state is not stored on disk (unlike cookies), and there
can only be one for each domain+path.
This change is needed for implementing cookie persistence in Web+ using
the network kit backend.
The current implementation requires the user to unarchive the cookie
jar, then hand it over to the BUrlContext which will copy it to its own
field. This makes the code simpler, but maybe doing a complete copy
(with all the cookies) is an heavy operation and could be avoided.
* Remove the fRawData field, as handling it is too complicated (it's
not easy to have proper copy semantics on a BDataIO) and it's not used
anyway, as the listener DataReceived call is enough to get the data and
handle it.
* All the remaining fields are HTTP-only, so rename the class to
HttpResult and attach it to HttpRequest instead of UrlRequest.
CID 1108353, 1108335: memory leak.
CID 610473: unused variable.
CID 1108446, 1108433, 1108432, 1108419, 1108400, 991710, 991713, 991712,
610098, 610097, 610096, 610095: uninitialized field
CID 1108421: unused field
Change the ownership of the result for Url/HttpRequests. The request now
owns its result and you either access it by reference while the request
is live, or copy it to keep it after the request destruction. To help
with that, get BUrlResult copy constructor and assignment operator to
work.
Performance issue: copying the BUrlResult also copies the underlying
BMallocIO data. This should be shared between the BUrlResult objects to
make the copy lighter. The case of BUrlSynchronousRequest is now
particularly inefficient, with at least 2 copies needed to get at the
result.
* Now takes ownership of headers, form data and input data
* Split Set* and Adopt* methods to help with proper use of this (Set
does a copy)
* Write documentation.
* The RFC provide a regular expression for URI parsing, so just use it.
* Allows parsing URIs with missing components (no scheme or authority)
* This allows to parse relative URLs as expected
* Can also handle things such as data: or mailto:
* Also more fixes to handling of incomplete URIs, some flags weren't
always set to the right values.
This gets Windows Live Mail (or is it called Outlook?) working, with
some other fixes on WebKit side.
* It would not work for cookies set to expire tomorrow or later, since
setting the time in a BDateTime does not overflow to the date.
* The BDateTime API could be improved to make this look nicer.
* @ is a separator (between user:password and host) only if there are
no slashes before it
* All slashes in user and password should be urlencoded (as well as any
@ and :)
* On the other hand, it's possible to have @ as part of an URL path or
query. An example is Google Maps.
Gets Google Maps working.
* With so long class names, there's no way I'm going to follow the 64
char limit on commit headlines.
* The class share the same API, so having them separate is not very
useful.
* This makes it possible to use the same listener in either synchronous
or asycnhronous mode (or both, for different requests)
* Add some error handling in NetworkCookie and don't add broken cookies
(or should I say crumbs?) to the cookie jar
* More control on the path and domain, as well as the expiration time
We now pass Opera cookie testsuite functionality tests, as well as some
of the negative tests (we even do better than curl). Not going further
right now as this works well enough for positive cases and most
security/privacy issues are fixed (cross domain and cross path cookie
setting or spying).
* Http spec says headers can be split when they are comma separated
* However, cookies are semicolon separated, so it is not acceptable to
split them.
* We will want to implement some way to limit the cookie header entry
size, as servers have a limit on what they can accept (usually around 4K
characters). The RFC also says we don't need to remember more than 20
cookies per domain.
* This takes a relative path as a parameter, and modifies the object to
point to the given location.
* '..' is not handled yet, and will be sent as-is to the server.
* Makes it possible to follow more types of 302 redirects
In particular, I can now run the tests from Opera's testsuite
(testsuite.opera.com), which shows I have more work to do on cookie
handling.
* The dot was mandatory in older RFCs, but the new RFC6265 disallow it.
* Both schemes are used around the web, so we allow them both.
It's possible to login to mail.google.com again.
* Some fields weren't initialized, leading to random crashes later on
* Remove the enum that was used for protocol options
* Use a single field to track the request state, instead of separate
booleans.
* The cookie jar iterator now use a BObjectList instead of a BList
* Add a convenience method to the cookie jar to add a cookie by BUrl
and raw cookie string.
* Remove some methods in BNetworkCookie that could lead to invalid
cookies (cross-domain or with no domain at all).
* Make the cookie parsing able to report errors
* Fix off-by-one error in domain cookies validation.
* Remove the BUrlRequest class, which was only delegating work to
BUrlProtocol and subclasses
* Rename BUrlProtocol to BUrlRequest, and BUrlRequestHttp to BHttpRequest
* Creating a request is now done through the BUrlProtocolRoster. For
now there is just a static MakeRequest method, this will be completed
when we get to actually allowing add-ons to provide different request
handlers.
This allows cleanup of the API for requests:
* Remove the universal SetOption method with constants, and have
dedicated setters for each protocol option.
* Setters can now have multiple parameters, for example you can give
BHTTPRequest a BDataIO and a known size
* In this case, the BHttpRequest will not use HTTP chunked transfers,
which were always used before and made most servers unhappy (tested and
failed with lighttpd, google accounts and github).
* This does intentionally break source compatibility, so that a review
of concerned code is forced.
* Binary compatibility should be maintained in most cases. The values
of the constants for the writable directories are now used for the
writable system directories. The values for the non-writable
directories are mapped to "/boot/system/data/empty/...", an empty or
non-existent directory, so that they will simply be skipped in search
paths. Only code that explicitly expects to find something in a
B_COMMON_* directory, will fail.
* Remove support for the "common" installation location from packagefs,
package kit, package daemon, package managers.
* Rename the B_COMMON_*_DIRECTORY constants referring to writable
directories to B_SYSTEM_*_DIRECTORY.
* Remove/adjust the use of various B_COMMON_*_DIRECTORY constants.
I'm sure some occurrence still remain. They can be adjusted when the
remaining B_COMMON_*_DIRECTORY constants are removed.
* The default is to use IPv6 addresses, but these don't quite work yet
in Haiku.
* Also, some debug messages improvements and fix a crash when the
payload has % inside it (parsing it as a printf string isn't such a
good idea)
* We didn't do anything with cookies received from the server, they are
now automatically added to the cookie jar.
* Also make sure the UrlContext (which holds the cookie jar) is
forwarded from UrlRequest to UrlProtocol when it gets set.
this gets cookies working in Service Kit-based WebKit.
* The use of a static variable for storing the chunk size made it shared between all instances of BUrlProtoclHttp.
* Inline the function at the single place where it is used, and allocate the variable on the stack instead.
The whole receiving loop should be split into chunked and non-chunked variants to improve code readability.