On (very) slow machines this might improve the experience.
This only has effect on Second Life, since that's currently the only
grid that uses https for capabilities (all of them, except GetMesh,
GetMesh2 and GetTexture).
When linking the viewer with libmemleak, which significantly slows
down calls to malloc (and calloc, realloc, new, memalign and so on)
and free (delete), I no longer could download my inventory.
With this patch this was solved. The reason was that completing
the SSL/TLS handshake took more than 10 seconds (15 to 20).
This fixes at least one case (crash report 8407), which comes
down to not cleanly informing a responder of failure when the
request url is empty (or so badly formed that it isn't a valid
url). As a result, the statemachine would abort() without
informing the responder - which is bad, sort of.
The previous cases where the responder needed to be informed
of a failure, namely "statemachine timed_out()" and "bad_socket()"
when a socket suddenly becomes bad for unknown reason, have been
replaced with the more general 'aborted()' function, which must
be called before the statemachine calls abort(). Clearly this
has been done for all cases of abort() now, so that if the
llerrs fires again in the future then that would have to be
after the statemachine calls finish(), which is still as "impossible"
as it was - hence the llerrs is still there to make sure.
The reason that this seldom happened on SL, and more often on
opensim, even more often on home-brew test grids, seems plausible:
malformed urls happen more in those cases.
I also took the opportunity to improve the robustness of cases
where the curl error code is checked: it makes no sense to check
what curl gives as error code when an internal error occurred.
This is necessary for opensim mega regions that can have up to 16 long
poll connections for a single service, while upping the maximum number
of connections per service just for that is clearly nonsense.
I also changed that long poll connections do not "use" the "Other"
capability type (which shows that the debug info in the HTTP debug
console for the Other capability type turns grey when it only has event
poll connections). As a result additional connections become available
for textures, mesh and the inventory (the other capability types) on
opensim, which has all types on the same service, because now Other
does no longer constantly reserves a full share of the available
connections. This makes the actual number of connections used for
textures and mesh a lot more like it is on Second Life.
Turns out that the only responders that want to get the redirect
status codes themselves are the ones that already had a
redirect_status_ok() exception.
I need to parse the debug output of libcurl for this, and choose
to not do that for a Release build - at which point it makes little
sense to even do it for a Debug build, since also the Alpha release
are 'Release'. Even though it would probably have no impact on FPS,
there would be like two persons looking at that number and understanding
it.
XMLRPCResponder is used to login, which is a HTTP 1.0 server
that doesn't support reuse of connections.
It's cleaner to close connections from our side too.
This changes,
0x7fffdcf93700 CURLIO 0x7fffcd86db40 * Connection #0 to host login.agni.lindenlab.com left intact
into
0x7fffdcf93700 CURLIO 0x7fffcd8b0bd0 * Closing connection #0
This splits the AIPerService queue up into four queues: one for each
"capability type". On Second Life that doesn't make a difference in
itself for textures because the texture service only serves one
capability type: textures. Other services however can serve two types,
while on Avination - that currently only has one services for everything
- this really makes a difference because that single service now has
four queues.
More importantly however is that the administration of how many requests
are in the "pipeline" (from approving that a new HTTP request may be
added for given service, till curl finished it) is now per capability
type (or service/capabitity type pair actually). This means downloads of
a certain capability type (textures, inventory, mesh, other) will no
longer stall because unapproved requests cluttered the queue for a given
service.
Moreover, before when a request did finished, it would only look for a
new request in the queue of the service that just finished. This simple
algorithm worked when there were no 'PerSerice' objects, and only one
'Curl' queue: because if anything was queued that that was because there
were running requests, and when one of those running requests finished
it made sense to see if one of those queued requests could be added now.
However, after adding multiple queues, one for each service, it could
happen that service A had queued requests while only requests from
service B were actually running: only requests of B would ever finish
and the requests of A would be queued forever.
With this patch the algorithm is to look alternating first in the
texture request queue and then in the inventory request queue - or vice
versa, and if there are none of those, look for a request of a different
type. If also that cannot be found, look for a request in another
service. This is still not optimal and subject to change.
The inventory bulk fetch is not thread-safe, so the it doesn't start
right away, causing the approvement not to be honored upon return from
post_approved (formerly post_nb).
This patch renames wantsMoreHTTPReqestsFor to approveHTTPRequestFor,
and has it return NULL or a AIPerService::Approvement object.
The latter is now passed to the CurlEasyHandle object instead of just a
boolean mQueueIfTooMuchBandwidthUsage, and then the Approvement is
honored by the state machine right after the request is actually added
to the command queue.
This should avoid a flood of inventory requests in the case
approveHTTPRequestFor is called multiple times before the main thread
adds the requests to the command queue. I don't think that actually ever
happens, but I added debug code (to find some problem) that is so damn
strictly checking everything that I need to be this precise in order to
do that testing.
* Removed the 'RequestQueue' from other PerServiceRequestQueue occurances
in the code.
* Made wantsMoreHTTPRequestsFor and checkBandwidthUsage threadsafe (by
grouping the static variables of AIPerService into thread ThreadSafe
groups.
Most notably getMesh (the only one possibly using any significant
bandwidth), but in general every type of requests that just have to
happen anyway and in the order they are requested: they are just passed
to the curl thread, but now the curl thread will queue them and hold
back if the (general) service they use is loaded too heavily.
Adds throttling based on on average bandwidth usage per HTTP service.
Since only HTTP textures are using this, they are still starved by other
services like inventory and mesh dowloads. Also, it will be needed to
move the maximum number of connections per service the to the PerService
class, and dynamically tune them: reducing the number of connections is
the first thing to do when using too much bandwidth.
I also added a graph for HTTP texture bandwidth to the stats floater.
For some reason the average bandwidth (over 1 second) look almost like
scattered noise... weird for something that is averaged...
Rationale: LL is doing all throttling per service (host:port), not per
service hostname. Also, textures and capabilities use the same host: the
sim you are connected to. Splitting the queues up on a per-service basis
will stop the textures from blocking a capability request.
Renamed PerHostRequestQueue and PerHostRequestQueuePtr to
AIPerHostRequestQueue and AIPerHostRequestQueuePtr respectively and
moved them to global namespace. This in preparation for using them in
the texture fetcher (as function of the hostname of the url that is
contructed there).
Replaces LLTextureFetch::getNumHTTPRequests.
Returns AICurlInterface::Stats::running_handles.
This is work in progress that temporarily doesn't compile because
LLTextureFetch::getNumHTTPRequests is still being used somewhere.
This is now necessary since the curl thread no longer syncs with the
main thread: it is possible that a request finishes after a texture
fetch thread was shot down but before curl was stopped, and curl
calling BufferedCurlEasyRequest::processOutput while objects that the
responder uses were already destructed (most notably
LLTextureFetch itself).
When uploading finishes, but is not detected, the timeout should be for
"reply delay", the time that the server takes before it replies, and not
CurlTimeoutLowSpeedTime. This patch adds code that takes this failure
into account (which happened only ONCE for me on Metropolis while flying
around and using trickle (not sure if that is relevant), so it's not
that likely to improvement anything in practise. Note that it is
detected by an assertion when it happens, so that we can safely assume
it normally never happened on SL).
* Generalized PUT / POST configuration by adding
CurlEasyRequest::setPut, which now also supports keep-alive (which
still isn't used).
* Upload content length is now stored in CurlEasyRequest::mContentLength
* CurlEasyRequest::has_stalled() now return false if it was possbile
that the 'upload finished' detect failed AND calls upload_finished()
itself in that case, so it is no longer 'const'.
* If low speed is detect exactly when the last bytes are being attempted
to be sent (unlikely scenario), then the upload gets 4 more seconds
after which is switches to CurlTimeoutReplyDelay.
* Added EDoesAuthentication and EAllowCompressedReply to replace
booleans, for readability and type-safety, as did EKeepAlive. Note
that this change inverts the meaning of the compression related parameter.
* Unrelated: removed an unnecessary #include "llurlrequest.h" from
llxmlrpcresponder.h
It turns out that it's possible to receive data before an upload
finished when the server sends HTTP_BAD_REQUEST in the middle of the
upload (this happens to me when I try to upload a 6000x6000 image to my
profile feed, with is a 44MB file). So, in that case the finished upload
detection did not fail and we shouldn't assert.
Although it should never happen that a file descriptor is suddenly
closed, it appeared that this happens on linux 64bit when using
FMODex... Not really sure how useful this is, but at least now the
viewer just continues to work, as if -say- the socket was closed
remotely. Before the curl thread would go into a tight loop that it
wouldn't recover from until the watchdog thread terminated the viewer.
When linking against the prebuilt libcrypto.so.1.0.0 and
libssl.so.1.0.0 there is a problem running the viewer in
gdb (on debian) because gdb links with libpython2.7.so
which needs a newer openssl on my system (OPENSSL_1.0.1).
The patch allows to work around this problem by defining
-DUSE_SYSTEM_OPENSSL:BOOL=ON when configuring, which then
causes the viewer to be compiled against the system libs.
(If you already installed the prebuilt, you have to manually
remove them with script/install.py --uninstall openSSL).
The prebuilt libcurl expects a function SSLv2_client_method
however, which is not present in my openssl system libs.
A stub was added which is possible because the function
in question isn't used anyway.
Add CURLTR debug channel for libcurl API calls,
and use CURLIO only for libcurl debug output.
Note: need to set gDebugCurlTerse to true for
filtering to take effect, then pass 'debug_on'
to the LLHttpClient methods that require debugging.
Adds a std::map for hostname (or urls) --> PerHostRequestQueue
objects. The latter keeps track of the number of added curl easy
requests and decides if a new request should be throttled or
not, as well as provides the queue to queue throttled requests.
At the moment CurlConcurrentConnectionsPerHost is set to 16,
because things really don't work without LL supporting connection
reuse if we limit it to 2. CurlConcurrentConnectionsPerHost is
also set to non-persistent so that we can easily change it in the future
(once we decide on it's final value it can be set to persistent).
Moved AICurlPrivate::Stats to AICurlInterface::Stats and added several
counters to keep track of the number of existing instances of
respectively AICurlEasyRequest, AICurlEasyRequestStateMachine,
BufferedCurlEasyRequest, ResponderBase and
ThreadSafeBufferedCurlEasyRequest.
Rename check_run_count to check_msg_queue, because the whole 'run count'
approach is flawed anyway (the author of libcurl told me that THE way
to check for finished curl handles is to just call curl_multi_info_read
every time: it's extremely fast. Any test that attempts to avoid that
call is nonsense anyway.
The reason the assertion failed might have been caused by the fact
that we're comparing the current number of easy handles with the
number of running handles of 'a while ago'. It is possible that a
easy handle was removed in the meantime.
In order to check if that hypothesis is right, I moved the assertion
to directly below the call to curl_multi_socket_action where it
should hold. If this new assertion doesn't trigger than the hypothesis
was right and this is fixed.
Every curl transaction is a AICurlEasyRequestStateMachine which has a
AICurlEasyRequest as member, which is a reference counting pointer to
a ThreadSafeBufferedCurlEasyRequest. And now BufferedCurlEasyRequest is
derived from CurlEasyRequest which is derived from CurlEasyHandle, but
neither are used separatedly.