Fixed LLFastTimers et al.

No longer crash when std::vector moves in memory when it grows.
Fixed display of timer values (in ms).
This commit is contained in:
Aleric Inglewood
2012-02-04 18:28:24 +01:00
parent b755bcd495
commit 5a455eac91
5 changed files with 201 additions and 68 deletions

View File

@@ -23,6 +23,125 @@
* Linden Research, Inc., 945 Battery Street, San Francisco, CA 94111 USA
* $/LicenseInfo$
*/
//
// LLFastTimer documentation, written by Aleric (Feb 2012).
//
// Disclaimer: this is horrible code and I distantiate myself from it's design.
// It's neither robust nor object oriented. I just document what I find, in
// order to be able to fix the bugs (that logically result from such a design).
//
// Note that the choosen names of the variables are non-intuitive and make
// understanding the code harder. However, I didn't change them in order to
// make merging less of a nightmare in the future -- Aleric.
//
//
// First of all, absolutely nothing in this code is even remotely thread-safe:
// FastTimers should only be used from the main thread and never from another
// thread.
//
// NamedTimerFactory is a singleton, accessed through NamedTimerFactory::instance().
//
// It has four pointer members which are initialized once to point to
// four objects with a life-time equal to the application/singleton:
//
// mTimerRoot --> NamedTimer("root")
// mActiveTimerRoot --> NamedTimer("Frame")
// mRootFrameState --> FrameState(mActiveTimerRoot)
// mAppTimer --> LLFastTimer(mRootFrameState)
//
// A NamedTimer has a name and a life-time of approximately that of the application.
// There is exactly one instance per unique name.
// NamedTimer's are ordered in a hierarchy with each one parent and zero or more
// children (the "root" has parent NULL).
// The parent of mActiveTimerRoot is mTimerRoot, which has one child: mActiveTimerRoot.
// NamedTimer::getDepth() returns the number of parents; mTimerRoot has a depth of 0,
// mActiveTimerRoot has a depth of 1 and so on. NamedTimer::getRootNamedTimer() just
// returns mActiveTimerRoot.
//
// Each NamedTimer is linked to exactly one FrameState object, namely
// LLFastTimer::getFrameStateList()[named_timer.getFrameStateIndex()], where
// getFrameStateList() is a static function returning a global std::vector<FrameState>.
// This vector is ordered "Depth First" (the FrameState objects (belonging to
// NamedTimer objects) with smallest depth first). The vector is resorted a few
// times in the beginning (and indexes in FrameState updated) since timers are added
// whenever they are first used, not in "Depth First" order, but stabilizes after a
// while. This implies that FrameState pointers can't really be used: FrameState
// objects move around in memory whenever something is inserted or removed from the
// std::vector and/or when the vector is resorted. However, FrameState pointers ARE
// being used and code exists that tries to update those pointers in the above
// mentioned cases (this part had bugs, which I now fixed).
//
// FrameState objects point back to their corresponding NamedTimer through mTimer.
// They have also parents: the FrameState object corresponding to the parent of mTimer.
//
// Thus, so far we have (assuming "namedtimerX" was created first):
//
// NamedTimer's: FrameState's:
//
// NULL
// ^
// |
// depth=0: "root" (mTimerRoot) <-------> getFrameStateList()[0]
// ^ ^
// | (parent) | (parent)
// | |
// depth=1: "Frame" (mActiveTimerRoot) <-------> mRootFrameState
// ^ ^ ^ ^
// | | | |
// | (parent) | (parent) | (parent) | (parent)
// | | | |
// depth=2: "namedtimerX" | <-------> getFrameStateList()[2] |
// "namedtimerY" <-------> getFrameStateList()[3]
//
// where the NamedTimer's point to the corresponding FrameState's by means of
// NamedTimer::mFrameStateIndex, and the FrameState's point back through FrameState::mTimer.
//
// Note the missing getFrameStateList()[1], which is ignored and replaced by
// a specific call to 'new FrameState' in initSingleton(). The reason for that is
// probably because otherwise mRootFrameState has to be updated every time the
// frame state list vector is moved in memory. This special case adds some complexity to,
// for instance, getFrameState() which now needs to test if the caller is mActiveTimerRoot.
//
// DeclareTimer objects are NameTimer/FrameState pointer pairs with again a lifetime
// of approximately that of the application. The are usually static, even global,
// and are passed an name as string; the name is looked up and added if not already
// existing, or else the previously created pair is returned. Obviously, "root" and
// "Frame" are the only ones that don't have a corresponding DeclareTimer object.
//
// LLFastTimer objects are short lived objects, created in a scope and destroyed
// at the end in order to measure the time that the application spent in that
// scope. They are passed DeclareTimer objects to know which timer to append to.
// LLFastTimer::mFrameState is a pointer to the corresponding timer.
// The static LLFastTimer::sCurTimerData is a CurTimerData struct that has
// a duplicate of that pointer as well as a pointer to the corresponding NamedTimer,
// of the last LLFastTimer object that was created (and not destroyed again);
// in other words: the running timer with the largest depth.
// When a new LLFastTimer object is created while one is already running,
// then this sCurTimerData is saved in the already running one (as
// LLFastTimer::mLastTimerData) and restored upon destruction of that child timer.
//
// The following FrameState pointers are being used:
//
// FrameState::mParent
// DeclareTimer::mFrameState
// CurTimerData::mFrameState
// LLFastTimer::mFrameState
//
// All of those can be invalidated whenever something is added to the std::vector<FrameState>,
// and when that vector is sorted.
//
// Adding new FrameState objects is done in NamedTimer(std::string const& name), called from
// createNamedTimer(), called whenever a DeclareTimer is constructed. At the end of the
// DeclareTimer constructor update_cached_pointers_if_changed() is called, which calls
// updateCachedPointers() if the std::vector moved in memory since last time it was called.
//
// Sorting is done in NamedTimer::resetFrame(), which theoretically can be called from
// anywhere. Also here updateCachedPointers() is called, directly after sorting the vector.
//
// I fixed updateCachedPointers() to correct all of the above pointers and removed
// another FrameState pointer that was unnecessary.
#include "linden_common.h"
#include "llfasttimer.h"
@@ -64,14 +183,6 @@ BOOL LLFastTimer::sMetricLog = FALSE;
LLMutex* LLFastTimer::sLogLock = NULL;
std::queue<LLSD> LLFastTimer::sLogQueue;
#define USE_RDTSC 0
#if LL_LINUX || LL_SOLARIS
U64 LLFastTimer::sClockResolution = 1000000000; // Nanosecond resolution
#else
U64 LLFastTimer::sClockResolution = 1000000; // Microsecond resolution
#endif
std::vector<LLFastTimer::FrameState>* LLFastTimer::sTimerInfos = NULL;
U64 LLFastTimer::sTimerCycles = 0;
U32 LLFastTimer::sTimerCalls = 0;
@@ -134,10 +245,11 @@ public:
// so we have to work around that by using a specialized implementation
// for the special case were mTimerRoot != mActiveTimerRoot -- Aleric
mRootFrameState->mParent = &LLFastTimer::getFrameStateList()[0]; // &mTimerRoot->getFrameState()
mRootFrameState->mParent->mActiveCount = 1;
// And the following four lines are mActiveTimerRoot->setParent(mTimerRoot);
llassert(!mActiveTimerRoot->mParent);
mActiveTimerRoot->mParent = mTimerRoot; // mParent = parent;
mRootFrameState->mParent = mRootFrameState->mParent; // getFrameState().mParent = &parent->getFrameState();
//mRootFrameState->mParent = mRootFrameState->mParent; // getFrameState().mParent = &parent->getFrameState();
mTimerRoot->getChildren().push_back(mActiveTimerRoot); // parent->getChildren().push_back(this);
mTimerRoot->mNeedsSorting = true; // parent->mNeedsSorting = true;
@@ -195,7 +307,7 @@ private:
LLFastTimer::NamedTimer* mActiveTimerRoot;
LLFastTimer::NamedTimer* mTimerRoot;
LLFastTimer* mAppTimer;
LLFastTimer::FrameState* mRootFrameState;
LLFastTimer::FrameState* mRootFrameState; // Points to memory allocated with new, so this pointer is not invalidated.
};
void update_cached_pointers_if_changed()
@@ -204,9 +316,9 @@ void update_cached_pointers_if_changed()
static LLFastTimer::FrameState* sFirstTimerAddress = NULL;
if (&*(LLFastTimer::getFrameStateList().begin()) != sFirstTimerAddress)
{
LLFastTimer::DeclareTimer::updateCachedPointers();
LLFastTimer::updateCachedPointers();
sFirstTimerAddress = &*(LLFastTimer::getFrameStateList().begin());
}
sFirstTimerAddress = &*(LLFastTimer::getFrameStateList().begin());
}
LLFastTimer::DeclareTimer::DeclareTimer(const std::string& name, bool open )
@@ -225,53 +337,69 @@ LLFastTimer::DeclareTimer::DeclareTimer(const std::string& name)
}
// static
void LLFastTimer::DeclareTimer::updateCachedPointers()
void LLFastTimer::updateCachedPointers()
{
// propagate frame state pointers to timer declarations
for (instance_iter it = beginInstances(); it != endInstances(); ++it)
// Update DeclareTimer::mFrameState pointers.
for (DeclareTimer::instance_iter it = DeclareTimer::beginInstances(); it != DeclareTimer::endInstances(); ++it)
{
// update cached pointer
it->mFrameState = &it->mTimer.getFrameState();
}
// also update frame states of timers on stack
LLFastTimer* cur_timerp = LLFastTimer::sCurTimerData.mCurTimer;
while(cur_timerp->mLastTimerData.mCurTimer != cur_timerp)
// Update CurTimerData::mFrameState and LLFastTimer::mFrameState of timers on the stack.
FrameState& root_frame_state(NamedTimerFactory::instance().getRootFrameState()); // This one is not invalidated.
CurTimerData* cur_timer_data = &LLFastTimer::sCurTimerData;
// If the the following condition holds then cur_timer_data->mCurTimer == mAppTimer and
// we can stop since mAppTimer->mFrameState is allocated with new and does not invalidate.
while(cur_timer_data->mFrameState != &root_frame_state)
{
cur_timerp->mFrameState = &cur_timerp->mFrameState->mTimer->getFrameState();
cur_timerp = cur_timerp->mLastTimerData.mCurTimer;
cur_timer_data->mFrameState = cur_timer_data->mCurTimer->mFrameState = &cur_timer_data->mNamedTimer->getFrameState();
cur_timer_data = &cur_timer_data->mCurTimer->mLastTimerData;
}
// Update FrameState::mParent
info_list_t& frame_state_list(getFrameStateList());
FrameState* const vector_start = &*frame_state_list.begin();
int const vector_size = frame_state_list.size();
FrameState const* const old_vector_start = root_frame_state.mParent;
if (vector_start != old_vector_start)
{
// Vector was moved; if it was sorted then FrameState::mParent will get fixed after returning from this function (see LLFastTimer::NamedTimer::resetFrame).
root_frame_state.mParent = vector_start;
ptrdiff_t offset = vector_start - old_vector_start;
llassert(frame_state_list[vector_size - 1].mParent == vector_start); // The one that was added at the end is already OK.
for (int i = 2; i < vector_size - 1; ++i)
{
FrameState*& parent(frame_state_list[i].mParent);
if (parent != &root_frame_state)
{
parent += offset;
}
}
}
}
//static
#if (LL_DARWIN || LL_LINUX || LL_SOLARIS) && !(defined(__i386__) || defined(__amd64__))
U64 LLFastTimer::countsPerSecond() // counts per second for the *32-bit* timer
{
return sClockResolution >> 8;
}
#else // windows or x86-mac or x86-linux or x86-solaris
U64 LLFastTimer::countsPerSecond() // counts per second for the *32-bit* timer
{
#if USE_RDTSC || !LL_WINDOWS
//getCPUFrequency returns MHz and sCPUClockFrequency wants to be in Hz
static U64 sCPUClockFrequency = U64(LLProcessorInfo().getCPUFrequency()*1000000.0);
// we drop the low-order byte in our timers, so report a lower frequency
// See lltimer.cpp.
#if LL_LINUX || LL_DARWIN || LL_SOLARIS
std::string LLFastTimer::sClockType = "gettimeofday";
#elif LL_WINDOWS
std::string LLFastTimer::sClockType = "QueryPerformanceCounter";
#else
// If we're not using RDTSC, each fasttimer tick is just a performance counter tick.
// Not redefining the clock frequency itself (in llprocessor.cpp/calculate_cpu_frequency())
// since that would change displayed MHz stats for CPUs
#error "Platform not supported"
#endif
//static
U64 LLFastTimer::countsPerSecond() // counts per second for the *32-bit* timer
{
static bool firstcall = true;
static U64 sCPUClockFrequency;
if (firstcall)
{
QueryPerformanceFrequency((LARGE_INTEGER*)&sCPUClockFrequency);
sCPUClockFrequency = calc_clock_frequency();
firstcall = false;
}
#endif
return sCPUClockFrequency >> 8;
}
#endif
LLFastTimer::FrameState::FrameState(LLFastTimer::NamedTimer* timerp)
: mActiveCount(0),
@@ -409,11 +537,12 @@ void LLFastTimer::NamedTimer::buildHierarchy()
// bootstrap tree construction by attaching to last timer to be on stack
// when this timer was called
if (timer.getFrameState().mLastCaller && timer.mParent == NamedTimerFactory::instance().getRootTimer())
FrameState& frame_state(timer.getFrameState());
if (frame_state.mLastCaller && timer.mParent == NamedTimerFactory::instance().getRootTimer())
{
timer.setParent(timer.getFrameState().mLastCaller->mTimer);
timer.setParent(frame_state.mLastCaller);
// no need to push up tree on first use, flag can be set spuriously
timer.getFrameState().mMoveUpTree = false;
frame_state.mMoveUpTree = false;
}
}
}
@@ -572,15 +701,14 @@ void LLFastTimer::NamedTimer::resetFrame()
timerp->mFrameStateIndex = index;
index++;
llassert_always(timerp->mFrameStateIndex < (S32)getFrameStateList().size());
}
llassert(index == (S32)getFrameStateList().size());
// sort timers by DFS traversal order to improve cache coherency
std::sort(getFrameStateList().begin(), getFrameStateList().end(), SortTimersDFS());
// update pointers into framestatelist now that we've sorted it
DeclareTimer::updateCachedPointers();
updateCachedPointers();
// reset for next frame
{
@@ -652,7 +780,11 @@ LLFastTimer::info_list_t& LLFastTimer::getFrameStateList()
{
if (!sTimerInfos)
{
sTimerInfos = new info_list_t();
sTimerInfos = new info_list_t;
#if 0
// Avoid the vector being moved in memory by reserving enough memory right away.
sTimerInfos->reserve(1024);
#endif
}
return *sTimerInfos;
}
@@ -784,22 +916,27 @@ const LLFastTimer::NamedTimer* LLFastTimer::getTimerByName(const std::string& na
LLFastTimer::LLFastTimer(LLFastTimer::FrameState* state)
: mFrameState(state)
{
// Only called for mAppTimer with mRootFrameState, which never invalidates.
llassert(state == &NamedTimerFactory::instance().getRootFrameState());
U32 start_time = getCPUClockCount32();
mStartTime = start_time;
mFrameState->mActiveCount++;
LLFastTimer::sCurTimerData.mCurTimer = this;
LLFastTimer::sCurTimerData.mNamedTimer = mFrameState->mTimer;
LLFastTimer::sCurTimerData.mFrameState = mFrameState;
LLFastTimer::sCurTimerData.mChildTime = 0;
// This is the root FastTimer (mAppTimer), mark it as such by having
// mLastTimerData be equal to sCurTimerData (which is a rather arbitrary
// and not very logical way to do that --Aleric).
mLastTimerData = LLFastTimer::sCurTimerData;
}
//////////////////////////////////////////////////////////////////////////////
//
// Important note: These implementations must be FAST!
//
//LL_COMMON_API U64 get_clock_count(); // in lltimer.cpp
// These use QueryPerformanceCounter, which is arguably fine and also works on AMD architectures.
U32 LLFastTimer::getCPUClockCount32()
@@ -812,9 +949,3 @@ U64 LLFastTimer::getCPUClockCount64()
return get_clock_count();
}
#if LL_WINDOWS
std::string LLFastTimer::sClockType = "QueryPerformanceCounter";
#else
std::string LLFastTimer::sClockType = "gettimeofday";
#endif