FreeCalypso > hg > fc-tourmaline
view doc/Nucleus-change @ 220:0ed36de51973
ABB semaphore protection overhaul
The ABB semaphone protection logic that came with TCS211 from TI
was broken in several ways:
* Some semaphore-protected functions were called from Application_Initialize()
context. NU_Obtain_Semaphore() called with NU_SUSPEND fails with
NU_INVALID_SUSPEND in this context, but the return value wasn't checked,
and NU_Release_Semaphore() would be called unconditionally at the end.
The latter call would increment the semaphore count past 1, making the
semaphore no longer binary and thus no longer effective for resource
protection. The fix is to check the return value from NU_Obtain_Semaphore()
and skip the NU_Release_Semaphore() call if the semaphore wasn't properly
obtained.
* Some SPI hardware manipulation was being done before entering the semaphore-
protected critical section. The fix is to reorder the code: first obtain
the semaphore, then do everything else.
* In the corner case of L1/DSP recovery, l1_abb_power_on() would call some
non-semaphore-protected ABB & SPI init functions. The fix is to skip those
calls in the case of recovery.
* A few additional corner cases existed, all of which are fixed by making
ABB semaphore protection 100% consistent for all ABB functions and code paths.
There is still one remaining problem of priority inversion: suppose a low-
priority task calls an ABB function, and some medium-priority task just happens
to preempt right in the middle of that semaphore-protected ABB operation. Then
the high-priority SPI task is locked out for a non-deterministic time until
that medium-priority task finishes its work and goes back to sleep. This
priority inversion problem remains outstanding for now.
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Mon, 26 Apr 2021 20:55:25 +0000 |
parents | a1799f6d6aa7 |
children |
line wrap: on
line source
The specific integration of ATI Nucleus PLUS RTOS in TI's stable TCS211 fw (which served as the baseline for several vendors' production fw) exhibits one hair-raising bug. While we don't know for sure where and how they maintained Nucleus library sources for compilation (the version we got has them censored out), we do see that Nucleus header files (nucleus.h and ??_defs.h) exist in two different locations in the source tree in two different versions: * One version exists under chipsetsw/os/nucleus * The other version exists under gpf/inc/nuc & gpf/inc/nuc/arm7 The two versions of these header files under these two paths in TCS211 are not the same! The main nucleus.h header file is the same in both places, cs_defs.h and tm_defs.h versions differ only in comments, but tc_defs.h is the real kicker: the version under gpf/inc/nuc has an extra field added to the TC_HCB aka NU_HISR structure, making this structure one word longer than in the other version! More specifically, in ATI's original Nucleus this structure is 22 words long with 4 unused dummy words at the end; TI's GPF version adds a fifth dummy word (thankfully toward the end, not shifting any actually-used members of the struct), putting the total struct size at 23 words. It would be one thing if TI had made this change consistently, but they didn't: some modules were compiled with one version of the headers and got the 22-word version of the struct, while other modules were compiled with the other header file version and got the 23-word version of the struct. How can their fw work with this bug in it? Answer: TCS211 fw works despite this Nucleus integration bug because: * None of the actually-used members of the struct change offsets between the two versions; * Some places in the code have 22-word structs allocated in memory while other places have 23-word structs, but when they pass pointers to these structs to Nucleus API functions, those functions don't access past the actually-used part at the beginning (the part before dummy words), and they never do anything like zeroing out the full size of the expected struct. * The only place in TCS211 fw where the total size of the struct matters is where NU_HISR is embedded in another structure, and there is one such place in GPF. Here breakage would result if different modules using these structs and arrays were compiled with different header file versions, but all modules that touch this part are compiled with the GPF version of nucleus.h, NU_DEBUG and tc_defs.h. Needless to say, resolving this bogosity has been an important part of FreeCalypso firmware deblobbing. Naturally the most ideal solution would have been to remove the bogus extra word added by TI and consistently use the original 22-word struct everywhere, but there is one further complication: I (Mother Mychaela) don't feel comfortable with moving away from the original blob version of the OSL component of GPF, and these COFF objects have been compiled with the 23-word version of TC_HCB aka NU_HISR. The following alternative approach has been implemented in FC Tourmaline: * The new source version of Nucleus by Comrade XVilka has been checked in under src/nucleus, and this new source version is the one we are using instead of TI's binary object version. * The new Nucleus header files src/nucleus/nucleus.h and src/nucleus/??_defs.h are the only ones used in Tourmaline - both old versions have been removed from active -I include paths. * The new src/nucleus/tc_defs.h header file has been patched to replicate TI's 23-word version of TC_HCB aka NU_HISR, and the NU_HISR_SIZE definition in src/nucleus/nucleus.h has also been adjusted to match. Thus we are using the 23-word version of TC_HCB aka NU_HISR everywhere, with 5 dummy words at the end rather than 4, adding 4 extra bytes of wasted RAM space to every instance of this struct throughout the firmware - but there are only a small number of these instances, thus the waste is negligible. In return we gain 100% consistency (the same version of the struct is used everywhere in our fw), and we retain the ability to keep the original OSL blobs which I am not ready to give up.