FreeCalypso > hg > gsm-codec-lib
changeset 552:6ab066180ec2
doc: new article FR1-Rx-DTX-detail
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Mon, 07 Oct 2024 00:25:50 +0000 |
parents | 8f44d7064c56 |
children | ebcf414b7d99 |
files | doc/FR1-Rx-DTX-detail |
diffstat | 1 files changed, 145 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/FR1-Rx-DTX-detail Mon Oct 07 00:25:50 2024 +0000 @@ -0,0 +1,145 @@ +Rx DTX handler implementation details +===================================== + +As explained in the basic FR1-Rx-DTX article, an Rx DTX handler has to be +inserted between the output of the Rx radio subsystem and the input to the +basic GSM 06.10 speech decoder. In ThemWi codec library architecture, we +normally run a full decoder for GSM-FR that combines the Rx DTX handler and +the basic 06.10 decoder, and the Rx DTX handler block by itself also serves +as a TFO transform. + +This Rx DTX handler is based on several GSM specs: 06.11 for the error +concealment function, 06.12 for the comfort noise insertion function, and 06.31 +for overall Rx DTX handling. However, these specs give a lot of leeway to +implementors, hence it is prudent to document the specific choices made in the +present ThemWi implementation. + +Error concealment implementation +================================ + +Error concealment is also called substitution and muting of lost frames. The +implementation of this function in Themyscira libgsmfr2 is based on the Example +solution presented in chapter 6 of 3GPP TS 46.011 (formerly GSM 06.11), applying +the most literal reading to this spec section. + +When unusable frames (as defined in GSM 06.31) occur during speech state (i.e., +not following a SID), the present logic kicks in. For the first BFI following +good speech, the last speech frame is repeated verbatim. On the second BFI the +muting logic of Xmaxc reduction kicks in, decrementing each of the 4 Xmaxc +parameters by 4 with each emitted frame. RPE grid position parameters are +randomized at the same time. The frame in which all 4 Xmaxc parameters equal 0 +(either because they were already 0 or because they got reduced to 0 by the +muting sequence) is the last frame emitted in this state; all subsequent BFIs +will be turned into fixed-bit-pattern silence frames as given in TS 46.011 +Table 1. + +If a BFI comes in when the Rx DTX handler is in its reset (or homed) state, the +output proceeds directly to silence frames. + +Comfort noise insertion +======================= + +Comfort noise generation and updating is specified in GSM 06.12 section 6.1. +Most of this section is very straightforward, and is implemented in ThemWi +libgsmfr2 exactly as specified, except for the very last sentence in that +section: + +"When updating the comfort noise, the parameters above should preferably be + interpolated over a few frames to obtain smooth transitions." + +ThemWi implementation of Rx DTX handler in libgsmfr2 does not do this "should +preferably" part: no interpolation is done on CN parameters; as soon as each +SID update comes in, the new parameters are used immediately for all generated +CN frames. + +Because the spec says "should preferably" rather than "shall", we can "get away" +with not implementing CN interpolation. But there is an even more profound +issue: we have yet to find anyone else's implementation, which we could use as +guidance, that does CN parameter interpolation for FRv1. (Such interpolation +is mandatory and defined in bit-exact terms for HRv1 and EFR, but FRv1 is a +different story.) + +We had a hope that Nokia TCSM2 (a historical hw implementation of GSM TRAU +network element) might implement CN interpolation for FRv1 - but our +experimental findings on that platform are inconclusive: + +* When acting as a TFO transform for FRv1, this TRAU does not interpolate CN + parameters, it makes abrupt changes in CN output just like our implementation + - but it effects a strange delay of 24 frames, suggesting that they have some + code paths that assume CN interpolation would be applied. + +* When the TRAU acts as a regular speech decoder (not TFO), it is not clear how + it performs any of Rx DTX functions: Nokia chose to not implement the optional + in-band homing feature for FRv1, thus we have no way to explore bit-exact + behaviour of their speech decoder via test sequences. + +Another enticing idea would be to statically reverse-engineer the DSP ROM of TI +Calypso chip and thus recover its complete speech Rx chain - but of course the +effort would be extremely massive, and is not likely to happen any time soon. + +Until we either get around to the far-future task of Calypso DSP static +reversing or find some other implementation of GSM-FR Rx DTX handler that does +CN interpolation and whose operation we can replicate, we shall stick to the +simple approach of not doing CN interpolation. + +Handling of SID frames with Xmaxc discrepancy +============================================= + +Per GSM 06.12 section 5.2, all 4 subframe Xmaxc parameters in a SID frame are +supposed to be equal, encoding the quantized form of mean(Xmax). However, what +should Rx DTX implementations do when they receive an otherwise-valid SID frame +in which these 4 parameters are not all equal? In our implementation, we handle +such discrepancy as follows: + +* In those frame positions in which we receive a fresh SID (initial or update), + the CN frame we emit is a direct transformation of the received SID, and all + 4 Xmaxc parameters are passed through intact. + +* When we emit CN frames based on remembered LARc and Xmaxc parameters, we use + the last-subframe Xmaxc from the most recently received SID frame. + +Lost SID handling and CN muting +=============================== + +In accord with GSM 06.11 sections 5.3, when we receive an unusable frame in a +TAF position during CN insertion state, we set a flag that remembers this +condition, but don't switch to CN muting right away. Per section 5.4 of the +same spec, we initiate CN muting when a second lost SID event occurs (unusable +frame received in a TAF position) without intervening good speech frames or +accepted SID frames. + +When we do enter CN muting state, we decrement CN Xmaxc (always the same for +all 4 subframes) by 4 on each output frame, following the Example solution of +3GPP TS 46.011 (formerly GSM 06.11) chapter 6. Once this CN Xmaxc reaches 0, +we switch to emitting fixed-bit-pattern silence frames of TS 46.011 Table 1. + +Handling of invalid SID frames +============================== + +In agreement with GSM 06.31 spec, we recognize invalid SID and invoke the +appropriate handler in all 3 combinations: BFI=0 SID=1, BFI=1 SID=1, and +BFI=1 SID=2. The real complexity, however, lies in what that invalid SID +handler actually does: + +* If invalid SID arrives when we are already in CN insertion state, we treat it + the same as an unusable frame (continue CN output with current parameters), + but the flag of lost SID is reset, as required by our interpretation of the + specs. + +* If invalid SID arrives in CN muting state, i.e., after two consecutive lost + SID events, the muting continues unaffected, i.e., we don't "rejuvenate" + already-started-muting comfort noise upon receiving invalid SID. + +* If invalid SID arrives in good speech state, meaning that we are supposed to + begin a CN insertion period but we didn't get usable parameters for it, we + obtain LARc and mean(Xmax) parameters from the last good speech frame, + following the second option permitted by the "NOTE" at the end of GSM 06.31 + section 6.1.2. To get Xmaxc for CN, we dequantize all 4 Xmaxc parameters of + the last good speech frame, average them, then requantize. + +* If invalid SID arrives in speech muting state, the invalid SID is ignored and + speech muting continues unaffected. + +* If invalid SID arrives in NO_DATA state (initial state out of reset, or the + state after either speech or CN muting has fully decayed), we emit the fixed + silence frame of TS 46.011 Table 1.