FreeCalypso > hg > gsm-codec-lib
view doc/FR1-Rx-DTX-detail @ 553:ebcf414b7d99
doc/TFO-transform: describe details for FRv1, both modes
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Mon, 07 Oct 2024 08:24:24 +0000 |
parents | 6ab066180ec2 |
children | 62943a1ad64e |
line wrap: on
line source
Rx DTX handler implementation details ===================================== As explained in the basic FR1-Rx-DTX article, an Rx DTX handler has to be inserted between the output of the Rx radio subsystem and the input to the basic GSM 06.10 speech decoder. In ThemWi codec library architecture, we normally run a full decoder for GSM-FR that combines the Rx DTX handler and the basic 06.10 decoder, and the Rx DTX handler block by itself also serves as a TFO transform. This Rx DTX handler is based on several GSM specs: 06.11 for the error concealment function, 06.12 for the comfort noise insertion function, and 06.31 for overall Rx DTX handling. However, these specs give a lot of leeway to implementors, hence it is prudent to document the specific choices made in the present ThemWi implementation. Error concealment implementation ================================ Error concealment is also called substitution and muting of lost frames. The implementation of this function in Themyscira libgsmfr2 is based on the Example solution presented in chapter 6 of 3GPP TS 46.011 (formerly GSM 06.11), applying the most literal reading to this spec section. When unusable frames (as defined in GSM 06.31) occur during speech state (i.e., not following a SID), the present logic kicks in. For the first BFI following good speech, the last speech frame is repeated verbatim. On the second BFI the muting logic of Xmaxc reduction kicks in, decrementing each of the 4 Xmaxc parameters by 4 with each emitted frame. RPE grid position parameters are randomized at the same time. The frame in which all 4 Xmaxc parameters equal 0 (either because they were already 0 or because they got reduced to 0 by the muting sequence) is the last frame emitted in this state; all subsequent BFIs will be turned into fixed-bit-pattern silence frames as given in TS 46.011 Table 1. If a BFI comes in when the Rx DTX handler is in its reset (or homed) state, the output proceeds directly to silence frames. Comfort noise insertion ======================= Comfort noise generation and updating is specified in GSM 06.12 section 6.1. Most of this section is very straightforward, and is implemented in ThemWi libgsmfr2 exactly as specified, except for the very last sentence in that section: "When updating the comfort noise, the parameters above should preferably be interpolated over a few frames to obtain smooth transitions." ThemWi implementation of Rx DTX handler in libgsmfr2 does not do this "should preferably" part: no interpolation is done on CN parameters; as soon as each SID update comes in, the new parameters are used immediately for all generated CN frames. Because the spec says "should preferably" rather than "shall", we can "get away" with not implementing CN interpolation. But there is an even more profound issue: we have yet to find anyone else's implementation, which we could use as guidance, that does CN parameter interpolation for FRv1. (Such interpolation is mandatory and defined in bit-exact terms for HRv1 and EFR, but FRv1 is a different story.) We had a hope that Nokia TCSM2 (a historical hw implementation of GSM TRAU network element) might implement CN interpolation for FRv1 - but our experimental findings on that platform are inconclusive: * When acting as a TFO transform for FRv1, this TRAU does not interpolate CN parameters, it makes abrupt changes in CN output just like our implementation - but it effects a strange delay of 24 frames, suggesting that they have some code paths that assume CN interpolation would be applied. * When the TRAU acts as a regular speech decoder (not TFO), it is not clear how it performs any of Rx DTX functions: Nokia chose to not implement the optional in-band homing feature for FRv1, thus we have no way to explore bit-exact behaviour of their speech decoder via test sequences. Another enticing idea would be to statically reverse-engineer the DSP ROM of TI Calypso chip and thus recover its complete speech Rx chain - but of course the effort would be extremely massive, and is not likely to happen any time soon. Until we either get around to the far-future task of Calypso DSP static reversing or find some other implementation of GSM-FR Rx DTX handler that does CN interpolation and whose operation we can replicate, we shall stick to the simple approach of not doing CN interpolation. Handling of SID frames with Xmaxc discrepancy ============================================= Per GSM 06.12 section 5.2, all 4 subframe Xmaxc parameters in a SID frame are supposed to be equal, encoding the quantized form of mean(Xmax). However, what should Rx DTX implementations do when they receive an otherwise-valid SID frame in which these 4 parameters are not all equal? In our implementation, we handle such discrepancy as follows: * In those frame positions in which we receive a fresh SID (initial or update), the CN frame we emit is a direct transformation of the received SID, and all 4 Xmaxc parameters are passed through intact. * When we emit CN frames based on remembered LARc and Xmaxc parameters, we use the last-subframe Xmaxc from the most recently received SID frame. Lost SID handling and CN muting =============================== In accord with GSM 06.11 sections 5.3, when we receive an unusable frame in a TAF position during CN insertion state, we set a flag that remembers this condition, but don't switch to CN muting right away. Per section 5.4 of the same spec, we initiate CN muting when a second lost SID event occurs (unusable frame received in a TAF position) without intervening good speech frames or accepted SID frames. When we do enter CN muting state, we decrement CN Xmaxc (always the same for all 4 subframes) by 4 on each output frame, following the Example solution of 3GPP TS 46.011 (formerly GSM 06.11) chapter 6. Once this CN Xmaxc reaches 0, we switch to emitting fixed-bit-pattern silence frames of TS 46.011 Table 1. Handling of invalid SID frames ============================== In agreement with GSM 06.31 spec, we recognize invalid SID and invoke the appropriate handler in all 3 combinations: BFI=0 SID=1, BFI=1 SID=1, and BFI=1 SID=2. The real complexity, however, lies in what that invalid SID handler actually does: * If invalid SID arrives when we are already in CN insertion state, we treat it the same as an unusable frame (continue CN output with current parameters), but the flag of lost SID is reset, as required by our interpretation of the specs. * If invalid SID arrives in CN muting state, i.e., after two consecutive lost SID events, the muting continues unaffected, i.e., we don't "rejuvenate" already-started-muting comfort noise upon receiving invalid SID. * If invalid SID arrives in good speech state, meaning that we are supposed to begin a CN insertion period but we didn't get usable parameters for it, we obtain LARc and mean(Xmax) parameters from the last good speech frame, following the second option permitted by the "NOTE" at the end of GSM 06.31 section 6.1.2. To get Xmaxc for CN, we dequantize all 4 Xmaxc parameters of the last good speech frame, average them, then requantize. * If invalid SID arrives in speech muting state, the invalid SID is ignored and speech muting continues unaffected. * If invalid SID arrives in NO_DATA state (initial state out of reset, or the state after either speech or CN muting has fully decayed), we emit the fixed silence frame of TS 46.011 Table 1.