FreeCalypso > hg > gsm-net-reveng
view doc/TFO-xform/Theory @ 33:e828468b0afd
doc/TFO-xform/Theory: article written
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Sat, 31 Aug 2024 20:45:25 +0000 |
parents | |
children |
line wrap: on
line source
TFO transform from uplink to downlink ===================================== With all 3 classic GSM codecs (FRv1, HRv1, EFR) the original architecture calls for a network-side transcoder (TRAU) on each individual call leg. The implications are: * The uplink runs from the MS to the speech decoder in the TRAU that turns the mobile-generated speech into 64 kbit/s G.711. The Rx DTX handler, a subblock of that speech decoder in the TRAU, handles error concealment (substitution and muting of lost frames) and comfort noise insertion during DTXu pauses, and once this speech stream has been transcoded to G.711, all trace of these GSM-specific effects disappears. * The downlink runs from the speech encoder in the TRAU to TCH DL radio output from the BTS. Because the DL frame stream comes from a free-running speech encoder, it never contains errored frames or invalid SID or any other aberrations: without DTXd, this frame stream is 100% good speech frames, and with DTXd, it is a mixture of good speech and valid SID frames. But suppose you have two mobile call legs (mobile user Alice calls mobile user Bob), and you wish to eliminate the quality-degrading effect of double or tandem transcoding by passing compressed speech frames directly from Alice to Bob and vice-versa - what happens now? The UL frame stream from each call leg will contain BFI frame gaps that are never allowed in DL, and if the network deploys DTX only in the UL direction (DTXu without DTXd, a very sensible choice for small-capacity single-carrier cells), the representation of DTXu pauses coming from each call leg (SID frames followed by prolonged BFI gaps) is also not suitable for direct passing to the DL of the opposite call leg. The solution offered in the TFO spec (GSM 08.62) is a special transform from call leg A UL to call leg B DL. This transform has no official name that I could find, but I call it "TFO transform". In the original GSM 08.62 spec (up to R99) this TFO transform is described in sections 8.2.1 and 8.2.2; when the spec changed to 28.062 with 3GPP Release 4 (adding AMR in GSM and AMR-only UMTS), the description of TFO transform for classic GSM codecs moved to section C.3.2.1.1. However, both spec versions only say what "shall" be done without any guidance on how to do it algorithmically: the spec language is "subject to manufacturer dependent future improvements and is not part of this recommendation." Distilling the problem to its essence, the addition of TFO introduces a new type of logical transform on codec frames (and a stateful one at that!) that never appeared previously anywhere in classic GSM architecture, is not mentioned in any other spec, and is not addressed at all by any of the reference codec sources. This new transform is implemented only in the TFO block in TRAUs and nowhere else (in classic GSM architecture), and can be exercised only by establishing a TFO call between two interworking TRAUs. There are 3 main parts to this TFO transform, 3 main areas where anyone who seeks to implement this transform has to think hard and come up with an innovative solution: 1) Error concealment in non-DTX speech: if an errored frame (BFI) appears after non-SID speech frames (meaning non-DTX speech), the transform has to fill in substitution/muting "speech" frames (meaning codec frames that look like valid speech frames) in the stream going to call leg B DL. 2) Comfort noise insertion: if the incoming frame stream from call leg A UL contains SID frames (DTXu) but the same are not allowed on call leg B DL (no DTXd), the transform has to insert "speech" frames (in the same parenthetical meaning) that represent comfort noise, as intended by Alice's phone that transmitted SID with certain CN parameters. 3) Comfort noise muting: handling the case where the incoming UL frame stream goes into CN insertion state (via one or more SID frames), but then goes total BFI, with no more SID update frames appearing in TAF positions. In the case of a single codec leg from a source encoder to an end decoder, standard decoders are required by their respective DTX specs to gradually mute their CN output, to indicate channel breakdown to the user - the TFO transform has to produce the same effect. All 3 of the just-listed functions are explicitly called out in the TFO spec, in each case with the same language of "shall" followed by "subject to manufacturer dependent future improvements and is not part of this recommendation." DTXd or no DTXd =============== When the destination call leg operates without DTXd, the TFO transform can only emit frames that are well-formed speech frames for the respective codec, no SID frames. In this case the transform has to do "everything", all 3 of the listed functions, although the last function of CN muting may be either separate or absorbed into CN generation function depending on the codec. OTOH, when call leg B has DTXd enabled/allowed, there is more room for additional complexity. The simplest solution would be to not make use of DTXd capability and always emit speech frames - but the problem with this simple approach is teleological. If a GSM network operator runs with DTXd enabled, presumably that operator seeks to reap the benefits of DTXd as in reduction of radio interference, in which case a TFO transform that fails to make use of DTXd capability would defeat the purpose. Hence if someone sets out to implement a TFO transform that supports full utilization of DTXd, they would have to do additional work: * The function of CN insertion in the transform _mostly_ goes away: if a valid SID frame comes, the TRAU caches it and repeats it continuously until the next SID update, allowing the BTS to select which SID frames it will actually transmit based on its SACCH alignment. But more complex handling is still needed if the first SID frame (the one that begins CN insertion period) came in as invalid SID, and the function of CN muting takes on new significance. * CN muting: when the cached SID expires and no new SID updates arrive in TAF positions, the TFO transform has to indicate somehow to Bob that Alice's call leg is having trouble, which will be easy or difficult depending on what rules are specified in the codec specs for SID interpolation in the final receiver. * Error concealment in non-DTX speech: at first glance this function appears to be exactly the same whether DTXd is used or not. But consider the case of total channel breakdown, such that the incoming frame stream becomes all BFI: how should this case be handled? In the absence of DTXd, the output of the TFO transform becomes a stream of silence frames, meaning some kind of "speech" frames that produce total silence at the end decoder. But if the network operates with DTXd with the aim of reducing radio interference, these silence "speech" frames should be replaced with SIDs whose parameters are chosen to produce silent output. Current approach in Themyscira libraries ======================================== There is a desire to implement TFO transform for all 3 classic GSM codecs in Themyscira Wireless GSM codec libraries suite, and the first question to be decided is the policy with regard to DTXd. The current approach is to not implement any DTXd support, i.e., implement the TFO transform only in its no-DTXd basic form. The reason for this decision is based on the reality of small-capacity single-carrier cells: given that the total number of humans who actually _want_ to use GSM (as opposed to whatever latest 4G/5G/etc is peddled by Big Tech mafia) is vanishingly small, there is currently no justification for building higher-capacity GSM cells that use more than a single 200 kHz radio carrier. And if each GSM cell consists of only one radio carrier (the BCCH carrier, also called C0 in the specs), then physical DTXd (as in actually turning off radio Tx, as opposed to "logical" DTXd where that effect is merely faked for the MS by transmitting dummy bursts or induced-BFI frames) is simply impossible. Therefore, in the present state of human condition, there is no justification for expending the effort to implement additional complexity for proper DTXd.