# HG changeset patch # User Mychaela Falconia # Date 1725137125 0 # Node ID e828468b0afdf46029b6e0154032953a66e3f2ff # Parent f6bb790e186a2d88913e5fec7ba4e0398ef2b896 doc/TFO-xform/Theory: article written diff -r f6bb790e186a -r e828468b0afd doc/TFO-xform/Theory --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/TFO-xform/Theory Sat Aug 31 20:45:25 2024 +0000 @@ -0,0 +1,137 @@ +TFO transform from uplink to downlink +===================================== + +With all 3 classic GSM codecs (FRv1, HRv1, EFR) the original architecture calls +for a network-side transcoder (TRAU) on each individual call leg. The +implications are: + +* The uplink runs from the MS to the speech decoder in the TRAU that turns the + mobile-generated speech into 64 kbit/s G.711. The Rx DTX handler, a subblock + of that speech decoder in the TRAU, handles error concealment (substitution + and muting of lost frames) and comfort noise insertion during DTXu pauses, + and once this speech stream has been transcoded to G.711, all trace of these + GSM-specific effects disappears. + +* The downlink runs from the speech encoder in the TRAU to TCH DL radio output + from the BTS. Because the DL frame stream comes from a free-running speech + encoder, it never contains errored frames or invalid SID or any other + aberrations: without DTXd, this frame stream is 100% good speech frames, and + with DTXd, it is a mixture of good speech and valid SID frames. + +But suppose you have two mobile call legs (mobile user Alice calls mobile user +Bob), and you wish to eliminate the quality-degrading effect of double or tandem +transcoding by passing compressed speech frames directly from Alice to Bob and +vice-versa - what happens now? The UL frame stream from each call leg will +contain BFI frame gaps that are never allowed in DL, and if the network deploys +DTX only in the UL direction (DTXu without DTXd, a very sensible choice for +small-capacity single-carrier cells), the representation of DTXu pauses coming +from each call leg (SID frames followed by prolonged BFI gaps) is also not +suitable for direct passing to the DL of the opposite call leg. + +The solution offered in the TFO spec (GSM 08.62) is a special transform from +call leg A UL to call leg B DL. This transform has no official name that I +could find, but I call it "TFO transform". In the original GSM 08.62 spec (up +to R99) this TFO transform is described in sections 8.2.1 and 8.2.2; when the +spec changed to 28.062 with 3GPP Release 4 (adding AMR in GSM and AMR-only +UMTS), the description of TFO transform for classic GSM codecs moved to section +C.3.2.1.1. + +However, both spec versions only say what "shall" be done without any guidance +on how to do it algorithmically: the spec language is "subject to manufacturer +dependent future improvements and is not part of this recommendation." +Distilling the problem to its essence, the addition of TFO introduces a new type +of logical transform on codec frames (and a stateful one at that!) that never +appeared previously anywhere in classic GSM architecture, is not mentioned in +any other spec, and is not addressed at all by any of the reference codec +sources. This new transform is implemented only in the TFO block in TRAUs and +nowhere else (in classic GSM architecture), and can be exercised only by +establishing a TFO call between two interworking TRAUs. + +There are 3 main parts to this TFO transform, 3 main areas where anyone who +seeks to implement this transform has to think hard and come up with an +innovative solution: + +1) Error concealment in non-DTX speech: if an errored frame (BFI) appears after + non-SID speech frames (meaning non-DTX speech), the transform has to fill in + substitution/muting "speech" frames (meaning codec frames that look like + valid speech frames) in the stream going to call leg B DL. + +2) Comfort noise insertion: if the incoming frame stream from call leg A UL + contains SID frames (DTXu) but the same are not allowed on call leg B DL + (no DTXd), the transform has to insert "speech" frames (in the same + parenthetical meaning) that represent comfort noise, as intended by Alice's + phone that transmitted SID with certain CN parameters. + +3) Comfort noise muting: handling the case where the incoming UL frame stream + goes into CN insertion state (via one or more SID frames), but then goes + total BFI, with no more SID update frames appearing in TAF positions. In + the case of a single codec leg from a source encoder to an end decoder, + standard decoders are required by their respective DTX specs to gradually + mute their CN output, to indicate channel breakdown to the user - the TFO + transform has to produce the same effect. + +All 3 of the just-listed functions are explicitly called out in the TFO spec, in +each case with the same language of "shall" followed by "subject to manufacturer +dependent future improvements and is not part of this recommendation." + +DTXd or no DTXd +=============== + +When the destination call leg operates without DTXd, the TFO transform can only +emit frames that are well-formed speech frames for the respective codec, no SID +frames. In this case the transform has to do "everything", all 3 of the listed +functions, although the last function of CN muting may be either separate or +absorbed into CN generation function depending on the codec. + +OTOH, when call leg B has DTXd enabled/allowed, there is more room for +additional complexity. The simplest solution would be to not make use of DTXd +capability and always emit speech frames - but the problem with this simple +approach is teleological. If a GSM network operator runs with DTXd enabled, +presumably that operator seeks to reap the benefits of DTXd as in reduction of +radio interference, in which case a TFO transform that fails to make use of DTXd +capability would defeat the purpose. Hence if someone sets out to implement a +TFO transform that supports full utilization of DTXd, they would have to do +additional work: + +* The function of CN insertion in the transform _mostly_ goes away: if a valid + SID frame comes, the TRAU caches it and repeats it continuously until the + next SID update, allowing the BTS to select which SID frames it will actually + transmit based on its SACCH alignment. But more complex handling is still + needed if the first SID frame (the one that begins CN insertion period) came + in as invalid SID, and the function of CN muting takes on new significance. + +* CN muting: when the cached SID expires and no new SID updates arrive in TAF + positions, the TFO transform has to indicate somehow to Bob that Alice's call + leg is having trouble, which will be easy or difficult depending on what rules + are specified in the codec specs for SID interpolation in the final receiver. + +* Error concealment in non-DTX speech: at first glance this function appears to + be exactly the same whether DTXd is used or not. But consider the case of + total channel breakdown, such that the incoming frame stream becomes all BFI: + how should this case be handled? In the absence of DTXd, the output of the + TFO transform becomes a stream of silence frames, meaning some kind of + "speech" frames that produce total silence at the end decoder. But if the + network operates with DTXd with the aim of reducing radio interference, these + silence "speech" frames should be replaced with SIDs whose parameters are + chosen to produce silent output. + +Current approach in Themyscira libraries +======================================== + +There is a desire to implement TFO transform for all 3 classic GSM codecs in +Themyscira Wireless GSM codec libraries suite, and the first question to be +decided is the policy with regard to DTXd. + +The current approach is to not implement any DTXd support, i.e., implement the +TFO transform only in its no-DTXd basic form. The reason for this decision is +based on the reality of small-capacity single-carrier cells: given that the +total number of humans who actually _want_ to use GSM (as opposed to whatever +latest 4G/5G/etc is peddled by Big Tech mafia) is vanishingly small, there is +currently no justification for building higher-capacity GSM cells that use more +than a single 200 kHz radio carrier. And if each GSM cell consists of only one +radio carrier (the BCCH carrier, also called C0 in the specs), then physical +DTXd (as in actually turning off radio Tx, as opposed to "logical" DTXd where +that effect is merely faked for the MS by transmitting dummy bursts or +induced-BFI frames) is simply impossible. Therefore, in the present state of +human condition, there is no justification for expending the effort to implement +additional complexity for proper DTXd.