FreeCalypso > hg > gsm-net-reveng

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/TFO-xform/Theory	Sat Aug 31 20:45:25 2024 +0000
@@ -0,0 +1,137 @@
+TFO transform from uplink to downlink
+=====================================
+
+With all 3 classic GSM codecs (FRv1, HRv1, EFR) the original architecture calls
+for a network-side transcoder (TRAU) on each individual call leg.  The
+implications are:
+
+* The uplink runs from the MS to the speech decoder in the TRAU that turns the
+  mobile-generated speech into 64 kbit/s G.711.  The Rx DTX handler, a subblock
+  of that speech decoder in the TRAU, handles error concealment (substitution
+  and muting of lost frames) and comfort noise insertion during DTXu pauses,
+  and once this speech stream has been transcoded to G.711, all trace of these
+  GSM-specific effects disappears.
+
+* The downlink runs from the speech encoder in the TRAU to TCH DL radio output
+  from the BTS.  Because the DL frame stream comes from a free-running speech
+  encoder, it never contains errored frames or invalid SID or any other
+  aberrations: without DTXd, this frame stream is 100% good speech frames, and
+  with DTXd, it is a mixture of good speech and valid SID frames.
+
+But suppose you have two mobile call legs (mobile user Alice calls mobile user
+Bob), and you wish to eliminate the quality-degrading effect of double or tandem
+transcoding by passing compressed speech frames directly from Alice to Bob and
+vice-versa - what happens now?  The UL frame stream from each call leg will
+contain BFI frame gaps that are never allowed in DL, and if the network deploys
+DTX only in the UL direction (DTXu without DTXd, a very sensible choice for
+small-capacity single-carrier cells), the representation of DTXu pauses coming
+from each call leg (SID frames followed by prolonged BFI gaps) is also not
+suitable for direct passing to the DL of the opposite call leg.
+
+The solution offered in the TFO spec (GSM 08.62) is a special transform from
+call leg A UL to call leg B DL.  This transform has no official name that I
+could find, but I call it "TFO transform".  In the original GSM 08.62 spec (up
+to R99) this TFO transform is described in sections 8.2.1 and 8.2.2; when the
+spec changed to 28.062 with 3GPP Release 4 (adding AMR in GSM and AMR-only
+UMTS), the description of TFO transform for classic GSM codecs moved to section
+C.3.2.1.1.
+
+However, both spec versions only say what "shall" be done without any guidance
+on how to do it algorithmically: the spec language is "subject to manufacturer
+dependent future improvements and is not part of this recommendation."
+Distilling the problem to its essence, the addition of TFO introduces a new type
+of logical transform on codec frames (and a stateful one at that!) that never
+appeared previously anywhere in classic GSM architecture, is not mentioned in
+any other spec, and is not addressed at all by any of the reference codec
+sources.  This new transform is implemented only in the TFO block in TRAUs and
+nowhere else (in classic GSM architecture), and can be exercised only by
+establishing a TFO call between two interworking TRAUs.
+
+There are 3 main parts to this TFO transform, 3 main areas where anyone who
+seeks to implement this transform has to think hard and come up with an
+innovative solution:
+
+1) Error concealment in non-DTX speech: if an errored frame (BFI) appears after
+   non-SID speech frames (meaning non-DTX speech), the transform has to fill in
+   substitution/muting "speech" frames (meaning codec frames that look like
+   valid speech frames) in the stream going to call leg B DL.
+
+2) Comfort noise insertion: if the incoming frame stream from call leg A UL
+   contains SID frames (DTXu) but the same are not allowed on call leg B DL
+   (no DTXd), the transform has to insert "speech" frames (in the same
+   parenthetical meaning) that represent comfort noise, as intended by Alice's
+   phone that transmitted SID with certain CN parameters.
+
+3) Comfort noise muting: handling the case where the incoming UL frame stream
+   goes into CN insertion state (via one or more SID frames), but then goes
+   total BFI, with no more SID update frames appearing in TAF positions.  In
+   the case of a single codec leg from a source encoder to an end decoder,
+   standard decoders are required by their respective DTX specs to gradually
+   mute their CN output, to indicate channel breakdown to the user - the TFO
+   transform has to produce the same effect.
+
+All 3 of the just-listed functions are explicitly called out in the TFO spec, in
+each case with the same language of "shall" followed by "subject to manufacturer
+dependent future improvements and is not part of this recommendation."
+
+DTXd or no DTXd
+===============
+
+When the destination call leg operates without DTXd, the TFO transform can only
+emit frames that are well-formed speech frames for the respective codec, no SID
+frames.  In this case the transform has to do "everything", all 3 of the listed
+functions, although the last function of CN muting may be either separate or
+absorbed into CN generation function depending on the codec.
+
+OTOH, when call leg B has DTXd enabled/allowed, there is more room for
+additional complexity.  The simplest solution would be to not make use of DTXd
+capability and always emit speech frames - but the problem with this simple
+approach is teleological.  If a GSM network operator runs with DTXd enabled,
+presumably that operator seeks to reap the benefits of DTXd as in reduction of
+radio interference, in which case a TFO transform that fails to make use of DTXd
+capability would defeat the purpose.  Hence if someone sets out to implement a
+TFO transform that supports full utilization of DTXd, they would have to do
+additional work:
+
+* The function of CN insertion in the transform _mostly_ goes away: if a valid
+  SID frame comes, the TRAU caches it and repeats it continuously until the
+  next SID update, allowing the BTS to select which SID frames it will actually
+  transmit based on its SACCH alignment.  But more complex handling is still
+  needed if the first SID frame (the one that begins CN insertion period) came
+  in as invalid SID, and the function of CN muting takes on new significance.
+
+* CN muting: when the cached SID expires and no new SID updates arrive in TAF
+  positions, the TFO transform has to indicate somehow to Bob that Alice's call
+  leg is having trouble, which will be easy or difficult depending on what rules
+  are specified in the codec specs for SID interpolation in the final receiver.
+
+* Error concealment in non-DTX speech: at first glance this function appears to
+  be exactly the same whether DTXd is used or not.  But consider the case of
+  total channel breakdown, such that the incoming frame stream becomes all BFI:
+  how should this case be handled?  In the absence of DTXd, the output of the
+  TFO transform becomes a stream of silence frames, meaning some kind of
+  "speech" frames that produce total silence at the end decoder.  But if the
+  network operates with DTXd with the aim of reducing radio interference, these
+  silence "speech" frames should be replaced with SIDs whose parameters are
+  chosen to produce silent output.
+
+Current approach in Themyscira libraries
+========================================
+
+There is a desire to implement TFO transform for all 3 classic GSM codecs in
+Themyscira Wireless GSM codec libraries suite, and the first question to be
+decided is the policy with regard to DTXd.
+
+The current approach is to not implement any DTXd support, i.e., implement the
+TFO transform only in its no-DTXd basic form.  The reason for this decision is
+based on the reality of small-capacity single-carrier cells: given that the
+total number of humans who actually _want_ to use GSM (as opposed to whatever
+latest 4G/5G/etc is peddled by Big Tech mafia) is vanishingly small, there is
+currently no justification for building higher-capacity GSM cells that use more
+than a single 200 kHz radio carrier.  And if each GSM cell consists of only one
+radio carrier (the BCCH carrier, also called C0 in the specs), then physical
+DTXd (as in actually turning off radio Tx, as opposed to "logical" DTXd where
+that effect is merely faked for the MS by transmitting dummy bursts or
+induced-BFI frames) is simply impossible.  Therefore, in the present state of
+human condition, there is no justification for expending the effort to implement
+additional complexity for proper DTXd.