comparison doc/TFO-xform/Theory @ 33:e828468b0afd

doc/TFO-xform/Theory: article written
author Mychaela Falconia <falcon@freecalypso.org>
date Sat, 31 Aug 2024 20:45:25 +0000
parents
children
comparison
equal deleted inserted replaced
32:f6bb790e186a 33:e828468b0afd
1 TFO transform from uplink to downlink
2 =====================================
3
4 With all 3 classic GSM codecs (FRv1, HRv1, EFR) the original architecture calls
5 for a network-side transcoder (TRAU) on each individual call leg. The
6 implications are:
7
8 * The uplink runs from the MS to the speech decoder in the TRAU that turns the
9 mobile-generated speech into 64 kbit/s G.711. The Rx DTX handler, a subblock
10 of that speech decoder in the TRAU, handles error concealment (substitution
11 and muting of lost frames) and comfort noise insertion during DTXu pauses,
12 and once this speech stream has been transcoded to G.711, all trace of these
13 GSM-specific effects disappears.
14
15 * The downlink runs from the speech encoder in the TRAU to TCH DL radio output
16 from the BTS. Because the DL frame stream comes from a free-running speech
17 encoder, it never contains errored frames or invalid SID or any other
18 aberrations: without DTXd, this frame stream is 100% good speech frames, and
19 with DTXd, it is a mixture of good speech and valid SID frames.
20
21 But suppose you have two mobile call legs (mobile user Alice calls mobile user
22 Bob), and you wish to eliminate the quality-degrading effect of double or tandem
23 transcoding by passing compressed speech frames directly from Alice to Bob and
24 vice-versa - what happens now? The UL frame stream from each call leg will
25 contain BFI frame gaps that are never allowed in DL, and if the network deploys
26 DTX only in the UL direction (DTXu without DTXd, a very sensible choice for
27 small-capacity single-carrier cells), the representation of DTXu pauses coming
28 from each call leg (SID frames followed by prolonged BFI gaps) is also not
29 suitable for direct passing to the DL of the opposite call leg.
30
31 The solution offered in the TFO spec (GSM 08.62) is a special transform from
32 call leg A UL to call leg B DL. This transform has no official name that I
33 could find, but I call it "TFO transform". In the original GSM 08.62 spec (up
34 to R99) this TFO transform is described in sections 8.2.1 and 8.2.2; when the
35 spec changed to 28.062 with 3GPP Release 4 (adding AMR in GSM and AMR-only
36 UMTS), the description of TFO transform for classic GSM codecs moved to section
37 C.3.2.1.1.
38
39 However, both spec versions only say what "shall" be done without any guidance
40 on how to do it algorithmically: the spec language is "subject to manufacturer
41 dependent future improvements and is not part of this recommendation."
42 Distilling the problem to its essence, the addition of TFO introduces a new type
43 of logical transform on codec frames (and a stateful one at that!) that never
44 appeared previously anywhere in classic GSM architecture, is not mentioned in
45 any other spec, and is not addressed at all by any of the reference codec
46 sources. This new transform is implemented only in the TFO block in TRAUs and
47 nowhere else (in classic GSM architecture), and can be exercised only by
48 establishing a TFO call between two interworking TRAUs.
49
50 There are 3 main parts to this TFO transform, 3 main areas where anyone who
51 seeks to implement this transform has to think hard and come up with an
52 innovative solution:
53
54 1) Error concealment in non-DTX speech: if an errored frame (BFI) appears after
55 non-SID speech frames (meaning non-DTX speech), the transform has to fill in
56 substitution/muting "speech" frames (meaning codec frames that look like
57 valid speech frames) in the stream going to call leg B DL.
58
59 2) Comfort noise insertion: if the incoming frame stream from call leg A UL
60 contains SID frames (DTXu) but the same are not allowed on call leg B DL
61 (no DTXd), the transform has to insert "speech" frames (in the same
62 parenthetical meaning) that represent comfort noise, as intended by Alice's
63 phone that transmitted SID with certain CN parameters.
64
65 3) Comfort noise muting: handling the case where the incoming UL frame stream
66 goes into CN insertion state (via one or more SID frames), but then goes
67 total BFI, with no more SID update frames appearing in TAF positions. In
68 the case of a single codec leg from a source encoder to an end decoder,
69 standard decoders are required by their respective DTX specs to gradually
70 mute their CN output, to indicate channel breakdown to the user - the TFO
71 transform has to produce the same effect.
72
73 All 3 of the just-listed functions are explicitly called out in the TFO spec, in
74 each case with the same language of "shall" followed by "subject to manufacturer
75 dependent future improvements and is not part of this recommendation."
76
77 DTXd or no DTXd
78 ===============
79
80 When the destination call leg operates without DTXd, the TFO transform can only
81 emit frames that are well-formed speech frames for the respective codec, no SID
82 frames. In this case the transform has to do "everything", all 3 of the listed
83 functions, although the last function of CN muting may be either separate or
84 absorbed into CN generation function depending on the codec.
85
86 OTOH, when call leg B has DTXd enabled/allowed, there is more room for
87 additional complexity. The simplest solution would be to not make use of DTXd
88 capability and always emit speech frames - but the problem with this simple
89 approach is teleological. If a GSM network operator runs with DTXd enabled,
90 presumably that operator seeks to reap the benefits of DTXd as in reduction of
91 radio interference, in which case a TFO transform that fails to make use of DTXd
92 capability would defeat the purpose. Hence if someone sets out to implement a
93 TFO transform that supports full utilization of DTXd, they would have to do
94 additional work:
95
96 * The function of CN insertion in the transform _mostly_ goes away: if a valid
97 SID frame comes, the TRAU caches it and repeats it continuously until the
98 next SID update, allowing the BTS to select which SID frames it will actually
99 transmit based on its SACCH alignment. But more complex handling is still
100 needed if the first SID frame (the one that begins CN insertion period) came
101 in as invalid SID, and the function of CN muting takes on new significance.
102
103 * CN muting: when the cached SID expires and no new SID updates arrive in TAF
104 positions, the TFO transform has to indicate somehow to Bob that Alice's call
105 leg is having trouble, which will be easy or difficult depending on what rules
106 are specified in the codec specs for SID interpolation in the final receiver.
107
108 * Error concealment in non-DTX speech: at first glance this function appears to
109 be exactly the same whether DTXd is used or not. But consider the case of
110 total channel breakdown, such that the incoming frame stream becomes all BFI:
111 how should this case be handled? In the absence of DTXd, the output of the
112 TFO transform becomes a stream of silence frames, meaning some kind of
113 "speech" frames that produce total silence at the end decoder. But if the
114 network operates with DTXd with the aim of reducing radio interference, these
115 silence "speech" frames should be replaced with SIDs whose parameters are
116 chosen to produce silent output.
117
118 Current approach in Themyscira libraries
119 ========================================
120
121 There is a desire to implement TFO transform for all 3 classic GSM codecs in
122 Themyscira Wireless GSM codec libraries suite, and the first question to be
123 decided is the policy with regard to DTXd.
124
125 The current approach is to not implement any DTXd support, i.e., implement the
126 TFO transform only in its no-DTXd basic form. The reason for this decision is
127 based on the reality of small-capacity single-carrier cells: given that the
128 total number of humans who actually _want_ to use GSM (as opposed to whatever
129 latest 4G/5G/etc is peddled by Big Tech mafia) is vanishingly small, there is
130 currently no justification for building higher-capacity GSM cells that use more
131 than a single 200 kHz radio carrier. And if each GSM cell consists of only one
132 radio carrier (the BCCH carrier, also called C0 in the specs), then physical
133 DTXd (as in actually turning off radio Tx, as opposed to "logical" DTXd where
134 that effect is merely faked for the MS by transmitting dummy bursts or
135 induced-BFI frames) is simply impossible. Therefore, in the present state of
136 human condition, there is no justification for expending the effort to implement
137 additional complexity for proper DTXd.