comparison doc/FR1-Rx-DTX-detail @ 552:6ab066180ec2

doc: new article FR1-Rx-DTX-detail
author Mychaela Falconia <falcon@freecalypso.org>
date Mon, 07 Oct 2024 00:25:50 +0000
parents
children 62943a1ad64e
comparison
equal deleted inserted replaced
551:8f44d7064c56 552:6ab066180ec2
1 Rx DTX handler implementation details
2 =====================================
3
4 As explained in the basic FR1-Rx-DTX article, an Rx DTX handler has to be
5 inserted between the output of the Rx radio subsystem and the input to the
6 basic GSM 06.10 speech decoder. In ThemWi codec library architecture, we
7 normally run a full decoder for GSM-FR that combines the Rx DTX handler and
8 the basic 06.10 decoder, and the Rx DTX handler block by itself also serves
9 as a TFO transform.
10
11 This Rx DTX handler is based on several GSM specs: 06.11 for the error
12 concealment function, 06.12 for the comfort noise insertion function, and 06.31
13 for overall Rx DTX handling. However, these specs give a lot of leeway to
14 implementors, hence it is prudent to document the specific choices made in the
15 present ThemWi implementation.
16
17 Error concealment implementation
18 ================================
19
20 Error concealment is also called substitution and muting of lost frames. The
21 implementation of this function in Themyscira libgsmfr2 is based on the Example
22 solution presented in chapter 6 of 3GPP TS 46.011 (formerly GSM 06.11), applying
23 the most literal reading to this spec section.
24
25 When unusable frames (as defined in GSM 06.31) occur during speech state (i.e.,
26 not following a SID), the present logic kicks in. For the first BFI following
27 good speech, the last speech frame is repeated verbatim. On the second BFI the
28 muting logic of Xmaxc reduction kicks in, decrementing each of the 4 Xmaxc
29 parameters by 4 with each emitted frame. RPE grid position parameters are
30 randomized at the same time. The frame in which all 4 Xmaxc parameters equal 0
31 (either because they were already 0 or because they got reduced to 0 by the
32 muting sequence) is the last frame emitted in this state; all subsequent BFIs
33 will be turned into fixed-bit-pattern silence frames as given in TS 46.011
34 Table 1.
35
36 If a BFI comes in when the Rx DTX handler is in its reset (or homed) state, the
37 output proceeds directly to silence frames.
38
39 Comfort noise insertion
40 =======================
41
42 Comfort noise generation and updating is specified in GSM 06.12 section 6.1.
43 Most of this section is very straightforward, and is implemented in ThemWi
44 libgsmfr2 exactly as specified, except for the very last sentence in that
45 section:
46
47 "When updating the comfort noise, the parameters above should preferably be
48 interpolated over a few frames to obtain smooth transitions."
49
50 ThemWi implementation of Rx DTX handler in libgsmfr2 does not do this "should
51 preferably" part: no interpolation is done on CN parameters; as soon as each
52 SID update comes in, the new parameters are used immediately for all generated
53 CN frames.
54
55 Because the spec says "should preferably" rather than "shall", we can "get away"
56 with not implementing CN interpolation. But there is an even more profound
57 issue: we have yet to find anyone else's implementation, which we could use as
58 guidance, that does CN parameter interpolation for FRv1. (Such interpolation
59 is mandatory and defined in bit-exact terms for HRv1 and EFR, but FRv1 is a
60 different story.)
61
62 We had a hope that Nokia TCSM2 (a historical hw implementation of GSM TRAU
63 network element) might implement CN interpolation for FRv1 - but our
64 experimental findings on that platform are inconclusive:
65
66 * When acting as a TFO transform for FRv1, this TRAU does not interpolate CN
67 parameters, it makes abrupt changes in CN output just like our implementation
68 - but it effects a strange delay of 24 frames, suggesting that they have some
69 code paths that assume CN interpolation would be applied.
70
71 * When the TRAU acts as a regular speech decoder (not TFO), it is not clear how
72 it performs any of Rx DTX functions: Nokia chose to not implement the optional
73 in-band homing feature for FRv1, thus we have no way to explore bit-exact
74 behaviour of their speech decoder via test sequences.
75
76 Another enticing idea would be to statically reverse-engineer the DSP ROM of TI
77 Calypso chip and thus recover its complete speech Rx chain - but of course the
78 effort would be extremely massive, and is not likely to happen any time soon.
79
80 Until we either get around to the far-future task of Calypso DSP static
81 reversing or find some other implementation of GSM-FR Rx DTX handler that does
82 CN interpolation and whose operation we can replicate, we shall stick to the
83 simple approach of not doing CN interpolation.
84
85 Handling of SID frames with Xmaxc discrepancy
86 =============================================
87
88 Per GSM 06.12 section 5.2, all 4 subframe Xmaxc parameters in a SID frame are
89 supposed to be equal, encoding the quantized form of mean(Xmax). However, what
90 should Rx DTX implementations do when they receive an otherwise-valid SID frame
91 in which these 4 parameters are not all equal? In our implementation, we handle
92 such discrepancy as follows:
93
94 * In those frame positions in which we receive a fresh SID (initial or update),
95 the CN frame we emit is a direct transformation of the received SID, and all
96 4 Xmaxc parameters are passed through intact.
97
98 * When we emit CN frames based on remembered LARc and Xmaxc parameters, we use
99 the last-subframe Xmaxc from the most recently received SID frame.
100
101 Lost SID handling and CN muting
102 ===============================
103
104 In accord with GSM 06.11 sections 5.3, when we receive an unusable frame in a
105 TAF position during CN insertion state, we set a flag that remembers this
106 condition, but don't switch to CN muting right away. Per section 5.4 of the
107 same spec, we initiate CN muting when a second lost SID event occurs (unusable
108 frame received in a TAF position) without intervening good speech frames or
109 accepted SID frames.
110
111 When we do enter CN muting state, we decrement CN Xmaxc (always the same for
112 all 4 subframes) by 4 on each output frame, following the Example solution of
113 3GPP TS 46.011 (formerly GSM 06.11) chapter 6. Once this CN Xmaxc reaches 0,
114 we switch to emitting fixed-bit-pattern silence frames of TS 46.011 Table 1.
115
116 Handling of invalid SID frames
117 ==============================
118
119 In agreement with GSM 06.31 spec, we recognize invalid SID and invoke the
120 appropriate handler in all 3 combinations: BFI=0 SID=1, BFI=1 SID=1, and
121 BFI=1 SID=2. The real complexity, however, lies in what that invalid SID
122 handler actually does:
123
124 * If invalid SID arrives when we are already in CN insertion state, we treat it
125 the same as an unusable frame (continue CN output with current parameters),
126 but the flag of lost SID is reset, as required by our interpretation of the
127 specs.
128
129 * If invalid SID arrives in CN muting state, i.e., after two consecutive lost
130 SID events, the muting continues unaffected, i.e., we don't "rejuvenate"
131 already-started-muting comfort noise upon receiving invalid SID.
132
133 * If invalid SID arrives in good speech state, meaning that we are supposed to
134 begin a CN insertion period but we didn't get usable parameters for it, we
135 obtain LARc and mean(Xmax) parameters from the last good speech frame,
136 following the second option permitted by the "NOTE" at the end of GSM 06.31
137 section 6.1.2. To get Xmaxc for CN, we dequantize all 4 Xmaxc parameters of
138 the last good speech frame, average them, then requantize.
139
140 * If invalid SID arrives in speech muting state, the invalid SID is ignored and
141 speech muting continues unaffected.
142
143 * If invalid SID arrives in NO_DATA state (initial state out of reset, or the
144 state after either speech or CN muting has fully decayed), we emit the fixed
145 silence frame of TS 46.011 Table 1.