FreeCalypso > hg > gsm-codec-lib
comparison doc/FR1-Rx-DTX-detail @ 552:6ab066180ec2
doc: new article FR1-Rx-DTX-detail
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Mon, 07 Oct 2024 00:25:50 +0000 |
parents | |
children | 62943a1ad64e |
comparison
equal
deleted
inserted
replaced
551:8f44d7064c56 | 552:6ab066180ec2 |
---|---|
1 Rx DTX handler implementation details | |
2 ===================================== | |
3 | |
4 As explained in the basic FR1-Rx-DTX article, an Rx DTX handler has to be | |
5 inserted between the output of the Rx radio subsystem and the input to the | |
6 basic GSM 06.10 speech decoder. In ThemWi codec library architecture, we | |
7 normally run a full decoder for GSM-FR that combines the Rx DTX handler and | |
8 the basic 06.10 decoder, and the Rx DTX handler block by itself also serves | |
9 as a TFO transform. | |
10 | |
11 This Rx DTX handler is based on several GSM specs: 06.11 for the error | |
12 concealment function, 06.12 for the comfort noise insertion function, and 06.31 | |
13 for overall Rx DTX handling. However, these specs give a lot of leeway to | |
14 implementors, hence it is prudent to document the specific choices made in the | |
15 present ThemWi implementation. | |
16 | |
17 Error concealment implementation | |
18 ================================ | |
19 | |
20 Error concealment is also called substitution and muting of lost frames. The | |
21 implementation of this function in Themyscira libgsmfr2 is based on the Example | |
22 solution presented in chapter 6 of 3GPP TS 46.011 (formerly GSM 06.11), applying | |
23 the most literal reading to this spec section. | |
24 | |
25 When unusable frames (as defined in GSM 06.31) occur during speech state (i.e., | |
26 not following a SID), the present logic kicks in. For the first BFI following | |
27 good speech, the last speech frame is repeated verbatim. On the second BFI the | |
28 muting logic of Xmaxc reduction kicks in, decrementing each of the 4 Xmaxc | |
29 parameters by 4 with each emitted frame. RPE grid position parameters are | |
30 randomized at the same time. The frame in which all 4 Xmaxc parameters equal 0 | |
31 (either because they were already 0 or because they got reduced to 0 by the | |
32 muting sequence) is the last frame emitted in this state; all subsequent BFIs | |
33 will be turned into fixed-bit-pattern silence frames as given in TS 46.011 | |
34 Table 1. | |
35 | |
36 If a BFI comes in when the Rx DTX handler is in its reset (or homed) state, the | |
37 output proceeds directly to silence frames. | |
38 | |
39 Comfort noise insertion | |
40 ======================= | |
41 | |
42 Comfort noise generation and updating is specified in GSM 06.12 section 6.1. | |
43 Most of this section is very straightforward, and is implemented in ThemWi | |
44 libgsmfr2 exactly as specified, except for the very last sentence in that | |
45 section: | |
46 | |
47 "When updating the comfort noise, the parameters above should preferably be | |
48 interpolated over a few frames to obtain smooth transitions." | |
49 | |
50 ThemWi implementation of Rx DTX handler in libgsmfr2 does not do this "should | |
51 preferably" part: no interpolation is done on CN parameters; as soon as each | |
52 SID update comes in, the new parameters are used immediately for all generated | |
53 CN frames. | |
54 | |
55 Because the spec says "should preferably" rather than "shall", we can "get away" | |
56 with not implementing CN interpolation. But there is an even more profound | |
57 issue: we have yet to find anyone else's implementation, which we could use as | |
58 guidance, that does CN parameter interpolation for FRv1. (Such interpolation | |
59 is mandatory and defined in bit-exact terms for HRv1 and EFR, but FRv1 is a | |
60 different story.) | |
61 | |
62 We had a hope that Nokia TCSM2 (a historical hw implementation of GSM TRAU | |
63 network element) might implement CN interpolation for FRv1 - but our | |
64 experimental findings on that platform are inconclusive: | |
65 | |
66 * When acting as a TFO transform for FRv1, this TRAU does not interpolate CN | |
67 parameters, it makes abrupt changes in CN output just like our implementation | |
68 - but it effects a strange delay of 24 frames, suggesting that they have some | |
69 code paths that assume CN interpolation would be applied. | |
70 | |
71 * When the TRAU acts as a regular speech decoder (not TFO), it is not clear how | |
72 it performs any of Rx DTX functions: Nokia chose to not implement the optional | |
73 in-band homing feature for FRv1, thus we have no way to explore bit-exact | |
74 behaviour of their speech decoder via test sequences. | |
75 | |
76 Another enticing idea would be to statically reverse-engineer the DSP ROM of TI | |
77 Calypso chip and thus recover its complete speech Rx chain - but of course the | |
78 effort would be extremely massive, and is not likely to happen any time soon. | |
79 | |
80 Until we either get around to the far-future task of Calypso DSP static | |
81 reversing or find some other implementation of GSM-FR Rx DTX handler that does | |
82 CN interpolation and whose operation we can replicate, we shall stick to the | |
83 simple approach of not doing CN interpolation. | |
84 | |
85 Handling of SID frames with Xmaxc discrepancy | |
86 ============================================= | |
87 | |
88 Per GSM 06.12 section 5.2, all 4 subframe Xmaxc parameters in a SID frame are | |
89 supposed to be equal, encoding the quantized form of mean(Xmax). However, what | |
90 should Rx DTX implementations do when they receive an otherwise-valid SID frame | |
91 in which these 4 parameters are not all equal? In our implementation, we handle | |
92 such discrepancy as follows: | |
93 | |
94 * In those frame positions in which we receive a fresh SID (initial or update), | |
95 the CN frame we emit is a direct transformation of the received SID, and all | |
96 4 Xmaxc parameters are passed through intact. | |
97 | |
98 * When we emit CN frames based on remembered LARc and Xmaxc parameters, we use | |
99 the last-subframe Xmaxc from the most recently received SID frame. | |
100 | |
101 Lost SID handling and CN muting | |
102 =============================== | |
103 | |
104 In accord with GSM 06.11 sections 5.3, when we receive an unusable frame in a | |
105 TAF position during CN insertion state, we set a flag that remembers this | |
106 condition, but don't switch to CN muting right away. Per section 5.4 of the | |
107 same spec, we initiate CN muting when a second lost SID event occurs (unusable | |
108 frame received in a TAF position) without intervening good speech frames or | |
109 accepted SID frames. | |
110 | |
111 When we do enter CN muting state, we decrement CN Xmaxc (always the same for | |
112 all 4 subframes) by 4 on each output frame, following the Example solution of | |
113 3GPP TS 46.011 (formerly GSM 06.11) chapter 6. Once this CN Xmaxc reaches 0, | |
114 we switch to emitting fixed-bit-pattern silence frames of TS 46.011 Table 1. | |
115 | |
116 Handling of invalid SID frames | |
117 ============================== | |
118 | |
119 In agreement with GSM 06.31 spec, we recognize invalid SID and invoke the | |
120 appropriate handler in all 3 combinations: BFI=0 SID=1, BFI=1 SID=1, and | |
121 BFI=1 SID=2. The real complexity, however, lies in what that invalid SID | |
122 handler actually does: | |
123 | |
124 * If invalid SID arrives when we are already in CN insertion state, we treat it | |
125 the same as an unusable frame (continue CN output with current parameters), | |
126 but the flag of lost SID is reset, as required by our interpretation of the | |
127 specs. | |
128 | |
129 * If invalid SID arrives in CN muting state, i.e., after two consecutive lost | |
130 SID events, the muting continues unaffected, i.e., we don't "rejuvenate" | |
131 already-started-muting comfort noise upon receiving invalid SID. | |
132 | |
133 * If invalid SID arrives in good speech state, meaning that we are supposed to | |
134 begin a CN insertion period but we didn't get usable parameters for it, we | |
135 obtain LARc and mean(Xmax) parameters from the last good speech frame, | |
136 following the second option permitted by the "NOTE" at the end of GSM 06.31 | |
137 section 6.1.2. To get Xmaxc for CN, we dequantize all 4 Xmaxc parameters of | |
138 the last good speech frame, average them, then requantize. | |
139 | |
140 * If invalid SID arrives in speech muting state, the invalid SID is ignored and | |
141 speech muting continues unaffected. | |
142 | |
143 * If invalid SID arrives in NO_DATA state (initial state out of reset, or the | |
144 state after either speech or CN muting has fully decayed), we emit the fixed | |
145 silence frame of TS 46.011 Table 1. |