FreeCalypso > hg > efr-experiments
comparison Theory-and-mystery @ 7:1fd613cec7ab
Theory-and-mystery: document written
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Wed, 17 Apr 2024 17:14:41 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
6:6119d2c1e7d9 | 7:1fd613cec7ab |
---|---|
1 Relation between GSM-EFR and 12k2 mode of AMR | |
2 ============================================= | |
3 | |
4 What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR, | |
5 or MR122 for short? The most obvious difference is in DTX: the format of SID | |
6 frames and even the very paradigm of how DTX works are completely different | |
7 between EFR and AMR. But what about non-DTX operation? If a codec session | |
8 consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR | |
9 and MR122 strictly identical? | |
10 | |
11 The correct answer is that in the absence of SIDs, EFR and MR122 are directly | |
12 interoperable in that the output of an EFR encoder can be fed to the input of | |
13 an AMR decoder, and vice-versa. However, the two codecs are NOT identical at | |
14 the bit-exact level! The differences are subtle, such that finding them | |
15 requires some intense study; here I cover those diffs which I was able to find. | |
16 | |
17 DHF difference and the reason why it occurs | |
18 =========================================== | |
19 | |
20 In their official form (non-telco-grade corner-cutting libraries don't count, | |
21 no matter how popular among FOSS), both EFR and AMR include codec homing as a | |
22 mandatory feature, and the mechanism works on the same principle across all | |
23 ETSI/3GPP codecs. The encoder homing frame (EHF) is the same for all codecs: | |
24 all 160 samples equal to 0x0008, but each codec has its own decoder homing frame | |
25 (DHF). Each codec's respective DHF is the natural output of its encoder when | |
26 the input is EHF and the initial state is the reset state - as simple as that. | |
27 Note the natural aspect: every spec-defined DHF came about naturally in that | |
28 codec, hence the exact set of codec parameters that constitutes a DHF is not a | |
29 detail which some standard-setting committee could define arbitrarily. | |
30 | |
31 AMR has 8 different DHFs for its 8 different modes, and the DHF for MR122 is | |
32 *not* the same as EFR DHF! Given that this DHF is nothing but the encoder's | |
33 natural response to encoding an EHF input, this difference in DHF between EFR | |
34 and MR122 indicates the existence of some difference between the two encoders. | |
35 A simple experiment, contained in this source tree, reveals what the key | |
36 difference is: see src/cod_12k2.c, #ifdef EFR2_VARIANT. When this source is | |
37 compiled with -DEFR2_VARIANT in efr2 directory, the resulting encoder produces | |
38 DHF (natural response to EHF received in the reset state) that is identical to | |
39 the one defined for MR122, proving that this specific change is the reason for | |
40 the diff in DHF parameters between EFR and MR122. | |
41 | |
42 The encoder diff that happens here (change from EFR to MR122) is an artificial | |
43 delay of 5 ms. In EFR, on each invocation of the encoder, a frame of new 160 | |
44 speech samples is fed in, and that same frame is subject to encoding. In AMR, | |
45 the input is still 160 samples each time, but the frame being encoded consists | |
46 of 40 samples from the tail of the previous input and 120 samples from the new | |
47 input. The newest 40 samples are used for auto-correlation computation in the | |
48 lower modes of AMR (see 3GPP TS 26.090 section 5.2), but in MR122 they do | |
49 absolutely nothing until the next invocation of the encoder, effecting an | |
50 artificial delay of 5 ms. In true multirate operation this delay is needed to | |
51 support seamless mode switching, but in an MR122-only environment it is just | |
52 waste. | |
53 | |
54 Other encoder differences | |
55 ========================= | |
56 | |
57 The 5 ms delay covered above is not the only diff between non-DTX EFR and MR122 | |
58 encoders. We know that other diffs must exist because the output of the test | |
59 encoder built in efr2 directory of this repository does not match that of the | |
60 official AMR encoder beyond the initial homing frames; however, those additional | |
61 differences have not been studied yet. | |
62 | |
63 Decoder diffs between EFR and MR122 | |
64 =================================== | |
65 | |
66 The two decoders are also different at the bit-exact level: if you take a "pure" | |
67 stream of 12k2 speech frames (no DHF, no SIDs and no BFI frame gaps or defects) | |
68 and feed it to EFR and AMR decoders, both starting from external reset state, | |
69 the resulting outputs will be different. | |
70 | |
71 Two specific differences in the decoder have been identified: | |
72 | |
73 * The AGC module is different: see agc.c vs agc_amr.c in src directory. The | |
74 diffs inside AGC have not been studied yet. | |
75 | |
76 * The post-processing step described in 3GPP TS 26.090 section 6.2.2 (high-pass | |
77 filtering) is new with AMR. | |
78 | |
79 The code version built in efr2 directory has these two changes applied; it | |
80 passes on all available test sequences (amr122_efr.zip described below), but | |
81 there may be other diffs that aren't caught by this test sequence set and which | |
82 we therefore have not identified yet. | |
83 | |
84 ETSI/3GPP laxness toward EFR implementors | |
85 ========================================= | |
86 | |
87 ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact | |
88 form, and every production implementation was required to match the output of | |
89 the official reference bit for bit. However, once AMR came out, the regulation | |
90 on EFR was loosened. GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0) | |
91 has an appendix-like chapter (chapter 10) whose first paragraph reads: | |
92 | |
93 The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described | |
94 in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate | |
95 speech coder. An alternative implementation of the Enhanced Full Rate | |
96 speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate | |
97 coder is allowed. Alternative implementations shall implement the | |
98 functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the | |
99 exception that the DTX transmission format (GSM 06.81) and the comfort | |
100 noise generation (GSM 06.62) shall be used. | |
101 | |
102 It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps | |
103 both) weren't too happy with the prospect of having to include two different | |
104 versions of _almost_ the same codec algorithm with a bunch of interspersed | |
105 subtle diffs, and so the rules were bent: EFR implementors were given permission | |
106 to deviate from the original bit-exact definition of EFR in order to have more | |
107 commonality with MR122. | |
108 | |
109 But the devil is in the details. If I am seeking to implement this "EFR | |
110 alternative 2", where is the new bit-exact reference to be followed for this | |
111 option? No such reference C code for this AMR-EFR hybrid appears to have been | |
112 published anywhere, but this code must have existed once in unpublished form, | |
113 as we do have surviving published _output_ from that mystery code. | |
114 | |
115 The digital companion to just-quoted GSM 06.54 is a ZIP archive named | |
116 ts_100725v050200p0.zip; inside this ZIP archive there are 9 inner ZIPs: 8 ZIPs | |
117 for the 8 original EFR test sequence disks, plus a later addendum named | |
118 amr122_efr.zip. The latter ZIP contains *.cod and *.dec test sequence files in | |
119 EFR format (*not* AMR), as well as *.out files from the intended decoding of | |
120 *.dec. The transformation from *.cod to *.dec in this set is unchanged EFR | |
121 ed_iface, but the encoder run that produced *.cod and the decoder run that | |
122 produced *.out were quite special: | |
123 | |
124 * t??_efr.cod contain the same codec parameters as the AMR counterpart in 06.74 | |
125 test sequence set except for the first two frames in each sequence, which are | |
126 proper EFR DHFs. It appears that they ran an essentially-unmodified AMR | |
127 encoder in MR122 wtth DTX disabled, then artificially patched the DHF after | |
128 MR122 encoder output, then packaged the output in EFR *.cod format - but it | |
129 must have been more complicated, as this simplistic approach would not support | |
130 DTX. | |
131 | |
132 * dtx?_efr.cod and dtx?_efr2.cod are more intriguing: they are said to | |
133 correspond to VAD1 and VAD2 in the AMR reference source, yet these sequences | |
134 have EFR SID frames in their silence parts, not AMR DTX. Thus someone must | |
135 have constructed an encoder that combines most of AMR code (including AMR VAD | |
136 and the AMR version of 12k2 speech encoding) with EFR Tx DTX logic and EFR SID | |
137 generation - quite a feat! | |
138 | |
139 * In the decoder direction, the hack presented in efr2 directory of this code | |
140 repository is sufficient to produce a matching *.out for every *.dec in the | |
141 amr122_efr.zip mystery collection, including dtx?_efr.dec and dtx?_efr2.dec. | |
142 However, we made our hack by starting with EFR reference source and making | |
143 small surgical changes to it; I wonder if whoever did the original feat at | |
144 ETSI/3GPP started with AMR source instead and outfitted it with ability to | |
145 understand EFR SID frames and do comfort noise generation per GSM 06.62 - | |
146 that approach would be a big feat, just like with the encoder. | |
147 | |
148 The present author considers it a shame that whatever AMR-EFR hybrid programs | |
149 were used to generate the sequences in amr122_efr.zip were never published. In | |
150 the absence of such published code, the details of exactly what was done by | |
151 those commercial DSP/transcoder vendors who combined AMR with EFR will remain | |
152 elusive. |