comparison Theory-and-mystery @ 7:1fd613cec7ab

Theory-and-mystery: document written
author Mychaela Falconia <falcon@freecalypso.org>
date Wed, 17 Apr 2024 17:14:41 +0000
parents
children
comparison
equal deleted inserted replaced
6:6119d2c1e7d9 7:1fd613cec7ab
1 Relation between GSM-EFR and 12k2 mode of AMR
2 =============================================
3
4 What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR,
5 or MR122 for short? The most obvious difference is in DTX: the format of SID
6 frames and even the very paradigm of how DTX works are completely different
7 between EFR and AMR. But what about non-DTX operation? If a codec session
8 consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR
9 and MR122 strictly identical?
10
11 The correct answer is that in the absence of SIDs, EFR and MR122 are directly
12 interoperable in that the output of an EFR encoder can be fed to the input of
13 an AMR decoder, and vice-versa. However, the two codecs are NOT identical at
14 the bit-exact level! The differences are subtle, such that finding them
15 requires some intense study; here I cover those diffs which I was able to find.
16
17 DHF difference and the reason why it occurs
18 ===========================================
19
20 In their official form (non-telco-grade corner-cutting libraries don't count,
21 no matter how popular among FOSS), both EFR and AMR include codec homing as a
22 mandatory feature, and the mechanism works on the same principle across all
23 ETSI/3GPP codecs. The encoder homing frame (EHF) is the same for all codecs:
24 all 160 samples equal to 0x0008, but each codec has its own decoder homing frame
25 (DHF). Each codec's respective DHF is the natural output of its encoder when
26 the input is EHF and the initial state is the reset state - as simple as that.
27 Note the natural aspect: every spec-defined DHF came about naturally in that
28 codec, hence the exact set of codec parameters that constitutes a DHF is not a
29 detail which some standard-setting committee could define arbitrarily.
30
31 AMR has 8 different DHFs for its 8 different modes, and the DHF for MR122 is
32 *not* the same as EFR DHF! Given that this DHF is nothing but the encoder's
33 natural response to encoding an EHF input, this difference in DHF between EFR
34 and MR122 indicates the existence of some difference between the two encoders.
35 A simple experiment, contained in this source tree, reveals what the key
36 difference is: see src/cod_12k2.c, #ifdef EFR2_VARIANT. When this source is
37 compiled with -DEFR2_VARIANT in efr2 directory, the resulting encoder produces
38 DHF (natural response to EHF received in the reset state) that is identical to
39 the one defined for MR122, proving that this specific change is the reason for
40 the diff in DHF parameters between EFR and MR122.
41
42 The encoder diff that happens here (change from EFR to MR122) is an artificial
43 delay of 5 ms. In EFR, on each invocation of the encoder, a frame of new 160
44 speech samples is fed in, and that same frame is subject to encoding. In AMR,
45 the input is still 160 samples each time, but the frame being encoded consists
46 of 40 samples from the tail of the previous input and 120 samples from the new
47 input. The newest 40 samples are used for auto-correlation computation in the
48 lower modes of AMR (see 3GPP TS 26.090 section 5.2), but in MR122 they do
49 absolutely nothing until the next invocation of the encoder, effecting an
50 artificial delay of 5 ms. In true multirate operation this delay is needed to
51 support seamless mode switching, but in an MR122-only environment it is just
52 waste.
53
54 Other encoder differences
55 =========================
56
57 The 5 ms delay covered above is not the only diff between non-DTX EFR and MR122
58 encoders. We know that other diffs must exist because the output of the test
59 encoder built in efr2 directory of this repository does not match that of the
60 official AMR encoder beyond the initial homing frames; however, those additional
61 differences have not been studied yet.
62
63 Decoder diffs between EFR and MR122
64 ===================================
65
66 The two decoders are also different at the bit-exact level: if you take a "pure"
67 stream of 12k2 speech frames (no DHF, no SIDs and no BFI frame gaps or defects)
68 and feed it to EFR and AMR decoders, both starting from external reset state,
69 the resulting outputs will be different.
70
71 Two specific differences in the decoder have been identified:
72
73 * The AGC module is different: see agc.c vs agc_amr.c in src directory. The
74 diffs inside AGC have not been studied yet.
75
76 * The post-processing step described in 3GPP TS 26.090 section 6.2.2 (high-pass
77 filtering) is new with AMR.
78
79 The code version built in efr2 directory has these two changes applied; it
80 passes on all available test sequences (amr122_efr.zip described below), but
81 there may be other diffs that aren't caught by this test sequence set and which
82 we therefore have not identified yet.
83
84 ETSI/3GPP laxness toward EFR implementors
85 =========================================
86
87 ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact
88 form, and every production implementation was required to match the output of
89 the official reference bit for bit. However, once AMR came out, the regulation
90 on EFR was loosened. GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0)
91 has an appendix-like chapter (chapter 10) whose first paragraph reads:
92
93 The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described
94 in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate
95 speech coder. An alternative implementation of the Enhanced Full Rate
96 speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate
97 coder is allowed. Alternative implementations shall implement the
98 functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the
99 exception that the DTX transmission format (GSM 06.81) and the comfort
100 noise generation (GSM 06.62) shall be used.
101
102 It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps
103 both) weren't too happy with the prospect of having to include two different
104 versions of _almost_ the same codec algorithm with a bunch of interspersed
105 subtle diffs, and so the rules were bent: EFR implementors were given permission
106 to deviate from the original bit-exact definition of EFR in order to have more
107 commonality with MR122.
108
109 But the devil is in the details. If I am seeking to implement this "EFR
110 alternative 2", where is the new bit-exact reference to be followed for this
111 option? No such reference C code for this AMR-EFR hybrid appears to have been
112 published anywhere, but this code must have existed once in unpublished form,
113 as we do have surviving published _output_ from that mystery code.
114
115 The digital companion to just-quoted GSM 06.54 is a ZIP archive named
116 ts_100725v050200p0.zip; inside this ZIP archive there are 9 inner ZIPs: 8 ZIPs
117 for the 8 original EFR test sequence disks, plus a later addendum named
118 amr122_efr.zip. The latter ZIP contains *.cod and *.dec test sequence files in
119 EFR format (*not* AMR), as well as *.out files from the intended decoding of
120 *.dec. The transformation from *.cod to *.dec in this set is unchanged EFR
121 ed_iface, but the encoder run that produced *.cod and the decoder run that
122 produced *.out were quite special:
123
124 * t??_efr.cod contain the same codec parameters as the AMR counterpart in 06.74
125 test sequence set except for the first two frames in each sequence, which are
126 proper EFR DHFs. It appears that they ran an essentially-unmodified AMR
127 encoder in MR122 wtth DTX disabled, then artificially patched the DHF after
128 MR122 encoder output, then packaged the output in EFR *.cod format - but it
129 must have been more complicated, as this simplistic approach would not support
130 DTX.
131
132 * dtx?_efr.cod and dtx?_efr2.cod are more intriguing: they are said to
133 correspond to VAD1 and VAD2 in the AMR reference source, yet these sequences
134 have EFR SID frames in their silence parts, not AMR DTX. Thus someone must
135 have constructed an encoder that combines most of AMR code (including AMR VAD
136 and the AMR version of 12k2 speech encoding) with EFR Tx DTX logic and EFR SID
137 generation - quite a feat!
138
139 * In the decoder direction, the hack presented in efr2 directory of this code
140 repository is sufficient to produce a matching *.out for every *.dec in the
141 amr122_efr.zip mystery collection, including dtx?_efr.dec and dtx?_efr2.dec.
142 However, we made our hack by starting with EFR reference source and making
143 small surgical changes to it; I wonder if whoever did the original feat at
144 ETSI/3GPP started with AMR source instead and outfitted it with ability to
145 understand EFR SID frames and do comfort noise generation per GSM 06.62 -
146 that approach would be a big feat, just like with the encoder.
147
148 The present author considers it a shame that whatever AMR-EFR hybrid programs
149 were used to generate the sequences in amr122_efr.zip were never published. In
150 the absence of such published code, the details of exactly what was done by
151 those commercial DSP/transcoder vendors who combined AMR with EFR will remain
152 elusive.