FreeCalypso > hg > gsm-codec-lib
comparison doc/AMR-EFR-philosophy @ 311:83408f67a96c
doc/AMR-EFR-philosophy: new article
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Wed, 17 Apr 2024 20:53:10 +0000 |
parents | doc/AMR-EFR-conversion@8eb0e7a39409 |
children | 9bcf65088006 |
comparison
equal
deleted
inserted
replaced
310:8ad5d5adb848 | 311:83408f67a96c |
---|---|
1 Relation between GSM-EFR and 12k2 mode of AMR | |
2 ============================================= | |
3 | |
4 What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR, | |
5 or MR122 for short? The most obvious difference is in DTX: the format of SID | |
6 frames and even the very paradigm of how DTX works are completely different | |
7 between EFR and AMR. But what about non-DTX operation? If a codec session | |
8 consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR | |
9 and MR122 strictly identical? | |
10 | |
11 The correct answer is that in the absence of SIDs, EFR and MR122 are directly | |
12 interoperable in that the output of an EFR encoder can be fed to the input of | |
13 an AMR decoder, and vice-versa. However, the two codecs are NOT identical at | |
14 the bit-exact level! The differences are subtle, such that finding them | |
15 requires some intense study; this article documents some of these study | |
16 findings: | |
17 | |
18 https://www.freecalypso.org/hg/efr-experiments/file/tip/Theory-and-mystery | |
19 | |
20 What other DSP/transcoder vendors have done | |
21 =========================================== | |
22 | |
23 ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact | |
24 form, and every production implementation was required to match the output of | |
25 the official reference bit for bit. However, once AMR came out, the regulation | |
26 on EFR was loosened. GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0) | |
27 has an appendix-like chapter (chapter 10) whose first paragraph reads: | |
28 | |
29 The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described | |
30 in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate | |
31 speech coder. An alternative implementation of the Enhanced Full Rate | |
32 speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate | |
33 coder is allowed. Alternative implementations shall implement the | |
34 functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the | |
35 exception that the DTX transmission format (GSM 06.81) and the comfort | |
36 noise generation (GSM 06.62) shall be used. | |
37 | |
38 It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps | |
39 both) weren't too happy with the prospect of having to include two different | |
40 versions of _almost_ the same codec algorithm with a bunch of interspersed | |
41 subtle diffs, and so the rules were bent: EFR implementors were given permission | |
42 to deviate from the original bit-exact definition of EFR in order to have more | |
43 commonality with MR122. | |
44 | |
45 Approach adopted for Themyscira GSM codec libraries suite | |
46 ========================================================= | |
47 | |
48 I (Mother Mychaela) previously entertained the idea of creating a unified codec | |
49 library that supports both AMR and EFR with common code, producing a published- | |
50 source, FOSS-culture equivalent of what most proprietary vendors have done. | |
51 However, on further reflection, that idea has been rejected. The current vision | |
52 (as of 2024-04) is that libgsmefr (stable since early 2023) and libtwamr | |
53 (currently a work in progress) shall remain separate and independent libraries, | |
54 the former implementing GSM-EFR (the original bit-exact definition) and the | |
55 latter implementing AMR. My reasons for this decision are: | |
56 | |
57 * Libgsmefr already exists, and it is already a bit of a jewel compared to the | |
58 sorry state of true GSM codec support in the world of FOSS outside Themyscira. | |
59 Giving up on this library and moving to some nebulous new one does not sound | |
60 appealing. | |
61 | |
62 * There does not exist any formal, bit-exact definition for what we informally | |
63 call "EFR version 2": the realization of EFR as implemented by post-AMR-era | |
64 proprietary vendors, some sort of AMR-EFR hybrid. As I see it, it is not my | |
65 place to try to innovate in speech codec design, instead it is my job to | |
66 provide 100% correct, bit-exact implementations of existing solid standards - | |
67 and there is no bit-exact standard to follow for "EFR version 2". | |
68 | |
69 * Libtwamr project: the task of turning the original AMR code from 3GPP into a | |
70 proper library, style-consistent with Themyscira libgsmfr2 and libgsmefr, | |
71 without the ugliness of opencore-amr, is already a lot of work as it is. | |
72 There is no need to make it harder by adding the task of supporting AMR-based | |
73 EFR, especially when the latter lacks formal definition. | |
74 | |
75 Performance issues | |
76 ================== | |
77 | |
78 Right now the only significant downside of libgsmefr compared to | |
79 libopencore-amrnb is that our library is significantly slower: almost 7 times | |
80 slower on non-DTX encode and a little over 3 times slower on SID-free decode. | |
81 However, this performance problem will need to be solved by profiling the code | |
82 to find the slowest spots, comparing the code of individual blocks between ours | |
83 and theirs, and porting over whatever performance-optimizing strategies were | |
84 implemented in OpenCORE code base. The latter code base is a derivative work | |
85 based on 3GPP AMR source, hence the guts of the codec are largely the same | |
86 between 3GPP AMR and libopencore-amrnb; the latter has been significantly | |
87 performance-optimized, but also heavily uglified. But there is no reason why | |
88 the same performance fixes can't be applied to EFR code base - it will simply | |
89 take work. This work is currently part of our future roadmap. |