comparison doc/AMR-EFR-philosophy @ 311:83408f67a96c

doc/AMR-EFR-philosophy: new article
author Mychaela Falconia <falcon@freecalypso.org>
date Wed, 17 Apr 2024 20:53:10 +0000
parents doc/AMR-EFR-conversion@8eb0e7a39409
children 9bcf65088006
comparison
equal deleted inserted replaced
310:8ad5d5adb848 311:83408f67a96c
1 Relation between GSM-EFR and 12k2 mode of AMR
2 =============================================
3
4 What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR,
5 or MR122 for short? The most obvious difference is in DTX: the format of SID
6 frames and even the very paradigm of how DTX works are completely different
7 between EFR and AMR. But what about non-DTX operation? If a codec session
8 consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR
9 and MR122 strictly identical?
10
11 The correct answer is that in the absence of SIDs, EFR and MR122 are directly
12 interoperable in that the output of an EFR encoder can be fed to the input of
13 an AMR decoder, and vice-versa. However, the two codecs are NOT identical at
14 the bit-exact level! The differences are subtle, such that finding them
15 requires some intense study; this article documents some of these study
16 findings:
17
18 https://www.freecalypso.org/hg/efr-experiments/file/tip/Theory-and-mystery
19
20 What other DSP/transcoder vendors have done
21 ===========================================
22
23 ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact
24 form, and every production implementation was required to match the output of
25 the official reference bit for bit. However, once AMR came out, the regulation
26 on EFR was loosened. GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0)
27 has an appendix-like chapter (chapter 10) whose first paragraph reads:
28
29 The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described
30 in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate
31 speech coder. An alternative implementation of the Enhanced Full Rate
32 speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate
33 coder is allowed. Alternative implementations shall implement the
34 functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the
35 exception that the DTX transmission format (GSM 06.81) and the comfort
36 noise generation (GSM 06.62) shall be used.
37
38 It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps
39 both) weren't too happy with the prospect of having to include two different
40 versions of _almost_ the same codec algorithm with a bunch of interspersed
41 subtle diffs, and so the rules were bent: EFR implementors were given permission
42 to deviate from the original bit-exact definition of EFR in order to have more
43 commonality with MR122.
44
45 Approach adopted for Themyscira GSM codec libraries suite
46 =========================================================
47
48 I (Mother Mychaela) previously entertained the idea of creating a unified codec
49 library that supports both AMR and EFR with common code, producing a published-
50 source, FOSS-culture equivalent of what most proprietary vendors have done.
51 However, on further reflection, that idea has been rejected. The current vision
52 (as of 2024-04) is that libgsmefr (stable since early 2023) and libtwamr
53 (currently a work in progress) shall remain separate and independent libraries,
54 the former implementing GSM-EFR (the original bit-exact definition) and the
55 latter implementing AMR. My reasons for this decision are:
56
57 * Libgsmefr already exists, and it is already a bit of a jewel compared to the
58 sorry state of true GSM codec support in the world of FOSS outside Themyscira.
59 Giving up on this library and moving to some nebulous new one does not sound
60 appealing.
61
62 * There does not exist any formal, bit-exact definition for what we informally
63 call "EFR version 2": the realization of EFR as implemented by post-AMR-era
64 proprietary vendors, some sort of AMR-EFR hybrid. As I see it, it is not my
65 place to try to innovate in speech codec design, instead it is my job to
66 provide 100% correct, bit-exact implementations of existing solid standards -
67 and there is no bit-exact standard to follow for "EFR version 2".
68
69 * Libtwamr project: the task of turning the original AMR code from 3GPP into a
70 proper library, style-consistent with Themyscira libgsmfr2 and libgsmefr,
71 without the ugliness of opencore-amr, is already a lot of work as it is.
72 There is no need to make it harder by adding the task of supporting AMR-based
73 EFR, especially when the latter lacks formal definition.
74
75 Performance issues
76 ==================
77
78 Right now the only significant downside of libgsmefr compared to
79 libopencore-amrnb is that our library is significantly slower: almost 7 times
80 slower on non-DTX encode and a little over 3 times slower on SID-free decode.
81 However, this performance problem will need to be solved by profiling the code
82 to find the slowest spots, comparing the code of individual blocks between ours
83 and theirs, and porting over whatever performance-optimizing strategies were
84 implemented in OpenCORE code base. The latter code base is a derivative work
85 based on 3GPP AMR source, hence the guts of the codec are largely the same
86 between 3GPP AMR and libopencore-amrnb; the latter has been significantly
87 performance-optimized, but also heavily uglified. But there is no reason why
88 the same performance fixes can't be applied to EFR code base - it will simply
89 take work. This work is currently part of our future roadmap.