FreeCalypso > hg > gsm-codec-lib
diff doc/AMR-EFR-philosophy @ 311:83408f67a96c
doc/AMR-EFR-philosophy: new article
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Wed, 17 Apr 2024 20:53:10 +0000 |
parents | doc/AMR-EFR-conversion@8eb0e7a39409 |
children | 9bcf65088006 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/AMR-EFR-philosophy Wed Apr 17 20:53:10 2024 +0000 @@ -0,0 +1,89 @@ +Relation between GSM-EFR and 12k2 mode of AMR +============================================= + +What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR, +or MR122 for short? The most obvious difference is in DTX: the format of SID +frames and even the very paradigm of how DTX works are completely different +between EFR and AMR. But what about non-DTX operation? If a codec session +consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR +and MR122 strictly identical? + +The correct answer is that in the absence of SIDs, EFR and MR122 are directly +interoperable in that the output of an EFR encoder can be fed to the input of +an AMR decoder, and vice-versa. However, the two codecs are NOT identical at +the bit-exact level! The differences are subtle, such that finding them +requires some intense study; this article documents some of these study +findings: + +https://www.freecalypso.org/hg/efr-experiments/file/tip/Theory-and-mystery + +What other DSP/transcoder vendors have done +=========================================== + +ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact +form, and every production implementation was required to match the output of +the official reference bit for bit. However, once AMR came out, the regulation +on EFR was loosened. GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0) +has an appendix-like chapter (chapter 10) whose first paragraph reads: + + The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described + in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate + speech coder. An alternative implementation of the Enhanced Full Rate + speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate + coder is allowed. Alternative implementations shall implement the + functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the + exception that the DTX transmission format (GSM 06.81) and the comfort + noise generation (GSM 06.62) shall be used. + +It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps +both) weren't too happy with the prospect of having to include two different +versions of _almost_ the same codec algorithm with a bunch of interspersed +subtle diffs, and so the rules were bent: EFR implementors were given permission +to deviate from the original bit-exact definition of EFR in order to have more +commonality with MR122. + +Approach adopted for Themyscira GSM codec libraries suite +========================================================= + +I (Mother Mychaela) previously entertained the idea of creating a unified codec +library that supports both AMR and EFR with common code, producing a published- +source, FOSS-culture equivalent of what most proprietary vendors have done. +However, on further reflection, that idea has been rejected. The current vision +(as of 2024-04) is that libgsmefr (stable since early 2023) and libtwamr +(currently a work in progress) shall remain separate and independent libraries, +the former implementing GSM-EFR (the original bit-exact definition) and the +latter implementing AMR. My reasons for this decision are: + +* Libgsmefr already exists, and it is already a bit of a jewel compared to the + sorry state of true GSM codec support in the world of FOSS outside Themyscira. + Giving up on this library and moving to some nebulous new one does not sound + appealing. + +* There does not exist any formal, bit-exact definition for what we informally + call "EFR version 2": the realization of EFR as implemented by post-AMR-era + proprietary vendors, some sort of AMR-EFR hybrid. As I see it, it is not my + place to try to innovate in speech codec design, instead it is my job to + provide 100% correct, bit-exact implementations of existing solid standards - + and there is no bit-exact standard to follow for "EFR version 2". + +* Libtwamr project: the task of turning the original AMR code from 3GPP into a + proper library, style-consistent with Themyscira libgsmfr2 and libgsmefr, + without the ugliness of opencore-amr, is already a lot of work as it is. + There is no need to make it harder by adding the task of supporting AMR-based + EFR, especially when the latter lacks formal definition. + +Performance issues +================== + +Right now the only significant downside of libgsmefr compared to +libopencore-amrnb is that our library is significantly slower: almost 7 times +slower on non-DTX encode and a little over 3 times slower on SID-free decode. +However, this performance problem will need to be solved by profiling the code +to find the slowest spots, comparing the code of individual blocks between ours +and theirs, and porting over whatever performance-optimizing strategies were +implemented in OpenCORE code base. The latter code base is a derivative work +based on 3GPP AMR source, hence the guts of the codec are largely the same +between 3GPP AMR and libopencore-amrnb; the latter has been significantly +performance-optimized, but also heavily uglified. But there is no reason why +the same performance fixes can't be applied to EFR code base - it will simply +take work. This work is currently part of our future roadmap.