FreeCalypso > hg > gsm-codec-lib
view doc/AMR-EFR-philosophy @ 408:8847c1740e78
libtwamr: integrate VAD1
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Tue, 07 May 2024 00:56:10 +0000 |
parents | 83408f67a96c |
children | 9bcf65088006 |
line wrap: on
line source
Relation between GSM-EFR and 12k2 mode of AMR ============================================= What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR, or MR122 for short? The most obvious difference is in DTX: the format of SID frames and even the very paradigm of how DTX works are completely different between EFR and AMR. But what about non-DTX operation? If a codec session consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR and MR122 strictly identical? The correct answer is that in the absence of SIDs, EFR and MR122 are directly interoperable in that the output of an EFR encoder can be fed to the input of an AMR decoder, and vice-versa. However, the two codecs are NOT identical at the bit-exact level! The differences are subtle, such that finding them requires some intense study; this article documents some of these study findings: https://www.freecalypso.org/hg/efr-experiments/file/tip/Theory-and-mystery What other DSP/transcoder vendors have done =========================================== ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact form, and every production implementation was required to match the output of the official reference bit for bit. However, once AMR came out, the regulation on EFR was loosened. GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0) has an appendix-like chapter (chapter 10) whose first paragraph reads: The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate speech coder. An alternative implementation of the Enhanced Full Rate speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate coder is allowed. Alternative implementations shall implement the functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the exception that the DTX transmission format (GSM 06.81) and the comfort noise generation (GSM 06.62) shall be used. It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps both) weren't too happy with the prospect of having to include two different versions of _almost_ the same codec algorithm with a bunch of interspersed subtle diffs, and so the rules were bent: EFR implementors were given permission to deviate from the original bit-exact definition of EFR in order to have more commonality with MR122. Approach adopted for Themyscira GSM codec libraries suite ========================================================= I (Mother Mychaela) previously entertained the idea of creating a unified codec library that supports both AMR and EFR with common code, producing a published- source, FOSS-culture equivalent of what most proprietary vendors have done. However, on further reflection, that idea has been rejected. The current vision (as of 2024-04) is that libgsmefr (stable since early 2023) and libtwamr (currently a work in progress) shall remain separate and independent libraries, the former implementing GSM-EFR (the original bit-exact definition) and the latter implementing AMR. My reasons for this decision are: * Libgsmefr already exists, and it is already a bit of a jewel compared to the sorry state of true GSM codec support in the world of FOSS outside Themyscira. Giving up on this library and moving to some nebulous new one does not sound appealing. * There does not exist any formal, bit-exact definition for what we informally call "EFR version 2": the realization of EFR as implemented by post-AMR-era proprietary vendors, some sort of AMR-EFR hybrid. As I see it, it is not my place to try to innovate in speech codec design, instead it is my job to provide 100% correct, bit-exact implementations of existing solid standards - and there is no bit-exact standard to follow for "EFR version 2". * Libtwamr project: the task of turning the original AMR code from 3GPP into a proper library, style-consistent with Themyscira libgsmfr2 and libgsmefr, without the ugliness of opencore-amr, is already a lot of work as it is. There is no need to make it harder by adding the task of supporting AMR-based EFR, especially when the latter lacks formal definition. Performance issues ================== Right now the only significant downside of libgsmefr compared to libopencore-amrnb is that our library is significantly slower: almost 7 times slower on non-DTX encode and a little over 3 times slower on SID-free decode. However, this performance problem will need to be solved by profiling the code to find the slowest spots, comparing the code of individual blocks between ours and theirs, and porting over whatever performance-optimizing strategies were implemented in OpenCORE code base. The latter code base is a derivative work based on 3GPP AMR source, hence the guts of the codec are largely the same between 3GPP AMR and libopencore-amrnb; the latter has been significantly performance-optimized, but also heavily uglified. But there is no reason why the same performance fixes can't be applied to EFR code base - it will simply take work. This work is currently part of our future roadmap.