diff doc/AMR-EFR-philosophy @ 311:83408f67a96c

doc/AMR-EFR-philosophy: new article
author Mychaela Falconia <falcon@freecalypso.org>
date Wed, 17 Apr 2024 20:53:10 +0000
parents doc/AMR-EFR-conversion@8eb0e7a39409
children 9bcf65088006
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/AMR-EFR-philosophy	Wed Apr 17 20:53:10 2024 +0000
@@ -0,0 +1,89 @@
+Relation between GSM-EFR and 12k2 mode of AMR
+=============================================
+
+What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR,
+or MR122 for short?  The most obvious difference is in DTX: the format of SID
+frames and even the very paradigm of how DTX works are completely different
+between EFR and AMR.  But what about non-DTX operation?  If a codec session
+consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR
+and MR122 strictly identical?
+
+The correct answer is that in the absence of SIDs, EFR and MR122 are directly
+interoperable in that the output of an EFR encoder can be fed to the input of
+an AMR decoder, and vice-versa.  However, the two codecs are NOT identical at
+the bit-exact level!  The differences are subtle, such that finding them
+requires some intense study; this article documents some of these study
+findings:
+
+https://www.freecalypso.org/hg/efr-experiments/file/tip/Theory-and-mystery
+
+What other DSP/transcoder vendors have done
+===========================================
+
+ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact
+form, and every production implementation was required to match the output of
+the official reference bit for bit.  However, once AMR came out, the regulation
+on EFR was loosened.  GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0)
+has an appendix-like chapter (chapter 10) whose first paragraph reads:
+
+	The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described
+	in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate
+	speech coder.  An alternative implementation of the Enhanced Full Rate
+	speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate
+	coder is allowed.  Alternative implementations shall implement the
+	functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the
+	exception that the DTX transmission format (GSM 06.81) and the comfort
+	noise generation (GSM 06.62) shall be used.
+
+It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps
+both) weren't too happy with the prospect of having to include two different
+versions of _almost_ the same codec algorithm with a bunch of interspersed
+subtle diffs, and so the rules were bent: EFR implementors were given permission
+to deviate from the original bit-exact definition of EFR in order to have more
+commonality with MR122.
+
+Approach adopted for Themyscira GSM codec libraries suite
+=========================================================
+
+I (Mother Mychaela) previously entertained the idea of creating a unified codec
+library that supports both AMR and EFR with common code, producing a published-
+source, FOSS-culture equivalent of what most proprietary vendors have done.
+However, on further reflection, that idea has been rejected.  The current vision
+(as of 2024-04) is that libgsmefr (stable since early 2023) and libtwamr
+(currently a work in progress) shall remain separate and independent libraries,
+the former implementing GSM-EFR (the original bit-exact definition) and the
+latter implementing AMR.  My reasons for this decision are:
+
+* Libgsmefr already exists, and it is already a bit of a jewel compared to the
+  sorry state of true GSM codec support in the world of FOSS outside Themyscira.
+  Giving up on this library and moving to some nebulous new one does not sound
+  appealing.
+
+* There does not exist any formal, bit-exact definition for what we informally
+  call "EFR version 2": the realization of EFR as implemented by post-AMR-era
+  proprietary vendors, some sort of AMR-EFR hybrid.  As I see it, it is not my
+  place to try to innovate in speech codec design, instead it is my job to
+  provide 100% correct, bit-exact implementations of existing solid standards -
+  and there is no bit-exact standard to follow for "EFR version 2".
+
+* Libtwamr project: the task of turning the original AMR code from 3GPP into a
+  proper library, style-consistent with Themyscira libgsmfr2 and libgsmefr,
+  without the ugliness of opencore-amr, is already a lot of work as it is.
+  There is no need to make it harder by adding the task of supporting AMR-based
+  EFR, especially when the latter lacks formal definition.
+
+Performance issues
+==================
+
+Right now the only significant downside of libgsmefr compared to
+libopencore-amrnb is that our library is significantly slower: almost 7 times
+slower on non-DTX encode and a little over 3 times slower on SID-free decode.
+However, this performance problem will need to be solved by profiling the code
+to find the slowest spots, comparing the code of individual blocks between ours
+and theirs, and porting over whatever performance-optimizing strategies were
+implemented in OpenCORE code base.  The latter code base is a derivative work
+based on 3GPP AMR source, hence the guts of the codec are largely the same
+between 3GPP AMR and libopencore-amrnb; the latter has been significantly
+performance-optimized, but also heavily uglified.  But there is no reason why
+the same performance fixes can't be applied to EFR code base - it will simply
+take work.  This work is currently part of our future roadmap.