view doc/AMR-EFR-philosophy @ 408:8847c1740e78

libtwamr: integrate VAD1
author Mychaela Falconia <falcon@freecalypso.org>
date Tue, 07 May 2024 00:56:10 +0000
parents 83408f67a96c
children 9bcf65088006
line wrap: on
line source

Relation between GSM-EFR and 12k2 mode of AMR
=============================================

What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR,
or MR122 for short?  The most obvious difference is in DTX: the format of SID
frames and even the very paradigm of how DTX works are completely different
between EFR and AMR.  But what about non-DTX operation?  If a codec session
consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR
and MR122 strictly identical?

The correct answer is that in the absence of SIDs, EFR and MR122 are directly
interoperable in that the output of an EFR encoder can be fed to the input of
an AMR decoder, and vice-versa.  However, the two codecs are NOT identical at
the bit-exact level!  The differences are subtle, such that finding them
requires some intense study; this article documents some of these study
findings:

https://www.freecalypso.org/hg/efr-experiments/file/tip/Theory-and-mystery

What other DSP/transcoder vendors have done
===========================================

ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact
form, and every production implementation was required to match the output of
the official reference bit for bit.  However, once AMR came out, the regulation
on EFR was loosened.  GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0)
has an appendix-like chapter (chapter 10) whose first paragraph reads:

	The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described
	in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate
	speech coder.  An alternative implementation of the Enhanced Full Rate
	speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate
	coder is allowed.  Alternative implementations shall implement the
	functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the
	exception that the DTX transmission format (GSM 06.81) and the comfort
	noise generation (GSM 06.62) shall be used.

It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps
both) weren't too happy with the prospect of having to include two different
versions of _almost_ the same codec algorithm with a bunch of interspersed
subtle diffs, and so the rules were bent: EFR implementors were given permission
to deviate from the original bit-exact definition of EFR in order to have more
commonality with MR122.

Approach adopted for Themyscira GSM codec libraries suite
=========================================================

I (Mother Mychaela) previously entertained the idea of creating a unified codec
library that supports both AMR and EFR with common code, producing a published-
source, FOSS-culture equivalent of what most proprietary vendors have done.
However, on further reflection, that idea has been rejected.  The current vision
(as of 2024-04) is that libgsmefr (stable since early 2023) and libtwamr
(currently a work in progress) shall remain separate and independent libraries,
the former implementing GSM-EFR (the original bit-exact definition) and the
latter implementing AMR.  My reasons for this decision are:

* Libgsmefr already exists, and it is already a bit of a jewel compared to the
  sorry state of true GSM codec support in the world of FOSS outside Themyscira.
  Giving up on this library and moving to some nebulous new one does not sound
  appealing.

* There does not exist any formal, bit-exact definition for what we informally
  call "EFR version 2": the realization of EFR as implemented by post-AMR-era
  proprietary vendors, some sort of AMR-EFR hybrid.  As I see it, it is not my
  place to try to innovate in speech codec design, instead it is my job to
  provide 100% correct, bit-exact implementations of existing solid standards -
  and there is no bit-exact standard to follow for "EFR version 2".

* Libtwamr project: the task of turning the original AMR code from 3GPP into a
  proper library, style-consistent with Themyscira libgsmfr2 and libgsmefr,
  without the ugliness of opencore-amr, is already a lot of work as it is.
  There is no need to make it harder by adding the task of supporting AMR-based
  EFR, especially when the latter lacks formal definition.

Performance issues
==================

Right now the only significant downside of libgsmefr compared to
libopencore-amrnb is that our library is significantly slower: almost 7 times
slower on non-DTX encode and a little over 3 times slower on SID-free decode.
However, this performance problem will need to be solved by profiling the code
to find the slowest spots, comparing the code of individual blocks between ours
and theirs, and porting over whatever performance-optimizing strategies were
implemented in OpenCORE code base.  The latter code base is a derivative work
based on 3GPP AMR source, hence the guts of the codec are largely the same
between 3GPP AMR and libopencore-amrnb; the latter has been significantly
performance-optimized, but also heavily uglified.  But there is no reason why
the same performance fixes can't be applied to EFR code base - it will simply
take work.  This work is currently part of our future roadmap.