view doc/AMR-EFR-philosophy @ 477:4c9222d95647

libtwamr encoder: always emit frame->mode = mode; In the original implementation of amr_encode_frame(), the 'mode' member of the output struct was set to 0xFF if the output frame type is TX_NO_DATA. This design was made to mimic the mode field (16-bit word) being set to 0xFFFF (or -1) in 3GPP test sequence format - but nothing actually depends on this struct member being set in any way, and amr_frame_to_tseq() generates the needed 0xFFFF on its own, based on frame->type being equal to TX_NO_DATA. It is simpler and more efficient to always set frame->mode to the actual encoding mode in amr_encode_frame(), and this new behavior has already been documented in doc/AMR-library-API description in anticipation of the present change.
author Mychaela Falconia <falcon@freecalypso.org>
date Sat, 18 May 2024 22:30:42 +0000
parents ad032051166a
children
line wrap: on
line source

Relation between GSM-EFR and 12k2 mode of AMR
=============================================

What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR,
or MR122 for short?  The most obvious difference is in DTX: the format of SID
frames and even the very paradigm of how DTX works are completely different
between EFR and AMR.  But what about non-DTX operation?  If a codec session
consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR
and MR122 strictly identical?

The correct answer is that in the absence of SIDs, EFR and MR122 are directly
interoperable in that the output of an EFR encoder can be fed to the input of
an AMR decoder, and vice-versa.  However, the two codecs are NOT identical at
the bit-exact level!  The differences are subtle, such that finding them
requires some intense study; this article documents some of these study
findings:

https://www.freecalypso.org/hg/efr-experiments/file/tip/Theory-and-mystery

What other DSP/transcoder vendors have done
===========================================

ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact
form, and every production implementation was required to match the output of
the official reference bit for bit.  However, once AMR came out, the regulation
on EFR was loosened.  GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0)
has an appendix-like chapter (chapter 10) whose first paragraph reads:

	The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described
	in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate
	speech coder.  An alternative implementation of the Enhanced Full Rate
	speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate
	coder is allowed.  Alternative implementations shall implement the
	functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the
	exception that the DTX transmission format (GSM 06.81) and the comfort
	noise generation (GSM 06.62) shall be used.

It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps
both) weren't too happy with the prospect of having to include two different
versions of _almost_ the same codec algorithm with a bunch of interspersed
subtle diffs, and so the rules were bent: EFR implementors were given permission
to deviate from the original bit-exact definition of EFR in order to have more
commonality with MR122.

Approach adopted for Themyscira GSM codec libraries suite
=========================================================

I (Mother Mychaela) previously entertained the idea of creating a unified codec
library that supports both AMR and EFR with common code, producing a published-
source, FOSS-culture equivalent of what most proprietary vendors have done.
However, on further reflection, that idea has been rejected.  The current
situation as of 2024-05 is as follows:

* Libgsmefr is our production-oriented implementation of GSM-EFR codec.  It
  implements the original bit-exact definition of EFR, not the AMR-EFR hybrid
  version, and it includes full support for DTX encoding and SID decoding with
  comfort noise generation per GSM 06.62.

* Libtwamr is our librification of 3GPP AMR reference code.  The library is
  structured in such a way that libtwamr stateful encoder and decoder functions
  can be combined with stateless EFR frame packing and unpacking functions from
  libgsmefr, allowing AMR-EFR hybrid encoders and decoders to be built.  The
  decoder homing function in libtwamr can be told to trigger on EFR DHF instead
  of MR122 version, and for the encoder direction there is a simple utility
  function that artificially transforms MR122 DHF into EFR DHF post-encoder.
  However, there is no support for AMR-EFR hybrid encoding with DTX enabled,
  and the low-effort version of AMR-EFR hybrid decoder constructed in this
  manner cannot grok EFR SID frames or generate CN per GSM 06.62.

Production implementations of GSM network elements that need to perform EFR
speech transcoding should use libgsmefr, not libtwamr.  The limited support
that is provided for AMR-EFR hybrid encoding and decoding with the combination
of libtwamr and libgsmefr is intended for experimentation and reverse
engineering of other people's implementations, for times when it becomes
necessary to model, simulate or replicate bit-exact operation of someone else's
network element.