view doc/AMR-EFR-philosophy @ 581:e2d5cad04cbf

libgsmhr1 RxFE: store CN R0+LPC separately from speech In the original GSM 06.06 code the ECU for speech mode is entirely separate from the CN generator, maintaining separate state. (The main intertie between them is the speech vs CN state variable, distinguishing between speech and CN BFIs, in addition to the CN-specific function of distinguishing between initial and update SIDs.) In the present RxFE implementation I initially thought that we could use the same saved_frame buffer for both ECU and CN, overwriting just the first 4 params (R0 and LPC) when a valid SID comes in. However, I now realize it was a bad idea: the original code has a corner case (long sequence of speech-mode BFIs to put the ECU in state 6, then SID and CN-mode BFIs, then a good speech frame) that would be broken by that buffer reuse approach. We could eliminate this corner case by resetting the ECU state when passing through a CN insertion period, but doing so would needlessly increase the behavioral diffs between GSM 06.06 and our version. Solution: use a separate CN-specific buffer for CN R0+LPC parameters, and match the behavior of GSM 06.06 code in this regard.
author Mychaela Falconia <falcon@freecalypso.org>
date Thu, 13 Feb 2025 10:02:45 +0000
parents ad032051166a
children
line wrap: on
line source

Relation between GSM-EFR and 12k2 mode of AMR
=============================================

What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR,
or MR122 for short?  The most obvious difference is in DTX: the format of SID
frames and even the very paradigm of how DTX works are completely different
between EFR and AMR.  But what about non-DTX operation?  If a codec session
consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR
and MR122 strictly identical?

The correct answer is that in the absence of SIDs, EFR and MR122 are directly
interoperable in that the output of an EFR encoder can be fed to the input of
an AMR decoder, and vice-versa.  However, the two codecs are NOT identical at
the bit-exact level!  The differences are subtle, such that finding them
requires some intense study; this article documents some of these study
findings:

https://www.freecalypso.org/hg/efr-experiments/file/tip/Theory-and-mystery

What other DSP/transcoder vendors have done
===========================================

ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact
form, and every production implementation was required to match the output of
the official reference bit for bit.  However, once AMR came out, the regulation
on EFR was loosened.  GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0)
has an appendix-like chapter (chapter 10) whose first paragraph reads:

	The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described
	in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate
	speech coder.  An alternative implementation of the Enhanced Full Rate
	speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate
	coder is allowed.  Alternative implementations shall implement the
	functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the
	exception that the DTX transmission format (GSM 06.81) and the comfort
	noise generation (GSM 06.62) shall be used.

It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps
both) weren't too happy with the prospect of having to include two different
versions of _almost_ the same codec algorithm with a bunch of interspersed
subtle diffs, and so the rules were bent: EFR implementors were given permission
to deviate from the original bit-exact definition of EFR in order to have more
commonality with MR122.

Approach adopted for Themyscira GSM codec libraries suite
=========================================================

I (Mother Mychaela) previously entertained the idea of creating a unified codec
library that supports both AMR and EFR with common code, producing a published-
source, FOSS-culture equivalent of what most proprietary vendors have done.
However, on further reflection, that idea has been rejected.  The current
situation as of 2024-05 is as follows:

* Libgsmefr is our production-oriented implementation of GSM-EFR codec.  It
  implements the original bit-exact definition of EFR, not the AMR-EFR hybrid
  version, and it includes full support for DTX encoding and SID decoding with
  comfort noise generation per GSM 06.62.

* Libtwamr is our librification of 3GPP AMR reference code.  The library is
  structured in such a way that libtwamr stateful encoder and decoder functions
  can be combined with stateless EFR frame packing and unpacking functions from
  libgsmefr, allowing AMR-EFR hybrid encoders and decoders to be built.  The
  decoder homing function in libtwamr can be told to trigger on EFR DHF instead
  of MR122 version, and for the encoder direction there is a simple utility
  function that artificially transforms MR122 DHF into EFR DHF post-encoder.
  However, there is no support for AMR-EFR hybrid encoding with DTX enabled,
  and the low-effort version of AMR-EFR hybrid decoder constructed in this
  manner cannot grok EFR SID frames or generate CN per GSM 06.62.

Production implementations of GSM network elements that need to perform EFR
speech transcoding should use libgsmefr, not libtwamr.  The limited support
that is provided for AMR-EFR hybrid encoding and decoding with the combination
of libtwamr and libgsmefr is intended for experimentation and reverse
engineering of other people's implementations, for times when it becomes
necessary to model, simulate or replicate bit-exact operation of someone else's
network element.