view Theory-and-mystery @ 8:8b17df8f6340 default tip

add README
author Mychaela Falconia <falcon@freecalypso.org>
date Wed, 17 Apr 2024 17:30:25 +0000
parents 1fd613cec7ab
children
line wrap: on
line source

Relation between GSM-EFR and 12k2 mode of AMR
=============================================

What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR,
or MR122 for short?  The most obvious difference is in DTX: the format of SID
frames and even the very paradigm of how DTX works are completely different
between EFR and AMR.  But what about non-DTX operation?  If a codec session
consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR
and MR122 strictly identical?

The correct answer is that in the absence of SIDs, EFR and MR122 are directly
interoperable in that the output of an EFR encoder can be fed to the input of
an AMR decoder, and vice-versa.  However, the two codecs are NOT identical at
the bit-exact level!  The differences are subtle, such that finding them
requires some intense study; here I cover those diffs which I was able to find.

DHF difference and the reason why it occurs
===========================================

In their official form (non-telco-grade corner-cutting libraries don't count,
no matter how popular among FOSS), both EFR and AMR include codec homing as a
mandatory feature, and the mechanism works on the same principle across all
ETSI/3GPP codecs.  The encoder homing frame (EHF) is the same for all codecs:
all 160 samples equal to 0x0008, but each codec has its own decoder homing frame
(DHF).  Each codec's respective DHF is the natural output of its encoder when
the input is EHF and the initial state is the reset state - as simple as that.
Note the natural aspect: every spec-defined DHF came about naturally in that
codec, hence the exact set of codec parameters that constitutes a DHF is not a
detail which some standard-setting committee could define arbitrarily.

AMR has 8 different DHFs for its 8 different modes, and the DHF for MR122 is
*not* the same as EFR DHF!  Given that this DHF is nothing but the encoder's
natural response to encoding an EHF input, this difference in DHF between EFR
and MR122 indicates the existence of some difference between the two encoders.
A simple experiment, contained in this source tree, reveals what the key
difference is: see src/cod_12k2.c, #ifdef EFR2_VARIANT.  When this source is
compiled with -DEFR2_VARIANT in efr2 directory, the resulting encoder produces
DHF (natural response to EHF received in the reset state) that is identical to
the one defined for MR122, proving that this specific change is the reason for
the diff in DHF parameters between EFR and MR122.

The encoder diff that happens here (change from EFR to MR122) is an artificial
delay of 5 ms.  In EFR, on each invocation of the encoder, a frame of new 160
speech samples is fed in, and that same frame is subject to encoding.  In AMR,
the input is still 160 samples each time, but the frame being encoded consists
of 40 samples from the tail of the previous input and 120 samples from the new
input.  The newest 40 samples are used for auto-correlation computation in the
lower modes of AMR (see 3GPP TS 26.090 section 5.2), but in MR122 they do
absolutely nothing until the next invocation of the encoder, effecting an
artificial delay of 5 ms.  In true multirate operation this delay is needed to
support seamless mode switching, but in an MR122-only environment it is just
waste.

Other encoder differences
=========================

The 5 ms delay covered above is not the only diff between non-DTX EFR and MR122
encoders.  We know that other diffs must exist because the output of the test
encoder built in efr2 directory of this repository does not match that of the
official AMR encoder beyond the initial homing frames; however, those additional
differences have not been studied yet.

Decoder diffs between EFR and MR122
===================================

The two decoders are also different at the bit-exact level: if you take a "pure"
stream of 12k2 speech frames (no DHF, no SIDs and no BFI frame gaps or defects)
and feed it to EFR and AMR decoders, both starting from external reset state,
the resulting outputs will be different.

Two specific differences in the decoder have been identified:

* The AGC module is different: see agc.c vs agc_amr.c in src directory.  The
  diffs inside AGC have not been studied yet.

* The post-processing step described in 3GPP TS 26.090 section 6.2.2 (high-pass
  filtering) is new with AMR.

The code version built in efr2 directory has these two changes applied; it
passes on all available test sequences (amr122_efr.zip described below), but
there may be other diffs that aren't caught by this test sequence set and which
we therefore have not identified yet.

ETSI/3GPP laxness toward EFR implementors
=========================================

ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact
form, and every production implementation was required to match the output of
the official reference bit for bit.  However, once AMR came out, the regulation
on EFR was loosened.  GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0)
has an appendix-like chapter (chapter 10) whose first paragraph reads:

	The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described
	in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate
	speech coder.  An alternative implementation of the Enhanced Full Rate
	speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate
	coder is allowed.  Alternative implementations shall implement the
	functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the
	exception that the DTX transmission format (GSM 06.81) and the comfort
	noise generation (GSM 06.62) shall be used.

It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps
both) weren't too happy with the prospect of having to include two different
versions of _almost_ the same codec algorithm with a bunch of interspersed
subtle diffs, and so the rules were bent: EFR implementors were given permission
to deviate from the original bit-exact definition of EFR in order to have more
commonality with MR122.

But the devil is in the details.  If I am seeking to implement this "EFR
alternative 2", where is the new bit-exact reference to be followed for this
option?  No such reference C code for this AMR-EFR hybrid appears to have been
published anywhere, but this code must have existed once in unpublished form,
as we do have surviving published _output_ from that mystery code.

The digital companion to just-quoted GSM 06.54 is a ZIP archive named
ts_100725v050200p0.zip; inside this ZIP archive there are 9 inner ZIPs: 8 ZIPs
for the 8 original EFR test sequence disks, plus a later addendum named
amr122_efr.zip.  The latter ZIP contains *.cod and *.dec test sequence files in
EFR format (*not* AMR), as well as *.out files from the intended decoding of
*.dec.  The transformation from *.cod to *.dec in this set is unchanged EFR
ed_iface, but the encoder run that produced *.cod and the decoder run that
produced *.out were quite special:

* t??_efr.cod contain the same codec parameters as the AMR counterpart in 06.74
  test sequence set except for the first two frames in each sequence, which are
  proper EFR DHFs.  It appears that they ran an essentially-unmodified AMR
  encoder in MR122 wtth DTX disabled, then artificially patched the DHF after
  MR122 encoder output, then packaged the output in EFR *.cod format - but it
  must have been more complicated, as this simplistic approach would not support
  DTX.

* dtx?_efr.cod and dtx?_efr2.cod are more intriguing: they are said to
  correspond to VAD1 and VAD2 in the AMR reference source, yet these sequences
  have EFR SID frames in their silence parts, not AMR DTX.  Thus someone must
  have constructed an encoder that combines most of AMR code (including AMR VAD
  and the AMR version of 12k2 speech encoding) with EFR Tx DTX logic and EFR SID
  generation - quite a feat!

* In the decoder direction, the hack presented in efr2 directory of this code
  repository is sufficient to produce a matching *.out for every *.dec in the
  amr122_efr.zip mystery collection, including dtx?_efr.dec and dtx?_efr2.dec.
  However, we made our hack by starting with EFR reference source and making
  small surgical changes to it; I wonder if whoever did the original feat at
  ETSI/3GPP started with AMR source instead and outfitted it with ability to
  understand EFR SID frames and do comfort noise generation per GSM 06.62 -
  that approach would be a big feat, just like with the encoder.

The present author considers it a shame that whatever AMR-EFR hybrid programs
were used to generate the sequences in amr122_efr.zip were never published.  In
the absence of such published code, the details of exactly what was done by
those commercial DSP/transcoder vendors who combined AMR with EFR will remain
elusive.