view doc/AMR-EFR-hybrid-emu @ 477:4c9222d95647

libtwamr encoder: always emit frame->mode = mode; In the original implementation of amr_encode_frame(), the 'mode' member of the output struct was set to 0xFF if the output frame type is TX_NO_DATA. This design was made to mimic the mode field (16-bit word) being set to 0xFFFF (or -1) in 3GPP test sequence format - but nothing actually depends on this struct member being set in any way, and amr_frame_to_tseq() generates the needed 0xFFFF on its own, based on frame->type being equal to TX_NO_DATA. It is simpler and more efficient to always set frame->mode to the actual encoding mode in amr_encode_frame(), and this new behavior has already been documented in doc/AMR-library-API description in anticipation of the present change.
author Mychaela Falconia <falcon@freecalypso.org>
date Sat, 18 May 2024 22:30:42 +0000
parents ad032051166a
children
line wrap: on
line source

Emulation of other people's AMR-EFR hybrid implementations
==========================================================

[Please see AMR-EFR-philosophy article for background information on the
 differences between classic GSM-EFR and the 12k2 mode of AMR, and how ETSI/3GPP
 loosened their regulation on bit-exactness of EFR, then continue here.]

Experiments reveal that the extant commercial GSM networks of T-Mobile USA and
Telcel Mexico (and likely other countries' GSM networks too) use a GSM speech
transcoder implementation that performs EFR encoding and decoding (for times
when the MS declares no support for AMR and the network falls back to EFR) per
the alternative which we call AMR-EFR hybrid.  The needed experiments are done
by using a FreeCalypso phone or devboard as the MS (declaring yourself to the
network as non-AMR-capable via AT%SPVER), capturing TCH DL and feeding TCH UL
with FreeCalypso tools, and using a SIP-to-PSTN connectivity provider (BulkVS
or Anveo) on the other end of the test call that allows the experimenter to
receive the PCMU or PCMA sample stream coming out of the GSM network's speech
transcoder and feed a crafted PCMU/PCMA sample stream in the other direction.

In this experimental setup, bit-exact details of how the GSM network under study
implements EFR decoding can be tested by feeding a controlled sequence of EFR
codec frames (beginning with at least two DHFs) to GSM Um uplink and observing
the PCMU or PCMA sample stream received on the IP-PSTN end of the call.
Similarly, bit-exact details of how the NUS implements EFR encoding can be
tested by feeding controlled PCMU/PCMA sample streams into the call from IP-PSTN
and observing what the network emits on GSM Um downlink.  In the latter case,
frame synchronization finding tricks described in ETSI/3GPP test sequence specs
need to included as part of the experiment.

When these experiments were performed on the GSM networks of T-Mobile USA and
Telcel Mexico, it was immediately apparent that they do not implement EFR
following the original bit-exact code of GSM 06.53: feeding any of the original
EFR test sequences from GSM 06.54 to the NUS does not produce matching results.
However, when I tried feeding EFR codec frame sequences from amr122_efr.zip
(the late addendum to GSM 06.54 for the AMR-EFR hybrid option) to GSM UL, the
PCMU (T-Mobile USA) or PCMA (Telcel Mexico) output from the GSM network's EFR
decoder matched _those_ test sequences, indicating that these networks use the
AMR-EFR alternative implementation.

Creating tinkerer-oriented FOSS tools that can emulate or replicate the poorly
defined "EFR alternative 2" implemented by these extant commercial networks has
been a sportive challenge ever since.  The present development in Themyscira
GSM codec libraries and utilities suite is a step toward conquering that
challenge: we are now able to replicate the mystery commercial transcoder in
non-DTX operation, specifically:

a) We can feed a SID-free stream of EFR codec frames to GSM UL, beginning with
   DHF, and get the expected result on PCMU or PCMA;

b) In the encoder direction, for the first 7 frames after EHF, before DTX is
   allowed to kick in, we can get GSM DL output from the network that matches
   our expectations.

Encoder 5 ms delay and DHF transformation
=========================================

One of the diffs between classic EFR and MR122 in the encoder direction is the
artificial delay of 5 ms introduced in the AMR version.  In true multirate
operation this delay is needed to support seamless switching between codec
modes, but when the only allowed codec rate is 12k2 (which is the case with EFR
by definition), this delay is pure waste.  (Needless to say, an extra delay of
5 ms is nothing compared to the egregious latencies introduced by today's ugly
and horrible world of IP-based transport everywhere, but still...)  This
artificial 5 ms delay in the encoder is the reason for the DHF difference
between EFR and MR122 - but here is the wild part: instead of recognizing this
artificial delay as unnecessary and wasteful for 12k2-only EFR and removing it
from the AMR-EFR hybrid contraption, those commercial transcoder vendors and
the people who prepared amr122_efr.zip for ETSI/3GPP (were they the same
people?) kept this 5 ms encoder delay, keeping the whole encoder unchanged AMR
except for whatever insane trickery they did to fit EFR DTX logic and EFR SID
generation into it, but added special DHF transformation logic on the output of
this AMR encoder to produce compliant EFR DHF when the input is EHF.

Exactly how this DHF transformation is done in those actually-deployed AMR-EFR
hybrid encoders is a bit of a mystery.  My first thought was to compare the
speech parameters emitted by the AMR encoder against MR122 DHF, and if the
result is a match, replace that MR122-DHF parameter set with EFR DHF.  This
approach is implemented in the simple amr_dhf_subst_efr() function in libtwamr.
One distinctive signature of this approach is that the output of a hybrid
encoder following this method can never equal MR122 DHF: this one particular
bit pattern is precluded from the set of possible outputs under all conditions.

However, subsequent experiments quickly revealed that the logic implemented by
the transcoder in the network of T-Mobile USA must be different.  One of the
counter-intuitive effects of the 5 ms artificial delay in the MR122 encoder is
what happens when the encoder is in its homed state and you feed it an input
frame whose first 120 samples are all 0x0008, but some (as few as one or as many
as all) of the last 40 samples are different.  This frame does not meet the
definition of EHF and won't be recognized as such - the encoder won't get
rehomed once again after processing this frame - yet the output will be
bit-exact MR122 DHF.  How do those AMR-EFR hybrid encoders handle *this* case?

Experiments on T-Mobile reveal that in the case in question, the encoded frame
is emitted with the bit pattern of MR122 DHF, *not* transformed into EFR DHF.
Because MR122-DHF output is impossible with an encoder that implements logic
like our amr_dhf_subst_efr() first cut, we know (by modus tollens) that
T-Mobile's implementation uses some different logic.

Our new (current) working model is implemented in amr_dhf_subst_efr2(): we
replace the output of the AMR encoder with EFR DHF if the raw encoder output
was MR122 DHF *and* the input frame was EHF.  This version appears to match
the observed behavior of T-Mobile USA so far.

EFR DHF in the decoder direction
================================

The way decoder homing works in all ETSI/3GPP-defined speech codecs, there is
an explicit check against known DHF bit pattern (up to first subframe only) at
the beginning of the decoder (if the decoder is homed and the input is DHF per
this reduced check, artificially emit EHF, stay homed and do nothing more), and
a second similar check against the known DHF bit pattern (full frame comparison
this time) at the end of the decoder, triggering the state reset function on
match.  These checks are (and can only be) implemented by explicit comparison
against a known hard-coded DHF pattern - hence it doesn't matter in the decoder
case whether the DHF is natural (as in all properly ETSI-defined codecs) or
artificial as in AMR-EFR hybrid.  Thus the "correct" handling of DHF in the
AMR-EFR hybrid decoder is a matter of replacing the check against MR122 DHF bit
pattern with a check against the different bit pattern of EFR DHF.

The decoder engine in libtwamr supports this different-DHF option for MR122
decoding by way of a bit set in the mode field in struct amr_param_frame - see
the detailed description in AMR-library-API article.

Command line utilities for AMR-EFR hybrid
=========================================

The present package includes a small set of command line utilities that work
with the AMR-EFR hybrid described above:

amrefr-encode-r
amrefr-decode-r

	These two utilities function just like gsmefr-encode-r and
	gsmefr-decode-r described in Codec-utils article, but implement the
	AMR-EFR hybrid version of the codec instead of original EFR.  The
	no-DTX limitation applies: amrefr-encode-r lacks -d option, and the
	input to amrefr-decode-r must not contain any SID frames.

amrefr-tseq-enc
amrefr-tseq-dec

	These two utilities are AMR-EFR counterparts to gsmefr-etsi-enc and
	gsmefr-etsi-dec test programs described in EFR-testing article.  They
	pass all tests on the non-DTX t??_efr.* sequences in ETSI's
	amr122_efr.zip, but not on any of the DTX sequences included in the
	same ZIP.  Just like amrefr-encode-r, amrefr-tseq-enc lacks -d option,
	and amrefr-tseq-dec rejects input containing SID frames.