diff Theory-and-mystery @ 7:1fd613cec7ab

Theory-and-mystery: document written
author Mychaela Falconia <falcon@freecalypso.org>
date Wed, 17 Apr 2024 17:14:41 +0000
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Theory-and-mystery	Wed Apr 17 17:14:41 2024 +0000
@@ -0,0 +1,152 @@
+Relation between GSM-EFR and 12k2 mode of AMR
+=============================================
+
+What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR,
+or MR122 for short?  The most obvious difference is in DTX: the format of SID
+frames and even the very paradigm of how DTX works are completely different
+between EFR and AMR.  But what about non-DTX operation?  If a codec session
+consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR
+and MR122 strictly identical?
+
+The correct answer is that in the absence of SIDs, EFR and MR122 are directly
+interoperable in that the output of an EFR encoder can be fed to the input of
+an AMR decoder, and vice-versa.  However, the two codecs are NOT identical at
+the bit-exact level!  The differences are subtle, such that finding them
+requires some intense study; here I cover those diffs which I was able to find.
+
+DHF difference and the reason why it occurs
+===========================================
+
+In their official form (non-telco-grade corner-cutting libraries don't count,
+no matter how popular among FOSS), both EFR and AMR include codec homing as a
+mandatory feature, and the mechanism works on the same principle across all
+ETSI/3GPP codecs.  The encoder homing frame (EHF) is the same for all codecs:
+all 160 samples equal to 0x0008, but each codec has its own decoder homing frame
+(DHF).  Each codec's respective DHF is the natural output of its encoder when
+the input is EHF and the initial state is the reset state - as simple as that.
+Note the natural aspect: every spec-defined DHF came about naturally in that
+codec, hence the exact set of codec parameters that constitutes a DHF is not a
+detail which some standard-setting committee could define arbitrarily.
+
+AMR has 8 different DHFs for its 8 different modes, and the DHF for MR122 is
+*not* the same as EFR DHF!  Given that this DHF is nothing but the encoder's
+natural response to encoding an EHF input, this difference in DHF between EFR
+and MR122 indicates the existence of some difference between the two encoders.
+A simple experiment, contained in this source tree, reveals what the key
+difference is: see src/cod_12k2.c, #ifdef EFR2_VARIANT.  When this source is
+compiled with -DEFR2_VARIANT in efr2 directory, the resulting encoder produces
+DHF (natural response to EHF received in the reset state) that is identical to
+the one defined for MR122, proving that this specific change is the reason for
+the diff in DHF parameters between EFR and MR122.
+
+The encoder diff that happens here (change from EFR to MR122) is an artificial
+delay of 5 ms.  In EFR, on each invocation of the encoder, a frame of new 160
+speech samples is fed in, and that same frame is subject to encoding.  In AMR,
+the input is still 160 samples each time, but the frame being encoded consists
+of 40 samples from the tail of the previous input and 120 samples from the new
+input.  The newest 40 samples are used for auto-correlation computation in the
+lower modes of AMR (see 3GPP TS 26.090 section 5.2), but in MR122 they do
+absolutely nothing until the next invocation of the encoder, effecting an
+artificial delay of 5 ms.  In true multirate operation this delay is needed to
+support seamless mode switching, but in an MR122-only environment it is just
+waste.
+
+Other encoder differences
+=========================
+
+The 5 ms delay covered above is not the only diff between non-DTX EFR and MR122
+encoders.  We know that other diffs must exist because the output of the test
+encoder built in efr2 directory of this repository does not match that of the
+official AMR encoder beyond the initial homing frames; however, those additional
+differences have not been studied yet.
+
+Decoder diffs between EFR and MR122
+===================================
+
+The two decoders are also different at the bit-exact level: if you take a "pure"
+stream of 12k2 speech frames (no DHF, no SIDs and no BFI frame gaps or defects)
+and feed it to EFR and AMR decoders, both starting from external reset state,
+the resulting outputs will be different.
+
+Two specific differences in the decoder have been identified:
+
+* The AGC module is different: see agc.c vs agc_amr.c in src directory.  The
+  diffs inside AGC have not been studied yet.
+
+* The post-processing step described in 3GPP TS 26.090 section 6.2.2 (high-pass
+  filtering) is new with AMR.
+
+The code version built in efr2 directory has these two changes applied; it
+passes on all available test sequences (amr122_efr.zip described below), but
+there may be other diffs that aren't caught by this test sequence set and which
+we therefore have not identified yet.
+
+ETSI/3GPP laxness toward EFR implementors
+=========================================
+
+ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact
+form, and every production implementation was required to match the output of
+the official reference bit for bit.  However, once AMR came out, the regulation
+on EFR was loosened.  GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0)
+has an appendix-like chapter (chapter 10) whose first paragraph reads:
+
+	The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described
+	in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate
+	speech coder.  An alternative implementation of the Enhanced Full Rate
+	speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate
+	coder is allowed.  Alternative implementations shall implement the
+	functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the
+	exception that the DTX transmission format (GSM 06.81) and the comfort
+	noise generation (GSM 06.62) shall be used.
+
+It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps
+both) weren't too happy with the prospect of having to include two different
+versions of _almost_ the same codec algorithm with a bunch of interspersed
+subtle diffs, and so the rules were bent: EFR implementors were given permission
+to deviate from the original bit-exact definition of EFR in order to have more
+commonality with MR122.
+
+But the devil is in the details.  If I am seeking to implement this "EFR
+alternative 2", where is the new bit-exact reference to be followed for this
+option?  No such reference C code for this AMR-EFR hybrid appears to have been
+published anywhere, but this code must have existed once in unpublished form,
+as we do have surviving published _output_ from that mystery code.
+
+The digital companion to just-quoted GSM 06.54 is a ZIP archive named
+ts_100725v050200p0.zip; inside this ZIP archive there are 9 inner ZIPs: 8 ZIPs
+for the 8 original EFR test sequence disks, plus a later addendum named
+amr122_efr.zip.  The latter ZIP contains *.cod and *.dec test sequence files in
+EFR format (*not* AMR), as well as *.out files from the intended decoding of
+*.dec.  The transformation from *.cod to *.dec in this set is unchanged EFR
+ed_iface, but the encoder run that produced *.cod and the decoder run that
+produced *.out were quite special:
+
+* t??_efr.cod contain the same codec parameters as the AMR counterpart in 06.74
+  test sequence set except for the first two frames in each sequence, which are
+  proper EFR DHFs.  It appears that they ran an essentially-unmodified AMR
+  encoder in MR122 wtth DTX disabled, then artificially patched the DHF after
+  MR122 encoder output, then packaged the output in EFR *.cod format - but it
+  must have been more complicated, as this simplistic approach would not support
+  DTX.
+
+* dtx?_efr.cod and dtx?_efr2.cod are more intriguing: they are said to
+  correspond to VAD1 and VAD2 in the AMR reference source, yet these sequences
+  have EFR SID frames in their silence parts, not AMR DTX.  Thus someone must
+  have constructed an encoder that combines most of AMR code (including AMR VAD
+  and the AMR version of 12k2 speech encoding) with EFR Tx DTX logic and EFR SID
+  generation - quite a feat!
+
+* In the decoder direction, the hack presented in efr2 directory of this code
+  repository is sufficient to produce a matching *.out for every *.dec in the
+  amr122_efr.zip mystery collection, including dtx?_efr.dec and dtx?_efr2.dec.
+  However, we made our hack by starting with EFR reference source and making
+  small surgical changes to it; I wonder if whoever did the original feat at
+  ETSI/3GPP started with AMR source instead and outfitted it with ability to
+  understand EFR SID frames and do comfort noise generation per GSM 06.62 -
+  that approach would be a big feat, just like with the encoder.
+
+The present author considers it a shame that whatever AMR-EFR hybrid programs
+were used to generate the sequences in amr122_efr.zip were never published.  In
+the absence of such published code, the details of exactly what was done by
+those commercial DSP/transcoder vendors who combined AMR with EFR will remain
+elusive.