FreeCalypso > hg > efr-experiments
view Theory-and-mystery @ 7:1fd613cec7ab
Theory-and-mystery: document written
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Wed, 17 Apr 2024 17:14:41 +0000 |
parents | |
children |
line wrap: on
line source
Relation between GSM-EFR and 12k2 mode of AMR ============================================= What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR, or MR122 for short? The most obvious difference is in DTX: the format of SID frames and even the very paradigm of how DTX works are completely different between EFR and AMR. But what about non-DTX operation? If a codec session consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR and MR122 strictly identical? The correct answer is that in the absence of SIDs, EFR and MR122 are directly interoperable in that the output of an EFR encoder can be fed to the input of an AMR decoder, and vice-versa. However, the two codecs are NOT identical at the bit-exact level! The differences are subtle, such that finding them requires some intense study; here I cover those diffs which I was able to find. DHF difference and the reason why it occurs =========================================== In their official form (non-telco-grade corner-cutting libraries don't count, no matter how popular among FOSS), both EFR and AMR include codec homing as a mandatory feature, and the mechanism works on the same principle across all ETSI/3GPP codecs. The encoder homing frame (EHF) is the same for all codecs: all 160 samples equal to 0x0008, but each codec has its own decoder homing frame (DHF). Each codec's respective DHF is the natural output of its encoder when the input is EHF and the initial state is the reset state - as simple as that. Note the natural aspect: every spec-defined DHF came about naturally in that codec, hence the exact set of codec parameters that constitutes a DHF is not a detail which some standard-setting committee could define arbitrarily. AMR has 8 different DHFs for its 8 different modes, and the DHF for MR122 is *not* the same as EFR DHF! Given that this DHF is nothing but the encoder's natural response to encoding an EHF input, this difference in DHF between EFR and MR122 indicates the existence of some difference between the two encoders. A simple experiment, contained in this source tree, reveals what the key difference is: see src/cod_12k2.c, #ifdef EFR2_VARIANT. When this source is compiled with -DEFR2_VARIANT in efr2 directory, the resulting encoder produces DHF (natural response to EHF received in the reset state) that is identical to the one defined for MR122, proving that this specific change is the reason for the diff in DHF parameters between EFR and MR122. The encoder diff that happens here (change from EFR to MR122) is an artificial delay of 5 ms. In EFR, on each invocation of the encoder, a frame of new 160 speech samples is fed in, and that same frame is subject to encoding. In AMR, the input is still 160 samples each time, but the frame being encoded consists of 40 samples from the tail of the previous input and 120 samples from the new input. The newest 40 samples are used for auto-correlation computation in the lower modes of AMR (see 3GPP TS 26.090 section 5.2), but in MR122 they do absolutely nothing until the next invocation of the encoder, effecting an artificial delay of 5 ms. In true multirate operation this delay is needed to support seamless mode switching, but in an MR122-only environment it is just waste. Other encoder differences ========================= The 5 ms delay covered above is not the only diff between non-DTX EFR and MR122 encoders. We know that other diffs must exist because the output of the test encoder built in efr2 directory of this repository does not match that of the official AMR encoder beyond the initial homing frames; however, those additional differences have not been studied yet. Decoder diffs between EFR and MR122 =================================== The two decoders are also different at the bit-exact level: if you take a "pure" stream of 12k2 speech frames (no DHF, no SIDs and no BFI frame gaps or defects) and feed it to EFR and AMR decoders, both starting from external reset state, the resulting outputs will be different. Two specific differences in the decoder have been identified: * The AGC module is different: see agc.c vs agc_amr.c in src directory. The diffs inside AGC have not been studied yet. * The post-processing step described in 3GPP TS 26.090 section 6.2.2 (high-pass filtering) is new with AMR. The code version built in efr2 directory has these two changes applied; it passes on all available test sequences (amr122_efr.zip described below), but there may be other diffs that aren't caught by this test sequence set and which we therefore have not identified yet. ETSI/3GPP laxness toward EFR implementors ========================================= ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact form, and every production implementation was required to match the output of the official reference bit for bit. However, once AMR came out, the regulation on EFR was loosened. GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0) has an appendix-like chapter (chapter 10) whose first paragraph reads: The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate speech coder. An alternative implementation of the Enhanced Full Rate speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate coder is allowed. Alternative implementations shall implement the functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the exception that the DTX transmission format (GSM 06.81) and the comfort noise generation (GSM 06.62) shall be used. It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps both) weren't too happy with the prospect of having to include two different versions of _almost_ the same codec algorithm with a bunch of interspersed subtle diffs, and so the rules were bent: EFR implementors were given permission to deviate from the original bit-exact definition of EFR in order to have more commonality with MR122. But the devil is in the details. If I am seeking to implement this "EFR alternative 2", where is the new bit-exact reference to be followed for this option? No such reference C code for this AMR-EFR hybrid appears to have been published anywhere, but this code must have existed once in unpublished form, as we do have surviving published _output_ from that mystery code. The digital companion to just-quoted GSM 06.54 is a ZIP archive named ts_100725v050200p0.zip; inside this ZIP archive there are 9 inner ZIPs: 8 ZIPs for the 8 original EFR test sequence disks, plus a later addendum named amr122_efr.zip. The latter ZIP contains *.cod and *.dec test sequence files in EFR format (*not* AMR), as well as *.out files from the intended decoding of *.dec. The transformation from *.cod to *.dec in this set is unchanged EFR ed_iface, but the encoder run that produced *.cod and the decoder run that produced *.out were quite special: * t??_efr.cod contain the same codec parameters as the AMR counterpart in 06.74 test sequence set except for the first two frames in each sequence, which are proper EFR DHFs. It appears that they ran an essentially-unmodified AMR encoder in MR122 wtth DTX disabled, then artificially patched the DHF after MR122 encoder output, then packaged the output in EFR *.cod format - but it must have been more complicated, as this simplistic approach would not support DTX. * dtx?_efr.cod and dtx?_efr2.cod are more intriguing: they are said to correspond to VAD1 and VAD2 in the AMR reference source, yet these sequences have EFR SID frames in their silence parts, not AMR DTX. Thus someone must have constructed an encoder that combines most of AMR code (including AMR VAD and the AMR version of 12k2 speech encoding) with EFR Tx DTX logic and EFR SID generation - quite a feat! * In the decoder direction, the hack presented in efr2 directory of this code repository is sufficient to produce a matching *.out for every *.dec in the amr122_efr.zip mystery collection, including dtx?_efr.dec and dtx?_efr2.dec. However, we made our hack by starting with EFR reference source and making small surgical changes to it; I wonder if whoever did the original feat at ETSI/3GPP started with AMR source instead and outfitted it with ability to understand EFR SID frames and do comfort noise generation per GSM 06.62 - that approach would be a big feat, just like with the encoder. The present author considers it a shame that whatever AMR-EFR hybrid programs were used to generate the sequences in amr122_efr.zip were never published. In the absence of such published code, the details of exactly what was done by those commercial DSP/transcoder vendors who combined AMR with EFR will remain elusive.