FreeCalypso > hg > efr-experiments
changeset 7:1fd613cec7ab
Theory-and-mystery: document written
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Wed, 17 Apr 2024 17:14:41 +0000 |
parents | 6119d2c1e7d9 |
children | 8b17df8f6340 |
files | Theory-and-mystery |
diffstat | 1 files changed, 152 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/Theory-and-mystery Wed Apr 17 17:14:41 2024 +0000 @@ -0,0 +1,152 @@ +Relation between GSM-EFR and 12k2 mode of AMR +============================================= + +What are the differences between GSM-EFR codec and the highest 12k2 mode of AMR, +or MR122 for short? The most obvious difference is in DTX: the format of SID +frames and even the very paradigm of how DTX works are completely different +between EFR and AMR. But what about non-DTX operation? If a codec session +consists solely of good speech frames, no SIDs and no BFI frame gaps, are EFR +and MR122 strictly identical? + +The correct answer is that in the absence of SIDs, EFR and MR122 are directly +interoperable in that the output of an EFR encoder can be fed to the input of +an AMR decoder, and vice-versa. However, the two codecs are NOT identical at +the bit-exact level! The differences are subtle, such that finding them +requires some intense study; here I cover those diffs which I was able to find. + +DHF difference and the reason why it occurs +=========================================== + +In their official form (non-telco-grade corner-cutting libraries don't count, +no matter how popular among FOSS), both EFR and AMR include codec homing as a +mandatory feature, and the mechanism works on the same principle across all +ETSI/3GPP codecs. The encoder homing frame (EHF) is the same for all codecs: +all 160 samples equal to 0x0008, but each codec has its own decoder homing frame +(DHF). Each codec's respective DHF is the natural output of its encoder when +the input is EHF and the initial state is the reset state - as simple as that. +Note the natural aspect: every spec-defined DHF came about naturally in that +codec, hence the exact set of codec parameters that constitutes a DHF is not a +detail which some standard-setting committee could define arbitrarily. + +AMR has 8 different DHFs for its 8 different modes, and the DHF for MR122 is +*not* the same as EFR DHF! Given that this DHF is nothing but the encoder's +natural response to encoding an EHF input, this difference in DHF between EFR +and MR122 indicates the existence of some difference between the two encoders. +A simple experiment, contained in this source tree, reveals what the key +difference is: see src/cod_12k2.c, #ifdef EFR2_VARIANT. When this source is +compiled with -DEFR2_VARIANT in efr2 directory, the resulting encoder produces +DHF (natural response to EHF received in the reset state) that is identical to +the one defined for MR122, proving that this specific change is the reason for +the diff in DHF parameters between EFR and MR122. + +The encoder diff that happens here (change from EFR to MR122) is an artificial +delay of 5 ms. In EFR, on each invocation of the encoder, a frame of new 160 +speech samples is fed in, and that same frame is subject to encoding. In AMR, +the input is still 160 samples each time, but the frame being encoded consists +of 40 samples from the tail of the previous input and 120 samples from the new +input. The newest 40 samples are used for auto-correlation computation in the +lower modes of AMR (see 3GPP TS 26.090 section 5.2), but in MR122 they do +absolutely nothing until the next invocation of the encoder, effecting an +artificial delay of 5 ms. In true multirate operation this delay is needed to +support seamless mode switching, but in an MR122-only environment it is just +waste. + +Other encoder differences +========================= + +The 5 ms delay covered above is not the only diff between non-DTX EFR and MR122 +encoders. We know that other diffs must exist because the output of the test +encoder built in efr2 directory of this repository does not match that of the +official AMR encoder beyond the initial homing frames; however, those additional +differences have not been studied yet. + +Decoder diffs between EFR and MR122 +=================================== + +The two decoders are also different at the bit-exact level: if you take a "pure" +stream of 12k2 speech frames (no DHF, no SIDs and no BFI frame gaps or defects) +and feed it to EFR and AMR decoders, both starting from external reset state, +the resulting outputs will be different. + +Two specific differences in the decoder have been identified: + +* The AGC module is different: see agc.c vs agc_amr.c in src directory. The + diffs inside AGC have not been studied yet. + +* The post-processing step described in 3GPP TS 26.090 section 6.2.2 (high-pass + filtering) is new with AMR. + +The code version built in efr2 directory has these two changes applied; it +passes on all available test sequences (amr122_efr.zip described below), but +there may be other diffs that aren't caught by this test sequence set and which +we therefore have not identified yet. + +ETSI/3GPP laxness toward EFR implementors +========================================= + +ETSI had a tradition of defining standard GSM codecs (FR, HR, EFR) in bit-exact +form, and every production implementation was required to match the output of +the official reference bit for bit. However, once AMR came out, the regulation +on EFR was loosened. GSM 06.54 document from 2000-08 (ETSI TS 100 725 V5.2.0) +has an appendix-like chapter (chapter 10) whose first paragraph reads: + + The 12.2 kbit/s mode of the Adaptive Multi Rate speech coder described + in TS 26.071 is functionally equivalent to the GSM Enhanced Full Rate + speech coder. An alternative implementation of the Enhanced Full Rate + speech service based on the 12.2 kbit/s mode of the Adaptive Multi Rate + coder is allowed. Alternative implementations shall implement the + functionality specified in TS 26.071 for the 12.2 kbit/s mode, with the + exception that the DTX transmission format (GSM 06.81) and the comfort + noise generation (GSM 06.62) shall be used. + +It appears that DSP vendors (for GSM MS or for network transcoders, or perhaps +both) weren't too happy with the prospect of having to include two different +versions of _almost_ the same codec algorithm with a bunch of interspersed +subtle diffs, and so the rules were bent: EFR implementors were given permission +to deviate from the original bit-exact definition of EFR in order to have more +commonality with MR122. + +But the devil is in the details. If I am seeking to implement this "EFR +alternative 2", where is the new bit-exact reference to be followed for this +option? No such reference C code for this AMR-EFR hybrid appears to have been +published anywhere, but this code must have existed once in unpublished form, +as we do have surviving published _output_ from that mystery code. + +The digital companion to just-quoted GSM 06.54 is a ZIP archive named +ts_100725v050200p0.zip; inside this ZIP archive there are 9 inner ZIPs: 8 ZIPs +for the 8 original EFR test sequence disks, plus a later addendum named +amr122_efr.zip. The latter ZIP contains *.cod and *.dec test sequence files in +EFR format (*not* AMR), as well as *.out files from the intended decoding of +*.dec. The transformation from *.cod to *.dec in this set is unchanged EFR +ed_iface, but the encoder run that produced *.cod and the decoder run that +produced *.out were quite special: + +* t??_efr.cod contain the same codec parameters as the AMR counterpart in 06.74 + test sequence set except for the first two frames in each sequence, which are + proper EFR DHFs. It appears that they ran an essentially-unmodified AMR + encoder in MR122 wtth DTX disabled, then artificially patched the DHF after + MR122 encoder output, then packaged the output in EFR *.cod format - but it + must have been more complicated, as this simplistic approach would not support + DTX. + +* dtx?_efr.cod and dtx?_efr2.cod are more intriguing: they are said to + correspond to VAD1 and VAD2 in the AMR reference source, yet these sequences + have EFR SID frames in their silence parts, not AMR DTX. Thus someone must + have constructed an encoder that combines most of AMR code (including AMR VAD + and the AMR version of 12k2 speech encoding) with EFR Tx DTX logic and EFR SID + generation - quite a feat! + +* In the decoder direction, the hack presented in efr2 directory of this code + repository is sufficient to produce a matching *.out for every *.dec in the + amr122_efr.zip mystery collection, including dtx?_efr.dec and dtx?_efr2.dec. + However, we made our hack by starting with EFR reference source and making + small surgical changes to it; I wonder if whoever did the original feat at + ETSI/3GPP started with AMR source instead and outfitted it with ability to + understand EFR SID frames and do comfort noise generation per GSM 06.62 - + that approach would be a big feat, just like with the encoder. + +The present author considers it a shame that whatever AMR-EFR hybrid programs +were used to generate the sequences in amr122_efr.zip were never published. In +the absence of such published code, the details of exactly what was done by +those commercial DSP/transcoder vendors who combined AMR with EFR will remain +elusive.