FreeCalypso > hg > gsm-codec-lib
view doc/AMR-EFR-hybrid-emu @ 477:4c9222d95647
libtwamr encoder: always emit frame->mode = mode;
In the original implementation of amr_encode_frame(), the 'mode' member
of the output struct was set to 0xFF if the output frame type is TX_NO_DATA.
This design was made to mimic the mode field (16-bit word) being set to
0xFFFF (or -1) in 3GPP test sequence format - but nothing actually depends
on this struct member being set in any way, and amr_frame_to_tseq()
generates the needed 0xFFFF on its own, based on frame->type being equal
to TX_NO_DATA.
It is simpler and more efficient to always set frame->mode to the actual
encoding mode in amr_encode_frame(), and this new behavior has already
been documented in doc/AMR-library-API description in anticipation of
the present change.
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Sat, 18 May 2024 22:30:42 +0000 |
parents | ad032051166a |
children |
line wrap: on
line source
Emulation of other people's AMR-EFR hybrid implementations ========================================================== [Please see AMR-EFR-philosophy article for background information on the differences between classic GSM-EFR and the 12k2 mode of AMR, and how ETSI/3GPP loosened their regulation on bit-exactness of EFR, then continue here.] Experiments reveal that the extant commercial GSM networks of T-Mobile USA and Telcel Mexico (and likely other countries' GSM networks too) use a GSM speech transcoder implementation that performs EFR encoding and decoding (for times when the MS declares no support for AMR and the network falls back to EFR) per the alternative which we call AMR-EFR hybrid. The needed experiments are done by using a FreeCalypso phone or devboard as the MS (declaring yourself to the network as non-AMR-capable via AT%SPVER), capturing TCH DL and feeding TCH UL with FreeCalypso tools, and using a SIP-to-PSTN connectivity provider (BulkVS or Anveo) on the other end of the test call that allows the experimenter to receive the PCMU or PCMA sample stream coming out of the GSM network's speech transcoder and feed a crafted PCMU/PCMA sample stream in the other direction. In this experimental setup, bit-exact details of how the GSM network under study implements EFR decoding can be tested by feeding a controlled sequence of EFR codec frames (beginning with at least two DHFs) to GSM Um uplink and observing the PCMU or PCMA sample stream received on the IP-PSTN end of the call. Similarly, bit-exact details of how the NUS implements EFR encoding can be tested by feeding controlled PCMU/PCMA sample streams into the call from IP-PSTN and observing what the network emits on GSM Um downlink. In the latter case, frame synchronization finding tricks described in ETSI/3GPP test sequence specs need to included as part of the experiment. When these experiments were performed on the GSM networks of T-Mobile USA and Telcel Mexico, it was immediately apparent that they do not implement EFR following the original bit-exact code of GSM 06.53: feeding any of the original EFR test sequences from GSM 06.54 to the NUS does not produce matching results. However, when I tried feeding EFR codec frame sequences from amr122_efr.zip (the late addendum to GSM 06.54 for the AMR-EFR hybrid option) to GSM UL, the PCMU (T-Mobile USA) or PCMA (Telcel Mexico) output from the GSM network's EFR decoder matched _those_ test sequences, indicating that these networks use the AMR-EFR alternative implementation. Creating tinkerer-oriented FOSS tools that can emulate or replicate the poorly defined "EFR alternative 2" implemented by these extant commercial networks has been a sportive challenge ever since. The present development in Themyscira GSM codec libraries and utilities suite is a step toward conquering that challenge: we are now able to replicate the mystery commercial transcoder in non-DTX operation, specifically: a) We can feed a SID-free stream of EFR codec frames to GSM UL, beginning with DHF, and get the expected result on PCMU or PCMA; b) In the encoder direction, for the first 7 frames after EHF, before DTX is allowed to kick in, we can get GSM DL output from the network that matches our expectations. Encoder 5 ms delay and DHF transformation ========================================= One of the diffs between classic EFR and MR122 in the encoder direction is the artificial delay of 5 ms introduced in the AMR version. In true multirate operation this delay is needed to support seamless switching between codec modes, but when the only allowed codec rate is 12k2 (which is the case with EFR by definition), this delay is pure waste. (Needless to say, an extra delay of 5 ms is nothing compared to the egregious latencies introduced by today's ugly and horrible world of IP-based transport everywhere, but still...) This artificial 5 ms delay in the encoder is the reason for the DHF difference between EFR and MR122 - but here is the wild part: instead of recognizing this artificial delay as unnecessary and wasteful for 12k2-only EFR and removing it from the AMR-EFR hybrid contraption, those commercial transcoder vendors and the people who prepared amr122_efr.zip for ETSI/3GPP (were they the same people?) kept this 5 ms encoder delay, keeping the whole encoder unchanged AMR except for whatever insane trickery they did to fit EFR DTX logic and EFR SID generation into it, but added special DHF transformation logic on the output of this AMR encoder to produce compliant EFR DHF when the input is EHF. Exactly how this DHF transformation is done in those actually-deployed AMR-EFR hybrid encoders is a bit of a mystery. My first thought was to compare the speech parameters emitted by the AMR encoder against MR122 DHF, and if the result is a match, replace that MR122-DHF parameter set with EFR DHF. This approach is implemented in the simple amr_dhf_subst_efr() function in libtwamr. One distinctive signature of this approach is that the output of a hybrid encoder following this method can never equal MR122 DHF: this one particular bit pattern is precluded from the set of possible outputs under all conditions. However, subsequent experiments quickly revealed that the logic implemented by the transcoder in the network of T-Mobile USA must be different. One of the counter-intuitive effects of the 5 ms artificial delay in the MR122 encoder is what happens when the encoder is in its homed state and you feed it an input frame whose first 120 samples are all 0x0008, but some (as few as one or as many as all) of the last 40 samples are different. This frame does not meet the definition of EHF and won't be recognized as such - the encoder won't get rehomed once again after processing this frame - yet the output will be bit-exact MR122 DHF. How do those AMR-EFR hybrid encoders handle *this* case? Experiments on T-Mobile reveal that in the case in question, the encoded frame is emitted with the bit pattern of MR122 DHF, *not* transformed into EFR DHF. Because MR122-DHF output is impossible with an encoder that implements logic like our amr_dhf_subst_efr() first cut, we know (by modus tollens) that T-Mobile's implementation uses some different logic. Our new (current) working model is implemented in amr_dhf_subst_efr2(): we replace the output of the AMR encoder with EFR DHF if the raw encoder output was MR122 DHF *and* the input frame was EHF. This version appears to match the observed behavior of T-Mobile USA so far. EFR DHF in the decoder direction ================================ The way decoder homing works in all ETSI/3GPP-defined speech codecs, there is an explicit check against known DHF bit pattern (up to first subframe only) at the beginning of the decoder (if the decoder is homed and the input is DHF per this reduced check, artificially emit EHF, stay homed and do nothing more), and a second similar check against the known DHF bit pattern (full frame comparison this time) at the end of the decoder, triggering the state reset function on match. These checks are (and can only be) implemented by explicit comparison against a known hard-coded DHF pattern - hence it doesn't matter in the decoder case whether the DHF is natural (as in all properly ETSI-defined codecs) or artificial as in AMR-EFR hybrid. Thus the "correct" handling of DHF in the AMR-EFR hybrid decoder is a matter of replacing the check against MR122 DHF bit pattern with a check against the different bit pattern of EFR DHF. The decoder engine in libtwamr supports this different-DHF option for MR122 decoding by way of a bit set in the mode field in struct amr_param_frame - see the detailed description in AMR-library-API article. Command line utilities for AMR-EFR hybrid ========================================= The present package includes a small set of command line utilities that work with the AMR-EFR hybrid described above: amrefr-encode-r amrefr-decode-r These two utilities function just like gsmefr-encode-r and gsmefr-decode-r described in Codec-utils article, but implement the AMR-EFR hybrid version of the codec instead of original EFR. The no-DTX limitation applies: amrefr-encode-r lacks -d option, and the input to amrefr-decode-r must not contain any SID frames. amrefr-tseq-enc amrefr-tseq-dec These two utilities are AMR-EFR counterparts to gsmefr-etsi-enc and gsmefr-etsi-dec test programs described in EFR-testing article. They pass all tests on the non-DTX t??_efr.* sequences in ETSI's amr122_efr.zip, but not on any of the DTX sequences included in the same ZIP. Just like amrefr-encode-r, amrefr-tseq-enc lacks -d option, and amrefr-tseq-dec rejects input containing SID frames.