# HG changeset patch # User Mychaela Falconia # Date 1670662261 0 # Node ID b33f2168fdec5d528dfe890bf072d8a2017d5464 # Parent b51295fcbbae9685aaed2d28a2c3645121268163 doc/EFR-rationale article written diff -r b51295fcbbae -r b33f2168fdec doc/EFR-rationale --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/EFR-rationale Sat Dec 10 08:51:01 2022 +0000 @@ -0,0 +1,82 @@ +Problem in need of solving +========================== + +At the time of the undertaking of Themyscira libgsmefr project (late 2022), +there did not exist any readily available library solution for GSM EFR codec. +The community of FOSS offers classic libgsm from 1990s for FR1 codec (it's an +implementation of GSM 06.10, on top of which we had to implement our own Rx DTX +handler) and opencore-amrnb for AMR (based on Android OpenCORE framework) - but +nothing for EFR. This situation creates a problem for anyone seeking to deploy +their own GSM network with a voice interface to PSTN or other networks: such +voice interface generally requires implementing a transcoder, and doing the +latter in turn requires a library that implements the codec to be supported. +In the present situation, anyone who wishes to implement a speech transcoder +for GSM networks can easily support FR1 and AMR codecs, but not EFR. + +EFR is more than just 12k2 mode of AMR! +======================================= + +There is a common misconception in the GSM hacker community that EFR is nothing +but the highest 12k2 mode of AMR, and that any library that implements AMR, +such as opencore-amrnb, is thus sufficient to support EFR as well. However, +the reality is more complex: + +* If an AMR encoder operates with DTX disabled, such that the output contains + only speech frames and no SID, and the mode is forced to 12k2, then indeed a + simple reshuffling of bits will produce speech frames that can be fed to an + EFR decoder on the other end. Note that the two encoders (EFR and AMR 12k2) + will produce *different* encoded speech parameters from the same input, and + the decoded speech output on the other end will also be different, but the + two versions are expected to be equally good for human ears. + +* In the other direction, if an EFR input stream contains only good speech + frames (no SID and no lost, FACCH-stolen or DTX-suppressed frames), one can + likewise do a simple bit reordering and feed these frames to an AMR decoder. + The output of this AMR decoder will once again be different from a proper + (bit-exact) EFR decoder for the same speech parameter inputs. but as long as + the EFR input stream is all good speech frames, the output will be good enough + for human ears. + +* The real problem occurs when the EFR input stream contains SID frames and BFI + frame gaps, as will always happen in reality if this stream is an uplink from + a GSM call. AMR SID mechanism is different from that of EFR, and an AMR + decoder will NOT recognize EFR SID frames. A quick experiment confirms that + when a real GSM EFR uplink RTP capture is converted to AMR by non-SID-aware + bit reshuffling and then fed to amrnb-dec from opencore-amrnb, unpleasant + sounds appear in the output whenever GSM uplink goes into SID. + +EFR reference code from ETSI +============================ + +A published-source bit-exact implementation of GSM EFR encoder and decoder, +complete with all beyond-speech functions of DTX, VAD, comfort noise generation, +error concealment etc does exist in the form of reference code from ETSI. +However, this code has never been turned into a usable codec library by anyone +prior to us (at least not by anyone who freely published their work), and doing +such librification (producing an EFR analogue to what Android OpenCORE people +did with AMR) is no easy feat! The original EFR code from ETSI exhibits two +problems which need to be remedied in the librification project: + +1) The original code maintains all codec state in global variables (lots of + them) that are scattered throughout. 3GPP reference code for AMR (naturally + later than EFR in chronological order) is better in this regard (in the AMR + version they gathered their global vars into structs and pass pointers to + these structs, although still many separately-malloc'ed structs instead of + single unified encoder state and decoder state), but we need the EFR version + for correct handling of all beyond-speech aspects, and this version is all + global vars. + +2) These reference codes from ETSI/3GPP (both EFR and AMR versions, it seems) + were intended to serve as simulations, not as production code, and the code + is very inefficient. + +Themyscira libgsmefr +==================== + +Libgsmefr presented in this code repository is our current solution for EFR. +It is a library styled after classic libgsm for FR1, but its guts consist of a +librified derivative of ETSI EFR code. The problem of global vars has been +solved in this library version - they've been gathered into one unified struct +for encoder state and another unified struct for decoder state - but the problem +of poor performance (significantly worse than opencore-amrnb) still remains for +now.