diff doc/EFR-rationale @ 122:b33f2168fdec

doc/EFR-rationale article written
author Mychaela Falconia <falcon@freecalypso.org>
date Sat, 10 Dec 2022 08:51:01 +0000
parents
children 4af99bf8671a
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/EFR-rationale	Sat Dec 10 08:51:01 2022 +0000
@@ -0,0 +1,82 @@
+Problem in need of solving
+==========================
+
+At the time of the undertaking of Themyscira libgsmefr project (late 2022),
+there did not exist any readily available library solution for GSM EFR codec.
+The community of FOSS offers classic libgsm from 1990s for FR1 codec (it's an
+implementation of GSM 06.10, on top of which we had to implement our own Rx DTX
+handler) and opencore-amrnb for AMR (based on Android OpenCORE framework) - but
+nothing for EFR.  This situation creates a problem for anyone seeking to deploy
+their own GSM network with a voice interface to PSTN or other networks: such
+voice interface generally requires implementing a transcoder, and doing the
+latter in turn requires a library that implements the codec to be supported.
+In the present situation, anyone who wishes to implement a speech transcoder
+for GSM networks can easily support FR1 and AMR codecs, but not EFR.
+
+EFR is more than just 12k2 mode of AMR!
+=======================================
+
+There is a common misconception in the GSM hacker community that EFR is nothing
+but the highest 12k2 mode of AMR, and that any library that implements AMR,
+such as opencore-amrnb, is thus sufficient to support EFR as well.  However,
+the reality is more complex:
+
+* If an AMR encoder operates with DTX disabled, such that the output contains
+  only speech frames and no SID, and the mode is forced to 12k2, then indeed a
+  simple reshuffling of bits will produce speech frames that can be fed to an
+  EFR decoder on the other end.  Note that the two encoders (EFR and AMR 12k2)
+  will produce *different* encoded speech parameters from the same input, and
+  the decoded speech output on the other end will also be different, but the
+  two versions are expected to be equally good for human ears.
+
+* In the other direction, if an EFR input stream contains only good speech
+  frames (no SID and no lost, FACCH-stolen or DTX-suppressed frames), one can
+  likewise do a simple bit reordering and feed these frames to an AMR decoder.
+  The output of this AMR decoder will once again be different from a proper
+  (bit-exact) EFR decoder for the same speech parameter inputs. but as long as
+  the EFR input stream is all good speech frames, the output will be good enough
+  for human ears.
+
+* The real problem occurs when the EFR input stream contains SID frames and BFI
+  frame gaps, as will always happen in reality if this stream is an uplink from
+  a GSM call.  AMR SID mechanism is different from that of EFR, and an AMR
+  decoder will NOT recognize EFR SID frames.  A quick experiment confirms that
+  when a real GSM EFR uplink RTP capture is converted to AMR by non-SID-aware
+  bit reshuffling and then fed to amrnb-dec from opencore-amrnb, unpleasant
+  sounds appear in the output whenever GSM uplink goes into SID.
+
+EFR reference code from ETSI
+============================
+
+A published-source bit-exact implementation of GSM EFR encoder and decoder,
+complete with all beyond-speech functions of DTX, VAD, comfort noise generation,
+error concealment etc does exist in the form of reference code from ETSI.
+However, this code has never been turned into a usable codec library by anyone
+prior to us (at least not by anyone who freely published their work), and doing
+such librification (producing an EFR analogue to what Android OpenCORE people
+did with AMR) is no easy feat!  The original EFR code from ETSI exhibits two
+problems which need to be remedied in the librification project:
+
+1) The original code maintains all codec state in global variables (lots of
+   them) that are scattered throughout.  3GPP reference code for AMR (naturally
+   later than EFR in chronological order) is better in this regard (in the AMR
+   version they gathered their global vars into structs and pass pointers to
+   these structs, although still many separately-malloc'ed structs instead of
+   single unified encoder state and decoder state), but we need the EFR version
+   for correct handling of all beyond-speech aspects, and this version is all
+   global vars.
+
+2) These reference codes from ETSI/3GPP (both EFR and AMR versions, it seems)
+   were intended to serve as simulations, not as production code, and the code
+   is very inefficient.
+
+Themyscira libgsmefr
+====================
+
+Libgsmefr presented in this code repository is our current solution for EFR.
+It is a library styled after classic libgsm for FR1, but its guts consist of a
+librified derivative of ETSI EFR code.  The problem of global vars has been
+solved in this library version - they've been gathered into one unified struct
+for encoder state and another unified struct for decoder state - but the problem
+of poor performance (significantly worse than opencore-amrnb) still remains for
+now.