view doc/EFR-rationale @ 222:842136bbd0da

dev: new program s2a-regen
author Mychaela Falconia <falcon@freecalypso.org>
date Sun, 23 Apr 2023 04:09:24 +0000
parents 3a0ee08a3b9d
children 69b9a1eeb5a2
line wrap: on
line source

Problem in need of solving
==========================

At the time of the undertaking of Themyscira libgsmefr project (late 2022),
there did not exist any readily available library solution for GSM EFR codec.
The community of FOSS offers classic libgsm from 1990s for FR1 codec (it's an
implementation of GSM 06.10, on top of which we had to implement our own Rx DTX
handler) and opencore-amrnb for AMR (based on Android OpenCORE framework) - but
nothing for EFR.  This situation creates a problem for anyone seeking to deploy
their own GSM network with a voice interface to PSTN or other networks: such
voice interface generally requires implementing a transcoder, and doing the
latter in turn requires a library that implements the codec to be supported.
In the present situation, anyone who wishes to implement a speech transcoder
for GSM networks can easily support FR1 and AMR codecs, but not EFR.

EFR is more than just 12k2 mode of AMR!
=======================================

There is a common misconception in the GSM hacker community that EFR is nothing
but the highest 12k2 mode of AMR, and that any library that implements AMR,
such as opencore-amrnb, is thus sufficient to support EFR as well.  However,
the reality is more complex:

* If an AMR encoder operates with DTX disabled, such that the output contains
  only speech frames and no SID, and the mode is forced to 12k2, then indeed a
  simple reshuffling of bits will produce speech frames that can be fed to an
  EFR decoder on the other end.  Note that the two encoders (EFR and AMR 12k2)
  will produce *different* encoded speech parameters from the same input, and
  the decoded speech output on the other end will also be different, but the
  two versions are expected to be equally good for human ears.

* In the other direction, if an EFR input stream contains only good speech
  frames (no SID and no lost, FACCH-stolen or DTX-suppressed frames), one can
  likewise do a simple bit reordering and feed these frames to an AMR decoder.
  The output of this AMR decoder will once again be different from a proper
  (bit-exact) EFR decoder for the same speech parameter inputs, but as long as
  the EFR input stream is all good speech frames, the output will be good enough
  for human ears.

* The real problem occurs when the EFR input stream contains SID frames and BFI
  frame gaps, as will always happen in reality if this stream is an uplink from
  a GSM call.  AMR SID mechanism is different from that of EFR, and an AMR
  decoder will NOT recognize EFR SID frames.  A quick experiment confirms that
  when a real GSM EFR uplink RTP capture is converted to AMR by non-SID-aware
  bit reshuffling and then fed to amrnb-dec from opencore-amrnb, unpleasant
  sounds appear in the output whenever GSM uplink goes into SID.

EFR reference code from ETSI
============================

A published-source bit-exact implementation of GSM EFR encoder and decoder,
complete with all beyond-speech functions of DTX, VAD, comfort noise generation,
error concealment etc does exist in the form of reference code from ETSI.
However, this code has never been turned into a usable codec library by anyone
prior to us (at least not by anyone who freely published their work), and doing
such librification (producing an EFR analogue to what Android OpenCORE people
did with AMR) is no easy feat!  The original EFR code from ETSI exhibits two
problems which need to be remedied in the librification project:

1) The original code maintains all codec state in global variables (lots of
   them) that are scattered throughout.  3GPP reference code for AMR (naturally
   later than EFR in chronological order) is better in this regard (in the AMR
   version they gathered their global vars into structs and pass pointers to
   these structs, although still many separately-malloc'ed structs instead of
   single unified encoder state and decoder state), but we need the EFR version
   for correct handling of all beyond-speech aspects, and this version is all
   global vars.

2) These reference codes from ETSI/3GPP (both EFR and AMR versions, it seems)
   were intended to serve as simulations, not as production code, and the code
   is very inefficient.

Themyscira libgsmefr
====================

Libgsmefr presented in this code repository is our current solution for EFR.
It is a library styled after classic libgsm for FR1, but its guts consist of a
librified derivative of ETSI EFR code.  The problem of global vars has been
solved in this library version - they've been gathered into one unified struct
for encoder state and another unified struct for decoder state - but the problem
of poor performance (significantly worse than opencore-amrnb) still remains for
now.

Future roadmap
==============

If someone is implementing a DSP vocoder block for a GSM MS or a network-side
speech transcoder that needs to support all standard GSM codecs, at some point
they will need to implement both EFR and AMR.  Given the close relation between
these two codecs (they are not perfectly compatible as we started out saying,
but they are still very closely related), keeping two entirely separate library
implementations for AMR and EFR will be very inefficient in the long run, and a
nightmare to get them to perform equally well.  It seems to me (Mother Mychaela)
that the correct solution will be to produce a single codec library that
implements both AMR and EFR, probably by starting with an AMR library and
extending it with special modes to handle those aspects where EFR differs.  It
is my forecast that we are going to end up doing something along these lines in
Themyscira - but it will be much later down the road; for the time being, our
initial version of ThemWi will only support FR and EFR, but not AMR.