view doc/AMR-EFR-conversion @ 581:e2d5cad04cbf

libgsmhr1 RxFE: store CN R0+LPC separately from speech In the original GSM 06.06 code the ECU for speech mode is entirely separate from the CN generator, maintaining separate state. (The main intertie between them is the speech vs CN state variable, distinguishing between speech and CN BFIs, in addition to the CN-specific function of distinguishing between initial and update SIDs.) In the present RxFE implementation I initially thought that we could use the same saved_frame buffer for both ECU and CN, overwriting just the first 4 params (R0 and LPC) when a valid SID comes in. However, I now realize it was a bad idea: the original code has a corner case (long sequence of speech-mode BFIs to put the ECU in state 6, then SID and CN-mode BFIs, then a good speech frame) that would be broken by that buffer reuse approach. We could eliminate this corner case by resetting the ECU state when passing through a CN insertion period, but doing so would needlessly increase the behavioral diffs between GSM 06.06 and our version. Solution: use a separate CN-specific buffer for CN R0+LPC parameters, and match the behavior of GSM 06.06 code in this regard.
author Mychaela Falconia <falcon@freecalypso.org>
date Thu, 13 Feb 2025 10:02:45 +0000
parents 78739fda2856
children
line wrap: on
line source

Please see our AMR-EFR-philosophy article for an analysis of differences between
EFR and MR122 (12k2 mode of AMR), and for a discussion of how we handle the
relation between these two codecs.  The following article was written in late
2022, before these issues were properly understood:

2022-December description
-------------------------

We have two simple utilities that allow one to experiment with "dumb" bit-
shuffling conversion between AMR 12k2 and EFR codec formats, to explore
capabilities and limitations of this approach.

gsm-amr2efr reads an AMR speech recording in RFC 4867 storage format (the common
.amr format) and converts it to EFR in gsmx format.  The AMR input to this
utility must consists of MR122 frames only - no other AMR modes, no SID and no
NO_DATA gaps.  The intent is that one can take a starting speech sample in WAV
format, encode it into AMR with amrnb-enc from opencore-amrnb (by default that
utility produces MR122 encoding without DTX), and then convert the AMR output to
EFR with gsm-amr2efr.  One can then encode the same starting-point WAV speech
sample with gsmefr-encode (matching official EFR from ETSI) and compare the two
EFR outputs.  When you do this experiment, you will see that the two EFR outputs
will be different (you can then analyze encoded speech parameter diffs with
gsmrec-dump), but each version can be fed to an EFR decoder, resulting in
OK-sounding speech.

gsm-efr2amr performs the opposite conversion: it reads an EFR session recording
in gsmx format and converts it to AMR storage format.  The input to gsm-efr2amr
is allowed to contain Themyscira BFI markers in addition to EFR frames; these
BFI markers will be turned into AMR NO_DATA frames.  The same input can also
contain EFR SID frames - however, gsm-efr2amr will not detect them and won't
give them any special handling, instead they will be bit-reshuffled into MR122
just like EFR speech frames.  The result of such "dumb" conversion is invalid
AMR, and when you decode it with amrnb-dec, you will hear some strange noises.

2024-April addendum
-------------------

In addition to the SID issue noted above (if the input to gsm-efr2amr contains
any SID frames, the output will be invalid AMR), these dumb conversion methods
fail to take action on any embedded decoder homing frames.  The correct DHF is
different between EFR and MR122, hence a better converter could be made to
recognize EFR DHFs in EFR->AMR direction and convert them to MR122 DHF, and do
the opposite in AMR->EFR direction.  However, the implementation of AMR in
libopencore-amrnb has the homing feature stripped out altogether, hence doing
DHF conversion would be pointless as long as amrnb-enc and amrnb-dec utilities
are involved.

Thoughts on more proper conversion
==================================

Imagine this hypothetical scenario: you operate a GSM network, and you
preferentially use EFR codec.  You are then able to obtain TrFO interconnection
with some other mobile network of more "modern" kind, and that "modern" network
uses AMR exclusively, with no ability to use any GSM-only codecs.  (The latter
situation holds for UMTS and VoLTE, for example.)  Ordinarily, under these
circumstances TrFO won't be possible - instead you have to interconnect in
G.711, have each side transcode its respective codec, and put up with double
transcoding.  But what if the AMR side can be told to use MR122 only, without
any of the lower modes?  Such arrangement would make no sense in GSM (just use
EFR instead and save the headache of dealing with AMR), but it might be sensible
to ask the UMTS/VoLTE side for that MR122-only config of AMR-NB.

In this hypothetical scenario, would it be possible to pass speech frames
transparently, doing only the necessary bit reshuffling, and only invoke some
slick innovative algorithm during speech pauses to translate between EFR and
AMR SID paradigms?

Right now this idea is fantasy only.  I don't know enough about VoLTE to tell
whether or not an MR122-only config of AMR-NB would work there, I have no idea
what codec config VoLTE operators run with currently when the other end of the
call is G.711 PSTN, and there is very little chance that any of the nation-scale
mobile operators would agree to a private peering interconnect with some tiny
community GSM network - while interconnection through fully public, open-to-
everyone IP-PSTN routes allows only G.711 and nothing else, no cellular TrFO.

Nonetheless, the idea of TrFO conversion between EFR and MR122-only AMR remains
interesting as a theoretical exercise, and we currently leave it there, just as
food for thought.