view doc/AMR-EFR-conversion @ 477:4c9222d95647

libtwamr encoder: always emit frame->mode = mode; In the original implementation of amr_encode_frame(), the 'mode' member of the output struct was set to 0xFF if the output frame type is TX_NO_DATA. This design was made to mimic the mode field (16-bit word) being set to 0xFFFF (or -1) in 3GPP test sequence format - but nothing actually depends on this struct member being set in any way, and amr_frame_to_tseq() generates the needed 0xFFFF on its own, based on frame->type being equal to TX_NO_DATA. It is simpler and more efficient to always set frame->mode to the actual encoding mode in amr_encode_frame(), and this new behavior has already been documented in doc/AMR-library-API description in anticipation of the present change.
author Mychaela Falconia <falcon@freecalypso.org>
date Sat, 18 May 2024 22:30:42 +0000
parents 78739fda2856
children
line wrap: on
line source

Please see our AMR-EFR-philosophy article for an analysis of differences between
EFR and MR122 (12k2 mode of AMR), and for a discussion of how we handle the
relation between these two codecs.  The following article was written in late
2022, before these issues were properly understood:

2022-December description
-------------------------

We have two simple utilities that allow one to experiment with "dumb" bit-
shuffling conversion between AMR 12k2 and EFR codec formats, to explore
capabilities and limitations of this approach.

gsm-amr2efr reads an AMR speech recording in RFC 4867 storage format (the common
.amr format) and converts it to EFR in gsmx format.  The AMR input to this
utility must consists of MR122 frames only - no other AMR modes, no SID and no
NO_DATA gaps.  The intent is that one can take a starting speech sample in WAV
format, encode it into AMR with amrnb-enc from opencore-amrnb (by default that
utility produces MR122 encoding without DTX), and then convert the AMR output to
EFR with gsm-amr2efr.  One can then encode the same starting-point WAV speech
sample with gsmefr-encode (matching official EFR from ETSI) and compare the two
EFR outputs.  When you do this experiment, you will see that the two EFR outputs
will be different (you can then analyze encoded speech parameter diffs with
gsmrec-dump), but each version can be fed to an EFR decoder, resulting in
OK-sounding speech.

gsm-efr2amr performs the opposite conversion: it reads an EFR session recording
in gsmx format and converts it to AMR storage format.  The input to gsm-efr2amr
is allowed to contain Themyscira BFI markers in addition to EFR frames; these
BFI markers will be turned into AMR NO_DATA frames.  The same input can also
contain EFR SID frames - however, gsm-efr2amr will not detect them and won't
give them any special handling, instead they will be bit-reshuffled into MR122
just like EFR speech frames.  The result of such "dumb" conversion is invalid
AMR, and when you decode it with amrnb-dec, you will hear some strange noises.

2024-April addendum
-------------------

In addition to the SID issue noted above (if the input to gsm-efr2amr contains
any SID frames, the output will be invalid AMR), these dumb conversion methods
fail to take action on any embedded decoder homing frames.  The correct DHF is
different between EFR and MR122, hence a better converter could be made to
recognize EFR DHFs in EFR->AMR direction and convert them to MR122 DHF, and do
the opposite in AMR->EFR direction.  However, the implementation of AMR in
libopencore-amrnb has the homing feature stripped out altogether, hence doing
DHF conversion would be pointless as long as amrnb-enc and amrnb-dec utilities
are involved.

Thoughts on more proper conversion
==================================

Imagine this hypothetical scenario: you operate a GSM network, and you
preferentially use EFR codec.  You are then able to obtain TrFO interconnection
with some other mobile network of more "modern" kind, and that "modern" network
uses AMR exclusively, with no ability to use any GSM-only codecs.  (The latter
situation holds for UMTS and VoLTE, for example.)  Ordinarily, under these
circumstances TrFO won't be possible - instead you have to interconnect in
G.711, have each side transcode its respective codec, and put up with double
transcoding.  But what if the AMR side can be told to use MR122 only, without
any of the lower modes?  Such arrangement would make no sense in GSM (just use
EFR instead and save the headache of dealing with AMR), but it might be sensible
to ask the UMTS/VoLTE side for that MR122-only config of AMR-NB.

In this hypothetical scenario, would it be possible to pass speech frames
transparently, doing only the necessary bit reshuffling, and only invoke some
slick innovative algorithm during speech pauses to translate between EFR and
AMR SID paradigms?

Right now this idea is fantasy only.  I don't know enough about VoLTE to tell
whether or not an MR122-only config of AMR-NB would work there, I have no idea
what codec config VoLTE operators run with currently when the other end of the
call is G.711 PSTN, and there is very little chance that any of the nation-scale
mobile operators would agree to a private peering interconnect with some tiny
community GSM network - while interconnection through fully public, open-to-
everyone IP-PSTN routes allows only G.711 and nothing else, no cellular TrFO.

Nonetheless, the idea of TrFO conversion between EFR and MR122-only AMR remains
interesting as a theoretical exercise, and we currently leave it there, just as
food for thought.