FreeCalypso > hg > gsm-codec-lib
view doc/AMR-EFR-conversion @ 477:4c9222d95647
libtwamr encoder: always emit frame->mode = mode;
In the original implementation of amr_encode_frame(), the 'mode' member
of the output struct was set to 0xFF if the output frame type is TX_NO_DATA.
This design was made to mimic the mode field (16-bit word) being set to
0xFFFF (or -1) in 3GPP test sequence format - but nothing actually depends
on this struct member being set in any way, and amr_frame_to_tseq()
generates the needed 0xFFFF on its own, based on frame->type being equal
to TX_NO_DATA.
It is simpler and more efficient to always set frame->mode to the actual
encoding mode in amr_encode_frame(), and this new behavior has already
been documented in doc/AMR-library-API description in anticipation of
the present change.
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Sat, 18 May 2024 22:30:42 +0000 |
parents | 78739fda2856 |
children |
line wrap: on
line source
Please see our AMR-EFR-philosophy article for an analysis of differences between EFR and MR122 (12k2 mode of AMR), and for a discussion of how we handle the relation between these two codecs. The following article was written in late 2022, before these issues were properly understood: 2022-December description ------------------------- We have two simple utilities that allow one to experiment with "dumb" bit- shuffling conversion between AMR 12k2 and EFR codec formats, to explore capabilities and limitations of this approach. gsm-amr2efr reads an AMR speech recording in RFC 4867 storage format (the common .amr format) and converts it to EFR in gsmx format. The AMR input to this utility must consists of MR122 frames only - no other AMR modes, no SID and no NO_DATA gaps. The intent is that one can take a starting speech sample in WAV format, encode it into AMR with amrnb-enc from opencore-amrnb (by default that utility produces MR122 encoding without DTX), and then convert the AMR output to EFR with gsm-amr2efr. One can then encode the same starting-point WAV speech sample with gsmefr-encode (matching official EFR from ETSI) and compare the two EFR outputs. When you do this experiment, you will see that the two EFR outputs will be different (you can then analyze encoded speech parameter diffs with gsmrec-dump), but each version can be fed to an EFR decoder, resulting in OK-sounding speech. gsm-efr2amr performs the opposite conversion: it reads an EFR session recording in gsmx format and converts it to AMR storage format. The input to gsm-efr2amr is allowed to contain Themyscira BFI markers in addition to EFR frames; these BFI markers will be turned into AMR NO_DATA frames. The same input can also contain EFR SID frames - however, gsm-efr2amr will not detect them and won't give them any special handling, instead they will be bit-reshuffled into MR122 just like EFR speech frames. The result of such "dumb" conversion is invalid AMR, and when you decode it with amrnb-dec, you will hear some strange noises. 2024-April addendum ------------------- In addition to the SID issue noted above (if the input to gsm-efr2amr contains any SID frames, the output will be invalid AMR), these dumb conversion methods fail to take action on any embedded decoder homing frames. The correct DHF is different between EFR and MR122, hence a better converter could be made to recognize EFR DHFs in EFR->AMR direction and convert them to MR122 DHF, and do the opposite in AMR->EFR direction. However, the implementation of AMR in libopencore-amrnb has the homing feature stripped out altogether, hence doing DHF conversion would be pointless as long as amrnb-enc and amrnb-dec utilities are involved. Thoughts on more proper conversion ================================== Imagine this hypothetical scenario: you operate a GSM network, and you preferentially use EFR codec. You are then able to obtain TrFO interconnection with some other mobile network of more "modern" kind, and that "modern" network uses AMR exclusively, with no ability to use any GSM-only codecs. (The latter situation holds for UMTS and VoLTE, for example.) Ordinarily, under these circumstances TrFO won't be possible - instead you have to interconnect in G.711, have each side transcode its respective codec, and put up with double transcoding. But what if the AMR side can be told to use MR122 only, without any of the lower modes? Such arrangement would make no sense in GSM (just use EFR instead and save the headache of dealing with AMR), but it might be sensible to ask the UMTS/VoLTE side for that MR122-only config of AMR-NB. In this hypothetical scenario, would it be possible to pass speech frames transparently, doing only the necessary bit reshuffling, and only invoke some slick innovative algorithm during speech pauses to translate between EFR and AMR SID paradigms? Right now this idea is fantasy only. I don't know enough about VoLTE to tell whether or not an MR122-only config of AMR-NB would work there, I have no idea what codec config VoLTE operators run with currently when the other end of the call is G.711 PSTN, and there is very little chance that any of the nation-scale mobile operators would agree to a private peering interconnect with some tiny community GSM network - while interconnection through fully public, open-to- everyone IP-PSTN routes allows only G.711 and nothing else, no cellular TrFO. Nonetheless, the idea of TrFO conversion between EFR and MR122-only AMR remains interesting as a theoretical exercise, and we currently leave it there, just as food for thought.