FreeCalypso > hg > gsm-codec-lib
changeset 476:c84bf526c7eb
beginning of libtwamr documentation
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Sat, 18 May 2024 21:22:07 +0000 |
parents | e512f0d25409 |
children | 4c9222d95647 |
files | doc/AMR-library-API doc/AMR-library-desc |
diffstat | 2 files changed, 340 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/AMR-library-API Sat May 18 21:22:07 2024 +0000 @@ -0,0 +1,252 @@ +Libtwamr general usage +====================== + +The external public interface to Themyscira libtwamr consists of a single +header file <tw_amr.h>; it should be installed in some system include +directory. + +The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function +prototypes), const qualifier is used where appropriate, and the interface is +defined in terms of <stdint.h> types; <tw_amr.h> includes <stdint.h>. + +Public #define constant definitions +=================================== + +Libtwamr public API header file <tw_amr.h> defines these constants: + +#define AMR_MAX_PRM 57 /* max. num. of params */ +#define AMR_IETF_MAX_PL 32 /* max bytes in RFC 4867 frame */ +#define AMR_IETF_HDR_LEN 6 /* .amr file header bytes */ +#define AMR_COD_WORDS 250 /* # of words in 3GPP test seq format */ + +Explanation: + +* AMR_MAX_PRM is the maximum number of broken-down speech parameters in the + highest 12k2 mode of AMR; this definition is needed for struct amr_param_frame + covered later in this document. + +* AMR_IETF_MAX_PL is the size of the output buffer that must be provided for + amr_frame_to_ietf(), and also most commonly the size of the staging buffer + which most applications will likely use for gathering the input to + amr_frame_from_ietf(). + +* AMR_IETF_HDR_LEN is the size of amr_file_header_magic[] public const datum + covered later in this document, and this constant will also be needed by any + application that needs to read or write the fixed header at the beginning of + .amr files. + +* AMR_COD_WORDS is the number of 16-bit words in one encoded frame in 3GPP test + sequence format (.cod); the public definition is needed for sizing the arrays + used with amr_frame_to_tseq() and amr_frame_from_tseq() API functions. + +Libtwamr enumerated types +========================= + +Libtwamr public API header file <tw_amr.h> defines these 3 enums: + +enum RXFrameType { + RX_SPEECH_GOOD = 0, + RX_SPEECH_DEGRADED, + RX_ONSET, + RX_SPEECH_BAD, + RX_SID_FIRST, + RX_SID_UPDATE, + RX_SID_BAD, + RX_NO_DATA, + RX_N_FRAMETYPES /* number of frame types */ +}; + +enum TXFrameType { + TX_SPEECH_GOOD = 0, + TX_SID_FIRST, + TX_SID_UPDATE, + TX_NO_DATA, + TX_SPEECH_DEGRADED, + TX_SPEECH_BAD, + TX_SID_BAD, + TX_ONSET, + TX_N_FRAMETYPES /* number of frame types */ +}; + +enum Mode { + MR475 = 0, + MR515, + MR59, + MR67, + MR74, + MR795, + MR102, + MR122, + MRDTX +}; + +Rx and Tx frame types are as defined by 3GPP, and the numeric values assigned +to each type are the same as those used by the official TS 26.073 encoder and +decoder programs. Note that Rx and Tx frame types are NOT equal! + +enum Mode should be self-explanatory: it covers the 8 possible codec modes of +AMR, plus the pseudo-mode of MRDTX used for packing and format manipulation of +SID frames. + +State allocation and freeing +============================ + +In order to use the AMR encoder, you will need to allocate an encoder state +structure, and to use the AMR decoder, you will need to allocate a decoder state +structure. The necessary state allocation functions are: + +struct amr_encoder_state *amr_encoder_create(int dtx, int use_vad2); +struct amr_decoder_state *amr_decoder_create(void); + +struct amr_encoder_state and struct amr_decoder_state are opaque structures to +library users: you only get pointers which you remember and pass around, but +<tw_amr.h> does not give you full definitions of these structs. As a library +user, you don't even get to know the size of these structs, hence the necessary +malloc() operation happens inside amr_encoder_create() and amr_decoder_create(). +However, each structure is malloc'ed as a single chunk, hence when you are done +with it, simply call free() to relinquish each encoder or decoder state +instance. + +amr_encoder_create() and amr_decoder_create() functions can fail if the malloc() +call inside fails, in which case the two libtwamr functions in question return +NULL. + +The dtx argument to amr_encoder_create() is a Boolean flag represented as an +int; it tells the AMR encoder whether it should operate with DTX enabled or +disabled. (Note that DTX is also called SCR for Source-Controlled Rate in some +AMR specs.) The use_vad2 argument is another Boolean flag, also represented as +an int; it tells the AMR encoder to use VAD2 algorithm instead of VAD1. It is +a novel feature of libtwamr in that both VAD versions are included and +selectable at run time; see AMR-library-desc article for the details. + +State reset functions +--------------------- + +The state of an already-allocated AMR encoder or AMR decoder can be reset at +any time with these functions: + +void amr_encoder_reset(struct amr_encoder_state *st, int dtx, int use_vad2); +void amr_decoder_reset(struct amr_decoder_state *st); + +Note that the two extra arguments to amr_encoder_reset() are the same as the +arguments to amr_encoder_create() - the reset operation is complete. +amr_encoder_create() is a wrapper around malloc() followed by +amr_encoder_reset(), and amr_decoder_create() is a wrapper around malloc() +followed by amr_decoder_reset(). + +Using the AMR encoder +===================== + +To encode one 20 ms audio frame per AMR, call amr_encode_frame(): + +void amr_encode_frame(struct amr_encoder_state *st, enum Mode mode, + const int16_t *pcm, struct amr_param_frame *frame); + +You need to provide an encoder state structure allocated earlier with +amr_encoder_create(), the selection of which codec mode to use, and a block of +160 linear PCM samples. Only modes MR475 through MR122 are valid for 'mode' +argument to amr_encode_frame(); MRDTX is not allowed in this context. + +The output from amr_encode_frame() is written into this structure: + +struct amr_param_frame { + uint8_t type; + uint8_t mode; + int16_t param[AMR_MAX_PRM]; +}; + +This structure is public, but it is defined by libtwamr (not by any external +standard), and it is generally intended to be an intermediate stage before +output encoding. Library functions exist for generating 3 output formats: 3GPP +AMR test sequence format, IETF RFC 4867 format, and AMR-EFR hybrid. + +Native encoder output +--------------------- + +The output structure is filled as follows: + +type: Set to one of TX_SPEECH_GOOD, TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA, + as defined by 3GPP. The last 3 are possible only when the encoder + operates with DTX enabled. + +mode: One of MR475 through MR122, same as the 'mode' argument to + amr_encode_frame(). + +param: Array of codec parameters, from 17 to 57 of them for modes MR475 through + MR122 in the case of TX_SPEECH_GOOD output, or 5 parameters for MRDTX + in the case of TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA DTX output. + +3GPP AMR test sequence output +----------------------------- + +The following function exists to convert the above encoder output into the test +sequence format which 3GPP defined for AMR, the insanely inefficient one with +250 (AMR_COD_WORDS) 16-bit words per frame: + +void amr_frame_to_tseq(const struct amr_param_frame *frame, uint16_t *cod); + +This function allows libtwamr encoder to be tested for correctness against the +set of test sequences in 3GPP TS 26.074. The output is in the local machine's +native byte order. + +RFC 4867 output +--------------- + +To turn libtwamr encoder output into an octet-aligned RFC 4867 single-frame +payload or storage-format frame (ToC octet followed by speech or SID data, but +no CMR payload header), call this function: + +unsigned amr_frame_to_ietf(const struct amr_param_frame *frame, uint8_t *bytes); + +The output buffer must have room for up to 32 bytes (AMR_IETF_MAX_PL); the +return value is the actual number of bytes used. The shortest possible output +is 1 byte in the case of TX_NO_DATA; the longest possible output is 32 bytes in +the case of TX_SPEECH_GOOD, mode MR122. + +Additional notes regarding output conversion functions +------------------------------------------------------ + +The struct amr_param_frame that is input to amr_frame_to_ietf() or +amr_frame_to_tseq() is expected to be a valid output from amr_encode_frame(). +These output conversion functions contain no guards against invalid input +(anything that cannot occur in the output from amr_encode_frame()), and are +thus allowed to segfault or corrupt memory etc if fed such invalid input. + +This lack of guard is justified in the present instance because struct +amr_param_frame is not intended to ever function as an external interface to +untrusted entities, instead this struct is intended to be only an intermediate +staging buffer between the call to amr_encode_frame() and an immediately +following call to one of the provided output conversion functions. + +AMR-EFR hybrid encoder +====================== + +To use libtwamr as an AMR-EFR hybrid encoder, follow these constraints: + +* 'dtx' argument must be 0 (no DTX) on the call to amr_encoder_create() or + amr_encoder_reset() that establishes the state for the encoder session. + +* 'mode' argument to amr_encode_frame() must be MR122 on every frame. + +After getting struct amr_param_frame out of amr_encode_frame(), call one of +these functions to generate the correct EFR DHF under the right conditions: + +void amr_dhf_subst_efr(struct amr_param_frame *frame); +void amr_dhf_subst_efr2(struct amr_param_frame *frame, const int16_t *pcm); + +Both functions check if the encoded frame is MR122 DHF (type equals +TX_SPEECH_GOOD, mode equals MR122, param array equals the fixed bit pattern of +MR122 DHF), and if so, overwrite param[] array in the structure with the +different bit pattern of EFR DHF. The difference between the two functions is +that amr_dhf_subst_efr() performs the just-described substitution +unconditionally, whereas amr_dhf_subst_efr2() applies this substitution only if +the PCM input is EHF. The latter function matches the observed behavior of +T-Mobile USA, but perhaps some others implemented the simpler logic equivalent +to our first function. + +After this transformation, call EFR_params2frame() from libgsmefr (see +EFR-library-API) with param[] array in struct amr_param_frame as input. + +Using the AMR decoder +===================== +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/AMR-library-desc Sat May 18 21:22:07 2024 +0000 @@ -0,0 +1,88 @@ +Themyscira libtwamr general description +======================================= + +Libtwamr is a librification of the official AMR reference C code from 3GPP, +produced by Themyscira Wireless and styled to match our libraries for more +classic GSM codecs. This library has been created with the following two goals +in mind: + +1) At the present time we (ThemWi) operate our GSM network with only GSM-FR and + GSM-EFR codecs, with the latter being preferred. We do not currently operate + with AMR because the conditions under which AMR becomes advantageous do not + currently exist in our network operation. However, we need to be prepared + for the possibility that the conditions which make AMR desirable may arise + some day, and we may need to start deploying AMR. In order to make AMR + deployment a possibility, many parts will need to be implemented, one of + which is a speech transcoding library that implements the AMR codec in the + same way how libgsmfr2 and libgsmefr implement the more classic codecs which + we use currently. + +2) Many other commercial GSM networks have implemented EFR speech service using + a type of AMR-EFR hybrid described in AMR-EFR-philosophy and + AMR-EFR-hybrid-emu articles. As part of certain behavioral reverse + engineering experiments, we sometimes need to model the bit-exact operation + of those other-people-controlled commercial implementations of AMR-EFR, and + our current libtwamr provides one way to do so. Knowing that a proper + implementation of an AMR codec library is likely to be needed some day for + reason 1 above, justification was obtained for expending the effort to + produce the present libtwamr. + +Compared to other plausible ways in which someone could reasonably approach the +task of librifying the AMR reference code from 3GPP, the design of libtwamr +includes two somewhat original choices: + +* Separation of core and I/O: the stateful encoder and decoder engines in + libtwamr operate on a custom frame structure that includes the array of codec + parameters in their broken-down form (e.g., 57 parameters for MR122), the + frame type as in original RXFrameType and TXFrameType, and the codec mode. + Conversion between this internal canonical form (which is most native to the + guts of the encoder and decoder engines) and external I/O formats (the 3GPP + test sequence format and the more practical RFC 4867 format used in RTP and + in .amr recording files) is relegated to stateless utility functions. + +* Both VAD1 and VAD2 included: the reference code from 3GPP includes two + alternative versions of Voice Activity Detection algorithm, VAD1 and VAD2. + Implementors are allowed to use either version and be compliant; 3GPP code + uses conditional compilation to select between the two, and it appears that + no thought was given to the possibility that a real implementation would + incorporate both VAD versions, to be selected at run time. However, given our + (ThemWi) desire for bit-exact testing against other people's implementations, + it made no sense for us to arbitrarily select one VAD version and drop the + other - hence we took the unconventional route of incorporating both VAD1 and + VAD2 into libtwamr, and designing our encoder API so that library users get + to select which VAD they wish to apply. + +Like all other Themyscira GSM codec libraries, libtwamr includes the codec +homing feature in both encoder and decoder directions, as required by 3GPP +specs. Furthermore, libtwamr implementation of this codec homing feature +includes the following simple extensions (simple in terms of low implementation +cost) to facilitate construction of an AMR-EFR hybrid encoder and decoder: + +* In the decoder direction, the main AMR frame decoder function includes a DHF + detector as required by 3GPP architecture. In libtwamr this function can be + told to trigger on EFR DHF instead of MR122 version, by way of a flag set in + the mode field of the frame structure passed to amr_decode_frame(). + +* In the encoder direction, the regular call to amr_encode_frame() - standard + for AMR - can be followed with a call to amr_dhf_subst_efr() or + amr_dhf_subst_efr2() before passing the array of encoded parameters to + EFR_params2frame() from libgsmefr. See AMR-EFR-hybrid-emu article for more + information. The AMR-EFR hybrid test sequences in amr122_efr.zip pass on + both amr_dhf_subst_efr() and amr_dhf_subst_efr2() versions, but the latter + additionally matches the observable behavior of T-Mobile USA. + +The mechanism that allows libtwamr to be used for AMR-EFR hybrid implementation +(as opposed to the more conventional use case of implementing standard AMR-NB) +is kept out of the main stateful paths: there are no separate AMR-EFR hybrid +encoder or decoder sessions that are distinguishable from regular AMR encoding +and decoding in terms of state. In the decoder direction, the main AMR frame +decoder function needs to know which DHF it should check for, but this +indication is embedded in the mode field in struct amr_param_frame and not in +the state. In the encoder direction, the mechanism is a separate function +(stateless) that needs to be called between amr_encode_frame() and +EFR_params2frame(). This approach dovetails nicely with the core vs I/O +separation: the option of AMR-EFR hybrid can be viewed as a different I/O front +end to the same AMR engine, alongside with 3GPP AMR test sequence and RFC 4867 +I/O options. + +Please refer to AMR-library-API article for further details.