FreeCalypso > hg > gsm-codec-lib
view doc/AMR-library-API @ 479:616b7ba1135b
doc/AMR-library-API: document AMR-EFR hybrid decoder
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Sun, 19 May 2024 22:22:40 +0000 |
parents | 936a08cc73ce |
children | 332397bc80aa |
line wrap: on
line source
Libtwamr general usage ====================== The external public interface to Themyscira libtwamr consists of a single header file <tw_amr.h>; it should be installed in some system include directory. The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function prototypes), const qualifier is used where appropriate, and the interface is defined in terms of <stdint.h> types; <tw_amr.h> includes <stdint.h>. Public #define constant definitions =================================== Libtwamr public API header file <tw_amr.h> defines these constants: #define AMR_MAX_PRM 57 /* max. num. of params */ #define AMR_IETF_MAX_PL 32 /* max bytes in RFC 4867 frame */ #define AMR_IETF_HDR_LEN 6 /* .amr file header bytes */ #define AMR_COD_WORDS 250 /* # of words in 3GPP test seq format */ Explanation: * AMR_MAX_PRM is the maximum number of broken-down speech parameters in the highest 12k2 mode of AMR; this definition is needed for struct amr_param_frame covered later in this document. * AMR_IETF_MAX_PL is the size of the output buffer that must be provided for amr_frame_to_ietf(), and also most commonly the size of the staging buffer which most applications will likely use for gathering the input to amr_frame_from_ietf(). * AMR_IETF_HDR_LEN is the size of amr_file_header_magic[] public const datum covered later in this document, and this constant will also be needed by any application that needs to read or write the fixed header at the beginning of .amr files. * AMR_COD_WORDS is the number of 16-bit words in one encoded frame in 3GPP test sequence format (.cod); the public definition is needed for sizing the arrays used with amr_frame_to_tseq() and amr_frame_from_tseq() API functions. Libtwamr enumerated types ========================= Libtwamr public API header file <tw_amr.h> defines these 3 enums: enum RXFrameType { RX_SPEECH_GOOD = 0, RX_SPEECH_DEGRADED, RX_ONSET, RX_SPEECH_BAD, RX_SID_FIRST, RX_SID_UPDATE, RX_SID_BAD, RX_NO_DATA, RX_N_FRAMETYPES /* number of frame types */ }; enum TXFrameType { TX_SPEECH_GOOD = 0, TX_SID_FIRST, TX_SID_UPDATE, TX_NO_DATA, TX_SPEECH_DEGRADED, TX_SPEECH_BAD, TX_SID_BAD, TX_ONSET, TX_N_FRAMETYPES /* number of frame types */ }; enum Mode { MR475 = 0, MR515, MR59, MR67, MR74, MR795, MR102, MR122, MRDTX }; Rx and Tx frame types are as defined by 3GPP, and the numeric values assigned to each type are the same as those used by the official TS 26.073 encoder and decoder programs. Note that Rx and Tx frame types are NOT equal! enum Mode should be self-explanatory: it covers the 8 possible codec modes of AMR, plus the pseudo-mode of MRDTX used for packing and format manipulation of SID frames. State allocation and freeing ============================ In order to use the AMR encoder, you will need to allocate an encoder state structure, and to use the AMR decoder, you will need to allocate a decoder state structure. The necessary state allocation functions are: struct amr_encoder_state *amr_encoder_create(int dtx, int use_vad2); struct amr_decoder_state *amr_decoder_create(void); struct amr_encoder_state and struct amr_decoder_state are opaque structures to library users: you only get pointers which you remember and pass around, but <tw_amr.h> does not give you full definitions of these structs. As a library user, you don't even get to know the size of these structs, hence the necessary malloc() operation happens inside amr_encoder_create() and amr_decoder_create(). However, each structure is malloc'ed as a single chunk, hence when you are done with it, simply call free() to relinquish each encoder or decoder state instance. amr_encoder_create() and amr_decoder_create() functions can fail if the malloc() call inside fails, in which case the two libtwamr functions in question return NULL. The dtx argument to amr_encoder_create() is a Boolean flag represented as an int; it tells the AMR encoder whether it should operate with DTX enabled or disabled. (Note that DTX is also called SCR for Source-Controlled Rate in some AMR specs.) The use_vad2 argument is another Boolean flag, also represented as an int; it tells the AMR encoder to use VAD2 algorithm instead of VAD1. It is a novel feature of libtwamr in that both VAD versions are included and selectable at run time; see AMR-library-desc article for the details. State reset functions --------------------- The state of an already-allocated AMR encoder or AMR decoder can be reset at any time with these functions: void amr_encoder_reset(struct amr_encoder_state *st, int dtx, int use_vad2); void amr_decoder_reset(struct amr_decoder_state *st); Note that the two extra arguments to amr_encoder_reset() are the same as the arguments to amr_encoder_create() - the reset operation is complete. amr_encoder_create() is a wrapper around malloc() followed by amr_encoder_reset(), and amr_decoder_create() is a wrapper around malloc() followed by amr_decoder_reset(). Using the AMR encoder ===================== To encode one 20 ms audio frame per AMR, call amr_encode_frame(): void amr_encode_frame(struct amr_encoder_state *st, enum Mode mode, const int16_t *pcm, struct amr_param_frame *frame); You need to provide an encoder state structure allocated earlier with amr_encoder_create(), the selection of which codec mode to use, and a block of 160 linear PCM samples. Only modes MR475 through MR122 are valid for 'mode' argument to amr_encode_frame(); MRDTX is not allowed in this context. The output from amr_encode_frame() is written into this structure: struct amr_param_frame { uint8_t type; uint8_t mode; int16_t param[AMR_MAX_PRM]; }; This structure is public, but it is defined by libtwamr (not by any external standard), and it is generally intended to be an intermediate stage before output encoding. Library functions exist for generating 3 output formats: 3GPP AMR test sequence format, IETF RFC 4867 format, and AMR-EFR hybrid. Native encoder output --------------------- The output structure is filled as follows: type: Set to one of TX_SPEECH_GOOD, TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA, as defined by 3GPP. The last 3 are possible only when the encoder operates with DTX enabled. mode: One of MR475 through MR122, same as the 'mode' argument to amr_encode_frame(). param: Array of codec parameters, from 17 to 57 of them for modes MR475 through MR122 in the case of TX_SPEECH_GOOD output, or 5 parameters for MRDTX in the case of TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA DTX output. 3GPP AMR test sequence output ----------------------------- The following function exists to convert the above encoder output into the test sequence format which 3GPP defined for AMR, the insanely inefficient one with 250 (AMR_COD_WORDS) 16-bit words per frame: void amr_frame_to_tseq(const struct amr_param_frame *frame, uint16_t *cod); This function allows libtwamr encoder to be tested for correctness against the set of test sequences in 3GPP TS 26.074. The output is in the local machine's native byte order. RFC 4867 output --------------- To turn libtwamr encoder output into an octet-aligned RFC 4867 single-frame payload or storage-format frame (ToC octet followed by speech or SID data, but no CMR payload header), call this function: unsigned amr_frame_to_ietf(const struct amr_param_frame *frame, uint8_t *bytes); The output buffer must have room for up to 32 bytes (AMR_IETF_MAX_PL); the return value is the actual number of bytes used. The shortest possible output is 1 byte in the case of TX_NO_DATA; the longest possible output is 32 bytes in the case of TX_SPEECH_GOOD, mode MR122. Additional notes regarding output conversion functions ------------------------------------------------------ The struct amr_param_frame that is input to amr_frame_to_ietf() or amr_frame_to_tseq() is expected to be a valid output from amr_encode_frame(). These output conversion functions contain no guards against invalid input (anything that cannot occur in the output from amr_encode_frame()), and are thus allowed to segfault or corrupt memory etc if fed such invalid input. This lack of guard is justified in the present instance because struct amr_param_frame is not intended to ever function as an external interface to untrusted entities, instead this struct is intended to be only an intermediate staging buffer between the call to amr_encode_frame() and an immediately following call to one of the provided output conversion functions. AMR-EFR hybrid encoder ====================== To use libtwamr as an AMR-EFR hybrid encoder, follow these constraints: * 'dtx' argument must be 0 (no DTX) on the call to amr_encoder_create() or amr_encoder_reset() that establishes the state for the encoder session. * 'mode' argument to amr_encode_frame() must be MR122 on every frame. After getting struct amr_param_frame out of amr_encode_frame(), call one of these functions to generate the correct EFR DHF under the right conditions: void amr_dhf_subst_efr(struct amr_param_frame *frame); void amr_dhf_subst_efr2(struct amr_param_frame *frame, const int16_t *pcm); Both functions check if the encoded frame is MR122 DHF (type equals TX_SPEECH_GOOD, mode equals MR122, param array equals the fixed bit pattern of MR122 DHF), and if so, overwrite param[] array in the structure with the different bit pattern of EFR DHF. The difference between the two functions is that amr_dhf_subst_efr() performs the just-described substitution unconditionally, whereas amr_dhf_subst_efr2() applies this substitution only if the PCM input is EHF. The latter function matches the observed behavior of T-Mobile USA, but perhaps some others implemented the simpler logic equivalent to our first function. After this transformation, call EFR_params2frame() from libgsmefr (see EFR-library-API) with param[] array in struct amr_param_frame as input. Using the AMR decoder: native interface ======================================= The internal native form of the stateful AMR decoder engine is: void amr_decode_frame(struct amr_decoder_state *st, const struct amr_param_frame *frame, int16_t *pcm); The input frame is given as struct amr_param_frame, same structure as is used for the output of the encoder. However, the required input to amr_decode_frame() is different from amr_encode_frame() output: * The 'type' member of the struct must be a code from enum RXFrameType, *not* enum TXFrameType! * All 3GPP-defined Rx frame types are allowed. * The 'mode' member of the input struct is ignored if the Rx frame type is RX_NO_DATA, but must be valid for every other frame type. If frame->type is not RX_NO_DATA, frame->mode is interpreted as follows: * The 3 least significant bits (mask 0x07) are taken to indicate the codec mode used for this frame; * The most significant bit (mask 0x80) has meaning only if the mode is MR122 and frame->type is RX_SPEECH_GOOD. Under these conditions, if this bit is set, the DHF check is modified to match against the bit pattern of EFR DHF instead of regular MR122 DHF. amr_decode_frame() contains no guards against invalid (undefined) frame types in frame->type, or against any of the codec parameters being out of range. struct amr_param_frame coming into this function must come only from trusted sources inside the application program, usually from one of the provided input format conversion functions. Decoder homing frame check -------------------------- The definition of AMR decoder per 3GPP includes two mandatory checks for the possibility of the input frame being one of the defined per-mode decoder homing frames (DHFs): one check at the beginning of the decoder, checking only up to the first subframe and acting only when the current state is homed, and the second check at the end of the decoder, checking all parameters (the full frame) and resetting the decoder on match. This DHF check operation, called from those two places in the stateful decoder as just described, is factored out into its own function that is exported as part of the public API: int amr_check_dhf(const struct amr_param_frame *frame, int first_sub_only); struct amr_param_frame needs to be passed to amr_check_dhf() as if it was amr_decode_frame(); the latter function in fact calls amr_check_dhf() on its input. The Boolean flag argument (first_sub_only) tells the function to check only to the end of the first subframe if nonzero, or check the entire frame if zero. The return value is 1 if the input matches DHF, 0 otherwise. frame->type must be RX_SPEECH_GOOD for the frame to be a DHF candidate, and the interpretation of frame->mode, including the special mode of matching against EFR DHF, is implemented in this function. Using the AMR decoder: input preparation ======================================== Stateless utility functions are provided for preparing decoder inputs, converting from RFC 4867 or 3GPP test sequence format into the internal form described above. Decoding RFC 4867 input ----------------------- If the entire RFC 4867 frame (read from .amr storage format or received in RTP as an octet-aligned payload) is already in memory, decode it with this function: int amr_frame_from_ietf(const uint8_t *bytes, struct amr_param_frame *frame); The string of bytes input to this function must begin with the ToC octet. Out of this ToC octet, only bits falling under the mask 0x7C (FT and Q bit fields) are checked. The remaining 3 bits are not checked: in the case of .amr storage format, RFC 4867 describes these bits as "padding" (P bits) and stipulates that they MUST be ignored by readers. However, in the case of RTP payloads received in a live session, the uppermost bit of the ToC octet becomes F rather than P, and it is the responsibility of the application to ensure that F=0: multiframe payloads are NOT supported. FT in the input frame may be [0,7] (MR475 through MR122), 8 (MRDTX) or 15 (AMR_FT_NODATA). In all of these cases amr_frame_from_ietf() succeeds and returns 0 to indicate so; the resulting struct amr_param_frame is then good to be passed to amr_decode_frame(). OTOH, if FT falls into the invalid range of [9,14], amr_frame_from_ietf() returns -1 to indicate invalid input. Applications that read from a .amr file will need to read just the ToC (aka frame header) octet and decode it to determine how many additional octets need to be read to absorb one frame. Similarly, RTP applications may need to validate incoming payloada by cross-checking between the FT indicated in the ToC octet and the received payload length. Both applications can use this function: int amr_ietf_grok_first_octet(uint8_t fo); The argument is the first octet, and the function only considers the FT field thereof. The return value is: -1 for invalid FT [9,14] 0 for FT=15 (the ToC octet is the entirety of the payload) >0 for valid FT [0,8], indicating the number of additional bytes to be read Decoding 3GPP test sequence input --------------------------------- To decode a frame from 3GPP .cod file format, call this function: int amr_frame_from_tseq(const uint16_t *cod, int use_rxtype, struct amr_param_frame *frame); The argument 'use_rxtype' should be 1 if the input uses Rx frame types (enum RXFrameType) or 0 if it uses Tx frame types (enum TXFrameType); this argument directly corresponds to -rxframetype command line option in the reference decoder program from 3GPP. Unlike raw amr_decode_frame(), amr_frame_from_tseq() does guard against invalid input. The return value from this function is: 0 means the input was good and the output is good to pass to amr_decode_frame(); -1 means the frame type field in the input is invalid; -2 means the mode field in the input is invalid. Frame type conversion --------------------- The operation of mapping from enum TXFrameType to enum RXFrameType, optionally but very commonly invoked from amr_frame_from_tseq(), is factored out into its own function, exported as part of the public API: int amr_txtype_to_rxtype(enum TXFrameType tx_type, enum RXFrameType *rx_type); The return value is 0 if tx_type is valid and *rx_type has been filled accordingly, or -1 if tx_type is invalid. AMR-EFR hybrid decoder ====================== To use libtwamr as an AMR-EFR hybrid decoder, follow these steps: * Turn the input frame from EFR RTP format into array-of-parameters form with libgsmefr function EFR_frame2params(), writing the output into the param[] array in struct amr_param_frame. * Set 'type' in the struct to RX_SPEECH_GOOD for good frames, RX_SPEECH_BAD for BFI with payload bits present, or RX_NO_DATA for BFI without payload. * Set 'mode' to 0x87 always, indicating a variation of MR122 with EFR DHF instead of the different native MR122 DHF. * Call amr_decode_frame() with this input. Fundamental limitation: the AMR decoder in libtwamr, derived from 3GPP AMR reference source and only minimally extended to support EFR DHF, does not support EFR SID frames. Therefore, the option of AMR-EFR hybrid emulation via libtwamr is limited to lab experiments where the input to the decoder can be ensured to be SID-free, and is not suitable for production use. See AMR-EFR-philosophy article for more information.