FreeCalypso > hg > gsm-codec-lib
view doc/AMR-library-API @ 476:c84bf526c7eb
beginning of libtwamr documentation
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Sat, 18 May 2024 21:22:07 +0000 |
parents | |
children | 936a08cc73ce |
line wrap: on
line source
Libtwamr general usage ====================== The external public interface to Themyscira libtwamr consists of a single header file <tw_amr.h>; it should be installed in some system include directory. The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function prototypes), const qualifier is used where appropriate, and the interface is defined in terms of <stdint.h> types; <tw_amr.h> includes <stdint.h>. Public #define constant definitions =================================== Libtwamr public API header file <tw_amr.h> defines these constants: #define AMR_MAX_PRM 57 /* max. num. of params */ #define AMR_IETF_MAX_PL 32 /* max bytes in RFC 4867 frame */ #define AMR_IETF_HDR_LEN 6 /* .amr file header bytes */ #define AMR_COD_WORDS 250 /* # of words in 3GPP test seq format */ Explanation: * AMR_MAX_PRM is the maximum number of broken-down speech parameters in the highest 12k2 mode of AMR; this definition is needed for struct amr_param_frame covered later in this document. * AMR_IETF_MAX_PL is the size of the output buffer that must be provided for amr_frame_to_ietf(), and also most commonly the size of the staging buffer which most applications will likely use for gathering the input to amr_frame_from_ietf(). * AMR_IETF_HDR_LEN is the size of amr_file_header_magic[] public const datum covered later in this document, and this constant will also be needed by any application that needs to read or write the fixed header at the beginning of .amr files. * AMR_COD_WORDS is the number of 16-bit words in one encoded frame in 3GPP test sequence format (.cod); the public definition is needed for sizing the arrays used with amr_frame_to_tseq() and amr_frame_from_tseq() API functions. Libtwamr enumerated types ========================= Libtwamr public API header file <tw_amr.h> defines these 3 enums: enum RXFrameType { RX_SPEECH_GOOD = 0, RX_SPEECH_DEGRADED, RX_ONSET, RX_SPEECH_BAD, RX_SID_FIRST, RX_SID_UPDATE, RX_SID_BAD, RX_NO_DATA, RX_N_FRAMETYPES /* number of frame types */ }; enum TXFrameType { TX_SPEECH_GOOD = 0, TX_SID_FIRST, TX_SID_UPDATE, TX_NO_DATA, TX_SPEECH_DEGRADED, TX_SPEECH_BAD, TX_SID_BAD, TX_ONSET, TX_N_FRAMETYPES /* number of frame types */ }; enum Mode { MR475 = 0, MR515, MR59, MR67, MR74, MR795, MR102, MR122, MRDTX }; Rx and Tx frame types are as defined by 3GPP, and the numeric values assigned to each type are the same as those used by the official TS 26.073 encoder and decoder programs. Note that Rx and Tx frame types are NOT equal! enum Mode should be self-explanatory: it covers the 8 possible codec modes of AMR, plus the pseudo-mode of MRDTX used for packing and format manipulation of SID frames. State allocation and freeing ============================ In order to use the AMR encoder, you will need to allocate an encoder state structure, and to use the AMR decoder, you will need to allocate a decoder state structure. The necessary state allocation functions are: struct amr_encoder_state *amr_encoder_create(int dtx, int use_vad2); struct amr_decoder_state *amr_decoder_create(void); struct amr_encoder_state and struct amr_decoder_state are opaque structures to library users: you only get pointers which you remember and pass around, but <tw_amr.h> does not give you full definitions of these structs. As a library user, you don't even get to know the size of these structs, hence the necessary malloc() operation happens inside amr_encoder_create() and amr_decoder_create(). However, each structure is malloc'ed as a single chunk, hence when you are done with it, simply call free() to relinquish each encoder or decoder state instance. amr_encoder_create() and amr_decoder_create() functions can fail if the malloc() call inside fails, in which case the two libtwamr functions in question return NULL. The dtx argument to amr_encoder_create() is a Boolean flag represented as an int; it tells the AMR encoder whether it should operate with DTX enabled or disabled. (Note that DTX is also called SCR for Source-Controlled Rate in some AMR specs.) The use_vad2 argument is another Boolean flag, also represented as an int; it tells the AMR encoder to use VAD2 algorithm instead of VAD1. It is a novel feature of libtwamr in that both VAD versions are included and selectable at run time; see AMR-library-desc article for the details. State reset functions --------------------- The state of an already-allocated AMR encoder or AMR decoder can be reset at any time with these functions: void amr_encoder_reset(struct amr_encoder_state *st, int dtx, int use_vad2); void amr_decoder_reset(struct amr_decoder_state *st); Note that the two extra arguments to amr_encoder_reset() are the same as the arguments to amr_encoder_create() - the reset operation is complete. amr_encoder_create() is a wrapper around malloc() followed by amr_encoder_reset(), and amr_decoder_create() is a wrapper around malloc() followed by amr_decoder_reset(). Using the AMR encoder ===================== To encode one 20 ms audio frame per AMR, call amr_encode_frame(): void amr_encode_frame(struct amr_encoder_state *st, enum Mode mode, const int16_t *pcm, struct amr_param_frame *frame); You need to provide an encoder state structure allocated earlier with amr_encoder_create(), the selection of which codec mode to use, and a block of 160 linear PCM samples. Only modes MR475 through MR122 are valid for 'mode' argument to amr_encode_frame(); MRDTX is not allowed in this context. The output from amr_encode_frame() is written into this structure: struct amr_param_frame { uint8_t type; uint8_t mode; int16_t param[AMR_MAX_PRM]; }; This structure is public, but it is defined by libtwamr (not by any external standard), and it is generally intended to be an intermediate stage before output encoding. Library functions exist for generating 3 output formats: 3GPP AMR test sequence format, IETF RFC 4867 format, and AMR-EFR hybrid. Native encoder output --------------------- The output structure is filled as follows: type: Set to one of TX_SPEECH_GOOD, TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA, as defined by 3GPP. The last 3 are possible only when the encoder operates with DTX enabled. mode: One of MR475 through MR122, same as the 'mode' argument to amr_encode_frame(). param: Array of codec parameters, from 17 to 57 of them for modes MR475 through MR122 in the case of TX_SPEECH_GOOD output, or 5 parameters for MRDTX in the case of TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA DTX output. 3GPP AMR test sequence output ----------------------------- The following function exists to convert the above encoder output into the test sequence format which 3GPP defined for AMR, the insanely inefficient one with 250 (AMR_COD_WORDS) 16-bit words per frame: void amr_frame_to_tseq(const struct amr_param_frame *frame, uint16_t *cod); This function allows libtwamr encoder to be tested for correctness against the set of test sequences in 3GPP TS 26.074. The output is in the local machine's native byte order. RFC 4867 output --------------- To turn libtwamr encoder output into an octet-aligned RFC 4867 single-frame payload or storage-format frame (ToC octet followed by speech or SID data, but no CMR payload header), call this function: unsigned amr_frame_to_ietf(const struct amr_param_frame *frame, uint8_t *bytes); The output buffer must have room for up to 32 bytes (AMR_IETF_MAX_PL); the return value is the actual number of bytes used. The shortest possible output is 1 byte in the case of TX_NO_DATA; the longest possible output is 32 bytes in the case of TX_SPEECH_GOOD, mode MR122. Additional notes regarding output conversion functions ------------------------------------------------------ The struct amr_param_frame that is input to amr_frame_to_ietf() or amr_frame_to_tseq() is expected to be a valid output from amr_encode_frame(). These output conversion functions contain no guards against invalid input (anything that cannot occur in the output from amr_encode_frame()), and are thus allowed to segfault or corrupt memory etc if fed such invalid input. This lack of guard is justified in the present instance because struct amr_param_frame is not intended to ever function as an external interface to untrusted entities, instead this struct is intended to be only an intermediate staging buffer between the call to amr_encode_frame() and an immediately following call to one of the provided output conversion functions. AMR-EFR hybrid encoder ====================== To use libtwamr as an AMR-EFR hybrid encoder, follow these constraints: * 'dtx' argument must be 0 (no DTX) on the call to amr_encoder_create() or amr_encoder_reset() that establishes the state for the encoder session. * 'mode' argument to amr_encode_frame() must be MR122 on every frame. After getting struct amr_param_frame out of amr_encode_frame(), call one of these functions to generate the correct EFR DHF under the right conditions: void amr_dhf_subst_efr(struct amr_param_frame *frame); void amr_dhf_subst_efr2(struct amr_param_frame *frame, const int16_t *pcm); Both functions check if the encoded frame is MR122 DHF (type equals TX_SPEECH_GOOD, mode equals MR122, param array equals the fixed bit pattern of MR122 DHF), and if so, overwrite param[] array in the structure with the different bit pattern of EFR DHF. The difference between the two functions is that amr_dhf_subst_efr() performs the just-described substitution unconditionally, whereas amr_dhf_subst_efr2() applies this substitution only if the PCM input is EHF. The latter function matches the observed behavior of T-Mobile USA, but perhaps some others implemented the simpler logic equivalent to our first function. After this transformation, call EFR_params2frame() from libgsmefr (see EFR-library-API) with param[] array in struct amr_param_frame as input. Using the AMR decoder =====================