FreeCalypso > hg > gsm-codec-lib
diff doc/AMR-library-API @ 476:c84bf526c7eb
beginning of libtwamr documentation
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Sat, 18 May 2024 21:22:07 +0000 |
parents | |
children | 936a08cc73ce |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/AMR-library-API Sat May 18 21:22:07 2024 +0000 @@ -0,0 +1,252 @@ +Libtwamr general usage +====================== + +The external public interface to Themyscira libtwamr consists of a single +header file <tw_amr.h>; it should be installed in some system include +directory. + +The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function +prototypes), const qualifier is used where appropriate, and the interface is +defined in terms of <stdint.h> types; <tw_amr.h> includes <stdint.h>. + +Public #define constant definitions +=================================== + +Libtwamr public API header file <tw_amr.h> defines these constants: + +#define AMR_MAX_PRM 57 /* max. num. of params */ +#define AMR_IETF_MAX_PL 32 /* max bytes in RFC 4867 frame */ +#define AMR_IETF_HDR_LEN 6 /* .amr file header bytes */ +#define AMR_COD_WORDS 250 /* # of words in 3GPP test seq format */ + +Explanation: + +* AMR_MAX_PRM is the maximum number of broken-down speech parameters in the + highest 12k2 mode of AMR; this definition is needed for struct amr_param_frame + covered later in this document. + +* AMR_IETF_MAX_PL is the size of the output buffer that must be provided for + amr_frame_to_ietf(), and also most commonly the size of the staging buffer + which most applications will likely use for gathering the input to + amr_frame_from_ietf(). + +* AMR_IETF_HDR_LEN is the size of amr_file_header_magic[] public const datum + covered later in this document, and this constant will also be needed by any + application that needs to read or write the fixed header at the beginning of + .amr files. + +* AMR_COD_WORDS is the number of 16-bit words in one encoded frame in 3GPP test + sequence format (.cod); the public definition is needed for sizing the arrays + used with amr_frame_to_tseq() and amr_frame_from_tseq() API functions. + +Libtwamr enumerated types +========================= + +Libtwamr public API header file <tw_amr.h> defines these 3 enums: + +enum RXFrameType { + RX_SPEECH_GOOD = 0, + RX_SPEECH_DEGRADED, + RX_ONSET, + RX_SPEECH_BAD, + RX_SID_FIRST, + RX_SID_UPDATE, + RX_SID_BAD, + RX_NO_DATA, + RX_N_FRAMETYPES /* number of frame types */ +}; + +enum TXFrameType { + TX_SPEECH_GOOD = 0, + TX_SID_FIRST, + TX_SID_UPDATE, + TX_NO_DATA, + TX_SPEECH_DEGRADED, + TX_SPEECH_BAD, + TX_SID_BAD, + TX_ONSET, + TX_N_FRAMETYPES /* number of frame types */ +}; + +enum Mode { + MR475 = 0, + MR515, + MR59, + MR67, + MR74, + MR795, + MR102, + MR122, + MRDTX +}; + +Rx and Tx frame types are as defined by 3GPP, and the numeric values assigned +to each type are the same as those used by the official TS 26.073 encoder and +decoder programs. Note that Rx and Tx frame types are NOT equal! + +enum Mode should be self-explanatory: it covers the 8 possible codec modes of +AMR, plus the pseudo-mode of MRDTX used for packing and format manipulation of +SID frames. + +State allocation and freeing +============================ + +In order to use the AMR encoder, you will need to allocate an encoder state +structure, and to use the AMR decoder, you will need to allocate a decoder state +structure. The necessary state allocation functions are: + +struct amr_encoder_state *amr_encoder_create(int dtx, int use_vad2); +struct amr_decoder_state *amr_decoder_create(void); + +struct amr_encoder_state and struct amr_decoder_state are opaque structures to +library users: you only get pointers which you remember and pass around, but +<tw_amr.h> does not give you full definitions of these structs. As a library +user, you don't even get to know the size of these structs, hence the necessary +malloc() operation happens inside amr_encoder_create() and amr_decoder_create(). +However, each structure is malloc'ed as a single chunk, hence when you are done +with it, simply call free() to relinquish each encoder or decoder state +instance. + +amr_encoder_create() and amr_decoder_create() functions can fail if the malloc() +call inside fails, in which case the two libtwamr functions in question return +NULL. + +The dtx argument to amr_encoder_create() is a Boolean flag represented as an +int; it tells the AMR encoder whether it should operate with DTX enabled or +disabled. (Note that DTX is also called SCR for Source-Controlled Rate in some +AMR specs.) The use_vad2 argument is another Boolean flag, also represented as +an int; it tells the AMR encoder to use VAD2 algorithm instead of VAD1. It is +a novel feature of libtwamr in that both VAD versions are included and +selectable at run time; see AMR-library-desc article for the details. + +State reset functions +--------------------- + +The state of an already-allocated AMR encoder or AMR decoder can be reset at +any time with these functions: + +void amr_encoder_reset(struct amr_encoder_state *st, int dtx, int use_vad2); +void amr_decoder_reset(struct amr_decoder_state *st); + +Note that the two extra arguments to amr_encoder_reset() are the same as the +arguments to amr_encoder_create() - the reset operation is complete. +amr_encoder_create() is a wrapper around malloc() followed by +amr_encoder_reset(), and amr_decoder_create() is a wrapper around malloc() +followed by amr_decoder_reset(). + +Using the AMR encoder +===================== + +To encode one 20 ms audio frame per AMR, call amr_encode_frame(): + +void amr_encode_frame(struct amr_encoder_state *st, enum Mode mode, + const int16_t *pcm, struct amr_param_frame *frame); + +You need to provide an encoder state structure allocated earlier with +amr_encoder_create(), the selection of which codec mode to use, and a block of +160 linear PCM samples. Only modes MR475 through MR122 are valid for 'mode' +argument to amr_encode_frame(); MRDTX is not allowed in this context. + +The output from amr_encode_frame() is written into this structure: + +struct amr_param_frame { + uint8_t type; + uint8_t mode; + int16_t param[AMR_MAX_PRM]; +}; + +This structure is public, but it is defined by libtwamr (not by any external +standard), and it is generally intended to be an intermediate stage before +output encoding. Library functions exist for generating 3 output formats: 3GPP +AMR test sequence format, IETF RFC 4867 format, and AMR-EFR hybrid. + +Native encoder output +--------------------- + +The output structure is filled as follows: + +type: Set to one of TX_SPEECH_GOOD, TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA, + as defined by 3GPP. The last 3 are possible only when the encoder + operates with DTX enabled. + +mode: One of MR475 through MR122, same as the 'mode' argument to + amr_encode_frame(). + +param: Array of codec parameters, from 17 to 57 of them for modes MR475 through + MR122 in the case of TX_SPEECH_GOOD output, or 5 parameters for MRDTX + in the case of TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA DTX output. + +3GPP AMR test sequence output +----------------------------- + +The following function exists to convert the above encoder output into the test +sequence format which 3GPP defined for AMR, the insanely inefficient one with +250 (AMR_COD_WORDS) 16-bit words per frame: + +void amr_frame_to_tseq(const struct amr_param_frame *frame, uint16_t *cod); + +This function allows libtwamr encoder to be tested for correctness against the +set of test sequences in 3GPP TS 26.074. The output is in the local machine's +native byte order. + +RFC 4867 output +--------------- + +To turn libtwamr encoder output into an octet-aligned RFC 4867 single-frame +payload or storage-format frame (ToC octet followed by speech or SID data, but +no CMR payload header), call this function: + +unsigned amr_frame_to_ietf(const struct amr_param_frame *frame, uint8_t *bytes); + +The output buffer must have room for up to 32 bytes (AMR_IETF_MAX_PL); the +return value is the actual number of bytes used. The shortest possible output +is 1 byte in the case of TX_NO_DATA; the longest possible output is 32 bytes in +the case of TX_SPEECH_GOOD, mode MR122. + +Additional notes regarding output conversion functions +------------------------------------------------------ + +The struct amr_param_frame that is input to amr_frame_to_ietf() or +amr_frame_to_tseq() is expected to be a valid output from amr_encode_frame(). +These output conversion functions contain no guards against invalid input +(anything that cannot occur in the output from amr_encode_frame()), and are +thus allowed to segfault or corrupt memory etc if fed such invalid input. + +This lack of guard is justified in the present instance because struct +amr_param_frame is not intended to ever function as an external interface to +untrusted entities, instead this struct is intended to be only an intermediate +staging buffer between the call to amr_encode_frame() and an immediately +following call to one of the provided output conversion functions. + +AMR-EFR hybrid encoder +====================== + +To use libtwamr as an AMR-EFR hybrid encoder, follow these constraints: + +* 'dtx' argument must be 0 (no DTX) on the call to amr_encoder_create() or + amr_encoder_reset() that establishes the state for the encoder session. + +* 'mode' argument to amr_encode_frame() must be MR122 on every frame. + +After getting struct amr_param_frame out of amr_encode_frame(), call one of +these functions to generate the correct EFR DHF under the right conditions: + +void amr_dhf_subst_efr(struct amr_param_frame *frame); +void amr_dhf_subst_efr2(struct amr_param_frame *frame, const int16_t *pcm); + +Both functions check if the encoded frame is MR122 DHF (type equals +TX_SPEECH_GOOD, mode equals MR122, param array equals the fixed bit pattern of +MR122 DHF), and if so, overwrite param[] array in the structure with the +different bit pattern of EFR DHF. The difference between the two functions is +that amr_dhf_subst_efr() performs the just-described substitution +unconditionally, whereas amr_dhf_subst_efr2() applies this substitution only if +the PCM input is EHF. The latter function matches the observed behavior of +T-Mobile USA, but perhaps some others implemented the simpler logic equivalent +to our first function. + +After this transformation, call EFR_params2frame() from libgsmefr (see +EFR-library-API) with param[] array in struct amr_param_frame as input. + +Using the AMR decoder +===================== +