diff doc/AMR-library-API @ 476:c84bf526c7eb

beginning of libtwamr documentation
author Mychaela Falconia <falcon@freecalypso.org>
date Sat, 18 May 2024 21:22:07 +0000
parents
children 936a08cc73ce
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/AMR-library-API	Sat May 18 21:22:07 2024 +0000
@@ -0,0 +1,252 @@
+Libtwamr general usage
+======================
+
+The external public interface to Themyscira libtwamr consists of a single
+header file <tw_amr.h>; it should be installed in some system include
+directory.
+
+The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function
+prototypes), const qualifier is used where appropriate, and the interface is
+defined in terms of <stdint.h> types; <tw_amr.h> includes <stdint.h>.
+
+Public #define constant definitions
+===================================
+
+Libtwamr public API header file <tw_amr.h> defines these constants:
+
+#define	AMR_MAX_PRM		57	/* max. num. of params      */
+#define	AMR_IETF_MAX_PL		32	/* max bytes in RFC 4867 frame */
+#define	AMR_IETF_HDR_LEN	6	/* .amr file header bytes */
+#define	AMR_COD_WORDS		250	/* # of words in 3GPP test seq format */
+
+Explanation:
+
+* AMR_MAX_PRM is the maximum number of broken-down speech parameters in the
+  highest 12k2 mode of AMR; this definition is needed for struct amr_param_frame
+  covered later in this document.
+
+* AMR_IETF_MAX_PL is the size of the output buffer that must be provided for
+  amr_frame_to_ietf(), and also most commonly the size of the staging buffer
+  which most applications will likely use for gathering the input to
+  amr_frame_from_ietf().
+
+* AMR_IETF_HDR_LEN is the size of amr_file_header_magic[] public const datum
+  covered later in this document, and this constant will also be needed by any
+  application that needs to read or write the fixed header at the beginning of
+  .amr files.
+
+* AMR_COD_WORDS is the number of 16-bit words in one encoded frame in 3GPP test
+  sequence format (.cod); the public definition is needed for sizing the arrays
+  used with amr_frame_to_tseq() and amr_frame_from_tseq() API functions.
+
+Libtwamr enumerated types
+=========================
+
+Libtwamr public API header file <tw_amr.h> defines these 3 enums:
+
+enum RXFrameType {
+	RX_SPEECH_GOOD = 0,
+	RX_SPEECH_DEGRADED,
+	RX_ONSET,
+	RX_SPEECH_BAD,
+	RX_SID_FIRST,
+	RX_SID_UPDATE,
+	RX_SID_BAD,
+	RX_NO_DATA,
+	RX_N_FRAMETYPES		/* number of frame types */
+};
+
+enum TXFrameType {
+	TX_SPEECH_GOOD = 0,
+	TX_SID_FIRST,
+	TX_SID_UPDATE,
+	TX_NO_DATA,
+	TX_SPEECH_DEGRADED,
+	TX_SPEECH_BAD,
+	TX_SID_BAD,
+	TX_ONSET,
+	TX_N_FRAMETYPES		/* number of frame types */
+};
+
+enum Mode {
+	MR475 = 0,
+	MR515,
+	MR59,
+	MR67,
+	MR74,
+	MR795,
+	MR102,
+	MR122,
+	MRDTX
+};
+
+Rx and Tx frame types are as defined by 3GPP, and the numeric values assigned
+to each type are the same as those used by the official TS 26.073 encoder and
+decoder programs.  Note that Rx and Tx frame types are NOT equal!
+
+enum Mode should be self-explanatory: it covers the 8 possible codec modes of
+AMR, plus the pseudo-mode of MRDTX used for packing and format manipulation of
+SID frames.
+
+State allocation and freeing
+============================
+
+In order to use the AMR encoder, you will need to allocate an encoder state
+structure, and to use the AMR decoder, you will need to allocate a decoder state
+structure.  The necessary state allocation functions are:
+
+struct amr_encoder_state *amr_encoder_create(int dtx, int use_vad2);
+struct amr_decoder_state *amr_decoder_create(void);
+
+struct amr_encoder_state and struct amr_decoder_state are opaque structures to
+library users: you only get pointers which you remember and pass around, but
+<tw_amr.h> does not give you full definitions of these structs.  As a library
+user, you don't even get to know the size of these structs, hence the necessary
+malloc() operation happens inside amr_encoder_create() and amr_decoder_create().
+However, each structure is malloc'ed as a single chunk, hence when you are done
+with it, simply call free() to relinquish each encoder or decoder state
+instance.
+
+amr_encoder_create() and amr_decoder_create() functions can fail if the malloc()
+call inside fails, in which case the two libtwamr functions in question return
+NULL.
+
+The dtx argument to amr_encoder_create() is a Boolean flag represented as an
+int; it tells the AMR encoder whether it should operate with DTX enabled or
+disabled.  (Note that DTX is also called SCR for Source-Controlled Rate in some
+AMR specs.)  The use_vad2 argument is another Boolean flag, also represented as
+an int; it tells the AMR encoder to use VAD2 algorithm instead of VAD1.  It is
+a novel feature of libtwamr in that both VAD versions are included and
+selectable at run time; see AMR-library-desc article for the details.
+
+State reset functions
+---------------------
+
+The state of an already-allocated AMR encoder or AMR decoder can be reset at
+any time with these functions:
+
+void amr_encoder_reset(struct amr_encoder_state *st, int dtx, int use_vad2);
+void amr_decoder_reset(struct amr_decoder_state *st);
+
+Note that the two extra arguments to amr_encoder_reset() are the same as the
+arguments to amr_encoder_create() - the reset operation is complete.
+amr_encoder_create() is a wrapper around malloc() followed by
+amr_encoder_reset(), and amr_decoder_create() is a wrapper around malloc()
+followed by amr_decoder_reset().
+
+Using the AMR encoder
+=====================
+
+To encode one 20 ms audio frame per AMR, call amr_encode_frame():
+
+void amr_encode_frame(struct amr_encoder_state *st, enum Mode mode,
+			const int16_t *pcm, struct amr_param_frame *frame);
+
+You need to provide an encoder state structure allocated earlier with
+amr_encoder_create(), the selection of which codec mode to use, and a block of
+160 linear PCM samples.  Only modes MR475 through MR122 are valid for 'mode'
+argument to amr_encode_frame(); MRDTX is not allowed in this context.
+
+The output from amr_encode_frame() is written into this structure:
+
+struct amr_param_frame {
+	uint8_t	type;
+	uint8_t	mode;
+	int16_t	param[AMR_MAX_PRM];
+};
+
+This structure is public, but it is defined by libtwamr (not by any external
+standard), and it is generally intended to be an intermediate stage before
+output encoding.  Library functions exist for generating 3 output formats: 3GPP
+AMR test sequence format, IETF RFC 4867 format, and AMR-EFR hybrid.
+
+Native encoder output
+---------------------
+
+The output structure is filled as follows:
+
+type:	Set to one of TX_SPEECH_GOOD, TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA,
+	as defined by 3GPP.  The last 3 are possible only when the encoder
+	operates with DTX enabled.
+
+mode:	One of MR475 through MR122, same as the 'mode' argument to
+	amr_encode_frame().
+
+param:	Array of codec parameters, from 17 to 57 of them for modes MR475 through
+	MR122 in the case of TX_SPEECH_GOOD output, or 5 parameters for MRDTX
+	in the case of TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA DTX output.
+
+3GPP AMR test sequence output
+-----------------------------
+
+The following function exists to convert the above encoder output into the test
+sequence format which 3GPP defined for AMR, the insanely inefficient one with
+250 (AMR_COD_WORDS) 16-bit words per frame:
+
+void amr_frame_to_tseq(const struct amr_param_frame *frame, uint16_t *cod);
+
+This function allows libtwamr encoder to be tested for correctness against the
+set of test sequences in 3GPP TS 26.074.  The output is in the local machine's
+native byte order.
+
+RFC 4867 output
+---------------
+
+To turn libtwamr encoder output into an octet-aligned RFC 4867 single-frame
+payload or storage-format frame (ToC octet followed by speech or SID data, but
+no CMR payload header), call this function:
+
+unsigned amr_frame_to_ietf(const struct amr_param_frame *frame, uint8_t *bytes);
+
+The output buffer must have room for up to 32 bytes (AMR_IETF_MAX_PL); the
+return value is the actual number of bytes used.  The shortest possible output
+is 1 byte in the case of TX_NO_DATA; the longest possible output is 32 bytes in
+the case of TX_SPEECH_GOOD, mode MR122.
+
+Additional notes regarding output conversion functions
+------------------------------------------------------
+
+The struct amr_param_frame that is input to amr_frame_to_ietf() or
+amr_frame_to_tseq() is expected to be a valid output from amr_encode_frame().
+These output conversion functions contain no guards against invalid input
+(anything that cannot occur in the output from amr_encode_frame()), and are
+thus allowed to segfault or corrupt memory etc if fed such invalid input.
+
+This lack of guard is justified in the present instance because struct
+amr_param_frame is not intended to ever function as an external interface to
+untrusted entities, instead this struct is intended to be only an intermediate
+staging buffer between the call to amr_encode_frame() and an immediately
+following call to one of the provided output conversion functions.
+
+AMR-EFR hybrid encoder
+======================
+
+To use libtwamr as an AMR-EFR hybrid encoder, follow these constraints:
+
+* 'dtx' argument must be 0 (no DTX) on the call to amr_encoder_create() or
+  amr_encoder_reset() that establishes the state for the encoder session.
+
+* 'mode' argument to amr_encode_frame() must be MR122 on every frame.
+
+After getting struct amr_param_frame out of amr_encode_frame(), call one of
+these functions to generate the correct EFR DHF under the right conditions:
+
+void amr_dhf_subst_efr(struct amr_param_frame *frame);
+void amr_dhf_subst_efr2(struct amr_param_frame *frame, const int16_t *pcm);
+
+Both functions check if the encoded frame is MR122 DHF (type equals
+TX_SPEECH_GOOD, mode equals MR122, param array equals the fixed bit pattern of
+MR122 DHF), and if so, overwrite param[] array in the structure with the
+different bit pattern of EFR DHF.  The difference between the two functions is
+that amr_dhf_subst_efr() performs the just-described substitution
+unconditionally, whereas amr_dhf_subst_efr2() applies this substitution only if
+the PCM input is EHF.  The latter function matches the observed behavior of
+T-Mobile USA, but perhaps some others implemented the simpler logic equivalent
+to our first function.
+
+After this transformation, call EFR_params2frame() from libgsmefr (see
+EFR-library-API) with param[] array in struct amr_param_frame as input.
+
+Using the AMR decoder
+=====================
+