view doc/AMR-library-API @ 476:c84bf526c7eb

beginning of libtwamr documentation
author Mychaela Falconia <falcon@freecalypso.org>
date Sat, 18 May 2024 21:22:07 +0000
parents
children 936a08cc73ce
line wrap: on
line source

Libtwamr general usage
======================

The external public interface to Themyscira libtwamr consists of a single
header file <tw_amr.h>; it should be installed in some system include
directory.

The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function
prototypes), const qualifier is used where appropriate, and the interface is
defined in terms of <stdint.h> types; <tw_amr.h> includes <stdint.h>.

Public #define constant definitions
===================================

Libtwamr public API header file <tw_amr.h> defines these constants:

#define	AMR_MAX_PRM		57	/* max. num. of params      */
#define	AMR_IETF_MAX_PL		32	/* max bytes in RFC 4867 frame */
#define	AMR_IETF_HDR_LEN	6	/* .amr file header bytes */
#define	AMR_COD_WORDS		250	/* # of words in 3GPP test seq format */

Explanation:

* AMR_MAX_PRM is the maximum number of broken-down speech parameters in the
  highest 12k2 mode of AMR; this definition is needed for struct amr_param_frame
  covered later in this document.

* AMR_IETF_MAX_PL is the size of the output buffer that must be provided for
  amr_frame_to_ietf(), and also most commonly the size of the staging buffer
  which most applications will likely use for gathering the input to
  amr_frame_from_ietf().

* AMR_IETF_HDR_LEN is the size of amr_file_header_magic[] public const datum
  covered later in this document, and this constant will also be needed by any
  application that needs to read or write the fixed header at the beginning of
  .amr files.

* AMR_COD_WORDS is the number of 16-bit words in one encoded frame in 3GPP test
  sequence format (.cod); the public definition is needed for sizing the arrays
  used with amr_frame_to_tseq() and amr_frame_from_tseq() API functions.

Libtwamr enumerated types
=========================

Libtwamr public API header file <tw_amr.h> defines these 3 enums:

enum RXFrameType {
	RX_SPEECH_GOOD = 0,
	RX_SPEECH_DEGRADED,
	RX_ONSET,
	RX_SPEECH_BAD,
	RX_SID_FIRST,
	RX_SID_UPDATE,
	RX_SID_BAD,
	RX_NO_DATA,
	RX_N_FRAMETYPES		/* number of frame types */
};

enum TXFrameType {
	TX_SPEECH_GOOD = 0,
	TX_SID_FIRST,
	TX_SID_UPDATE,
	TX_NO_DATA,
	TX_SPEECH_DEGRADED,
	TX_SPEECH_BAD,
	TX_SID_BAD,
	TX_ONSET,
	TX_N_FRAMETYPES		/* number of frame types */
};

enum Mode {
	MR475 = 0,
	MR515,
	MR59,
	MR67,
	MR74,
	MR795,
	MR102,
	MR122,
	MRDTX
};

Rx and Tx frame types are as defined by 3GPP, and the numeric values assigned
to each type are the same as those used by the official TS 26.073 encoder and
decoder programs.  Note that Rx and Tx frame types are NOT equal!

enum Mode should be self-explanatory: it covers the 8 possible codec modes of
AMR, plus the pseudo-mode of MRDTX used for packing and format manipulation of
SID frames.

State allocation and freeing
============================

In order to use the AMR encoder, you will need to allocate an encoder state
structure, and to use the AMR decoder, you will need to allocate a decoder state
structure.  The necessary state allocation functions are:

struct amr_encoder_state *amr_encoder_create(int dtx, int use_vad2);
struct amr_decoder_state *amr_decoder_create(void);

struct amr_encoder_state and struct amr_decoder_state are opaque structures to
library users: you only get pointers which you remember and pass around, but
<tw_amr.h> does not give you full definitions of these structs.  As a library
user, you don't even get to know the size of these structs, hence the necessary
malloc() operation happens inside amr_encoder_create() and amr_decoder_create().
However, each structure is malloc'ed as a single chunk, hence when you are done
with it, simply call free() to relinquish each encoder or decoder state
instance.

amr_encoder_create() and amr_decoder_create() functions can fail if the malloc()
call inside fails, in which case the two libtwamr functions in question return
NULL.

The dtx argument to amr_encoder_create() is a Boolean flag represented as an
int; it tells the AMR encoder whether it should operate with DTX enabled or
disabled.  (Note that DTX is also called SCR for Source-Controlled Rate in some
AMR specs.)  The use_vad2 argument is another Boolean flag, also represented as
an int; it tells the AMR encoder to use VAD2 algorithm instead of VAD1.  It is
a novel feature of libtwamr in that both VAD versions are included and
selectable at run time; see AMR-library-desc article for the details.

State reset functions
---------------------

The state of an already-allocated AMR encoder or AMR decoder can be reset at
any time with these functions:

void amr_encoder_reset(struct amr_encoder_state *st, int dtx, int use_vad2);
void amr_decoder_reset(struct amr_decoder_state *st);

Note that the two extra arguments to amr_encoder_reset() are the same as the
arguments to amr_encoder_create() - the reset operation is complete.
amr_encoder_create() is a wrapper around malloc() followed by
amr_encoder_reset(), and amr_decoder_create() is a wrapper around malloc()
followed by amr_decoder_reset().

Using the AMR encoder
=====================

To encode one 20 ms audio frame per AMR, call amr_encode_frame():

void amr_encode_frame(struct amr_encoder_state *st, enum Mode mode,
			const int16_t *pcm, struct amr_param_frame *frame);

You need to provide an encoder state structure allocated earlier with
amr_encoder_create(), the selection of which codec mode to use, and a block of
160 linear PCM samples.  Only modes MR475 through MR122 are valid for 'mode'
argument to amr_encode_frame(); MRDTX is not allowed in this context.

The output from amr_encode_frame() is written into this structure:

struct amr_param_frame {
	uint8_t	type;
	uint8_t	mode;
	int16_t	param[AMR_MAX_PRM];
};

This structure is public, but it is defined by libtwamr (not by any external
standard), and it is generally intended to be an intermediate stage before
output encoding.  Library functions exist for generating 3 output formats: 3GPP
AMR test sequence format, IETF RFC 4867 format, and AMR-EFR hybrid.

Native encoder output
---------------------

The output structure is filled as follows:

type:	Set to one of TX_SPEECH_GOOD, TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA,
	as defined by 3GPP.  The last 3 are possible only when the encoder
	operates with DTX enabled.

mode:	One of MR475 through MR122, same as the 'mode' argument to
	amr_encode_frame().

param:	Array of codec parameters, from 17 to 57 of them for modes MR475 through
	MR122 in the case of TX_SPEECH_GOOD output, or 5 parameters for MRDTX
	in the case of TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA DTX output.

3GPP AMR test sequence output
-----------------------------

The following function exists to convert the above encoder output into the test
sequence format which 3GPP defined for AMR, the insanely inefficient one with
250 (AMR_COD_WORDS) 16-bit words per frame:

void amr_frame_to_tseq(const struct amr_param_frame *frame, uint16_t *cod);

This function allows libtwamr encoder to be tested for correctness against the
set of test sequences in 3GPP TS 26.074.  The output is in the local machine's
native byte order.

RFC 4867 output
---------------

To turn libtwamr encoder output into an octet-aligned RFC 4867 single-frame
payload or storage-format frame (ToC octet followed by speech or SID data, but
no CMR payload header), call this function:

unsigned amr_frame_to_ietf(const struct amr_param_frame *frame, uint8_t *bytes);

The output buffer must have room for up to 32 bytes (AMR_IETF_MAX_PL); the
return value is the actual number of bytes used.  The shortest possible output
is 1 byte in the case of TX_NO_DATA; the longest possible output is 32 bytes in
the case of TX_SPEECH_GOOD, mode MR122.

Additional notes regarding output conversion functions
------------------------------------------------------

The struct amr_param_frame that is input to amr_frame_to_ietf() or
amr_frame_to_tseq() is expected to be a valid output from amr_encode_frame().
These output conversion functions contain no guards against invalid input
(anything that cannot occur in the output from amr_encode_frame()), and are
thus allowed to segfault or corrupt memory etc if fed such invalid input.

This lack of guard is justified in the present instance because struct
amr_param_frame is not intended to ever function as an external interface to
untrusted entities, instead this struct is intended to be only an intermediate
staging buffer between the call to amr_encode_frame() and an immediately
following call to one of the provided output conversion functions.

AMR-EFR hybrid encoder
======================

To use libtwamr as an AMR-EFR hybrid encoder, follow these constraints:

* 'dtx' argument must be 0 (no DTX) on the call to amr_encoder_create() or
  amr_encoder_reset() that establishes the state for the encoder session.

* 'mode' argument to amr_encode_frame() must be MR122 on every frame.

After getting struct amr_param_frame out of amr_encode_frame(), call one of
these functions to generate the correct EFR DHF under the right conditions:

void amr_dhf_subst_efr(struct amr_param_frame *frame);
void amr_dhf_subst_efr2(struct amr_param_frame *frame, const int16_t *pcm);

Both functions check if the encoded frame is MR122 DHF (type equals
TX_SPEECH_GOOD, mode equals MR122, param array equals the fixed bit pattern of
MR122 DHF), and if so, overwrite param[] array in the structure with the
different bit pattern of EFR DHF.  The difference between the two functions is
that amr_dhf_subst_efr() performs the just-described substitution
unconditionally, whereas amr_dhf_subst_efr2() applies this substitution only if
the PCM input is EHF.  The latter function matches the observed behavior of
T-Mobile USA, but perhaps some others implemented the simpler logic equivalent
to our first function.

After this transformation, call EFR_params2frame() from libgsmefr (see
EFR-library-API) with param[] array in struct amr_param_frame as input.

Using the AMR decoder
=====================