changeset 476:c84bf526c7eb

beginning of libtwamr documentation
author Mychaela Falconia <falcon@freecalypso.org>
date Sat, 18 May 2024 21:22:07 +0000
parents e512f0d25409
children 4c9222d95647
files doc/AMR-library-API doc/AMR-library-desc
diffstat 2 files changed, 340 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/AMR-library-API	Sat May 18 21:22:07 2024 +0000
@@ -0,0 +1,252 @@
+Libtwamr general usage
+======================
+
+The external public interface to Themyscira libtwamr consists of a single
+header file <tw_amr.h>; it should be installed in some system include
+directory.
+
+The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function
+prototypes), const qualifier is used where appropriate, and the interface is
+defined in terms of <stdint.h> types; <tw_amr.h> includes <stdint.h>.
+
+Public #define constant definitions
+===================================
+
+Libtwamr public API header file <tw_amr.h> defines these constants:
+
+#define	AMR_MAX_PRM		57	/* max. num. of params      */
+#define	AMR_IETF_MAX_PL		32	/* max bytes in RFC 4867 frame */
+#define	AMR_IETF_HDR_LEN	6	/* .amr file header bytes */
+#define	AMR_COD_WORDS		250	/* # of words in 3GPP test seq format */
+
+Explanation:
+
+* AMR_MAX_PRM is the maximum number of broken-down speech parameters in the
+  highest 12k2 mode of AMR; this definition is needed for struct amr_param_frame
+  covered later in this document.
+
+* AMR_IETF_MAX_PL is the size of the output buffer that must be provided for
+  amr_frame_to_ietf(), and also most commonly the size of the staging buffer
+  which most applications will likely use for gathering the input to
+  amr_frame_from_ietf().
+
+* AMR_IETF_HDR_LEN is the size of amr_file_header_magic[] public const datum
+  covered later in this document, and this constant will also be needed by any
+  application that needs to read or write the fixed header at the beginning of
+  .amr files.
+
+* AMR_COD_WORDS is the number of 16-bit words in one encoded frame in 3GPP test
+  sequence format (.cod); the public definition is needed for sizing the arrays
+  used with amr_frame_to_tseq() and amr_frame_from_tseq() API functions.
+
+Libtwamr enumerated types
+=========================
+
+Libtwamr public API header file <tw_amr.h> defines these 3 enums:
+
+enum RXFrameType {
+	RX_SPEECH_GOOD = 0,
+	RX_SPEECH_DEGRADED,
+	RX_ONSET,
+	RX_SPEECH_BAD,
+	RX_SID_FIRST,
+	RX_SID_UPDATE,
+	RX_SID_BAD,
+	RX_NO_DATA,
+	RX_N_FRAMETYPES		/* number of frame types */
+};
+
+enum TXFrameType {
+	TX_SPEECH_GOOD = 0,
+	TX_SID_FIRST,
+	TX_SID_UPDATE,
+	TX_NO_DATA,
+	TX_SPEECH_DEGRADED,
+	TX_SPEECH_BAD,
+	TX_SID_BAD,
+	TX_ONSET,
+	TX_N_FRAMETYPES		/* number of frame types */
+};
+
+enum Mode {
+	MR475 = 0,
+	MR515,
+	MR59,
+	MR67,
+	MR74,
+	MR795,
+	MR102,
+	MR122,
+	MRDTX
+};
+
+Rx and Tx frame types are as defined by 3GPP, and the numeric values assigned
+to each type are the same as those used by the official TS 26.073 encoder and
+decoder programs.  Note that Rx and Tx frame types are NOT equal!
+
+enum Mode should be self-explanatory: it covers the 8 possible codec modes of
+AMR, plus the pseudo-mode of MRDTX used for packing and format manipulation of
+SID frames.
+
+State allocation and freeing
+============================
+
+In order to use the AMR encoder, you will need to allocate an encoder state
+structure, and to use the AMR decoder, you will need to allocate a decoder state
+structure.  The necessary state allocation functions are:
+
+struct amr_encoder_state *amr_encoder_create(int dtx, int use_vad2);
+struct amr_decoder_state *amr_decoder_create(void);
+
+struct amr_encoder_state and struct amr_decoder_state are opaque structures to
+library users: you only get pointers which you remember and pass around, but
+<tw_amr.h> does not give you full definitions of these structs.  As a library
+user, you don't even get to know the size of these structs, hence the necessary
+malloc() operation happens inside amr_encoder_create() and amr_decoder_create().
+However, each structure is malloc'ed as a single chunk, hence when you are done
+with it, simply call free() to relinquish each encoder or decoder state
+instance.
+
+amr_encoder_create() and amr_decoder_create() functions can fail if the malloc()
+call inside fails, in which case the two libtwamr functions in question return
+NULL.
+
+The dtx argument to amr_encoder_create() is a Boolean flag represented as an
+int; it tells the AMR encoder whether it should operate with DTX enabled or
+disabled.  (Note that DTX is also called SCR for Source-Controlled Rate in some
+AMR specs.)  The use_vad2 argument is another Boolean flag, also represented as
+an int; it tells the AMR encoder to use VAD2 algorithm instead of VAD1.  It is
+a novel feature of libtwamr in that both VAD versions are included and
+selectable at run time; see AMR-library-desc article for the details.
+
+State reset functions
+---------------------
+
+The state of an already-allocated AMR encoder or AMR decoder can be reset at
+any time with these functions:
+
+void amr_encoder_reset(struct amr_encoder_state *st, int dtx, int use_vad2);
+void amr_decoder_reset(struct amr_decoder_state *st);
+
+Note that the two extra arguments to amr_encoder_reset() are the same as the
+arguments to amr_encoder_create() - the reset operation is complete.
+amr_encoder_create() is a wrapper around malloc() followed by
+amr_encoder_reset(), and amr_decoder_create() is a wrapper around malloc()
+followed by amr_decoder_reset().
+
+Using the AMR encoder
+=====================
+
+To encode one 20 ms audio frame per AMR, call amr_encode_frame():
+
+void amr_encode_frame(struct amr_encoder_state *st, enum Mode mode,
+			const int16_t *pcm, struct amr_param_frame *frame);
+
+You need to provide an encoder state structure allocated earlier with
+amr_encoder_create(), the selection of which codec mode to use, and a block of
+160 linear PCM samples.  Only modes MR475 through MR122 are valid for 'mode'
+argument to amr_encode_frame(); MRDTX is not allowed in this context.
+
+The output from amr_encode_frame() is written into this structure:
+
+struct amr_param_frame {
+	uint8_t	type;
+	uint8_t	mode;
+	int16_t	param[AMR_MAX_PRM];
+};
+
+This structure is public, but it is defined by libtwamr (not by any external
+standard), and it is generally intended to be an intermediate stage before
+output encoding.  Library functions exist for generating 3 output formats: 3GPP
+AMR test sequence format, IETF RFC 4867 format, and AMR-EFR hybrid.
+
+Native encoder output
+---------------------
+
+The output structure is filled as follows:
+
+type:	Set to one of TX_SPEECH_GOOD, TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA,
+	as defined by 3GPP.  The last 3 are possible only when the encoder
+	operates with DTX enabled.
+
+mode:	One of MR475 through MR122, same as the 'mode' argument to
+	amr_encode_frame().
+
+param:	Array of codec parameters, from 17 to 57 of them for modes MR475 through
+	MR122 in the case of TX_SPEECH_GOOD output, or 5 parameters for MRDTX
+	in the case of TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA DTX output.
+
+3GPP AMR test sequence output
+-----------------------------
+
+The following function exists to convert the above encoder output into the test
+sequence format which 3GPP defined for AMR, the insanely inefficient one with
+250 (AMR_COD_WORDS) 16-bit words per frame:
+
+void amr_frame_to_tseq(const struct amr_param_frame *frame, uint16_t *cod);
+
+This function allows libtwamr encoder to be tested for correctness against the
+set of test sequences in 3GPP TS 26.074.  The output is in the local machine's
+native byte order.
+
+RFC 4867 output
+---------------
+
+To turn libtwamr encoder output into an octet-aligned RFC 4867 single-frame
+payload or storage-format frame (ToC octet followed by speech or SID data, but
+no CMR payload header), call this function:
+
+unsigned amr_frame_to_ietf(const struct amr_param_frame *frame, uint8_t *bytes);
+
+The output buffer must have room for up to 32 bytes (AMR_IETF_MAX_PL); the
+return value is the actual number of bytes used.  The shortest possible output
+is 1 byte in the case of TX_NO_DATA; the longest possible output is 32 bytes in
+the case of TX_SPEECH_GOOD, mode MR122.
+
+Additional notes regarding output conversion functions
+------------------------------------------------------
+
+The struct amr_param_frame that is input to amr_frame_to_ietf() or
+amr_frame_to_tseq() is expected to be a valid output from amr_encode_frame().
+These output conversion functions contain no guards against invalid input
+(anything that cannot occur in the output from amr_encode_frame()), and are
+thus allowed to segfault or corrupt memory etc if fed such invalid input.
+
+This lack of guard is justified in the present instance because struct
+amr_param_frame is not intended to ever function as an external interface to
+untrusted entities, instead this struct is intended to be only an intermediate
+staging buffer between the call to amr_encode_frame() and an immediately
+following call to one of the provided output conversion functions.
+
+AMR-EFR hybrid encoder
+======================
+
+To use libtwamr as an AMR-EFR hybrid encoder, follow these constraints:
+
+* 'dtx' argument must be 0 (no DTX) on the call to amr_encoder_create() or
+  amr_encoder_reset() that establishes the state for the encoder session.
+
+* 'mode' argument to amr_encode_frame() must be MR122 on every frame.
+
+After getting struct amr_param_frame out of amr_encode_frame(), call one of
+these functions to generate the correct EFR DHF under the right conditions:
+
+void amr_dhf_subst_efr(struct amr_param_frame *frame);
+void amr_dhf_subst_efr2(struct amr_param_frame *frame, const int16_t *pcm);
+
+Both functions check if the encoded frame is MR122 DHF (type equals
+TX_SPEECH_GOOD, mode equals MR122, param array equals the fixed bit pattern of
+MR122 DHF), and if so, overwrite param[] array in the structure with the
+different bit pattern of EFR DHF.  The difference between the two functions is
+that amr_dhf_subst_efr() performs the just-described substitution
+unconditionally, whereas amr_dhf_subst_efr2() applies this substitution only if
+the PCM input is EHF.  The latter function matches the observed behavior of
+T-Mobile USA, but perhaps some others implemented the simpler logic equivalent
+to our first function.
+
+After this transformation, call EFR_params2frame() from libgsmefr (see
+EFR-library-API) with param[] array in struct amr_param_frame as input.
+
+Using the AMR decoder
+=====================
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/AMR-library-desc	Sat May 18 21:22:07 2024 +0000
@@ -0,0 +1,88 @@
+Themyscira libtwamr general description
+=======================================
+
+Libtwamr is a librification of the official AMR reference C code from 3GPP,
+produced by Themyscira Wireless and styled to match our libraries for more
+classic GSM codecs.  This library has been created with the following two goals
+in mind:
+
+1) At the present time we (ThemWi) operate our GSM network with only GSM-FR and
+   GSM-EFR codecs, with the latter being preferred.  We do not currently operate
+   with AMR because the conditions under which AMR becomes advantageous do not
+   currently exist in our network operation.  However, we need to be prepared
+   for the possibility that the conditions which make AMR desirable may arise
+   some day, and we may need to start deploying AMR.  In order to make AMR
+   deployment a possibility, many parts will need to be implemented, one of
+   which is a speech transcoding library that implements the AMR codec in the
+   same way how libgsmfr2 and libgsmefr implement the more classic codecs which
+   we use currently.
+
+2) Many other commercial GSM networks have implemented EFR speech service using
+   a type of AMR-EFR hybrid described in AMR-EFR-philosophy and
+   AMR-EFR-hybrid-emu articles.  As part of certain behavioral reverse
+   engineering experiments, we sometimes need to model the bit-exact operation
+   of those other-people-controlled commercial implementations of AMR-EFR, and
+   our current libtwamr provides one way to do so.  Knowing that a proper
+   implementation of an AMR codec library is likely to be needed some day for
+   reason 1 above, justification was obtained for expending the effort to
+   produce the present libtwamr.
+
+Compared to other plausible ways in which someone could reasonably approach the
+task of librifying the AMR reference code from 3GPP, the design of libtwamr
+includes two somewhat original choices:
+
+* Separation of core and I/O: the stateful encoder and decoder engines in
+  libtwamr operate on a custom frame structure that includes the array of codec
+  parameters in their broken-down form (e.g., 57 parameters for MR122), the
+  frame type as in original RXFrameType and TXFrameType, and the codec mode.
+  Conversion between this internal canonical form (which is most native to the
+  guts of the encoder and decoder engines) and external I/O formats (the 3GPP
+  test sequence format and the more practical RFC 4867 format used in RTP and
+  in .amr recording files) is relegated to stateless utility functions.
+
+* Both VAD1 and VAD2 included: the reference code from 3GPP includes two
+  alternative versions of Voice Activity Detection algorithm, VAD1 and VAD2.
+  Implementors are allowed to use either version and be compliant; 3GPP code
+  uses conditional compilation to select between the two, and it appears that
+  no thought was given to the possibility that a real implementation would
+  incorporate both VAD versions, to be selected at run time.  However, given our
+  (ThemWi) desire for bit-exact testing against other people's implementations,
+  it made no sense for us to arbitrarily select one VAD version and drop the
+  other - hence we took the unconventional route of incorporating both VAD1 and
+  VAD2 into libtwamr, and designing our encoder API so that library users get
+  to select which VAD they wish to apply.
+
+Like all other Themyscira GSM codec libraries, libtwamr includes the codec
+homing feature in both encoder and decoder directions, as required by 3GPP
+specs.  Furthermore, libtwamr implementation of this codec homing feature
+includes the following simple extensions (simple in terms of low implementation
+cost) to facilitate construction of an AMR-EFR hybrid encoder and decoder:
+
+* In the decoder direction, the main AMR frame decoder function includes a DHF
+  detector as required by 3GPP architecture.  In libtwamr this function can be
+  told to trigger on EFR DHF instead of MR122 version, by way of a flag set in
+  the mode field of the frame structure passed to amr_decode_frame().
+
+* In the encoder direction, the regular call to amr_encode_frame() - standard
+  for AMR - can be followed with a call to amr_dhf_subst_efr() or
+  amr_dhf_subst_efr2() before passing the array of encoded parameters to
+  EFR_params2frame() from libgsmefr.  See AMR-EFR-hybrid-emu article for more
+  information.  The AMR-EFR hybrid test sequences in amr122_efr.zip pass on
+  both amr_dhf_subst_efr() and amr_dhf_subst_efr2() versions, but the latter
+  additionally matches the observable behavior of T-Mobile USA.
+
+The mechanism that allows libtwamr to be used for AMR-EFR hybrid implementation
+(as opposed to the more conventional use case of implementing standard AMR-NB)
+is kept out of the main stateful paths: there are no separate AMR-EFR hybrid
+encoder or decoder sessions that are distinguishable from regular AMR encoding
+and decoding in terms of state.  In the decoder direction, the main AMR frame
+decoder function needs to know which DHF it should check for, but this
+indication is embedded in the mode field in struct amr_param_frame and not in
+the state.  In the encoder direction, the mechanism is a separate function
+(stateless) that needs to be called between amr_encode_frame() and
+EFR_params2frame().  This approach dovetails nicely with the core vs I/O
+separation: the option of AMR-EFR hybrid can be viewed as a different I/O front
+end to the same AMR engine, alongside with 3GPP AMR test sequence and RFC 4867
+I/O options.
+
+Please refer to AMR-library-API article for further details.