view doc/EFR-library-API @ 130:1c529bb31219

doc/EFR-library-API: explain dtx argument to EFR_encoder_reset()
author Mychaela Falconia <falcon@freecalypso.org>
date Sun, 11 Dec 2022 04:11:57 +0000
parents 92fdb499b5c3
children fe5aceaf51e0
line wrap: on
line source

The external public interface to Themyscira libgsmefr consists of a single
header file <gsm_efr.h>; it should be installed in the same system include
directory as <gsm.h> from classic libgsm (1990s free software product) for the
original FR codec, and the API of libgsmefr is modeled after that of libgsm.

The dialect of C we chose for libgsmefr is ANSI C (function prototypes), const
qualifier is used where appropriate, and the interface is defined in terms of
<stdint.h> types; <gsm_efr.h> includes <stdint.h>.

State allocation and freeing
============================

In order to use the EFR encoder, you will need to allocate an encoder state
structure, and to use the EFR decoder, you will need to allocate a decoder state
structure.  The necessary state allocation functions are:

extern struct EFR_encoder_state *EFR_encoder_create(int dtx);
extern struct EFR_decoder_state *EFR_decoder_create(void);

struct EFR_encoder_state and struct EFR_decoder_state are opaque structures to
library users: you only get pointers which you remember and pass around, but
<gsm_efr.h> does not give you full definitions of these structs.  As a library
user, you don't even get to know the size of these structs, hence the necessary
malloc() operation happens inside EFR_encoder_create() and EFR_decoder_create().
However, each structure is malloc'ed as a single chunk, hence when you are done
with it, simply call free() to relinquish each encoder or decoder state
instance.

EFR_encoder_create() and EFR_decoder_create() functions can fail if the malloc()
call inside fails, in which case the two libgsmefr functions in question return
NULL.

The dtx argument to EFR_encoder_create() is a Boolean flag represented as an
int; it tells the EFR encoder whether it should operate with DTX enabled (run
GSM 06.82 VAD and emit SID frames instead of speech frames per GSM 06.81) or DTX
disabled (skip VAD and always emit speech frames).

Using the EFR encoder
=====================

To encode one 20 ms audio frame per EFR, call EFR_encode_frame():

extern void EFR_encode_frame(struct EFR_encoder_state *st, const int16_t *pcm,
			     uint8_t *frame, int *sp, int *vad);

You need to provide an encoder state structure allocated earlier with
EFR_encoder_create(), a block of 160 linear PCM samples, and an output buffer of
31 bytes (EFR_RTP_FRAME_LEN constant also defined in <gsm_efr.h>) into which the
encoded EFR frame will be written; the frame format is that defined in ETSI TS
101 318 for EFR in RTP, including the 0xC signature in the upper nibble of the
first byte.

The last two arguments of type (int *) are optional pointers to extra output
flags SP and VAD, defined in GSM 06.81 section 5.1.1; either pointer or both of
them can be NULL if these extra output flags aren't needed.  Both of these flags
are needed in order to test our libgsmefr encoder implementation against
official ETSI test sequences (GSM 06.54), but they typically aren't needed
otherwise.

Using the EFR decoder
=====================

The main interface to our EFR decoder is this function:

extern void EFR_decode_frame(struct EFR_decoder_state *st, const uint8_t *frame,
			     int bfi, int taf, int16_t *pcm);

The inputs consist of 244 bits of frame payload (the 4 upper bits of the first
byte are ignored - there is NO enforcement of 0xC signature in our frame
decoder) and BFI and TAF flags defined in GSM 06.81 section 6.1.1.  Note the
absence of a SID flag argument: EFR_decode_frame() calls our own utility
function EFR_sid_classify() to determine SID from the frame itself per the rules
of GSM 06.81 section 6.1.1.

Many EFR decoder applications will also be faced with a situation where they
receive a frame gap (no data at all), and they need to run the EFR decoder with
BFI=1, but don't have any frame-bits input.  If you find yourself in this
situation, call the following function:

extern void EFR_decode_bfi_nodata(struct EFR_decoder_state *st, int taf,
				  int16_t *pcm);

EFR_decode_bfi_nodata() is equivalent to calling EFR_decode_frame() with a frame
buffer of 31 zero bytes (or 0xC signature followed by 244 zero bits) and BFI=1,
but is slightly more efficient in that the internal steps of EFR_frame2params()
and EFR_sid_classify() are skipped, and the made-up "frame" of 244 zero bits is
passed to the decoder core at the params array level.

Note that the official EFR decoder from ETSI, which we've replicated in our
librified form in libgsmefr, does make use of some presumed-invalid frame data
bits under BFI=1 conditions: see the description in GSM 06.61 section 6.1, where
the last sentence reads "The received fixed codebook excitation pulses from the
erroneous frame are always used as such."  With our current implementation, the
"erroneous frame" in the case of completely lost or missing frames is a made-up
frame of 244 zero bits; the question of whether this approach is good enough or
if we need to do something more complex remains for further study.

Stateless utility functions
===========================

All functions in this section are stateless (no encoder state or decoder state
structure is needed); they merely manipulate bit fields.

extern void EFR_frame2params(const uint8_t *frame, int16_t *params);

This function unpacks an EFR codec frame in ETSI TS 101 318 RTP encoding (the
upper nibble of the first byte is NOT checked, i.e., there is NO enforcement of
0xC signature) into an array of 57 (EFR_NUM_PARAMS) parameter words for the
codec.  int16_t signed type is used for the params array (even though all
parameters are actually unsigned) in order to match the guts of ETSI-based EFR
codec, and EFR_frame2params() is called internally by EFR_decode_frame().

extern void EFR_params2frame(const int16_t *params, uint8_t *frame);

This function takes an array of 57 (EFR_NUM_PARAMS) EFR codec parameter words
and packs them into a 31-byte (EFR_RTP_FRAME_LEN) frame in ETSI TS 101 318
format.  The 0xC signature is generated by this function, and every byte of the
output buffer is fully written without regard to any previous content.  This
function is called internally by EFR_encode_frame().

extern int EFR_sid_classify(const uint8_t *frame);

This function analyzes an RTP-encoded EFR frame (the upper nibble of the first
byte is NOT checked for 0xC signature) for the SID codeword of GSM 06.62 and
classifies the frame as SID=0, SID=1 or SID=2 per the rules of GSM 06.81
section 6.1.1.

extern void EFR_insert_sid_codeword(uint8_t *frame);

This function inserts the SID codeword of GSM 06.62 into the frame in the
pointed-to buffer; specifically, the 95 bits that make up the SID field are all
set to 1s, but all other bits remain unchanged.  This function is arguably least
useful to external users of libgsmefr, but it exists because of how the original
code from ETSI generates SID frames produced by the encoder in DTX mode.

Parameter-based encoder and decoder functions
=============================================

The EFR_encode_frame() and EFR_decode_frame() functions described earlier in
this document constitute the most practically useful (intended for actual use)
interfaces to our EFR encoder and decoder, but they are actually wrappers around
these parameter-based functions:

extern void EFR_encode_params(struct EFR_encoder_state *st, const int16_t *pcm,
			      int16_t *params, int *sp, int *vad);

This function is similar to EFR_encode_frame(), but the output is an array of
57 (EFR_NUM_PARAMS) codec parameter words rather than a finished frame.  The two
extra output flags are optional (pointers may be NULL) just like with
EFR_encode_frame(), but there is a catch: if the output frame is a SID (which
can only happen if DTX is enabled), the bits inside parameter words that would
correspond to SID codeword bits are NOT set, instead one MUST call
EFR_insert_sid_codeword() after packing the frame with EFR_params2frame().  The
wrapper in EFR_encode_frame() does exactly as described, and the overall logic
follows the original code structure from ETSI.

extern void EFR_decode_params(struct EFR_decoder_state *st,
			      const int16_t *params, int bfi, int sid, int taf,
			      int16_t *pcm);

This function is similar to EFR_decode_frame() with the frame input replaced
with params array input, but the SID classification per the rules of GSM 06.81
section 6.1.1 needs to be provided by the caller.  The wrapper in
EFR_decode_frame() calls both EFR_frame2params() and EFR_sid_classify() before
passing the work to EFR_decode_params().

State reset functions
=====================

extern void EFR_encoder_reset(struct EFR_encoder_state *st, int dtx);
extern void EFR_decoder_reset(struct EFR_decoder_state *st);

These functions reset the state of the encoder or the decoder, respectively;
the entire state structure is fully initialized to the respective home state
defined in GSM 06.60 section 8.5 for the encoder or section 8.6 for the decoder.

EFR_encoder_reset() is called internally by EFR_encoder_create() and by the
encoder itself when it encounters the ETSI-prescribed encoder homing frame;
EFR_decoder_reset() is called internally by EFR_decoder_create() and by the
decoder itself when it encounters the ETSI-prescribed decoder homing frame.
Therefore, there is generally no need for libgsmefr users to call these
functions directly - but they are made public for the sake of completeness.

If you call EFR_encoder_reset() manually, you can change the DTX enable/disable
flag from its initial value given to EFR_encoder_create() - the new value of
this flag passed to EFR_encoder_reset() always takes effect.  There is no
provision for changing this mode within an encoder session without a full reset.