view doc/EFR-library-API @ 408:8847c1740e78

libtwamr: integrate VAD1
author Mychaela Falconia <falcon@freecalypso.org>
date Tue, 07 May 2024 00:56:10 +0000
parents fe5aceaf51e0
children 9208db14b4b9
line wrap: on
line source

The external public interface to Themyscira libgsmefr consists of a single
header file <gsm_efr.h>; it should be installed in the same system include
directory as <gsm.h> from classic libgsm (1990s free software product) for the
original FR codec, and the API of libgsmefr is modeled after that of libgsm.

The dialect of C we chose for libgsmefr is ANSI C (function prototypes), const
qualifier is used where appropriate, and the interface is defined in terms of
<stdint.h> types; <gsm_efr.h> includes <stdint.h>.

State allocation and freeing
============================

In order to use the EFR encoder, you will need to allocate an encoder state
structure, and to use the EFR decoder, you will need to allocate a decoder state
structure.  The necessary state allocation functions are:

extern struct EFR_encoder_state *EFR_encoder_create(int dtx);
extern struct EFR_decoder_state *EFR_decoder_create(void);

struct EFR_encoder_state and struct EFR_decoder_state are opaque structures to
library users: you only get pointers which you remember and pass around, but
<gsm_efr.h> does not give you full definitions of these structs.  As a library
user, you don't even get to know the size of these structs, hence the necessary
malloc() operation happens inside EFR_encoder_create() and EFR_decoder_create().
However, each structure is malloc'ed as a single chunk, hence when you are done
with it, simply call free() to relinquish each encoder or decoder state
instance.

EFR_encoder_create() and EFR_decoder_create() functions can fail if the malloc()
call inside fails, in which case the two libgsmefr functions in question return
NULL.

The dtx argument to EFR_encoder_create() is a Boolean flag represented as an
int; it tells the EFR encoder whether it should operate with DTX enabled (run
GSM 06.82 VAD and emit SID frames instead of speech frames per GSM 06.81) or DTX
disabled (skip VAD and always emit speech frames).

Using the EFR encoder
=====================

To encode one 20 ms audio frame per EFR, call EFR_encode_frame():

extern void EFR_encode_frame(struct EFR_encoder_state *st, const int16_t *pcm,
			     uint8_t *frame, int *sp, int *vad);

You need to provide an encoder state structure allocated earlier with
EFR_encoder_create(), a block of 160 linear PCM samples, and an output buffer of
31 bytes (EFR_RTP_FRAME_LEN constant also defined in <gsm_efr.h>) into which the
encoded EFR frame will be written; the frame format is that defined in ETSI TS
101 318 for EFR in RTP, including the 0xC signature in the upper nibble of the
first byte.

The last two arguments of type (int *) are optional pointers to extra output
flags SP and VAD, defined in GSM 06.81 section 5.1.1; either pointer or both of
them can be NULL if these extra output flags aren't needed.  Both of these flags
are needed in order to test our libgsmefr encoder implementation against
official ETSI test sequences (GSM 06.54), but they typically aren't needed
otherwise.

Using the EFR decoder
=====================

The main interface to our EFR decoder is this function:

extern void EFR_decode_frame(struct EFR_decoder_state *st, const uint8_t *frame,
			     int bfi, int taf, int16_t *pcm);

The inputs consist of 244 bits of frame payload (the 4 upper bits of the first
byte are ignored - there is NO enforcement of 0xC signature in our frame
decoder) and BFI and TAF flags defined in GSM 06.81 section 6.1.1.  Note the
absence of a SID flag argument: EFR_decode_frame() calls our own utility
function EFR_sid_classify() to determine SID from the frame itself per the rules
of GSM 06.81 section 6.1.1.

The canonical EFR decoder always expects frame bits input to be present, even
during BFI condtions!  More specifically, if a BFI=1 decoding call comes in when
the decoder is in comfort noise generation state (after a SID), then all frame
bits passed along with BFI=1 are ignored as one would naturally expect for
frames that typically aren't transmitted at all - but if a BFI=1 decoding call
comes in when the decoder is in regular speech mode, the canonical decoder will
use the "fixed codebook excitation pulses" part of the erroneous frame (one
declared to be garbage) as part of its decoding operation!  (This part
constitutes 35 bits per subframe or 140 bits out of 244 per frame.)

BFI with no data
================

Many EFR decoder applications will be faced with a situation where they receive
a frame gap (no data at all), and they need to run the EFR decoder with BFI=1 -
but the application doesn't have any frame-bits input.  Yet the canonical EFR
decoder requires *some* erroneous frame bits to be fed to it - so what gives?
Our initial approach was to feed the decoder all zeros in the place of codec
parameters - but further analysis reveals that approach to be bad.  (To see for
yourself, study the code in d1035pf.c and think what it will do when the input
is fixed at all zeros.)  Our new approach is to generate pseudorandom bits for
these pulse parameters, as detailed below.

If you find yourself in the situation of needing to feed BFI=1 with no frame
data bits to the decoder, call the following function in the place of
EFR_decode_frame():

extern void EFR_decode_bfi_nodata(struct EFR_decoder_state *st, int taf,
				  int16_t *pcm);

This function begins by checking the internal state flag RX_SP_FLAG, indicating
whether the decoder is in speech or comfort noise generation mode.  If
RX_SP_FLAG is set, indicating speech state, then the main body of the decoder
will be making use of fixed codebook pulse parameters even for erroneous frames,
and EFR_decode_bfi_nodata() will invoke a PRNG to fill in pseudorandom bits.
If RX_SP_FLAG is clear, then the decoder is generating comfort noise following
reception of a SID, and BFI conditions are fully expected because the
transmitter is expected to be off.  In this case EFR_decode_bfi_nodata() feeds
all-zeros parameters to the main body of the decoder, as none of them will be
used.

Stateless utility functions
===========================

All functions in this section are stateless (no encoder state or decoder state
structure is needed); they merely manipulate bit fields.

extern void EFR_frame2params(const uint8_t *frame, int16_t *params);

This function unpacks an EFR codec frame in ETSI TS 101 318 RTP encoding (the
upper nibble of the first byte is NOT checked, i.e., there is NO enforcement of
0xC signature) into an array of 57 (EFR_NUM_PARAMS) parameter words for the
codec.  int16_t signed type is used for the params array (even though all
parameters are actually unsigned) in order to match the guts of ETSI-based EFR
codec, and EFR_frame2params() is called internally by EFR_decode_frame().

extern void EFR_params2frame(const int16_t *params, uint8_t *frame);

This function takes an array of 57 (EFR_NUM_PARAMS) EFR codec parameter words
and packs them into a 31-byte (EFR_RTP_FRAME_LEN) frame in ETSI TS 101 318
format.  The 0xC signature is generated by this function, and every byte of the
output buffer is fully written without regard to any previous content.  This
function is called internally by EFR_encode_frame().

extern int EFR_sid_classify(const uint8_t *frame);

This function analyzes an RTP-encoded EFR frame (the upper nibble of the first
byte is NOT checked for 0xC signature) for the SID codeword of GSM 06.62 and
classifies the frame as SID=0, SID=1 or SID=2 per the rules of GSM 06.81
section 6.1.1.

extern void EFR_insert_sid_codeword(uint8_t *frame);

This function inserts the SID codeword of GSM 06.62 into the frame in the
pointed-to buffer; specifically, the 95 bits that make up the SID field are all
set to 1s, but all other bits remain unchanged.  This function is arguably least
useful to external users of libgsmefr, but it exists because of how the original
code from ETSI generates SID frames produced by the encoder in DTX mode.

Parameter-based encoder and decoder functions
=============================================

The EFR_encode_frame() and EFR_decode_frame() functions described earlier in
this document constitute the most practically useful (intended for actual use)
interfaces to our EFR encoder and decoder, but they are actually wrappers around
these parameter-based functions:

extern void EFR_encode_params(struct EFR_encoder_state *st, const int16_t *pcm,
			      int16_t *params, int *sp, int *vad);

This function is similar to EFR_encode_frame(), but the output is an array of
57 (EFR_NUM_PARAMS) codec parameter words rather than a finished frame.  The two
extra output flags are optional (pointers may be NULL) just like with
EFR_encode_frame(), but there is a catch: if the output frame is a SID (which
can only happen if DTX is enabled), the bits inside parameter words that would
correspond to SID codeword bits are NOT set, instead one MUST call
EFR_insert_sid_codeword() after packing the frame with EFR_params2frame().  The
wrapper in EFR_encode_frame() does exactly as described, and the overall logic
follows the original code structure from ETSI.

extern void EFR_decode_params(struct EFR_decoder_state *st,
			      const int16_t *params, int bfi, int sid, int taf,
			      int16_t *pcm);

This function is similar to EFR_decode_frame() with the frame input replaced
with params array input, but the SID classification per the rules of GSM 06.81
section 6.1.1 needs to be provided by the caller.  The wrapper in
EFR_decode_frame() calls both EFR_frame2params() and EFR_sid_classify() before
passing the work to EFR_decode_params().

State reset functions
=====================

extern void EFR_encoder_reset(struct EFR_encoder_state *st, int dtx);
extern void EFR_decoder_reset(struct EFR_decoder_state *st);

These functions reset the state of the encoder or the decoder, respectively;
the entire state structure is fully initialized to the respective home state
defined in GSM 06.60 section 8.5 for the encoder or section 8.6 for the decoder.

EFR_encoder_reset() is called internally by EFR_encoder_create() and by the
encoder itself when it encounters the ETSI-prescribed encoder homing frame;
EFR_decoder_reset() is called internally by EFR_decoder_create() and by the
decoder itself when it encounters the ETSI-prescribed decoder homing frame.
Therefore, there is generally no need for libgsmefr users to call these
functions directly - but they are made public for the sake of completeness.

If you call EFR_encoder_reset() manually, you can change the DTX enable/disable
flag from its initial value given to EFR_encoder_create() - the new value of
this flag passed to EFR_encoder_reset() always takes effect.  There is no
provision for changing this mode within an encoder session without a full reset.