FreeCalypso > hg > gsm-codec-lib
view doc/EFR-library-API @ 408:8847c1740e78
libtwamr: integrate VAD1
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Tue, 07 May 2024 00:56:10 +0000 |
parents | fe5aceaf51e0 |
children | 9208db14b4b9 |
line wrap: on
line source
The external public interface to Themyscira libgsmefr consists of a single header file <gsm_efr.h>; it should be installed in the same system include directory as <gsm.h> from classic libgsm (1990s free software product) for the original FR codec, and the API of libgsmefr is modeled after that of libgsm. The dialect of C we chose for libgsmefr is ANSI C (function prototypes), const qualifier is used where appropriate, and the interface is defined in terms of <stdint.h> types; <gsm_efr.h> includes <stdint.h>. State allocation and freeing ============================ In order to use the EFR encoder, you will need to allocate an encoder state structure, and to use the EFR decoder, you will need to allocate a decoder state structure. The necessary state allocation functions are: extern struct EFR_encoder_state *EFR_encoder_create(int dtx); extern struct EFR_decoder_state *EFR_decoder_create(void); struct EFR_encoder_state and struct EFR_decoder_state are opaque structures to library users: you only get pointers which you remember and pass around, but <gsm_efr.h> does not give you full definitions of these structs. As a library user, you don't even get to know the size of these structs, hence the necessary malloc() operation happens inside EFR_encoder_create() and EFR_decoder_create(). However, each structure is malloc'ed as a single chunk, hence when you are done with it, simply call free() to relinquish each encoder or decoder state instance. EFR_encoder_create() and EFR_decoder_create() functions can fail if the malloc() call inside fails, in which case the two libgsmefr functions in question return NULL. The dtx argument to EFR_encoder_create() is a Boolean flag represented as an int; it tells the EFR encoder whether it should operate with DTX enabled (run GSM 06.82 VAD and emit SID frames instead of speech frames per GSM 06.81) or DTX disabled (skip VAD and always emit speech frames). Using the EFR encoder ===================== To encode one 20 ms audio frame per EFR, call EFR_encode_frame(): extern void EFR_encode_frame(struct EFR_encoder_state *st, const int16_t *pcm, uint8_t *frame, int *sp, int *vad); You need to provide an encoder state structure allocated earlier with EFR_encoder_create(), a block of 160 linear PCM samples, and an output buffer of 31 bytes (EFR_RTP_FRAME_LEN constant also defined in <gsm_efr.h>) into which the encoded EFR frame will be written; the frame format is that defined in ETSI TS 101 318 for EFR in RTP, including the 0xC signature in the upper nibble of the first byte. The last two arguments of type (int *) are optional pointers to extra output flags SP and VAD, defined in GSM 06.81 section 5.1.1; either pointer or both of them can be NULL if these extra output flags aren't needed. Both of these flags are needed in order to test our libgsmefr encoder implementation against official ETSI test sequences (GSM 06.54), but they typically aren't needed otherwise. Using the EFR decoder ===================== The main interface to our EFR decoder is this function: extern void EFR_decode_frame(struct EFR_decoder_state *st, const uint8_t *frame, int bfi, int taf, int16_t *pcm); The inputs consist of 244 bits of frame payload (the 4 upper bits of the first byte are ignored - there is NO enforcement of 0xC signature in our frame decoder) and BFI and TAF flags defined in GSM 06.81 section 6.1.1. Note the absence of a SID flag argument: EFR_decode_frame() calls our own utility function EFR_sid_classify() to determine SID from the frame itself per the rules of GSM 06.81 section 6.1.1. The canonical EFR decoder always expects frame bits input to be present, even during BFI condtions! More specifically, if a BFI=1 decoding call comes in when the decoder is in comfort noise generation state (after a SID), then all frame bits passed along with BFI=1 are ignored as one would naturally expect for frames that typically aren't transmitted at all - but if a BFI=1 decoding call comes in when the decoder is in regular speech mode, the canonical decoder will use the "fixed codebook excitation pulses" part of the erroneous frame (one declared to be garbage) as part of its decoding operation! (This part constitutes 35 bits per subframe or 140 bits out of 244 per frame.) BFI with no data ================ Many EFR decoder applications will be faced with a situation where they receive a frame gap (no data at all), and they need to run the EFR decoder with BFI=1 - but the application doesn't have any frame-bits input. Yet the canonical EFR decoder requires *some* erroneous frame bits to be fed to it - so what gives? Our initial approach was to feed the decoder all zeros in the place of codec parameters - but further analysis reveals that approach to be bad. (To see for yourself, study the code in d1035pf.c and think what it will do when the input is fixed at all zeros.) Our new approach is to generate pseudorandom bits for these pulse parameters, as detailed below. If you find yourself in the situation of needing to feed BFI=1 with no frame data bits to the decoder, call the following function in the place of EFR_decode_frame(): extern void EFR_decode_bfi_nodata(struct EFR_decoder_state *st, int taf, int16_t *pcm); This function begins by checking the internal state flag RX_SP_FLAG, indicating whether the decoder is in speech or comfort noise generation mode. If RX_SP_FLAG is set, indicating speech state, then the main body of the decoder will be making use of fixed codebook pulse parameters even for erroneous frames, and EFR_decode_bfi_nodata() will invoke a PRNG to fill in pseudorandom bits. If RX_SP_FLAG is clear, then the decoder is generating comfort noise following reception of a SID, and BFI conditions are fully expected because the transmitter is expected to be off. In this case EFR_decode_bfi_nodata() feeds all-zeros parameters to the main body of the decoder, as none of them will be used. Stateless utility functions =========================== All functions in this section are stateless (no encoder state or decoder state structure is needed); they merely manipulate bit fields. extern void EFR_frame2params(const uint8_t *frame, int16_t *params); This function unpacks an EFR codec frame in ETSI TS 101 318 RTP encoding (the upper nibble of the first byte is NOT checked, i.e., there is NO enforcement of 0xC signature) into an array of 57 (EFR_NUM_PARAMS) parameter words for the codec. int16_t signed type is used for the params array (even though all parameters are actually unsigned) in order to match the guts of ETSI-based EFR codec, and EFR_frame2params() is called internally by EFR_decode_frame(). extern void EFR_params2frame(const int16_t *params, uint8_t *frame); This function takes an array of 57 (EFR_NUM_PARAMS) EFR codec parameter words and packs them into a 31-byte (EFR_RTP_FRAME_LEN) frame in ETSI TS 101 318 format. The 0xC signature is generated by this function, and every byte of the output buffer is fully written without regard to any previous content. This function is called internally by EFR_encode_frame(). extern int EFR_sid_classify(const uint8_t *frame); This function analyzes an RTP-encoded EFR frame (the upper nibble of the first byte is NOT checked for 0xC signature) for the SID codeword of GSM 06.62 and classifies the frame as SID=0, SID=1 or SID=2 per the rules of GSM 06.81 section 6.1.1. extern void EFR_insert_sid_codeword(uint8_t *frame); This function inserts the SID codeword of GSM 06.62 into the frame in the pointed-to buffer; specifically, the 95 bits that make up the SID field are all set to 1s, but all other bits remain unchanged. This function is arguably least useful to external users of libgsmefr, but it exists because of how the original code from ETSI generates SID frames produced by the encoder in DTX mode. Parameter-based encoder and decoder functions ============================================= The EFR_encode_frame() and EFR_decode_frame() functions described earlier in this document constitute the most practically useful (intended for actual use) interfaces to our EFR encoder and decoder, but they are actually wrappers around these parameter-based functions: extern void EFR_encode_params(struct EFR_encoder_state *st, const int16_t *pcm, int16_t *params, int *sp, int *vad); This function is similar to EFR_encode_frame(), but the output is an array of 57 (EFR_NUM_PARAMS) codec parameter words rather than a finished frame. The two extra output flags are optional (pointers may be NULL) just like with EFR_encode_frame(), but there is a catch: if the output frame is a SID (which can only happen if DTX is enabled), the bits inside parameter words that would correspond to SID codeword bits are NOT set, instead one MUST call EFR_insert_sid_codeword() after packing the frame with EFR_params2frame(). The wrapper in EFR_encode_frame() does exactly as described, and the overall logic follows the original code structure from ETSI. extern void EFR_decode_params(struct EFR_decoder_state *st, const int16_t *params, int bfi, int sid, int taf, int16_t *pcm); This function is similar to EFR_decode_frame() with the frame input replaced with params array input, but the SID classification per the rules of GSM 06.81 section 6.1.1 needs to be provided by the caller. The wrapper in EFR_decode_frame() calls both EFR_frame2params() and EFR_sid_classify() before passing the work to EFR_decode_params(). State reset functions ===================== extern void EFR_encoder_reset(struct EFR_encoder_state *st, int dtx); extern void EFR_decoder_reset(struct EFR_decoder_state *st); These functions reset the state of the encoder or the decoder, respectively; the entire state structure is fully initialized to the respective home state defined in GSM 06.60 section 8.5 for the encoder or section 8.6 for the decoder. EFR_encoder_reset() is called internally by EFR_encoder_create() and by the encoder itself when it encounters the ETSI-prescribed encoder homing frame; EFR_decoder_reset() is called internally by EFR_decoder_create() and by the decoder itself when it encounters the ETSI-prescribed decoder homing frame. Therefore, there is generally no need for libgsmefr users to call these functions directly - but they are made public for the sake of completeness. If you call EFR_encoder_reset() manually, you can change the DTX enable/disable flag from its initial value given to EFR_encoder_create() - the new value of this flag passed to EFR_encoder_reset() always takes effect. There is no provision for changing this mode within an encoder session without a full reset.