view doc/FR1-library-API @ 542:f2d0f2f15d5f

libgsmefr: add wrapper for TW-TS-001 RTP input
author Mychaela Falconia <falcon@freecalypso.org>
date Sat, 28 Sep 2024 06:38:08 +0000
parents a3300483ae74
children
line wrap: on
line source

Libgsmfr2 general usage
=======================

The external public interface to Themyscira libgsmfr2 consists of a single
header file <tw_gsmfr.h>; it should be installed in some system include
directory.

The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function
prototypes), const qualifier is used where appropriate, and the interface is
defined in terms of <stdint.h> types; <tw_gsmfr.h> includes <stdint.h>.  The
use of old libgsm defined types (gsm_byte, gsm_frame and gsm_signal) has been
abolished in the migration from libgsm+libgsmfrp to libgsmfr2.

GSM 06.10 encoder and decoder
=============================

Both the encoder and the decoder are stateful; each running instance of either
element needs its own state structure.  However, this GSM 06.10 component of
libgsmfr2 shares a peculiar property with old libgsm from which it was derived:
the same state structure (struct gsmfr_0610_state) is used by both entities.
Needless to say, each given instance of struct gsmfr_0610_state must be used
for only one purpose, either for the encoder or for the decoder; mixing calls
to encoder and decoder functions with the same state structure is an invalid
operation with undefined results.

State structures for the basic encoder or decoder are allocated with this
function:

struct gsmfr_0610_state *gsmfr_0610_create(void);

This function allocates dynamic memory for the state structure with malloc()
and returns a pointer to the allocated and initialized struct if successful, or
NULL if malloc() fails.  The state structure is malloc'ed as a single chunk,
hence when you are done with it, simply free() it.

The initialization or reset portion of gsmfr_0610_create() operation can always
be repeated with this function:

void gsmfr_0610_reset(struct gsmfr_0610_state *state);

To support applications that need (or prefer) to use some different method of
managing their memory allocations, the library also exports this const datum:

extern const unsigned gsmfr_0610_state_size;

Using this feature, one can replace gsmfr_0610_create() with something like the
following (example for applications based on Osmocom libraries):

	struct gsmfr_0610_state *st;
	st = talloc_size(ctx, gsmfr_0610_state_size);
	if (st)
		gsmfr_0610_reset(st);

Immediately after gsmfr_0610_create() or gsmfr_0610_reset(), the "virgin" state
structure can be used either for the encoder or for the decoder; however, once
that state struct has been passed to functions of either group, it can only be
used for that functional group.

Encoder specifics
-----------------

The most elementary single-frame processing function of libgsmfr2 GSM 06.10
encoder is:

void gsmfr_0610_encode_params(struct gsmfr_0610_state *st, const int16_t *pcm,
			      struct gsmfr_param_frame *param);

The input is an array of 160 linear PCM samples (left-justified in int16_t),
and the output is this structure:

struct gsmfr_param_frame {
	int16_t	LARc[8];
	int16_t	Nc[4];
	int16_t	bc[4];
	int16_t	Mc[4];
	int16_t	xmaxc[4];
	int16_t	xMc[4][13];
};

Most of the time the following wrapper function is more useful:

void gsmfr_0610_encode_frame(struct gsmfr_0610_state *st, const int16_t *pcm,
			     uint8_t *frame);

The output is a 33-byte buffer, filled with the encoded GSM-FR speech frame in
the RTP format specified in ETSI TS 101 318 and IETF RFC 3551.

If the optional encoder homing feature is desired, call this function right
after the call to gsmfr_0610_encode_frame() or gsmfr_0610_encode_params():

void gsmfr_0610_encoder_homing(struct gsmfr_0610_state *st, const int16_t *pcm);

This function checks to see if the PCM frame (160 linear PCM samples) is an EHF;
if the input frame is indeed EHF, the function calls gsmfr_0610_reset().

Decoder specifics
-----------------

The internal native form of the 06.10 decoder once again uses
struct gsmfr_param_frame:

void gsmfr_0610_decode_params(struct gsmfr_0610_state *st,
			      const struct gsmfr_param_frame *param,
			      int16_t *pcm);

The more commonly used RTP-format version is:

void gsmfr_0610_decode_frame(struct gsmfr_0610_state *st, const uint8_t *frame,
			     int16_t *pcm);

Please note:

1) The basic GSM 06.10 decoder is just that: there is no SID recognition or DTX
   handling, every possible input bit pattern will be interpreted and decoded
   as a GSM 06.10 speech frame.

2) There is no decoder homing function at this layer, and no check for DHF.

3) The RTP signature nibble 0xD is ignored (not checked) by
   gsmfr_0610_decode_frame().

Rx DTX preprocessor block
=========================

The Rx DTX preprocessor is its own stateful element, independent from the 06.10
decoder to which it is usually coupled.  Libgsmfr2 provides a "fulldec" wrapper
that incorporates both elements, but the ability to use the Rx DTX preprocessor
by itself still remains, unchanged from our previous libgsmfrp offering.  One
significant application for this preprocessor by itself, without immediately
following it with the GSM 06.10 decode step, is the TFO/TrFO transform of 3GPP
TS 28.062 section C.3.2.1.1 for GSM-FR: our Rx DTX preprocessor does exactly
what that section calls for, specifically in "case 1" where the input UL frame
stream may contain SIDs and BFI frame gaps, but the output must be 100% valid
frames and SID-free.  The current version of libgsmfr2 includes some additional
provisions for using our preprocessor block as a TFO transform in both non-DTXd
and DTXd-enabled configurations, as detailed in a later section of this
document.

The state structure for this block is struct gsmfr_preproc_state, and it is
allocated with this function:

struct gsmfr_preproc_state *gsmfr_preproc_create(void);

Like other state structures in Themyscira GSM codec libraries, this opaque
state is malloc'ed as a single chunk and can be simply freed afterward.  A
reset function is also provided:

void gsmfr_preproc_reset(struct gsmfr_preproc_state *state);

There is also a public const datum with the size of this structure, allowing
use of talloc and other alternative schemes:

extern const unsigned gsmfr_preproc_state_size;

Preprocessing good frames
-------------------------

For every good traffic frame (BFI=0) you receive from the radio subsystem, you
need to call this preprocessor function:

void gsmfr_preproc_good_frame(struct gsmfr_preproc_state *state,
				uint8_t *frame);

The second argument is both input and output, i.e., the frame is modified in
place.  If the received frame is not SID (specifically, if the SID field
deviates from the SID codeword by 16 or more bits, per GSM 06.31 section 6.1.1),
then the frame (considered a good speech frame) will be left unmodified (i.e.,
it is to be passed unchanged to the GSM 06.10 decoder), but preprocessor state
will be updated.  OTOH, if the received frame is classified as either valid or
invalid SID per GSM 06.31, then the output frame will contain comfort noise
generated by the preprocessor using a PRNG, or a speech muting or silence frame
in some corner cases involving invalid SID.

GSM-FR RTP (originally libgsm) 0xD magic: the upper nibble of the first byte
can be anything on input to gsmfr_preproc_good_frame(), but the output frame
will always have the correct magic in it.

There is also a variant of this function (implemented as a wrapper) that applies
homing logic:

void gsmfr_preproc_good_frame_hm(struct gsmfr_preproc_state *state,
				 uint8_t *frame);

This function operates just like plain gsmfr_preproc_good_frame() except for
one difference: if the input matches the decoder homing frame (DHF), the state
is reset with an internal call to gsmfr_preproc_reset().  (Because the DHF is
still a good speech frame, it is always passed through to the output unchanged
by both functions - the only difference is the effect on subsequent state.)
The homing version of good frame preproc is intended for TFO applications, and
is invoked internally by gsmfr_tfo_xfrm_main() function described in a later
section of this document.

Handling BFI conditions
-----------------------

If you received a lost/missing frame indication instead of a good traffic frame,
call one of these preprocessor functions:

void gsmfr_preproc_bfi(struct gsmfr_preproc_state *state, int taf,
			uint8_t *frame_out);

or

void gsmfr_preproc_bfi_bits(struct gsmfr_preproc_state *state,
			    const uint8_t *bad_frame, int taf,
			    uint8_t *frame_out);

gsmfr_preproc_bfi_bits() should be called if you received payload bits along
with the BFI flag; plain gsmfr_preproc_bfi() should be called if you received
BFI with no data.  The bad frame passed to gsmfr_preproc_bfi_bits() is used
only to check if the BFI should be handled as an invalid SID rather than the
more common case of an unusable frame - see GSM 06.31 for definitions of these
terms.  Past the SID check, the bad frame content is a don't-care, and there is
no provision for making any use of erroneous frames like in EFR.

TAF is a flag defined in GSM 06.31 section 6.1.1; if you don't have this flag,
pass 0 - you will lose the function of comfort noise muting in the event of
prolonged SID loss, but all other Rx DTX functions will still work the same.

With both functions the 33-byte buffer pointed to by frame_out is only an
output, i.e., prior buffer content is a don't-care.  The frame generated by the
preprocessor may be substitution/muting, comfort noise or silence depending on
the state.

gsmfr_preproc_bfi_bits() arguments bad_frame and frame_out can point to the
same memory: the function finishes analyzing bad_frame input before it starts
writing to frame_out.

GSM-FR full decoder
===================

The full decoder is a high-level feature of libgsmfr2, incorporating both the
Rx DTX preprocessor block and the GSM 06.10 decoder block.  The state structure
for the full decoder (struct gsmfr_fulldec_state) internally incorporates both
struct gsmfr_0610_state and gsmfr_preproc_state, but because it is implemented
inside libgsmfr2, it is still malloc'ed as a single chunk and can thus be
released with a single free() call.  The functions for allocating and
initializing this state follow the established pattern:

struct gsmfr_fulldec_state *gsmfr_fulldec_create(void);

void gsmfr_fulldec_reset(struct gsmfr_fulldec_state *state);

extern const unsigned gsmfr_fulldec_state_size;

The reset function internally calls gsmfr_0610_reset() and
gsmfr_preproc_reset(), initializing both processing blocks.

Frame processing functions are also straightforward:

void gsmfr_fulldec_good_frame(struct gsmfr_fulldec_state *state,
				const uint8_t *frame, int16_t *pcm);

void gsmfr_fulldec_bfi(struct gsmfr_fulldec_state *state, int taf,
			int16_t *pcm);

void gsmfr_fulldec_bfi_bits(struct gsmfr_fulldec_state *state,
			    const uint8_t *bad_frame, int taf, int16_t *pcm);

These functions follow the same pattern as gsmfr_preproc_good_frame(),
gsmfr_preproc_bfi() and gsmfr_preproc_bfi_bits(), but the output is a 160-sample
linear PCM buffer.  Also note that the frame input to gsmfr_fulldec_good_frame()
is const, unlike the situation with gsmfr_preproc_good_frame() - the copying
into a scratchpad buffer (on the stack) happens inside this "fulldec" wrapper.

The "fulldec" layer also adds the decoder homing feature:
gsmfr_fulldec_good_frame() detects decoder homing frames and invokes
gsmfr_fulldec_reset() when required, and also implements EHF output per the
spec.

Full decoder RTP input
----------------------

If a network element is receiving GSM-FR input via RTP and needs to feed this
input to the decoder, the RTP payload handler needs to support both the basic
RTP format of ETSI TS 101 318 (also RFC 3551) and the extended RTP format of
TW-TS-001.  Depending on the format received, and depending on bit flags in the
TEH octet in the case of TW-TS-001, one of the 3 main processing functions
listed above will need to be called.  Seeing that this complex logic should be
abstracted away from applications into the library, we've added the following
wrapper function:

int gsmfr_fulldec_rtp_in(struct gsmfr_fulldec_state *state,
			 const uint8_t *rtp_pl, unsigned rtp_pl_len,
			 int16_t *pcm);

The input is the received RTP payload: array of bytes and length.  It is
acceptable to pass 0 as rtp_pl_len, in which case rtp_pl pointer can be NULL.
The function proceeds as follows:

* If the input is valid RTP format for GSM-FR (either basic or extended), it is
  passed to the appropriate main processing function.  Unlike the permissive
  stance taken in lower-level functions, RTP input validation includes a check
  of 0xD signature of GSM-FR, as well as validation of TEH octet signature and
  consistency in the case of TW-TS-001.  The return value is 0, indicating that
  good input was received.

* If the input is a zero-length payload (rtp_pl_len is 0, rtp_pl may be NULL),
  it is treated like BFI-no-data with TAF=0.  The return value is 0, meaning
  that this input is still considered valid.

* All other inputs are considered invalid.  Linear PCM output is still generated
  by calling gsmfr_fulldec_bfi(), but the return value is -1, signaling invalid
  RTP input.

TFO transform
=============

"TFO transform" is the term adopted by Themyscira Wireless for the non-trivial
transform on GSM codec frames called for by the TFO spec, 3GPP TS 28.062
section C.3.2.1.1.  For each of the 3 classic GSM codecs, this transform can
operate in two modes:

DTXd=0: the input UL frame stream from call leg A may contain SIDs and BFI
frame gaps, but the output to call leg B DL must be 100% valid frames and
SID-free.

DTXd=1: the output to call leg B DL is allowed to contain both good speech and
valid SID frames, just like the output of a DTX-enabled speech encoder.
Furthermore, it can be presumed that network operators who enable DTXd seek to
reap its benefits in terms of radio interference reduction, hence the
DTXd-enabled TFO transform should actually make use of DTXd capability.

In the case of GSM-FR codec, the TFO transform with DTXd=0 is identical to the
Rx DTX preprocessor part of the standard endpoint decoder, hence our "preproc"
block is directly suited to serve as such.  OTOH, the case of DTXd=1 is
different: heeding the implied need to actually make use of DTXd when possible
requires implementing a transform that is not the same as the preprocessor to
be applied just prior to local GSM 06.10 decoding, hence the DTXd-enabled TFO
transform is a different entity.

The approach implemented in Themyscira libgsmfr2 is a hybrid:

* The preprocessor block described earlier in this document functions both as
  the necessary component of the full endpoint decoder and as the TFO transform
  for DTXd=0.

* TFO transform for DTXd=1 is implemented as a two-step process:

1) Regular main processing functions of the preproc block produce output that
   is SID-free, containing synthetic "speech" frames in the case of comfort
   noise or silence.

2) A special post-processor function needs to be called immediately afterward.
   This function selectively transforms some output frames into SIDs based on
   a flag set in the state structure.

In order to make this approach possible, all main processing functions of the
preproc block do a little bit of extra housekeeping to keep track of whether or
not their output can be replaced with SID, logic that is unnecessary when this
block functions as part of the full endpoint decoder or as non-DTXd TFO
transform.  However, this logic is very simple and the overhead is very light.

TFO transform API
-----------------

The state structure was already described earlier: it is
struct gsmfr_preproc_state, created either with gsmfr_preproc_create() or by
externally allocating the needed memory based on gsmfr_preproc_state_size and
then initializing it with gsmfr_preproc_reset().  The following API functions
are then available:

int gsmfr_tfo_xfrm_main(struct gsmfr_preproc_state *state,
			const uint8_t *rtp_in, unsigned rtp_in_len,
			uint8_t *frame_out);

int gsmfr_tfo_xfrm_dtxd(struct gsmfr_preproc_state *state, uint8_t *frame_out);

gsmfr_tfo_xfrm_main() is the TFO transform counterpart to
gsmfr_fulldec_rtp_in(), described in detail earlier.  It is also possible (and
allowed) to call gsmfr_preproc_* main processing functions directly, but the
RTP wrapper is convenient for the same reasons as in the case of the full
decoder.  In this mode of usage, the only difference between the full decoder
and the TFO transform is that the former emits linear PCM output, whereas the
latter emits 33-byte GSM-FR codec frames to be sent to call leg B downlink.

The return value from gsmfr_tfo_xfrm_main() is the same as that of
gsmfr_fulldec_rtp_in(): 0 if the the RTP input was considered good or -1 if it
is invalid.  In the case of invalid RTP input that produces -1 return value,
gsmfr_tfo_xfrm_main() calls gsmfr_preproc_bfi(), just like how
gsmfr_fulldec_rtp_in() calls gsmfr_fulldec_bfi() under the same conditions.

If DTXd is in use, then the call to gsmfr_tfo_xfrm_main() needs to be directly
followed by a call to gsmfr_tfo_xfrm_dtxd(), operating on the same output buffer
with the same state structure.  The output will then be changed to SID when
appropriate for the current state.

The return value from gsmfr_tfo_xfrm_dtxd() is the SP flag of GSM 06.31: 1 if
the output frame is speech or 0 if it is SID.

TFO transform homing
--------------------

3GPP specs are silent on whether or not TFO transforms should implement homing,
i.e., whether or not they should reset to home state when a decoder homing frame
passes through.  However, at Themyscira Wireless we believe in building
deterministic systems whose bit-exact behavior can be modeled and relied upon;
for this reason, our implementation of TFO transform does include in-band
homing.  In accord with this design decision, gsmfr_tfo_xfrm_main() internally
calls gsmfr_preproc_good_frame_hm() described earlier instead of plain
gsmfr_preproc_good_frame().

With DTXd=1, if a stream of DHFs is input to the TFO transform, the same stream
of DHFs will appear on the output, i.e., DTXd won't kick in.  (The same behavior
occurs in a standard 3GPP-compliant speech encoder whose input is a stream of
0xD5 octets in PCMA or 0xFE in PCMU.)  However, any BFIs following this DHF
will be immediately converted to SID, under the same conditions when our TFO
transform with DTXd=0 emits silence frames of GSM 06.11.

Stateless utility functions
===========================

Conversions between RTP packed format and broken-down codec parameters are
stateless and implemented with highly efficient code.  There are two versions;
this version converts between packed frames and struct gsmfr_param_frame used
by 06.10 encoder and decoder functions:

void gsmfr_pack_frame(const struct gsmfr_param_frame *param, uint8_t *frame);
void gsmfr_unpack_frame(const uint8_t *frame, struct gsmfr_param_frame *param);

and this version converts between packed frames and a straight linear array of
76 parameters:

void gsmfr_pack_from_array(const int16_t *params, uint8_t *frame);
void gsmfr_unpack_to_array(const uint8_t *frame, int16_t *params);

The latter functions gsmfr_pack_from_array() and gsmfr_unpack_to_array() are
drop-in replacements for gsm_implode() and gsm_explode() from old libgsm.  The
order of parameters in this array is the canonical one: first all LARc, then
all params for the first subframe, then the second subframe, then the third and
the fourth.  OTOH, struct gsmfr_param_frame uses functional grouping, chosen
for ease of porting of original libgsm code.

Both unpacking functions (gsmfr_unpack_frame() and gsmfr_unpack_to_array())
ignore the upper nibble of the first byte, i.e., the 0xD signature is not
enforced.  However, this signature is always set correctly by gsmfr_pack_frame()
and gsmfr_pack_from_array(), and also by gsmfr_0610_encode_frame() function
which calls gsmfr_pack_frame() as its finishing step.

The last remaining stateless utility function performs SID classification of
received GSM-FR frames:

int gsmfr_preproc_sid_classify(const uint8_t *frame);

This function analyzes an RTP-encoded FR frame (the upper nibble of the first
byte is NOT checked for 0xD signature) for the SID codeword of GSM 06.12 and
classifies the frame as SID=0, SID=1 or SID=2 per the rules of GSM 06.31
section 6.1.1.  This classification is the first processing step performed by
gsmfr_preproc_good_frame().

Public constant definitions
===========================

Our public header file <tw_gsmfr.h> provides these constant definitions, which
should be self-explanatory:

#define	GSMFR_RTP_FRAME_LEN	33
#define	GSMFR_NUM_PARAMS	76

Public const data items
=======================

There are two special GSM-FR frame bit patterns defined in the specs: there is
the silence frame of GSM 06.11, and there is the decoder homing frame specified
in later versions of GSM 06.10.  RTP-packed representations of both frames are
included in libgsmfr2, and are made public:

extern const uint8_t gsmfr_preproc_silence_frame[GSMFR_RTP_FRAME_LEN];
extern const uint8_t gsmfr_decoder_homing_frame[GSMFR_RTP_FRAME_LEN];