view doc/FR1-Rx-DTX @ 275:5fbb323b2978

libgsmfr2: implement encoder homing
author Mychaela Falconia <falcon@freecalypso.org>
date Sun, 14 Apr 2024 03:06:03 +0000
parents 731c98b67da1
children 4034c2b06ec8
line wrap: on
line source

At the level of provided functionality and architectural structure, ETSI GSM
specifications for DTX (discontinuous transmission) are very symmetric between
FR and EFR: the same DTX functionality is specified for both codecs, with the
same overall architecture.  However, there is one important difference: in the
case of EFR the complete implementation of all DTX functions (for both Tx and
Rx) forms an integral and inseparable part of the reference codec (implemented
in C) from the beginning, whereas in the case of FR1 the addition of DTX is
somewhat of an afterthought.  GSM 06.10 defines a "pure" FR codec without any
DTX functions, and this most basic spec can be and has been implemented in this
"pure" form - classic Unix libgsm from 1990s is a proper, fully compliant
implementation of GSM 06.10, but only this spec, without any DTX.  In contrast,
there has never existed a "pure" implementation of GSM 06.60 EFR codec without
associated Tx and Rx DTX functions.  Furthermore, there is an important
distinction between Tx and Rx DTX handlers for FR1:

* Anyone who seeks to implement Tx DTX for FR1 would have to dig into the guts
  of GSM 06.10 encoder and augment it with VAD and SID encoding functions per
  GSM 06.32 and 06.12 specs.

* In contrast, the Rx DTX handler for FR1 is modular: the way it is specified
  in GSM 06.11, 06.12 and 06.31 is a front-end to unmodified GSM 06.10 decoder.
  On the Rx side, the interface from the radio subsystem to the Rx DTX handler
  consists of 260 bits of frame plus BFI and TAF flags (the spec also defines a
  SID flag, but it is determined from frame payload bits), and then the
  interface from the Rx DTX handler to the GSM 06.10 decoder is another FR frame
  of 260 bits.

What are the implications of this situation for the GSM published-source
software community?  Prior to the present libgsmfrp offering, there has always
been libgsm, but no Rx DTX handler.  If you are working with a GSM uplink RTP
stream from a BTS or a GSM downlink frame stream read out of TI Calypso DSP or
some other GSM MS PHY, feeding that stream directly to libgsm (without passing
through an Rx DTX handler) is NOT acceptable: a "bare" GSM 06.10 decoder won't
recognize SID frames and won't produce the expected comfort noise output, and
what are you going to do in those 20 ms windows in which no good traffic frame
was received?  The situation becomes especially bad (unkind on ears) if you are
reading received downlink frames out of TI Calypso DSP: the DSP's buffer will
have *some* bit content in every 20 ms window, but naturally this bit content
will be garbage during those frame windows when no good frame was received;
feeding that garbage to libgsm produces noises that are very unkind on ears.

The correct solution is to implement an Rx DTX handler, pass the stream of
frames and flags from the BTS or the MS PHY to this handler first, and then pass
the output of this handler to libgsm 06.10 decoder.  Themyscira libgsmfrp is a
Free Software implementation of Rx DTX handler for GSM FR, implementing SID
classification, comfort noise generation and error concealment.

Effect of extra preprocessing
=============================

One key detail deserves extra emphasis before going into library API details:
if the input to libgsmfrp consists entirely of good speech frames (no SID frames
and no BFIs), then the preprocessor becomes an identity transform.  Therefore,
if the output of our libgsmfrp preprocessor were to be fed to an additional
instance of the same further down the processing chain, no extra transformation
of any kind will happen.

Using libgsmfrp
===============

The external public interface to Themyscira libgsmfrp consists of a single
header file <gsm_fr_preproc.h>; it should be installed in the same system
include directory as <gsm.h> from libgsm.  Please note that <gsm_fr_preproc.h>
includes <gsm.h>, as needed for gsm_byte and gsm_frame defined types.

The dialect of C we chose for libgsmfrp is ANSI C (function prototypes), const
qualifier is used where appropriate; however, unlike libgsmefr, the interface
to libgsmfrp is defined in terms of gsm_byte type defined in <gsm.h>, included
from <gsm_fr_preproc.h>.

State allocation and freeing
============================

The Rx DTX handler is stateful, hence you will need to allocate a preprocessor
state structure in addition to the usual libgsm state structure for your GSM FR
Rx session.  The necessary function is:

extern struct gsmfr_preproc_state *gsmfr_preproc_create(void);

struct gsmfr_preproc_state is an opaque structure to library users: you only get
a pointer which you remember and pass around, but <gsm_fr_preproc.h> does not
give you a full definition of this struct.  As a library user, you don't even
get to know the size of this struct, hence the necessary malloc() operation
happens inside gsmfr_preproc_create().  However, the structure is malloc'ed as
a single chunk, hence when you are done with it, simply call free() on the
pointer you got from gsmfr_preproc_create().

gsmfr_preproc_create() can fail if the malloc() call inside fails, in which case
it returns NULL.

Preprocessing good frames
=========================

For every good traffic frame (BFI=0) you receive from the radio subsystem, you
need to call this preprocessor function:

extern void gsmfr_preproc_good_frame(struct gsmfr_preproc_state *state,
				     gsm_byte *frame);

The second argument is both input and output, i.e., the frame is modified in
place.  If the received frame is not SID (specifically, if the SID field
deviates from the SID codeword by 16 or more bits, per GSM 06.31 section 6.1.1),
then the frame (considered a good speech frame) will be left unmodified (i.e.,
it is to be passed unchanged to the GSM 06.10 decoder), but preprocessor state
will be updated.  OTOH, if the received frame is classified as either valid or
invalid SID per GSM 06.31, then the output frame will contain comfort noise
generated by the preprocessor using a PRNG, or a silence frame in one particular
corner case.

GSM-FR RTP (or libgsm) 0xD magic: the upper nibble of the first byte can be
anything on input to gsmfr_preproc_good_frame(), but the output frame will
always have the correct magic in it.

Handling BFI conditions
=======================

If you received a lost/missing frame indication instead of a good traffic frame,
call this preprocessor function:

extern void gsmfr_preproc_bfi(struct gsmfr_preproc_state *state, int taf,
			      gsm_byte *frame_out);

TAF is a flag defined in GSM 06.31 section 6.1.1; if you don't have this flag,
pass 0 - you will lose the function of comfort noise muting in the event of
prolonged SID loss, but all other Rx DTX functions will still work the same.

With this function the 33-byte frame buffer is only an output, i.e., prior
buffer content is a don't-care and there is no provision for making any use of
erroneous frames like in EFR.  The frame generated by the preprocessor may be
substitution/muting, comfort noise or silence depending on the state.

Other miscellaneous functions
=============================

extern void gsmfr_preproc_reset(struct gsmfr_preproc_state *state);

This function resets the preprocessor state to what it is right out of
gsmfr_preproc_create(), which is naturally just a combination of malloc() and
gsmfr_preproc_reset().  Given that our Rx DTX handler state is much simpler
than, for example, EFR codec state, there does not seem to be any need for
explicit resets, but the reset function is made public for the sake of
completeness.

extern int gsmfr_preproc_sid_classify(const gsm_byte *frame);

This function analyzes an RTP-encoded FR frame (the upper nibble of the first
byte is NOT checked for 0xD signature) for the SID codeword of GSM 06.12 and
classifies the frame as SID=0, SID=1 or SID=2 per the rules of GSM 06.31
section 6.1.1.

Silence frame datum
===================

extern const gsm_frame gsmfr_preproc_silence_frame;

Many implementors make the mistake of thinking that a GSM FR silence frame is a
frame of 260 zero bits, but the official specs disagree: the silence frame given
in GSM 06.11 (3GPP TS 46.011, at the very end of the spec) is quite different.
Themyscira libgsmfrp implements the correct silence frame per the spec, and that
datum is also made public.

libgsmfrp change history: version 1.0.1 to version 1.0.2
========================================================

There are only two changes, both involving corner cases with invalid SID frames
being received:

1) An invalid SID frame was received immediately following a good speech frame.
   In this case we start CN generation, but we take the needed LARc and Xmaxc
   parameters from the last speech frame, instead of the usual procedure of
   extracting them from a valid SID frame.  The change from 1.0.1 to 1.0.2
   concerns the Xmaxc parameter in this corner case: in 1.0.1 we took Xmaxc
   from the last subframe and used it for ensuing CN generation, but in 1.0.2
   we compute a more proper mean Xmaxc from all 4 subframes, by dequantizing,
   summing and requantizing.

2) An invalid SID frame was received in the speech muting state.  The sequence
   of inputs would have to be:

   - a good speech frame;
   - one or more BFIs, but not too many, so that the cached speech frame
     does not decay fully by Xmaxc reduction;
   - an invalid SID frame.

   In version 1.0.1 we handled this even more obscure corner case by entering
   the CN muting state, i.e., the state that is normally entered upon the
   second lost SID.  In version 1.0.2 we ignore invalid SID in the speech
   muting state and act as if we got BFI, i.e., continue speech muting rather
   than switch to CN muting.

libgsmfrp change history: version 1.0.0 to version 1.0.1
========================================================

Version 1.0.0 exhibited the following defects, which are fixed in 1.0.1:

1) The last received valid SID was cached forever for the purpose of
   handling future invalid SIDs - we could have received some valid
   SID ages ago, then lots of speech or NO_DATA, and if we then get
   an invalid SID, we would resurrect the last valid SID from ancient
   history - a bad design.  In our new design, we handle invalid SID
   based on the current state, much like BFI.

2) GSM 06.11 spec says clearly that after the second lost SID
   (received BFI=1 && TAF=1 in CN state) we need to gradually decrease
   the output level, rather than jump directly to emitting silence
   frames - we previously failed to implement such logic.

3) Per GSM 06.12 section 5.2, Xmaxc should be the same in all 4 subframes
   in a SID frame.  What should we do if we receive an otherwise valid
   SID frame with different Xmaxc?  Our previous approach would
   replicate this Xmaxc oddity in every subsequent generated CN frame,
   which is rather bad.  In our new design, the very first CN frame
   (which can be seen as a transformation of the SID frame itself)
   retains the original 4 distinct Xmaxc, but all subsequent CN frames
   are based on the Xmaxc from the last subframe of the most recent SID.