view doc/FR1-Rx-DTX @ 408:8847c1740e78

libtwamr: integrate VAD1
author Mychaela Falconia <falcon@freecalypso.org>
date Tue, 07 May 2024 00:56:10 +0000
parents 4034c2b06ec8
children
line wrap: on
line source

At the level of provided functionality and architectural structure, ETSI GSM
specifications for DTX (discontinuous transmission) are very symmetric between
FR and EFR: the same DTX functionality is specified for both codecs, with the
same overall architecture.  However, there is one important difference: in the
case of EFR the complete implementation of all DTX functions (for both Tx and
Rx) forms an integral and inseparable part of the reference codec (implemented
in C) from the beginning, whereas in the case of FR1 the addition of DTX is
somewhat of an afterthought.  GSM 06.10 defines a "pure" FR codec without any
DTX functions, and this most basic spec can be and has been implemented in this
"pure" form - classic Unix libgsm from 1990s is a proper, fully compliant
implementation of GSM 06.10, but only this spec, without any DTX.  In contrast,
there has never existed a "pure" implementation of GSM 06.60 EFR codec without
associated Tx and Rx DTX functions.  Furthermore, there is an important
distinction between Tx and Rx DTX handlers for FR1:

* Anyone who seeks to implement Tx DTX for FR1 would have to dig into the guts
  of GSM 06.10 encoder and augment it with VAD and SID encoding functions per
  GSM 06.32 and 06.12 specs.

* In contrast, the Rx DTX handler for FR1 is modular: the way it is specified
  in GSM 06.11, 06.12 and 06.31 is a front-end to unmodified GSM 06.10 decoder.
  On the Rx side, the interface from the radio subsystem to the Rx DTX handler
  consists of 260 bits of frame plus BFI and TAF flags (the spec also defines a
  SID flag, but it is determined from frame payload bits), and then the
  interface from the Rx DTX handler to the GSM 06.10 decoder is another FR frame
  of 260 bits.

What are the implications of this situation for the GSM published-source
software community?  Prior to the present Themyscira offering, there has always
been libgsm, but no Rx DTX handler.  If you are working with a GSM uplink RTP
stream from a BTS or a GSM downlink frame stream read out of TI Calypso DSP or
some other GSM MS PHY, feeding that stream directly to libgsm (without passing
through an Rx DTX handler) is NOT acceptable: a "bare" GSM 06.10 decoder won't
recognize SID frames and won't produce the expected comfort noise output, and
what are you going to do in those 20 ms windows in which no good traffic frame
was received?  The situation becomes especially bad (unkind on ears) if you are
reading received downlink frames out of TI Calypso DSP: the DSP's buffer will
have *some* bit content in every 20 ms window, but naturally this bit content
will be garbage during those frame windows when no good frame was received;
feeding that garbage to libgsm produces noises that are very unkind on ears.

The correct solution is to implement an Rx DTX handler, pass the stream of
frames and flags from the BTS or the MS PHY to this handler first, and then pass
the output of this handler to the standard GSM 06.10 decoder (classic libgsm or
some updated port thereof).  Themyscira libgsmfrp was our first Free Software
implementation of Rx DTX handler for GSM-FR, implementing SID classification,
comfort noise generation and error concealment.  Our new libgsmfr2 offering
takes the harmonization effort (between GSM-FR and other GSM codecs) one step
further, eliminating the dependency on old libgsm and putting all GSM-FR codec
functions "under one roof".

libgsmfrp/libgsmfr2 API documentation
=====================================

The Rx DTX component of libgsmfr2 has the same API as our previous libgsmfrp,
except for dropping the use of <gsm.h> and its types and needing to include our
new API header <tw_gsmfr.h>.  The present article previously contained the full
description of this API; that description has now been moved to FR1-library-API
article, where the whole of libgsmfr2 is documented.

Standalone exerciser utility
============================

The present GSM codec libraries and utilities package includes a standalone
utility that exercises our Rx DTX handler for GSM-FR.  This utility is
gsmfr-preproc, to be run as follows:

gsmfr-preproc input.gsmx output.gsm

The input is an extended-libgsm file that can contain SIDs and BFI frame gaps
in addition to regular GSM 06.10 speech frames (see Binary-file-format article);
the output is GSM 06.10 speech frames only.

False SID detection
===================

The intent of GSM-FR spec authors was that the sets of possible speech frames
and possible SID frames be disjoint.  Prior to introduction of DTX, there were
only regular speech frames per GSM 06.10, no SID, and a receiver had to deal
with only two possibilities: either a good speech frame was received, or the
frame was lost to radio errors or FACCH stealing (unusable frame).  When SID
frames were introduced for the purpose of intentional DTX as distinct from
radio errors, the intent was that SID was to be a "new animal" not seen before,
distinct from regular speech frames.  There is, however, a small blemish in the
actual system as realized: if the SID frame detector and the Rx DTX handler
that follows it in the Rx chain follow the rules of GSM 06.31 sections 6.1.1
and 6.1.2, respectively (like our implementation does), then some speech frames
may be mistaken for invalid SID, or perhaps even for valid SID, producing a
nonzero failure rate in this mechanism.

Official test sequence 02 in the set of 5 provided by ETSI exhibits this effect:
Seq02.inp is a legitimate 13-bit linear PCM input to the speech encoder, and the
corresponding output of GSM 06.10 encoder is contained in Seq02.cod.  However,
that output contains some frames that are mistakenly classified as SID=1
(invalid SID) by the rules of GSM 06.31 section 6.1.1!  It is true that these
ancient test sequences chronologically predate the invention of DTX and
GSM 06.31, but we still need to bear in mind that this problematic Seq02.cod is
not an artificially constructed sequence of 06.10 codec parameters: it is the
required output of the prescribed bit-exact encoder given a legitimate PCM
input!  There does not exist a perfect solution to this problem: as usual,
real-world engineering is all about trade-offs and compromises, and occasionally
a gear will slip.  The best we can do is to model the probability of such
gear-slip or wrong detection events, and engineer our systems to reduce this
probability to a level that is deemed acceptable - which is exactly what GSM
spec designers did here.

As of gsm-codec-lib-r3, gsmrec-dump utility shows the SID classification result
(GSM 06.31 section 6.1.1) in addition to parsed 06.10 codec parameters for each
frame, thus one can inspect FR-encoded streams and check for this blemish.

Effect of extra preprocessing
=============================

What will happen if the output of our Rx DTX preprocessor (e.g., the output of
gsmfr-preproc utility) is fed to another utility such as gsmfr-decode that also
applies the same preprocessor to its input?  In other words, what is the effect
of a secondary preprocessor application to previous preprocessor output?

Most of the time, the second preprocessor pass will be an identity transform
under these conditions, as the input to that second pass will consist entirely
of good speech frames, no SIDs and no BFIs.  Any speech frames in the original
input that were mistakenly classified as SID (valid or invalid) have already
been converted to comfort noise (or to the silence frame in one corner case of
invalid SID), hence they are no longer present in the output to trigger this
effect a second time.  However, there is still a small possibility that a
second pass will be a non-identity transform: pseudorandom RPE pulse parameters
in our comfort noise output are uniformly distributed between 1 and 6 (GSM 06.12
section 6.1), and if PRNG dice roll such that at least 80 out of 95 SID codeword
bit positions (all in the xMc part of the frame) are all zeros, the resulting
CN frame will be liable to misinterpretation as SID (invalid SID most of the
time, or even more rarely valid SID if at least 94 out of 95 SID codeword bit
positions are all zeros) if fed to the preprocessor a second time.  That second
pass would then further alter those affected frames, but no others.