view doc/FR1-Rx-DTX-detail @ 552:6ab066180ec2

doc: new article FR1-Rx-DTX-detail
author Mychaela Falconia <falcon@freecalypso.org>
date Mon, 07 Oct 2024 00:25:50 +0000
parents
children 62943a1ad64e
line wrap: on
line source

Rx DTX handler implementation details
=====================================

As explained in the basic FR1-Rx-DTX article, an Rx DTX handler has to be
inserted between the output of the Rx radio subsystem and the input to the
basic GSM 06.10 speech decoder.  In ThemWi codec library architecture, we
normally run a full decoder for GSM-FR that combines the Rx DTX handler and
the basic 06.10 decoder, and the Rx DTX handler block by itself also serves
as a TFO transform.

This Rx DTX handler is based on several GSM specs: 06.11 for the error
concealment function, 06.12 for the comfort noise insertion function, and 06.31
for overall Rx DTX handling.  However, these specs give a lot of leeway to
implementors, hence it is prudent to document the specific choices made in the
present ThemWi implementation.

Error concealment implementation
================================

Error concealment is also called substitution and muting of lost frames.  The
implementation of this function in Themyscira libgsmfr2 is based on the Example
solution presented in chapter 6 of 3GPP TS 46.011 (formerly GSM 06.11), applying
the most literal reading to this spec section.

When unusable frames (as defined in GSM 06.31) occur during speech state (i.e.,
not following a SID), the present logic kicks in.  For the first BFI following
good speech, the last speech frame is repeated verbatim.  On the second BFI the
muting logic of Xmaxc reduction kicks in, decrementing each of the 4 Xmaxc
parameters by 4 with each emitted frame.  RPE grid position parameters are
randomized at the same time.  The frame in which all 4 Xmaxc parameters equal 0
(either because they were already 0 or because they got reduced to 0 by the
muting sequence) is the last frame emitted in this state; all subsequent BFIs
will be turned into fixed-bit-pattern silence frames as given in TS 46.011
Table 1.

If a BFI comes in when the Rx DTX handler is in its reset (or homed) state, the
output proceeds directly to silence frames.

Comfort noise insertion
=======================

Comfort noise generation and updating is specified in GSM 06.12 section 6.1.
Most of this section is very straightforward, and is implemented in ThemWi
libgsmfr2 exactly as specified, except for the very last sentence in that
section:

"When updating the comfort noise, the parameters above should preferably be
 interpolated over a few frames to obtain smooth transitions."

ThemWi implementation of Rx DTX handler in libgsmfr2 does not do this "should
preferably" part: no interpolation is done on CN parameters; as soon as each
SID update comes in, the new parameters are used immediately for all generated
CN frames.

Because the spec says "should preferably" rather than "shall", we can "get away"
with not implementing CN interpolation.  But there is an even more profound
issue: we have yet to find anyone else's implementation, which we could use as
guidance, that does CN parameter interpolation for FRv1.  (Such interpolation
is mandatory and defined in bit-exact terms for HRv1 and EFR, but FRv1 is a
different story.)

We had a hope that Nokia TCSM2 (a historical hw implementation of GSM TRAU
network element) might implement CN interpolation for FRv1 - but our
experimental findings on that platform are inconclusive:

* When acting as a TFO transform for FRv1, this TRAU does not interpolate CN
  parameters, it makes abrupt changes in CN output just like our implementation
  - but it effects a strange delay of 24 frames, suggesting that they have some
  code paths that assume CN interpolation would be applied.

* When the TRAU acts as a regular speech decoder (not TFO), it is not clear how
  it performs any of Rx DTX functions: Nokia chose to not implement the optional
  in-band homing feature for FRv1, thus we have no way to explore bit-exact
  behaviour of their speech decoder via test sequences.

Another enticing idea would be to statically reverse-engineer the DSP ROM of TI
Calypso chip and thus recover its complete speech Rx chain - but of course the
effort would be extremely massive, and is not likely to happen any time soon.

Until we either get around to the far-future task of Calypso DSP static
reversing or find some other implementation of GSM-FR Rx DTX handler that does
CN interpolation and whose operation we can replicate, we shall stick to the
simple approach of not doing CN interpolation.

Handling of SID frames with Xmaxc discrepancy
=============================================

Per GSM 06.12 section 5.2, all 4 subframe Xmaxc parameters in a SID frame are
supposed to be equal, encoding the quantized form of mean(Xmax).  However, what
should Rx DTX implementations do when they receive an otherwise-valid SID frame
in which these 4 parameters are not all equal?  In our implementation, we handle
such discrepancy as follows:

* In those frame positions in which we receive a fresh SID (initial or update),
  the CN frame we emit is a direct transformation of the received SID, and all
  4 Xmaxc parameters are passed through intact.

* When we emit CN frames based on remembered LARc and Xmaxc parameters, we use
  the last-subframe Xmaxc from the most recently received SID frame.

Lost SID handling and CN muting
===============================

In accord with GSM 06.11 sections 5.3, when we receive an unusable frame in a
TAF position during CN insertion state, we set a flag that remembers this
condition, but don't switch to CN muting right away.  Per section 5.4 of the
same spec, we initiate CN muting when a second lost SID event occurs (unusable
frame received in a TAF position) without intervening good speech frames or
accepted SID frames.

When we do enter CN muting state, we decrement CN Xmaxc (always the same for
all 4 subframes) by 4 on each output frame, following the Example solution of
3GPP TS 46.011 (formerly GSM 06.11) chapter 6.  Once this CN Xmaxc reaches 0,
we switch to emitting fixed-bit-pattern silence frames of TS 46.011 Table 1.

Handling of invalid SID frames
==============================

In agreement with GSM 06.31 spec, we recognize invalid SID and invoke the
appropriate handler in all 3 combinations: BFI=0 SID=1, BFI=1 SID=1, and
BFI=1 SID=2.  The real complexity, however, lies in what that invalid SID
handler actually does:

* If invalid SID arrives when we are already in CN insertion state, we treat it
  the same as an unusable frame (continue CN output with current parameters),
  but the flag of lost SID is reset, as required by our interpretation of the
  specs.

* If invalid SID arrives in CN muting state, i.e., after two consecutive lost
  SID events, the muting continues unaffected, i.e., we don't "rejuvenate"
  already-started-muting comfort noise upon receiving invalid SID.

* If invalid SID arrives in good speech state, meaning that we are supposed to
  begin a CN insertion period but we didn't get usable parameters for it, we
  obtain LARc and mean(Xmax) parameters from the last good speech frame,
  following the second option permitted by the "NOTE" at the end of GSM 06.31
  section 6.1.2.  To get Xmaxc for CN, we dequantize all 4 Xmaxc parameters of
  the last good speech frame, average them, then requantize.

* If invalid SID arrives in speech muting state, the invalid SID is ignored and
  speech muting continues unaffected.

* If invalid SID arrives in NO_DATA state (initial state out of reset, or the
  state after either speech or CN muting has fully decayed), we emit the fixed
  silence frame of TS 46.011 Table 1.