view doc/TFO-transform @ 581:e2d5cad04cbf

libgsmhr1 RxFE: store CN R0+LPC separately from speech In the original GSM 06.06 code the ECU for speech mode is entirely separate from the CN generator, maintaining separate state. (The main intertie between them is the speech vs CN state variable, distinguishing between speech and CN BFIs, in addition to the CN-specific function of distinguishing between initial and update SIDs.) In the present RxFE implementation I initially thought that we could use the same saved_frame buffer for both ECU and CN, overwriting just the first 4 params (R0 and LPC) when a valid SID comes in. However, I now realize it was a bad idea: the original code has a corner case (long sequence of speech-mode BFIs to put the ECU in state 6, then SID and CN-mode BFIs, then a good speech frame) that would be broken by that buffer reuse approach. We could eliminate this corner case by resetting the ECU state when passing through a CN insertion period, but doing so would needlessly increase the behavioral diffs between GSM 06.06 and our version. Solution: use a separate CN-specific buffer for CN R0+LPC parameters, and match the behavior of GSM 06.06 code in this regard.
author Mychaela Falconia <falcon@freecalypso.org>
date Thu, 13 Feb 2025 10:02:45 +0000
parents ebcf414b7d99
children
line wrap: on
line source

TFO transform: general definition and goal
==========================================

"TFO transform" is the term adopted by Themyscira Wireless for the non-trivial
transform on GSM codec frames called for by the TFO spec, 3GPP TS 28.062
section C.3.2.1.1.  We have a goal of implementing TFO transform for all 3
classic GSM codecs (FR, HR and EFR) in our Themyscira codec libraries; in the
present release, only GSM-FR version has been implemented.

The input to this transform is the stream of received uplink frames from call
leg A, possibly containing BFI frame gaps and SID frames if call leg A uses
DTXu.  The output from the transform is a "pristine" stream of good codec frames
to be transmitted on the radio downlink for call leg B: good speech frames only
in the non-DTXd case, or a mixture of good speech and valid SID frames with
DTXd.  TFO transform is expected to be an identity transform when the input is
100% good speech frames, but it becomes non-trivial when it has to insert
synthetic "speech" frames for comfort noise or as error concealment.

TFO transform for FRv1
======================

This transform is implemented in libgsmfr2 in both DTXd=0 and DTXd=1
configurations.  DTXd=0 version of FRv1 TFO transform is mostly identical with
the Rx DTX handler preprocessor stage of regular speech decoding (the only
difference is in details of the in-band homing function); DTXd=1 version is
specific to this TFO/TrFO application.

In addition to libgsmfr2 functions documented in FR1-library-API article, there
is a command line test program that exercises our implementation of this TFO
transform.  Its usage is:

gsmfr-tfo-xfrm [-d] input.hex output.hex

Both input and output files are in TW-TS-005 Annex A hexadecimal format.  The
input will typically consist of TW-TS-001 extended RTP format, whereas the
output is always emitted in the basic format, pure GSM-FR codec frames only.

-d option enables DTXd, which is disabled by default.

Details of FRv1 TFO transform with DTXd=0
-----------------------------------------

Our implementation of TFO transform in DTXd=0 configuration is mostly identical
with the Rx DTX handler preprocessor stage of regular speech decoding; the
details are covered in FR1-Rx-DTX-detail article.

ThemWi implementation of TFO transform includes the feature of in-band homing:
if the input to the transform is the spec-defined decoder homing frame (DHF),
this DHF is passed through to the output just like any other good speech frame,
but the internal state is reset to the initial "home" state.

Details of FRv1 TFO transform with DTXd=1
-----------------------------------------

We implement the DTXd=1 version of TFO transform as a post-processor stage
after executing the "regular" logic for DTXd=0 case; more precisely, our
"regular" Rx DTX handler code sets some flags that are only used by the TFO
DTXd=1 post-processor, and the latter element acts on one of those flags.

The resulting visible behaviour of our TFO transform is as follows:

* Whenever a valid SID frame comes in, it is re-emitted on the output in the
  same frame position with the same parameters, even if it has different Xmaxc
  in different subframes.  However, it is "rejuvenated" in that any possible
  single bit error in the SID codeword is corrected, and all unused bits are
  also cleared.  This behaviour agrees with GSM 08.62 section 8.2.2.

* Also in agreement with GSM 08.62 section 8.2.2, any unusable frames or invalid
  SID frames that come in after that valid SID (but before that cached SID
  expires by way of two lost SID events, or a good speech frame ends the DTX
  pause) are replaced with output that repeats the last processed valid SID.
  This output consists of repeated SID frames just like the original, but with
  all 4 Xmaxc parameters set to the one from the last subframe.

* If an invalid SID frame is received directly after good speech, indicating a
  need to start comfort noise insertion but lacking usable parameters for it,
  the output from the TFO transform is just like that described in
  FR1-Rx-DTX-detail article, but in the form of SID frames rather than "speech"
  frames that represent CN.

* If two consecutive lost SID events occur and the Rx DTX handler has to enter
  CN muting state, our TFO transform breaks out of DTX and emits the CN muting
  sequence as "speech" frames rather than altered SID.  This tactic is done in
  order to produce immediate effect on the receiving end.  Once the muting fully
  decays, the transform emits 4 silence frames of GSM 06.11 Table 1, then
  switches to endlessly emitting SIDs derived from this silence frame (same
  LARc, Xmaxc=0).

* Any other time the Rx DTX handler is in NO_DATA state (initial reset state or
  fully decayed state after speech muting), the TFO transform in DTXd=1 mode
  emits SIDs derived from the silence frame instead of actual silence frames.

Emission of transform-synthesized SIDs frames during muting states is done in
order to help achieve the presumed network operator's goal of DTX maximization
and radio interference reduction.  However, if the input to the transform is
all good speech frames without DTX pauses, the transform does not attempt to
apply VAD and make its own DTXd.