view doc/TFO-transform @ 553:ebcf414b7d99

doc/TFO-transform: describe details for FRv1, both modes
author Mychaela Falconia <falcon@freecalypso.org>
date Mon, 07 Oct 2024 08:24:24 +0000
parents 8f44d7064c56
children
line wrap: on
line source

TFO transform: general definition and goal
==========================================

"TFO transform" is the term adopted by Themyscira Wireless for the non-trivial
transform on GSM codec frames called for by the TFO spec, 3GPP TS 28.062
section C.3.2.1.1.  We have a goal of implementing TFO transform for all 3
classic GSM codecs (FR, HR and EFR) in our Themyscira codec libraries; in the
present release, only GSM-FR version has been implemented.

The input to this transform is the stream of received uplink frames from call
leg A, possibly containing BFI frame gaps and SID frames if call leg A uses
DTXu.  The output from the transform is a "pristine" stream of good codec frames
to be transmitted on the radio downlink for call leg B: good speech frames only
in the non-DTXd case, or a mixture of good speech and valid SID frames with
DTXd.  TFO transform is expected to be an identity transform when the input is
100% good speech frames, but it becomes non-trivial when it has to insert
synthetic "speech" frames for comfort noise or as error concealment.

TFO transform for FRv1
======================

This transform is implemented in libgsmfr2 in both DTXd=0 and DTXd=1
configurations.  DTXd=0 version of FRv1 TFO transform is mostly identical with
the Rx DTX handler preprocessor stage of regular speech decoding (the only
difference is in details of the in-band homing function); DTXd=1 version is
specific to this TFO/TrFO application.

In addition to libgsmfr2 functions documented in FR1-library-API article, there
is a command line test program that exercises our implementation of this TFO
transform.  Its usage is:

gsmfr-tfo-xfrm [-d] input.hex output.hex

Both input and output files are in TW-TS-005 Annex A hexadecimal format.  The
input will typically consist of TW-TS-001 extended RTP format, whereas the
output is always emitted in the basic format, pure GSM-FR codec frames only.

-d option enables DTXd, which is disabled by default.

Details of FRv1 TFO transform with DTXd=0
-----------------------------------------

Our implementation of TFO transform in DTXd=0 configuration is mostly identical
with the Rx DTX handler preprocessor stage of regular speech decoding; the
details are covered in FR1-Rx-DTX-detail article.

ThemWi implementation of TFO transform includes the feature of in-band homing:
if the input to the transform is the spec-defined decoder homing frame (DHF),
this DHF is passed through to the output just like any other good speech frame,
but the internal state is reset to the initial "home" state.

Details of FRv1 TFO transform with DTXd=1
-----------------------------------------

We implement the DTXd=1 version of TFO transform as a post-processor stage
after executing the "regular" logic for DTXd=0 case; more precisely, our
"regular" Rx DTX handler code sets some flags that are only used by the TFO
DTXd=1 post-processor, and the latter element acts on one of those flags.

The resulting visible behaviour of our TFO transform is as follows:

* Whenever a valid SID frame comes in, it is re-emitted on the output in the
  same frame position with the same parameters, even if it has different Xmaxc
  in different subframes.  However, it is "rejuvenated" in that any possible
  single bit error in the SID codeword is corrected, and all unused bits are
  also cleared.  This behaviour agrees with GSM 08.62 section 8.2.2.

* Also in agreement with GSM 08.62 section 8.2.2, any unusable frames or invalid
  SID frames that come in after that valid SID (but before that cached SID
  expires by way of two lost SID events, or a good speech frame ends the DTX
  pause) are replaced with output that repeats the last processed valid SID.
  This output consists of repeated SID frames just like the original, but with
  all 4 Xmaxc parameters set to the one from the last subframe.

* If an invalid SID frame is received directly after good speech, indicating a
  need to start comfort noise insertion but lacking usable parameters for it,
  the output from the TFO transform is just like that described in
  FR1-Rx-DTX-detail article, but in the form of SID frames rather than "speech"
  frames that represent CN.

* If two consecutive lost SID events occur and the Rx DTX handler has to enter
  CN muting state, our TFO transform breaks out of DTX and emits the CN muting
  sequence as "speech" frames rather than altered SID.  This tactic is done in
  order to produce immediate effect on the receiving end.  Once the muting fully
  decays, the transform emits 4 silence frames of GSM 06.11 Table 1, then
  switches to endlessly emitting SIDs derived from this silence frame (same
  LARc, Xmaxc=0).

* Any other time the Rx DTX handler is in NO_DATA state (initial reset state or
  fully decayed state after speech muting), the TFO transform in DTXd=1 mode
  emits SIDs derived from the silence frame instead of actual silence frames.

Emission of transform-synthesized SIDs frames during muting states is done in
order to help achieve the presumed network operator's goal of DTX maximization
and radio interference reduction.  However, if the input to the transform is
all good speech frames without DTX pauses, the transform does not attempt to
apply VAD and make its own DTXd.