view doc/TFO-xform/EFR @ 36:d9553c7ac6ea

doc/TFO-xform/EFR: beginning of article
author Mychaela Falconia <falcon@freecalypso.org>
date Tue, 03 Sep 2024 07:08:24 +0000
parents
children 4ab7cc414ed2
line wrap: on
line source

TFO transform for EFR
=====================

Unlike the situation with FRv1 and HRv1, the standard endpoint decoder for EFR
provides no help for implementing a TFO transform.  The reference EFR decoder
source from ETSI includes bad frame handling and Rx DTX functions, but the logic
that implements these functions is interwoven throughout the body of the decoder
and does not form a separable front-end.  Most saliently, this Rx DTX and ECU
logic in the reference decoder does not operate on coded parameters as would be
needed for a TFO transform, instead it operates on linear values deeper in the
decoder after parameter dequantization.

Given that Abis is a de facto proprietary interface that is not interoperable
between different vendors (and the same holds for Ater in those BSS designs
that separate the TRAU from the BSC), and given how daunting it seems to
implement a true TFO transform for EFR, prior to getting our Nokia TCSM2 lab
setup I was wondering if historical TRAU vendors really did implement this
TFO transform, or if perhaps they used some kind of "cheating" trick on their
Abis similar to what we did in OsmoBTS in mid-2023.  However, once I got our
Nokia TCSM2 gear working, set up a TFO connection between two active TRAU
channels in EFR mode and passed some test sequences through it, it became clear
that Nokia did implement a real "honest-to-god" TFO transform for EFR: the
TRAU-DL frame stream is 100% valid "speech" frames (no idle frames or other
aberrations inserted) even when the TRAU-UL stream fed via TFO contains BFI
speech frames and DTXu pauses - the TRAU really does apply bad frame handling
and comfort noise insertion on parameter level.

Seeing that at least one major historical vendor did implement TFO transform
for EFR, and seeing the output from that transform, has set up a sportive
challenge for me: I no longer have a valid excuse to not do it.  I now have a
desire to produce a FOSS implementation of TFO transform for EFR in Themyscira
libraries (probably in libgsmefr), and make it no worse than Nokia's
implementation in TCSM2.

Bad frame handling in speech mode
=================================

Looking at the DL speech frames that were synthesized by the TRAU in those
frame positions where the incoming UL stream via TFO had BFIs, we can make the
following observations:

* The 5 LPC parameters are different in each generated substitution/muting
  frame, hence it looks like the TFO transform is running the quantization
  algorithm for each output frame to produce LPC parameters that aim for the
  substitution/muting LSFs of the official "example solution".

* LTP lag parameters remain constant for each run of BFIs between good speech
  frames; the lag value encoded therein matches the LTP lag (integer part only)
  from the 4th subframe of the last good speech frame, just like in the official
  endpoint decoder.

* Surprising bit: the 4 LTP gain values from the last good speech frame are
  endlessly regurgitated verbatim in each substitution/muting frame, without
  any signs of the attenuation I expected to see based on the official "example
  solution".

* Another surprising bit: the 35-bit fixed codebook sequence in each subframe
  is taken from the corresponding subframe of the last good speech frame,
  contrary to the official "example solution" that takes these bits from the
  errored frames.

* The four fixed codebook gain parameters in the emitted substitution/muting
  frames differ from one frame to the next in the case of multiple BFI frames
  in a row, and they also differ between subframes in the same frame - hence
  these parameters are clearly being regenerated as output progresses.  However,
  the quantization algorithm for this parameter is so complex that I haven't
  been able to make a more intelligent analysis yet.

Looking at the first good speech frame that follows each BFI substitution/muting
insert, we see that it is mostly unaltered: no alterations were seen to LPC or
LTP parameters, in particular.  However, in the case of the fixed codebook gain
parameter we see a different behavioral pattern: most of the time it is also
unaltered, but sometimes we see reduction in this parameter, and even then it
is only in certain subframes.  Are we perhaps seeing a capping of the fixed
codebook gain in the first good frame following BFI, similar to that implemented
in the reference endpoint decoder?  A better understanding of the quantization
mechanism for this parameter will be needed.