# HG changeset patch # User Mychaela Falconia # Date 1725347304 0 # Node ID d9553c7ac6eaab087963a3d801d0cf151e5765c1 # Parent 0979407719f0ab7332058fa5883fd7a129d8737a doc/TFO-xform/EFR: beginning of article diff -r 0979407719f0 -r d9553c7ac6ea doc/TFO-xform/EFR --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/TFO-xform/EFR Tue Sep 03 07:08:24 2024 +0000 @@ -0,0 +1,77 @@ +TFO transform for EFR +===================== + +Unlike the situation with FRv1 and HRv1, the standard endpoint decoder for EFR +provides no help for implementing a TFO transform. The reference EFR decoder +source from ETSI includes bad frame handling and Rx DTX functions, but the logic +that implements these functions is interwoven throughout the body of the decoder +and does not form a separable front-end. Most saliently, this Rx DTX and ECU +logic in the reference decoder does not operate on coded parameters as would be +needed for a TFO transform, instead it operates on linear values deeper in the +decoder after parameter dequantization. + +Given that Abis is a de facto proprietary interface that is not interoperable +between different vendors (and the same holds for Ater in those BSS designs +that separate the TRAU from the BSC), and given how daunting it seems to +implement a true TFO transform for EFR, prior to getting our Nokia TCSM2 lab +setup I was wondering if historical TRAU vendors really did implement this +TFO transform, or if perhaps they used some kind of "cheating" trick on their +Abis similar to what we did in OsmoBTS in mid-2023. However, once I got our +Nokia TCSM2 gear working, set up a TFO connection between two active TRAU +channels in EFR mode and passed some test sequences through it, it became clear +that Nokia did implement a real "honest-to-god" TFO transform for EFR: the +TRAU-DL frame stream is 100% valid "speech" frames (no idle frames or other +aberrations inserted) even when the TRAU-UL stream fed via TFO contains BFI +speech frames and DTXu pauses - the TRAU really does apply bad frame handling +and comfort noise insertion on parameter level. + +Seeing that at least one major historical vendor did implement TFO transform +for EFR, and seeing the output from that transform, has set up a sportive +challenge for me: I no longer have a valid excuse to not do it. I now have a +desire to produce a FOSS implementation of TFO transform for EFR in Themyscira +libraries (probably in libgsmefr), and make it no worse than Nokia's +implementation in TCSM2. + +Bad frame handling in speech mode +================================= + +Looking at the DL speech frames that were synthesized by the TRAU in those +frame positions where the incoming UL stream via TFO had BFIs, we can make the +following observations: + +* The 5 LPC parameters are different in each generated substitution/muting + frame, hence it looks like the TFO transform is running the quantization + algorithm for each output frame to produce LPC parameters that aim for the + substitution/muting LSFs of the official "example solution". + +* LTP lag parameters remain constant for each run of BFIs between good speech + frames; the lag value encoded therein matches the LTP lag (integer part only) + from the 4th subframe of the last good speech frame, just like in the official + endpoint decoder. + +* Surprising bit: the 4 LTP gain values from the last good speech frame are + endlessly regurgitated verbatim in each substitution/muting frame, without + any signs of the attenuation I expected to see based on the official "example + solution". + +* Another surprising bit: the 35-bit fixed codebook sequence in each subframe + is taken from the corresponding subframe of the last good speech frame, + contrary to the official "example solution" that takes these bits from the + errored frames. + +* The four fixed codebook gain parameters in the emitted substitution/muting + frames differ from one frame to the next in the case of multiple BFI frames + in a row, and they also differ between subframes in the same frame - hence + these parameters are clearly being regenerated as output progresses. However, + the quantization algorithm for this parameter is so complex that I haven't + been able to make a more intelligent analysis yet. + +Looking at the first good speech frame that follows each BFI substitution/muting +insert, we see that it is mostly unaltered: no alterations were seen to LPC or +LTP parameters, in particular. However, in the case of the fixed codebook gain +parameter we see a different behavioral pattern: most of the time it is also +unaltered, but sometimes we see reduction in this parameter, and even then it +is only in certain subframes. Are we perhaps seeing a capping of the fixed +codebook gain in the first good frame following BFI, similar to that implemented +in the reference endpoint decoder? A better understanding of the quantization +mechanism for this parameter will be needed.