diff doc/TFO-xform/EFR @ 36:d9553c7ac6ea

doc/TFO-xform/EFR: beginning of article
author Mychaela Falconia <falcon@freecalypso.org>
date Tue, 03 Sep 2024 07:08:24 +0000
parents
children 4ab7cc414ed2
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/TFO-xform/EFR	Tue Sep 03 07:08:24 2024 +0000
@@ -0,0 +1,77 @@
+TFO transform for EFR
+=====================
+
+Unlike the situation with FRv1 and HRv1, the standard endpoint decoder for EFR
+provides no help for implementing a TFO transform.  The reference EFR decoder
+source from ETSI includes bad frame handling and Rx DTX functions, but the logic
+that implements these functions is interwoven throughout the body of the decoder
+and does not form a separable front-end.  Most saliently, this Rx DTX and ECU
+logic in the reference decoder does not operate on coded parameters as would be
+needed for a TFO transform, instead it operates on linear values deeper in the
+decoder after parameter dequantization.
+
+Given that Abis is a de facto proprietary interface that is not interoperable
+between different vendors (and the same holds for Ater in those BSS designs
+that separate the TRAU from the BSC), and given how daunting it seems to
+implement a true TFO transform for EFR, prior to getting our Nokia TCSM2 lab
+setup I was wondering if historical TRAU vendors really did implement this
+TFO transform, or if perhaps they used some kind of "cheating" trick on their
+Abis similar to what we did in OsmoBTS in mid-2023.  However, once I got our
+Nokia TCSM2 gear working, set up a TFO connection between two active TRAU
+channels in EFR mode and passed some test sequences through it, it became clear
+that Nokia did implement a real "honest-to-god" TFO transform for EFR: the
+TRAU-DL frame stream is 100% valid "speech" frames (no idle frames or other
+aberrations inserted) even when the TRAU-UL stream fed via TFO contains BFI
+speech frames and DTXu pauses - the TRAU really does apply bad frame handling
+and comfort noise insertion on parameter level.
+
+Seeing that at least one major historical vendor did implement TFO transform
+for EFR, and seeing the output from that transform, has set up a sportive
+challenge for me: I no longer have a valid excuse to not do it.  I now have a
+desire to produce a FOSS implementation of TFO transform for EFR in Themyscira
+libraries (probably in libgsmefr), and make it no worse than Nokia's
+implementation in TCSM2.
+
+Bad frame handling in speech mode
+=================================
+
+Looking at the DL speech frames that were synthesized by the TRAU in those
+frame positions where the incoming UL stream via TFO had BFIs, we can make the
+following observations:
+
+* The 5 LPC parameters are different in each generated substitution/muting
+  frame, hence it looks like the TFO transform is running the quantization
+  algorithm for each output frame to produce LPC parameters that aim for the
+  substitution/muting LSFs of the official "example solution".
+
+* LTP lag parameters remain constant for each run of BFIs between good speech
+  frames; the lag value encoded therein matches the LTP lag (integer part only)
+  from the 4th subframe of the last good speech frame, just like in the official
+  endpoint decoder.
+
+* Surprising bit: the 4 LTP gain values from the last good speech frame are
+  endlessly regurgitated verbatim in each substitution/muting frame, without
+  any signs of the attenuation I expected to see based on the official "example
+  solution".
+
+* Another surprising bit: the 35-bit fixed codebook sequence in each subframe
+  is taken from the corresponding subframe of the last good speech frame,
+  contrary to the official "example solution" that takes these bits from the
+  errored frames.
+
+* The four fixed codebook gain parameters in the emitted substitution/muting
+  frames differ from one frame to the next in the case of multiple BFI frames
+  in a row, and they also differ between subframes in the same frame - hence
+  these parameters are clearly being regenerated as output progresses.  However,
+  the quantization algorithm for this parameter is so complex that I haven't
+  been able to make a more intelligent analysis yet.
+
+Looking at the first good speech frame that follows each BFI substitution/muting
+insert, we see that it is mostly unaltered: no alterations were seen to LPC or
+LTP parameters, in particular.  However, in the case of the fixed codebook gain
+parameter we see a different behavioral pattern: most of the time it is also
+unaltered, but sometimes we see reduction in this parameter, and even then it
+is only in certain subframes.  Are we perhaps seeing a capping of the fixed
+codebook gain in the first good frame following BFI, similar to that implemented
+in the reference endpoint decoder?  A better understanding of the quantization
+mechanism for this parameter will be needed.