FreeCalypso > hg > gsm-net-reveng
view doc/TFO-xform/EFR @ 56:b32b644b7d96
d144/nokia-tcsm2-atrau.bin: captured A-TRAU output from
Nokia TCSM2, fed with ul-input from Ater
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Wed, 25 Sep 2024 07:42:04 +0000 |
parents | 4ab7cc414ed2 |
children |
line wrap: on
line source
TFO transform for EFR ===================== Unlike the situation with FRv1 and HRv1, the standard endpoint decoder for EFR provides no help for implementing a TFO transform. The reference EFR decoder source from ETSI includes bad frame handling and Rx DTX functions, but the logic that implements these functions is interwoven throughout the body of the decoder and does not form a separable front-end. Most saliently, this Rx DTX and ECU logic in the reference decoder does not operate on coded parameters as would be needed for a TFO transform, instead it operates on linear values deeper in the decoder after parameter dequantization. Given that Abis is a de facto proprietary interface that is not interoperable between different vendors (and the same holds for Ater in those BSS designs that separate the TRAU from the BSC), and given how daunting it seems to implement a true TFO transform for EFR, prior to getting our Nokia TCSM2 lab setup I was wondering if historical TRAU vendors really did implement this TFO transform, or if perhaps they used some kind of "cheating" trick on their Abis similar to what we did in OsmoBTS in mid-2023. However, once I got our Nokia TCSM2 gear working, set up a TFO connection between two active TRAU channels in EFR mode and passed some test sequences through it, it became clear that Nokia did implement a real "honest-to-god" TFO transform for EFR: the TRAU-DL frame stream is 100% valid "speech" frames (no idle frames or other aberrations inserted) even when the TRAU-UL stream fed via TFO contains BFI speech frames and DTXu pauses - the TRAU really does apply bad frame handling and comfort noise insertion on parameter level. Seeing that at least one major historical vendor did implement TFO transform for EFR, and seeing the output from that transform, has set up a sportive challenge for me: I no longer have a valid excuse to not do it. I now have a desire to produce a FOSS implementation of TFO transform for EFR in Themyscira libraries (probably in libgsmefr), and make it no worse than Nokia's implementation in TCSM2. Bad frame handling in speech mode ================================= Looking at the DL speech frames that were synthesized by the TRAU in those frame positions where the incoming UL stream via TFO had BFIs, we can make the following observations: * The 5 LPC parameters are different in each generated substitution/muting frame, hence it looks like the TFO transform is running the quantization algorithm for each output frame to produce LPC parameters that aim for the substitution/muting LSFs of the official "example solution". If the series of BFI inputs continues for a while, the emitted LPC parameters settle into an oscillating pattern that alternates between two sets of numbers. * LTP lag parameters remain constant for each run of BFIs between good speech frames; the lag value encoded therein matches the LTP lag (integer part only) from the 4th subframe of the last good speech frame, just like in the official endpoint decoder. * Surprising bit: the 4 LTP gain values from the last good speech frame are endlessly regurgitated verbatim in each substitution/muting frame, without any signs of the attenuation I expected to see based on the official "example solution". * Another surprising bit: the 35-bit fixed codebook sequence in each subframe is taken from the corresponding subframe of the last good speech frame, contrary to the official "example solution" that takes these bits from the errored frames. * The four fixed codebook gain parameters in the emitted substitution/muting frames differ from one frame to the next in the case of multiple BFI frames in a row, and they also differ between subframes in the same frame - hence these parameters are clearly being regenerated as output progresses. However, the quantization algorithm for this parameter is so complex that I haven't been able to make a more intelligent analysis yet. If the series of BFI inputs continues for a while, the emitted fixed codebook gain parameters slowly go down and eventually become all zeros - although the exact meaning is still unclear given the highly non-intuitive quantization algorithm. Looking at the first good speech frame that follows each BFI substitution/muting insert, we see that it is mostly unaltered: no alterations were seen to LPC or LTP parameters, in particular. However, in the case of the fixed codebook gain parameter we see a different behavioral pattern: most of the time it is also unaltered, but sometimes we see reduction in this parameter, and even then it is only in certain subframes. Are we perhaps seeing a capping of the fixed codebook gain in the first good frame following BFI, similar to that implemented in the reference endpoint decoder? A better understanding of the quantization mechanism for this parameter will be needed. CN insertion by TFO transform ============================= Looking at the DL speech frames that were synthesized by the TRAU in those frame positions where the incoming UL stream via TFO had DTXu pauses (valid SID frames followed by BFIs), we can make the following observations: * The 5 LPC parameters appear to be generated anew on each output frame just like in the substitution/muting case, and it likewise appears that the TFO transform is running the regular LSF quantization algorithm taken from the encoder. * The 4 LTP lag parameters are set to {135, 33, 135, 33} in each generated CN frame, in agreement with how the official endpoint decoder sets the pitch delay to constant value 40. * The 4 LTP gain parameters are all set to 0, also in agreement with CN generation in the official endpoint decoder. * The 35-bit fixed codebook part of each subframe appears to be set to a pseudorandom sequence, different in each emitted frame and subframe. My analysis tells me it should be possible to construct fixed codebook sequences in "speech" output frames that would produce the same excitation as the official bit-exact CN - although the final PCM output probably won't match the official bit-exact CN because of LSF and fixed codebook gain requantization. However, we won't know whether or not the output from Nokia's TFO transform matches our idea of official-CN-matching fixed codebook excitation until we have our own implementation of this idea and compare the two. * The four fixed codebook gain parameters in the emitted CN frames are once again too difficult to understand for now - but they are definitely being recomputed anew for each emitted CN frame and subframe. If CN muting kicks in on the second lost SID (BFI instead of SID received in TAF position), we see the following additional behaviour: * On the TAF-position frame that initiates CN muting, the emitted LPC parameters break out of the alternating pattern they previously settled into. They go through a few unique number sets, then settle into a two-state oscillating pattern once again. Is the TFO transform perhaps making a switch from last-SID LSF numbers to the static "mean" ones when it goes into CN muting? * The emitted fixed codebook gain parameters start going down and eventually become all zeros. Looking at the first good speech frame that follows each CN insertion period, we see only two alterations made by the TFO transform: the 5 LPC parameters and the first subframe fixed codebook gain parameter are modified, presumably to compensate for the lack of quantizer state reset that happens when the end decoder has seen a CN insert. No more speech parameter alterations are seen past the first subframe of the first frame following the DTXu pause.