# HG changeset patch # User Mychaela Falconia # Date 1725398447 0 # Node ID 4ab7cc414ed21ad795a59acee7a8df2f075d9cc8 # Parent d9553c7ac6eaab087963a3d801d0cf151e5765c1 doc/TFO-xform/EFR: document CN insertion diff -r d9553c7ac6ea -r 4ab7cc414ed2 doc/TFO-xform/EFR --- a/doc/TFO-xform/EFR Tue Sep 03 07:08:24 2024 +0000 +++ b/doc/TFO-xform/EFR Tue Sep 03 21:20:47 2024 +0000 @@ -44,6 +44,10 @@ algorithm for each output frame to produce LPC parameters that aim for the substitution/muting LSFs of the official "example solution". + If the series of BFI inputs continues for a while, the emitted LPC parameters + settle into an oscillating pattern that alternates between two sets of + numbers. + * LTP lag parameters remain constant for each run of BFIs between good speech frames; the lag value encoded therein matches the LTP lag (integer part only) from the 4th subframe of the last good speech frame, just like in the official @@ -66,6 +70,11 @@ the quantization algorithm for this parameter is so complex that I haven't been able to make a more intelligent analysis yet. + If the series of BFI inputs continues for a while, the emitted fixed codebook + gain parameters slowly go down and eventually become all zeros - although the + exact meaning is still unclear given the highly non-intuitive quantization + algorithm. + Looking at the first good speech frame that follows each BFI substitution/muting insert, we see that it is mostly unaltered: no alterations were seen to LPC or LTP parameters, in particular. However, in the case of the fixed codebook gain @@ -75,3 +84,56 @@ codebook gain in the first good frame following BFI, similar to that implemented in the reference endpoint decoder? A better understanding of the quantization mechanism for this parameter will be needed. + +CN insertion by TFO transform +============================= + +Looking at the DL speech frames that were synthesized by the TRAU in those +frame positions where the incoming UL stream via TFO had DTXu pauses (valid SID +frames followed by BFIs), we can make the following observations: + +* The 5 LPC parameters appear to be generated anew on each output frame just + like in the substitution/muting case, and it likewise appears that the TFO + transform is running the regular LSF quantization algorithm taken from the + encoder. + +* The 4 LTP lag parameters are set to {135, 33, 135, 33} in each generated CN + frame, in agreement with how the official endpoint decoder sets the pitch + delay to constant value 40. + +* The 4 LTP gain parameters are all set to 0, also in agreement with CN + generation in the official endpoint decoder. + +* The 35-bit fixed codebook part of each subframe appears to be set to a + pseudorandom sequence, different in each emitted frame and subframe. My + analysis tells me it should be possible to construct fixed codebook sequences + in "speech" output frames that would produce the same excitation as the + official bit-exact CN - although the final PCM output probably won't match + the official bit-exact CN because of LSF and fixed codebook gain + requantization. However, we won't know whether or not the output from + Nokia's TFO transform matches our idea of official-CN-matching fixed codebook + excitation until we have our own implementation of this idea and compare + the two. + +* The four fixed codebook gain parameters in the emitted CN frames are once + again too difficult to understand for now - but they are definitely being + recomputed anew for each emitted CN frame and subframe. + +If CN muting kicks in on the second lost SID (BFI instead of SID received in +TAF position), we see the following additional behaviour: + +* On the TAF-position frame that initiates CN muting, the emitted LPC parameters + break out of the alternating pattern they previously settled into. They go + through a few unique number sets, then settle into a two-state oscillating + pattern once again. Is the TFO transform perhaps making a switch from + last-SID LSF numbers to the static "mean" ones when it goes into CN muting? + +* The emitted fixed codebook gain parameters start going down and eventually + become all zeros. + +Looking at the first good speech frame that follows each CN insertion period, +we see only two alterations made by the TFO transform: the 5 LPC parameters and +the first subframe fixed codebook gain parameter are modified, presumably to +compensate for the lack of quantizer state reset that happens when the end +decoder has seen a CN insert. No more speech parameter alterations are seen +past the first subframe of the first frame following the DTXu pause.