diff doc/TFO-xform/EFR @ 37:4ab7cc414ed2

doc/TFO-xform/EFR: document CN insertion
author Mychaela Falconia <falcon@freecalypso.org>
date Tue, 03 Sep 2024 21:20:47 +0000
parents d9553c7ac6ea
children
line wrap: on
line diff
--- a/doc/TFO-xform/EFR	Tue Sep 03 07:08:24 2024 +0000
+++ b/doc/TFO-xform/EFR	Tue Sep 03 21:20:47 2024 +0000
@@ -44,6 +44,10 @@
   algorithm for each output frame to produce LPC parameters that aim for the
   substitution/muting LSFs of the official "example solution".
 
+  If the series of BFI inputs continues for a while, the emitted LPC parameters
+  settle into an oscillating pattern that alternates between two sets of
+  numbers.
+
 * LTP lag parameters remain constant for each run of BFIs between good speech
   frames; the lag value encoded therein matches the LTP lag (integer part only)
   from the 4th subframe of the last good speech frame, just like in the official
@@ -66,6 +70,11 @@
   the quantization algorithm for this parameter is so complex that I haven't
   been able to make a more intelligent analysis yet.
 
+  If the series of BFI inputs continues for a while, the emitted fixed codebook
+  gain parameters slowly go down and eventually become all zeros - although the
+  exact meaning is still unclear given the highly non-intuitive quantization
+  algorithm.
+
 Looking at the first good speech frame that follows each BFI substitution/muting
 insert, we see that it is mostly unaltered: no alterations were seen to LPC or
 LTP parameters, in particular.  However, in the case of the fixed codebook gain
@@ -75,3 +84,56 @@
 codebook gain in the first good frame following BFI, similar to that implemented
 in the reference endpoint decoder?  A better understanding of the quantization
 mechanism for this parameter will be needed.
+
+CN insertion by TFO transform
+=============================
+
+Looking at the DL speech frames that were synthesized by the TRAU in those
+frame positions where the incoming UL stream via TFO had DTXu pauses (valid SID
+frames followed by BFIs), we can make the following observations:
+
+* The 5 LPC parameters appear to be generated anew on each output frame just
+  like in the substitution/muting case, and it likewise appears that the TFO
+  transform is running the regular LSF quantization algorithm taken from the
+  encoder.
+
+* The 4 LTP lag parameters are set to {135, 33, 135, 33} in each generated CN
+  frame, in agreement with how the official endpoint decoder sets the pitch
+  delay to constant value 40.
+
+* The 4 LTP gain parameters are all set to 0, also in agreement with CN
+  generation in the official endpoint decoder.
+
+* The 35-bit fixed codebook part of each subframe appears to be set to a
+  pseudorandom sequence, different in each emitted frame and subframe.  My
+  analysis tells me it should be possible to construct fixed codebook sequences
+  in "speech" output frames that would produce the same excitation as the
+  official bit-exact CN - although the final PCM output probably won't match
+  the official bit-exact CN because of LSF and fixed codebook gain
+  requantization.  However, we won't know whether or not the output from
+  Nokia's TFO transform matches our idea of official-CN-matching fixed codebook
+  excitation until we have our own implementation of this idea and compare
+  the two.
+
+* The four fixed codebook gain parameters in the emitted CN frames are once
+  again too difficult to understand for now - but they are definitely being
+  recomputed anew for each emitted CN frame and subframe.
+
+If CN muting kicks in on the second lost SID (BFI instead of SID received in
+TAF position), we see the following additional behaviour:
+
+* On the TAF-position frame that initiates CN muting, the emitted LPC parameters
+  break out of the alternating pattern they previously settled into.  They go
+  through a few unique number sets, then settle into a two-state oscillating
+  pattern once again.  Is the TFO transform perhaps making a switch from
+  last-SID LSF numbers to the static "mean" ones when it goes into CN muting?
+
+* The emitted fixed codebook gain parameters start going down and eventually
+  become all zeros.
+
+Looking at the first good speech frame that follows each CN insertion period,
+we see only two alterations made by the TFO transform: the 5 LPC parameters and
+the first subframe fixed codebook gain parameter are modified, presumably to
+compensate for the lack of quantizer state reset that happens when the end
+decoder has seen a CN insert.  No more speech parameter alterations are seen
+past the first subframe of the first frame following the DTXu pause.