gsm-net-reveng: doc/TFO-xform/EFR comparison

comparison doc/TFO-xform/EFR @ 37:4ab7cc414ed2

doc/TFO-xform/EFR: document CN insertion

author	Mychaela Falconia <falcon@freecalypso.org>
date	Tue, 03 Sep 2024 21:20:47 +0000
parents	d9553c7ac6ea
children

comparison

equal deleted inserted replaced

-:d9553c7ac6ea
+:4ab7cc414ed2
 * The 5 LPC parameters are different in each generated substitution/muting
 frame, hence it looks like the TFO transform is running the quantization
 algorithm for each output frame to produce LPC parameters that aim for the
 substitution/muting LSFs of the official "example solution".
+If the series of BFI inputs continues for a while, the emitted LPC parameters
+settle into an oscillating pattern that alternates between two sets of
+numbers.
 * LTP lag parameters remain constant for each run of BFIs between good speech
 frames; the lag value encoded therein matches the LTP lag (integer part only)
 from the 4th subframe of the last good speech frame, just like in the official
 endpoint decoder.
 in a row, and they also differ between subframes in the same frame - hence
 these parameters are clearly being regenerated as output progresses.  However,
 the quantization algorithm for this parameter is so complex that I haven't
 been able to make a more intelligent analysis yet.
+If the series of BFI inputs continues for a while, the emitted fixed codebook
+gain parameters slowly go down and eventually become all zeros - although the
+exact meaning is still unclear given the highly non-intuitive quantization
+algorithm.
 Looking at the first good speech frame that follows each BFI substitution/muting
 insert, we see that it is mostly unaltered: no alterations were seen to LPC or
 LTP parameters, in particular.  However, in the case of the fixed codebook gain
 parameter we see a different behavioral pattern: most of the time it is also
 unaltered, but sometimes we see reduction in this parameter, and even then it
 is only in certain subframes.  Are we perhaps seeing a capping of the fixed
 codebook gain in the first good frame following BFI, similar to that implemented
 in the reference endpoint decoder?  A better understanding of the quantization
 mechanism for this parameter will be needed.
+CN insertion by TFO transform
+=============================
+Looking at the DL speech frames that were synthesized by the TRAU in those
+frame positions where the incoming UL stream via TFO had DTXu pauses (valid SID
+frames followed by BFIs), we can make the following observations:
+* The 5 LPC parameters appear to be generated anew on each output frame just
+like in the substitution/muting case, and it likewise appears that the TFO
+transform is running the regular LSF quantization algorithm taken from the
+encoder.
+* The 4 LTP lag parameters are set to {135, 33, 135, 33} in each generated CN
+frame, in agreement with how the official endpoint decoder sets the pitch
+delay to constant value 40.
+* The 4 LTP gain parameters are all set to 0, also in agreement with CN
+generation in the official endpoint decoder.
+* The 35-bit fixed codebook part of each subframe appears to be set to a
+pseudorandom sequence, different in each emitted frame and subframe.  My
+analysis tells me it should be possible to construct fixed codebook sequences
+in "speech" output frames that would produce the same excitation as the
+official bit-exact CN - although the final PCM output probably won't match
+the official bit-exact CN because of LSF and fixed codebook gain
+requantization.  However, we won't know whether or not the output from
+Nokia's TFO transform matches our idea of official-CN-matching fixed codebook
+excitation until we have our own implementation of this idea and compare
+the two.
+* The four fixed codebook gain parameters in the emitted CN frames are once
+again too difficult to understand for now - but they are definitely being
+recomputed anew for each emitted CN frame and subframe.
+If CN muting kicks in on the second lost SID (BFI instead of SID received in
+TAF position), we see the following additional behaviour:
+* On the TAF-position frame that initiates CN muting, the emitted LPC parameters
+break out of the alternating pattern they previously settled into.  They go
+through a few unique number sets, then settle into a two-state oscillating
+pattern once again.  Is the TFO transform perhaps making a switch from
+last-SID LSF numbers to the static "mean" ones when it goes into CN muting?
+* The emitted fixed codebook gain parameters start going down and eventually
+become all zeros.
+Looking at the first good speech frame that follows each CN insertion period,
+we see only two alterations made by the TFO transform: the 5 LPC parameters and
+the first subframe fixed codebook gain parameter are modified, presumably to
+compensate for the lack of quantizer state reset that happens when the end
+decoder has seen a CN insert.  No more speech parameter alterations are seen
+past the first subframe of the first frame following the DTXu pause.

FreeCalypso > hg > gsm-net-reveng

comparison doc/TFO-xform/EFR @ 37:4ab7cc414ed2