comparison doc/TFO-xform/EFR @ 37:4ab7cc414ed2

doc/TFO-xform/EFR: document CN insertion
author Mychaela Falconia <falcon@freecalypso.org>
date Tue, 03 Sep 2024 21:20:47 +0000
parents d9553c7ac6ea
children
comparison
equal deleted inserted replaced
36:d9553c7ac6ea 37:4ab7cc414ed2
42 * The 5 LPC parameters are different in each generated substitution/muting 42 * The 5 LPC parameters are different in each generated substitution/muting
43 frame, hence it looks like the TFO transform is running the quantization 43 frame, hence it looks like the TFO transform is running the quantization
44 algorithm for each output frame to produce LPC parameters that aim for the 44 algorithm for each output frame to produce LPC parameters that aim for the
45 substitution/muting LSFs of the official "example solution". 45 substitution/muting LSFs of the official "example solution".
46 46
47 If the series of BFI inputs continues for a while, the emitted LPC parameters
48 settle into an oscillating pattern that alternates between two sets of
49 numbers.
50
47 * LTP lag parameters remain constant for each run of BFIs between good speech 51 * LTP lag parameters remain constant for each run of BFIs between good speech
48 frames; the lag value encoded therein matches the LTP lag (integer part only) 52 frames; the lag value encoded therein matches the LTP lag (integer part only)
49 from the 4th subframe of the last good speech frame, just like in the official 53 from the 4th subframe of the last good speech frame, just like in the official
50 endpoint decoder. 54 endpoint decoder.
51 55
64 in a row, and they also differ between subframes in the same frame - hence 68 in a row, and they also differ between subframes in the same frame - hence
65 these parameters are clearly being regenerated as output progresses. However, 69 these parameters are clearly being regenerated as output progresses. However,
66 the quantization algorithm for this parameter is so complex that I haven't 70 the quantization algorithm for this parameter is so complex that I haven't
67 been able to make a more intelligent analysis yet. 71 been able to make a more intelligent analysis yet.
68 72
73 If the series of BFI inputs continues for a while, the emitted fixed codebook
74 gain parameters slowly go down and eventually become all zeros - although the
75 exact meaning is still unclear given the highly non-intuitive quantization
76 algorithm.
77
69 Looking at the first good speech frame that follows each BFI substitution/muting 78 Looking at the first good speech frame that follows each BFI substitution/muting
70 insert, we see that it is mostly unaltered: no alterations were seen to LPC or 79 insert, we see that it is mostly unaltered: no alterations were seen to LPC or
71 LTP parameters, in particular. However, in the case of the fixed codebook gain 80 LTP parameters, in particular. However, in the case of the fixed codebook gain
72 parameter we see a different behavioral pattern: most of the time it is also 81 parameter we see a different behavioral pattern: most of the time it is also
73 unaltered, but sometimes we see reduction in this parameter, and even then it 82 unaltered, but sometimes we see reduction in this parameter, and even then it
74 is only in certain subframes. Are we perhaps seeing a capping of the fixed 83 is only in certain subframes. Are we perhaps seeing a capping of the fixed
75 codebook gain in the first good frame following BFI, similar to that implemented 84 codebook gain in the first good frame following BFI, similar to that implemented
76 in the reference endpoint decoder? A better understanding of the quantization 85 in the reference endpoint decoder? A better understanding of the quantization
77 mechanism for this parameter will be needed. 86 mechanism for this parameter will be needed.
87
88 CN insertion by TFO transform
89 =============================
90
91 Looking at the DL speech frames that were synthesized by the TRAU in those
92 frame positions where the incoming UL stream via TFO had DTXu pauses (valid SID
93 frames followed by BFIs), we can make the following observations:
94
95 * The 5 LPC parameters appear to be generated anew on each output frame just
96 like in the substitution/muting case, and it likewise appears that the TFO
97 transform is running the regular LSF quantization algorithm taken from the
98 encoder.
99
100 * The 4 LTP lag parameters are set to {135, 33, 135, 33} in each generated CN
101 frame, in agreement with how the official endpoint decoder sets the pitch
102 delay to constant value 40.
103
104 * The 4 LTP gain parameters are all set to 0, also in agreement with CN
105 generation in the official endpoint decoder.
106
107 * The 35-bit fixed codebook part of each subframe appears to be set to a
108 pseudorandom sequence, different in each emitted frame and subframe. My
109 analysis tells me it should be possible to construct fixed codebook sequences
110 in "speech" output frames that would produce the same excitation as the
111 official bit-exact CN - although the final PCM output probably won't match
112 the official bit-exact CN because of LSF and fixed codebook gain
113 requantization. However, we won't know whether or not the output from
114 Nokia's TFO transform matches our idea of official-CN-matching fixed codebook
115 excitation until we have our own implementation of this idea and compare
116 the two.
117
118 * The four fixed codebook gain parameters in the emitted CN frames are once
119 again too difficult to understand for now - but they are definitely being
120 recomputed anew for each emitted CN frame and subframe.
121
122 If CN muting kicks in on the second lost SID (BFI instead of SID received in
123 TAF position), we see the following additional behaviour:
124
125 * On the TAF-position frame that initiates CN muting, the emitted LPC parameters
126 break out of the alternating pattern they previously settled into. They go
127 through a few unique number sets, then settle into a two-state oscillating
128 pattern once again. Is the TFO transform perhaps making a switch from
129 last-SID LSF numbers to the static "mean" ones when it goes into CN muting?
130
131 * The emitted fixed codebook gain parameters start going down and eventually
132 become all zeros.
133
134 Looking at the first good speech frame that follows each CN insertion period,
135 we see only two alterations made by the TFO transform: the 5 LPC parameters and
136 the first subframe fixed codebook gain parameter are modified, presumably to
137 compensate for the lack of quantizer state reset that happens when the end
138 decoder has seen a CN insert. No more speech parameter alterations are seen
139 past the first subframe of the first frame following the DTXu pause.