gsm-net-reveng: doc/TFO-xform/HRv1 comparison

comparison doc/TFO-xform/HRv1 @ 35:0979407719f0

doc/TFO-xform/HRv1: article written

author	Mychaela Falconia <falcon@freecalypso.org>
date	Mon, 02 Sep 2024 07:32:09 +0000
parents
children

comparison

equal deleted inserted replaced

-:35d38348c880
+:0979407719f0
+HRv1: relation between regular end decoder and TFO transform
+============================================================
+The reference decoder source published by ETSI in GSM 06.06 exhibits an almost
+modular design: the Rx DTX handler front-end is almost a separable piece.
+Breaking it down more precisely, we can make these observations:
+0) Most aspects of bad frame handling and comfort noise generation are done by
+generating new coded speech parameters, such that the output of those
+algorithms can be packaged into new HRv1 codec frames to be sent to a distant
+decoder.  There are only two exceptions to this modularity:
+1) Handling of unreliable speech frames (BFI=0 UFI=1 in speech rather than CN
+state) has a modular and a non-modular aspect:
+1a) Modular aspect: if R0 increment from the last good frame to the
+unreliable frames exceeds a certain threshold, UFI is turned into BFI,
+which is then handled in a fully modular fashion.
+1b) Non-modular aspect: if the R0 increment does not meet the threshold for
+turning UFI into BFI but meets another slightly lower threshold, a flag
+is set that is passed into the guts of the speech decoder.  That flag
+effects speech muting on the decoder output level.
+2) GSM 06.22 section 6.2 (Comfort noise generation and updating) says in the
+very last sentence:
+"When updating the comfort noise parameters (frame energy and LPC
+coefficients), these parameters shall be interpolated over the SID update
+period to obtain smooth transitions."
+Note the change in language: the corresponding spec for FRv1 says "should
+preferably", but the HRv1 spec says "shall".  Furthermore, the bit-exact
+implementation in the reference C code is considered normative in this
+aspect, and is exercised by the test sequences of GSM 06.07.
+This CN interpolation aspect is non-modular: R0 and the set of LPC
+coefficients are decoded from bit parameters into linear form when CN frames
+(initial and updates) are received, interpolation is done on this linear
+form, and the interpolated values are passed to the main body of the speech
+decoder.
+Based on these observations, we can conclude that if we wish to detach this
+reference Rx DTX handler for HRv1 from the reference decoder and make it into
+an implementation of TFO transform for this codec, we have to solve two
+problems:
+1) Decide how to handle those UFI frames that aren't being turned into BFI;
+2) Decide how to handle R0 and LPC parameters during CN insertion.
+Nokia TCSM2 TRAU implementation
+===============================
+Now that we have a working historical bank-of-TRAUs apparatus in our lab, let's
+take a look at how this vendor (Nokia) implemented the TFO transform for HRv1
+in their TRAU.  Here are our findings:
+* Handling of BFI=1 frames in speech state (not in DTX) exhibits a
+simplification relative to GSM 06.06 reference code.  The reference code
+checks to see if the last saved frame and the received errored frame have the
+same voiced vs unvoiced mode: if this mode matches, codevector parameters are
+taken from the errored frame, otherwise the last saved frame is regurgitated
+without taking any bits from the errored frame.  Nokia's TFO transform always
+does the latter (no bits are taken from the errored frame) irrespective of
+voiced vs unvoiced mode matching or not.
+* Aside from this just-described simplification, all other aspects of BFI=1
+handling for speech frames appear to match the reference code.
+* UFI handling appears to have been taken out altogether, even the part that
+"upgrades" UFI to BFI when R0 increment is huge appears to have been omitted.
+I fed a test sequence from TFO side that has a good speech frame with R0=2
+followed by a UFI frame with R0=31, and the TRAU happily passed the latter
+frame (now treated as perfectly good) to the DL output.
+* Comfort noise generation (DTXd=0) is done exactly as the reference code would
+do it, except that neither R0 nor LPC parameters are interpolated.  During
+each CN output interval between SID updates, R0 and LPC parameters in every
+emitted CN frame are exactly equal to those received in the most recent SID
+frame, as simple as that.  When a new SID update comes in, the change in
+emitted R0 and LPC is abrupt.
+* The lost SID criterion for CN muting appears to be slightly different between
+Nokia's TFO implementation and my reading of the spec and the reference C
+code.  My interpretation of GSM 06.22 spec sections 5.2.3 and 5.2.4 is that
+unlike FR and EFR, in the case of HR codec the second lost SID (second
+occurrence of BFI instead of SID update in TAF position) does _not_ trigger
+CN muting; instead this muting is supposed to kick in on the _third_ lost SID
+occurrence.  (The difference in the spec was likely motivated by TAF positions
+occurring every 240 ms with HR instead of every 480 ms with FR & EFR.)  My
+reading of the reference C code agrees with my reading of the spec - yet
+Nokia's TFO implementation initiates CN muting in the frame following the
+second lost SID, not third.
+* Aside from the criterion for its initiation, the actual CN muting logic
+behaves exactly like the reference C code: R0 is decremented by 2 on each
+output frame following the TAF that initiates this sequence, and once R0
+reaches 0, it stays there while this zero-magnitude CN output continues
+indefinitely.
+* With DTXd=1 CN output is replaced with repeated retransmission of the same
+SID whose parameters would have been used for non-interpolated CN with DTXd=0,
+which also agrees with the rules of GSM 08.62 section 8.2.2 paragraph 2.
+* CN muting with DTXd=1 is implemented poorly.  The TRAU emits SID frames with
+R0 decrementing by 2 on each frame just like how it does for generated CN
+output that's in the process of being slowly muted, but this design is a poor
+choice: because the BTS will only transmit one of every 12 SID update frames
+and the TRAU has no way of knowing which SID will be transmitted, slow
+decrement cadence on SID frames themselves (not on CN output) makes no sense.
+Thoughts for Themyscira implementation
+======================================
+Prior to getting Nokia TCSM2 working in our lab and being able to experiment
+with this TRAU, when I was contemplating the idea of potentially implementing
+TFO transform for HRv1 in Themyscira libraries, my main trepidation was how to
+produce comfort noise in the form of "speech" parameter output.  For endpoint
+decoders GSM 06.22 prescribes a bit-exact algorithm with interpolation, but
+that smoothly interpolated CN cannot be readily expressed in terms of parameter
+bits that can be packed into a new HRv1 codec frame.  I thought about
+requantizing the interpolated LPC reflection coefficients on every CN output
+frame, using the same computationally intensive vector quantization algorithm
+as in speech encoding - but because I am not an expert in codec design, it is
+not obvious to me whether or not such approach would produce good results.
+However, seeing that Nokia got away with simply passing R0 and LPC parameters
+along from incoming SID frames to CN output without any interpolation or other
+transformation gives us a huge confidence boost - if Nokia did it, so can we!
+This approach is of course simple, and yields itself readily to elegant
+implementation.
+Seeing that Nokia got away with effectively discarding UFI in their TFO
+transform is also a confidence boost - once again if Nokia did it, so can we.
+I plan on keeping the logic that "upgrades" UFI to BFI under certain conditions
+(not sure why Nokia omitted it), but the effect of potentially muting speech in
+the guts of the decoder (past parameter-level manipulation) is not really
+feasible to implement in a TFO transform.
+Finally, regarding the logic that takes codevector parameters from errored
+(BFI) frames when the voicing mode matches between the last saved frame and the
+errored frame, the logic that exists in the reference C code but not in Nokia's
+TFO transform: I plan on keeping this logic in our version, but Nokia's approach
+will come in handy for handling BFI-no-data frames, a condition that does not
+exist in TDM-based Abis transport or in TFO, but does unfortunately exist in
+IP-based GSM RAN.

FreeCalypso > hg > gsm-net-reveng

comparison doc/TFO-xform/HRv1 @ 35:0979407719f0