FreeCalypso > hg > gsm-net-reveng
view doc/TFO-xform/HRv1 @ 35:0979407719f0
doc/TFO-xform/HRv1: article written
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Mon, 02 Sep 2024 07:32:09 +0000 |
parents | |
children |
line wrap: on
line source
HRv1: relation between regular end decoder and TFO transform ============================================================ The reference decoder source published by ETSI in GSM 06.06 exhibits an almost modular design: the Rx DTX handler front-end is almost a separable piece. Breaking it down more precisely, we can make these observations: 0) Most aspects of bad frame handling and comfort noise generation are done by generating new coded speech parameters, such that the output of those algorithms can be packaged into new HRv1 codec frames to be sent to a distant decoder. There are only two exceptions to this modularity: 1) Handling of unreliable speech frames (BFI=0 UFI=1 in speech rather than CN state) has a modular and a non-modular aspect: 1a) Modular aspect: if R0 increment from the last good frame to the unreliable frames exceeds a certain threshold, UFI is turned into BFI, which is then handled in a fully modular fashion. 1b) Non-modular aspect: if the R0 increment does not meet the threshold for turning UFI into BFI but meets another slightly lower threshold, a flag is set that is passed into the guts of the speech decoder. That flag effects speech muting on the decoder output level. 2) GSM 06.22 section 6.2 (Comfort noise generation and updating) says in the very last sentence: "When updating the comfort noise parameters (frame energy and LPC coefficients), these parameters shall be interpolated over the SID update period to obtain smooth transitions." Note the change in language: the corresponding spec for FRv1 says "should preferably", but the HRv1 spec says "shall". Furthermore, the bit-exact implementation in the reference C code is considered normative in this aspect, and is exercised by the test sequences of GSM 06.07. This CN interpolation aspect is non-modular: R0 and the set of LPC coefficients are decoded from bit parameters into linear form when CN frames (initial and updates) are received, interpolation is done on this linear form, and the interpolated values are passed to the main body of the speech decoder. Based on these observations, we can conclude that if we wish to detach this reference Rx DTX handler for HRv1 from the reference decoder and make it into an implementation of TFO transform for this codec, we have to solve two problems: 1) Decide how to handle those UFI frames that aren't being turned into BFI; 2) Decide how to handle R0 and LPC parameters during CN insertion. Nokia TCSM2 TRAU implementation =============================== Now that we have a working historical bank-of-TRAUs apparatus in our lab, let's take a look at how this vendor (Nokia) implemented the TFO transform for HRv1 in their TRAU. Here are our findings: * Handling of BFI=1 frames in speech state (not in DTX) exhibits a simplification relative to GSM 06.06 reference code. The reference code checks to see if the last saved frame and the received errored frame have the same voiced vs unvoiced mode: if this mode matches, codevector parameters are taken from the errored frame, otherwise the last saved frame is regurgitated without taking any bits from the errored frame. Nokia's TFO transform always does the latter (no bits are taken from the errored frame) irrespective of voiced vs unvoiced mode matching or not. * Aside from this just-described simplification, all other aspects of BFI=1 handling for speech frames appear to match the reference code. * UFI handling appears to have been taken out altogether, even the part that "upgrades" UFI to BFI when R0 increment is huge appears to have been omitted. I fed a test sequence from TFO side that has a good speech frame with R0=2 followed by a UFI frame with R0=31, and the TRAU happily passed the latter frame (now treated as perfectly good) to the DL output. * Comfort noise generation (DTXd=0) is done exactly as the reference code would do it, except that neither R0 nor LPC parameters are interpolated. During each CN output interval between SID updates, R0 and LPC parameters in every emitted CN frame are exactly equal to those received in the most recent SID frame, as simple as that. When a new SID update comes in, the change in emitted R0 and LPC is abrupt. * The lost SID criterion for CN muting appears to be slightly different between Nokia's TFO implementation and my reading of the spec and the reference C code. My interpretation of GSM 06.22 spec sections 5.2.3 and 5.2.4 is that unlike FR and EFR, in the case of HR codec the second lost SID (second occurrence of BFI instead of SID update in TAF position) does _not_ trigger CN muting; instead this muting is supposed to kick in on the _third_ lost SID occurrence. (The difference in the spec was likely motivated by TAF positions occurring every 240 ms with HR instead of every 480 ms with FR & EFR.) My reading of the reference C code agrees with my reading of the spec - yet Nokia's TFO implementation initiates CN muting in the frame following the second lost SID, not third. * Aside from the criterion for its initiation, the actual CN muting logic behaves exactly like the reference C code: R0 is decremented by 2 on each output frame following the TAF that initiates this sequence, and once R0 reaches 0, it stays there while this zero-magnitude CN output continues indefinitely. * With DTXd=1 CN output is replaced with repeated retransmission of the same SID whose parameters would have been used for non-interpolated CN with DTXd=0, which also agrees with the rules of GSM 08.62 section 8.2.2 paragraph 2. * CN muting with DTXd=1 is implemented poorly. The TRAU emits SID frames with R0 decrementing by 2 on each frame just like how it does for generated CN output that's in the process of being slowly muted, but this design is a poor choice: because the BTS will only transmit one of every 12 SID update frames and the TRAU has no way of knowing which SID will be transmitted, slow decrement cadence on SID frames themselves (not on CN output) makes no sense. Thoughts for Themyscira implementation ====================================== Prior to getting Nokia TCSM2 working in our lab and being able to experiment with this TRAU, when I was contemplating the idea of potentially implementing TFO transform for HRv1 in Themyscira libraries, my main trepidation was how to produce comfort noise in the form of "speech" parameter output. For endpoint decoders GSM 06.22 prescribes a bit-exact algorithm with interpolation, but that smoothly interpolated CN cannot be readily expressed in terms of parameter bits that can be packed into a new HRv1 codec frame. I thought about requantizing the interpolated LPC reflection coefficients on every CN output frame, using the same computationally intensive vector quantization algorithm as in speech encoding - but because I am not an expert in codec design, it is not obvious to me whether or not such approach would produce good results. However, seeing that Nokia got away with simply passing R0 and LPC parameters along from incoming SID frames to CN output without any interpolation or other transformation gives us a huge confidence boost - if Nokia did it, so can we! This approach is of course simple, and yields itself readily to elegant implementation. Seeing that Nokia got away with effectively discarding UFI in their TFO transform is also a confidence boost - once again if Nokia did it, so can we. I plan on keeping the logic that "upgrades" UFI to BFI under certain conditions (not sure why Nokia omitted it), but the effect of potentially muting speech in the guts of the decoder (past parameter-level manipulation) is not really feasible to implement in a TFO transform. Finally, regarding the logic that takes codevector parameters from errored (BFI) frames when the voicing mode matches between the last saved frame and the errored frame, the logic that exists in the reference C code but not in Nokia's TFO transform: I plan on keeping this logic in our version, but Nokia's approach will come in handy for handling BFI-no-data frames, a condition that does not exist in TDM-based Abis transport or in TFO, but does unfortunately exist in IP-based GSM RAN.