gsm-codec-lib: doc/FR1-Rx-DTX-detail comparison

comparison doc/FR1-Rx-DTX-detail @ 552:6ab066180ec2

doc: new article FR1-Rx-DTX-detail

author	Mychaela Falconia <falcon@freecalypso.org>
date	Mon, 07 Oct 2024 00:25:50 +0000
parents
children	62943a1ad64e

comparison

equal deleted inserted replaced

-:8f44d7064c56
+:6ab066180ec2
+Rx DTX handler implementation details
+=====================================
+As explained in the basic FR1-Rx-DTX article, an Rx DTX handler has to be
+inserted between the output of the Rx radio subsystem and the input to the
+basic GSM 06.10 speech decoder.  In ThemWi codec library architecture, we
+normally run a full decoder for GSM-FR that combines the Rx DTX handler and
+the basic 06.10 decoder, and the Rx DTX handler block by itself also serves
+as a TFO transform.
+This Rx DTX handler is based on several GSM specs: 06.11 for the error
+concealment function, 06.12 for the comfort noise insertion function, and 06.31
+for overall Rx DTX handling.  However, these specs give a lot of leeway to
+implementors, hence it is prudent to document the specific choices made in the
+present ThemWi implementation.
+Error concealment implementation
+================================
+Error concealment is also called substitution and muting of lost frames.  The
+implementation of this function in Themyscira libgsmfr2 is based on the Example
+solution presented in chapter 6 of 3GPP TS 46.011 (formerly GSM 06.11), applying
+the most literal reading to this spec section.
+When unusable frames (as defined in GSM 06.31) occur during speech state (i.e.,
+not following a SID), the present logic kicks in.  For the first BFI following
+good speech, the last speech frame is repeated verbatim.  On the second BFI the
+muting logic of Xmaxc reduction kicks in, decrementing each of the 4 Xmaxc
+parameters by 4 with each emitted frame.  RPE grid position parameters are
+randomized at the same time.  The frame in which all 4 Xmaxc parameters equal 0
+(either because they were already 0 or because they got reduced to 0 by the
+muting sequence) is the last frame emitted in this state; all subsequent BFIs
+will be turned into fixed-bit-pattern silence frames as given in TS 46.011
+Table 1.
+If a BFI comes in when the Rx DTX handler is in its reset (or homed) state, the
+output proceeds directly to silence frames.
+Comfort noise insertion
+=======================
+Comfort noise generation and updating is specified in GSM 06.12 section 6.1.
+Most of this section is very straightforward, and is implemented in ThemWi
+libgsmfr2 exactly as specified, except for the very last sentence in that
+section:
+"When updating the comfort noise, the parameters above should preferably be
+interpolated over a few frames to obtain smooth transitions."
+ThemWi implementation of Rx DTX handler in libgsmfr2 does not do this "should
+preferably" part: no interpolation is done on CN parameters; as soon as each
+SID update comes in, the new parameters are used immediately for all generated
+CN frames.
+Because the spec says "should preferably" rather than "shall", we can "get away"
+with not implementing CN interpolation.  But there is an even more profound
+issue: we have yet to find anyone else's implementation, which we could use as
+guidance, that does CN parameter interpolation for FRv1.  (Such interpolation
+is mandatory and defined in bit-exact terms for HRv1 and EFR, but FRv1 is a
+different story.)
+We had a hope that Nokia TCSM2 (a historical hw implementation of GSM TRAU
+network element) might implement CN interpolation for FRv1 - but our
+experimental findings on that platform are inconclusive:
+* When acting as a TFO transform for FRv1, this TRAU does not interpolate CN
+parameters, it makes abrupt changes in CN output just like our implementation
+- but it effects a strange delay of 24 frames, suggesting that they have some
+code paths that assume CN interpolation would be applied.
+* When the TRAU acts as a regular speech decoder (not TFO), it is not clear how
+it performs any of Rx DTX functions: Nokia chose to not implement the optional
+in-band homing feature for FRv1, thus we have no way to explore bit-exact
+behaviour of their speech decoder via test sequences.
+Another enticing idea would be to statically reverse-engineer the DSP ROM of TI
+Calypso chip and thus recover its complete speech Rx chain - but of course the
+effort would be extremely massive, and is not likely to happen any time soon.
+Until we either get around to the far-future task of Calypso DSP static
+reversing or find some other implementation of GSM-FR Rx DTX handler that does
+CN interpolation and whose operation we can replicate, we shall stick to the
+simple approach of not doing CN interpolation.
+Handling of SID frames with Xmaxc discrepancy
+=============================================
+Per GSM 06.12 section 5.2, all 4 subframe Xmaxc parameters in a SID frame are
+supposed to be equal, encoding the quantized form of mean(Xmax).  However, what
+should Rx DTX implementations do when they receive an otherwise-valid SID frame
+in which these 4 parameters are not all equal?  In our implementation, we handle
+such discrepancy as follows:
+* In those frame positions in which we receive a fresh SID (initial or update),
+the CN frame we emit is a direct transformation of the received SID, and all
+4 Xmaxc parameters are passed through intact.
+* When we emit CN frames based on remembered LARc and Xmaxc parameters, we use
+the last-subframe Xmaxc from the most recently received SID frame.
+Lost SID handling and CN muting
+===============================
+In accord with GSM 06.11 sections 5.3, when we receive an unusable frame in a
+TAF position during CN insertion state, we set a flag that remembers this
+condition, but don't switch to CN muting right away.  Per section 5.4 of the
+same spec, we initiate CN muting when a second lost SID event occurs (unusable
+frame received in a TAF position) without intervening good speech frames or
+accepted SID frames.
+When we do enter CN muting state, we decrement CN Xmaxc (always the same for
+all 4 subframes) by 4 on each output frame, following the Example solution of
+3GPP TS 46.011 (formerly GSM 06.11) chapter 6.  Once this CN Xmaxc reaches 0,
+we switch to emitting fixed-bit-pattern silence frames of TS 46.011 Table 1.
+Handling of invalid SID frames
+==============================
+In agreement with GSM 06.31 spec, we recognize invalid SID and invoke the
+appropriate handler in all 3 combinations: BFI=0 SID=1, BFI=1 SID=1, and
+BFI=1 SID=2.  The real complexity, however, lies in what that invalid SID
+handler actually does:
+* If invalid SID arrives when we are already in CN insertion state, we treat it
+the same as an unusable frame (continue CN output with current parameters),
+but the flag of lost SID is reset, as required by our interpretation of the
+specs.
+* If invalid SID arrives in CN muting state, i.e., after two consecutive lost
+SID events, the muting continues unaffected, i.e., we don't "rejuvenate"
+already-started-muting comfort noise upon receiving invalid SID.
+* If invalid SID arrives in good speech state, meaning that we are supposed to
+begin a CN insertion period but we didn't get usable parameters for it, we
+obtain LARc and mean(Xmax) parameters from the last good speech frame,
+following the second option permitted by the "NOTE" at the end of GSM 06.31
+section 6.1.2.  To get Xmaxc for CN, we dequantize all 4 Xmaxc parameters of
+the last good speech frame, average them, then requantize.
+* If invalid SID arrives in speech muting state, the invalid SID is ignored and
+speech muting continues unaffected.
+* If invalid SID arrives in NO_DATA state (initial state out of reset, or the
+state after either speech or CN muting has fully decayed), we emit the fixed
+silence frame of TS 46.011 Table 1.

FreeCalypso > hg > gsm-codec-lib

comparison doc/FR1-Rx-DTX-detail @ 552:6ab066180ec2