diff doc/FR1-Rx-DTX-detail @ 552:6ab066180ec2

doc: new article FR1-Rx-DTX-detail
author Mychaela Falconia <falcon@freecalypso.org>
date Mon, 07 Oct 2024 00:25:50 +0000
parents
children 62943a1ad64e
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/FR1-Rx-DTX-detail	Mon Oct 07 00:25:50 2024 +0000
@@ -0,0 +1,145 @@
+Rx DTX handler implementation details
+=====================================
+
+As explained in the basic FR1-Rx-DTX article, an Rx DTX handler has to be
+inserted between the output of the Rx radio subsystem and the input to the
+basic GSM 06.10 speech decoder.  In ThemWi codec library architecture, we
+normally run a full decoder for GSM-FR that combines the Rx DTX handler and
+the basic 06.10 decoder, and the Rx DTX handler block by itself also serves
+as a TFO transform.
+
+This Rx DTX handler is based on several GSM specs: 06.11 for the error
+concealment function, 06.12 for the comfort noise insertion function, and 06.31
+for overall Rx DTX handling.  However, these specs give a lot of leeway to
+implementors, hence it is prudent to document the specific choices made in the
+present ThemWi implementation.
+
+Error concealment implementation
+================================
+
+Error concealment is also called substitution and muting of lost frames.  The
+implementation of this function in Themyscira libgsmfr2 is based on the Example
+solution presented in chapter 6 of 3GPP TS 46.011 (formerly GSM 06.11), applying
+the most literal reading to this spec section.
+
+When unusable frames (as defined in GSM 06.31) occur during speech state (i.e.,
+not following a SID), the present logic kicks in.  For the first BFI following
+good speech, the last speech frame is repeated verbatim.  On the second BFI the
+muting logic of Xmaxc reduction kicks in, decrementing each of the 4 Xmaxc
+parameters by 4 with each emitted frame.  RPE grid position parameters are
+randomized at the same time.  The frame in which all 4 Xmaxc parameters equal 0
+(either because they were already 0 or because they got reduced to 0 by the
+muting sequence) is the last frame emitted in this state; all subsequent BFIs
+will be turned into fixed-bit-pattern silence frames as given in TS 46.011
+Table 1.
+
+If a BFI comes in when the Rx DTX handler is in its reset (or homed) state, the
+output proceeds directly to silence frames.
+
+Comfort noise insertion
+=======================
+
+Comfort noise generation and updating is specified in GSM 06.12 section 6.1.
+Most of this section is very straightforward, and is implemented in ThemWi
+libgsmfr2 exactly as specified, except for the very last sentence in that
+section:
+
+"When updating the comfort noise, the parameters above should preferably be
+ interpolated over a few frames to obtain smooth transitions."
+
+ThemWi implementation of Rx DTX handler in libgsmfr2 does not do this "should
+preferably" part: no interpolation is done on CN parameters; as soon as each
+SID update comes in, the new parameters are used immediately for all generated
+CN frames.
+
+Because the spec says "should preferably" rather than "shall", we can "get away"
+with not implementing CN interpolation.  But there is an even more profound
+issue: we have yet to find anyone else's implementation, which we could use as
+guidance, that does CN parameter interpolation for FRv1.  (Such interpolation
+is mandatory and defined in bit-exact terms for HRv1 and EFR, but FRv1 is a
+different story.)
+
+We had a hope that Nokia TCSM2 (a historical hw implementation of GSM TRAU
+network element) might implement CN interpolation for FRv1 - but our
+experimental findings on that platform are inconclusive:
+
+* When acting as a TFO transform for FRv1, this TRAU does not interpolate CN
+  parameters, it makes abrupt changes in CN output just like our implementation
+  - but it effects a strange delay of 24 frames, suggesting that they have some
+  code paths that assume CN interpolation would be applied.
+
+* When the TRAU acts as a regular speech decoder (not TFO), it is not clear how
+  it performs any of Rx DTX functions: Nokia chose to not implement the optional
+  in-band homing feature for FRv1, thus we have no way to explore bit-exact
+  behaviour of their speech decoder via test sequences.
+
+Another enticing idea would be to statically reverse-engineer the DSP ROM of TI
+Calypso chip and thus recover its complete speech Rx chain - but of course the
+effort would be extremely massive, and is not likely to happen any time soon.
+
+Until we either get around to the far-future task of Calypso DSP static
+reversing or find some other implementation of GSM-FR Rx DTX handler that does
+CN interpolation and whose operation we can replicate, we shall stick to the
+simple approach of not doing CN interpolation.
+
+Handling of SID frames with Xmaxc discrepancy
+=============================================
+
+Per GSM 06.12 section 5.2, all 4 subframe Xmaxc parameters in a SID frame are
+supposed to be equal, encoding the quantized form of mean(Xmax).  However, what
+should Rx DTX implementations do when they receive an otherwise-valid SID frame
+in which these 4 parameters are not all equal?  In our implementation, we handle
+such discrepancy as follows:
+
+* In those frame positions in which we receive a fresh SID (initial or update),
+  the CN frame we emit is a direct transformation of the received SID, and all
+  4 Xmaxc parameters are passed through intact.
+
+* When we emit CN frames based on remembered LARc and Xmaxc parameters, we use
+  the last-subframe Xmaxc from the most recently received SID frame.
+
+Lost SID handling and CN muting
+===============================
+
+In accord with GSM 06.11 sections 5.3, when we receive an unusable frame in a
+TAF position during CN insertion state, we set a flag that remembers this
+condition, but don't switch to CN muting right away.  Per section 5.4 of the
+same spec, we initiate CN muting when a second lost SID event occurs (unusable
+frame received in a TAF position) without intervening good speech frames or
+accepted SID frames.
+
+When we do enter CN muting state, we decrement CN Xmaxc (always the same for
+all 4 subframes) by 4 on each output frame, following the Example solution of
+3GPP TS 46.011 (formerly GSM 06.11) chapter 6.  Once this CN Xmaxc reaches 0,
+we switch to emitting fixed-bit-pattern silence frames of TS 46.011 Table 1.
+
+Handling of invalid SID frames
+==============================
+
+In agreement with GSM 06.31 spec, we recognize invalid SID and invoke the
+appropriate handler in all 3 combinations: BFI=0 SID=1, BFI=1 SID=1, and
+BFI=1 SID=2.  The real complexity, however, lies in what that invalid SID
+handler actually does:
+
+* If invalid SID arrives when we are already in CN insertion state, we treat it
+  the same as an unusable frame (continue CN output with current parameters),
+  but the flag of lost SID is reset, as required by our interpretation of the
+  specs.
+
+* If invalid SID arrives in CN muting state, i.e., after two consecutive lost
+  SID events, the muting continues unaffected, i.e., we don't "rejuvenate"
+  already-started-muting comfort noise upon receiving invalid SID.
+
+* If invalid SID arrives in good speech state, meaning that we are supposed to
+  begin a CN insertion period but we didn't get usable parameters for it, we
+  obtain LARc and mean(Xmax) parameters from the last good speech frame,
+  following the second option permitted by the "NOTE" at the end of GSM 06.31
+  section 6.1.2.  To get Xmaxc for CN, we dequantize all 4 Xmaxc parameters of
+  the last good speech frame, average them, then requantize.
+
+* If invalid SID arrives in speech muting state, the invalid SID is ignored and
+  speech muting continues unaffected.
+
+* If invalid SID arrives in NO_DATA state (initial state out of reset, or the
+  state after either speech or CN muting has fully decayed), we emit the fixed
+  silence frame of TS 46.011 Table 1.