FreeCalypso > hg > gsm-codec-lib
view doc/FR1-Rx-DTX @ 250:731c98b67da1
doc/FR1-Rx-DTX: document changes from 1.0.1 to 1.0.2
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Fri, 12 May 2023 04:49:09 +0000 |
parents | fcc0887ff0d0 |
children | 4034c2b06ec8 |
line wrap: on
line source
At the level of provided functionality and architectural structure, ETSI GSM specifications for DTX (discontinuous transmission) are very symmetric between FR and EFR: the same DTX functionality is specified for both codecs, with the same overall architecture. However, there is one important difference: in the case of EFR the complete implementation of all DTX functions (for both Tx and Rx) forms an integral and inseparable part of the reference codec (implemented in C) from the beginning, whereas in the case of FR1 the addition of DTX is somewhat of an afterthought. GSM 06.10 defines a "pure" FR codec without any DTX functions, and this most basic spec can be and has been implemented in this "pure" form - classic Unix libgsm from 1990s is a proper, fully compliant implementation of GSM 06.10, but only this spec, without any DTX. In contrast, there has never existed a "pure" implementation of GSM 06.60 EFR codec without associated Tx and Rx DTX functions. Furthermore, there is an important distinction between Tx and Rx DTX handlers for FR1: * Anyone who seeks to implement Tx DTX for FR1 would have to dig into the guts of GSM 06.10 encoder and augment it with VAD and SID encoding functions per GSM 06.32 and 06.12 specs. * In contrast, the Rx DTX handler for FR1 is modular: the way it is specified in GSM 06.11, 06.12 and 06.31 is a front-end to unmodified GSM 06.10 decoder. On the Rx side, the interface from the radio subsystem to the Rx DTX handler consists of 260 bits of frame plus BFI and TAF flags (the spec also defines a SID flag, but it is determined from frame payload bits), and then the interface from the Rx DTX handler to the GSM 06.10 decoder is another FR frame of 260 bits. What are the implications of this situation for the GSM published-source software community? Prior to the present libgsmfrp offering, there has always been libgsm, but no Rx DTX handler. If you are working with a GSM uplink RTP stream from a BTS or a GSM downlink frame stream read out of TI Calypso DSP or some other GSM MS PHY, feeding that stream directly to libgsm (without passing through an Rx DTX handler) is NOT acceptable: a "bare" GSM 06.10 decoder won't recognize SID frames and won't produce the expected comfort noise output, and what are you going to do in those 20 ms windows in which no good traffic frame was received? The situation becomes especially bad (unkind on ears) if you are reading received downlink frames out of TI Calypso DSP: the DSP's buffer will have *some* bit content in every 20 ms window, but naturally this bit content will be garbage during those frame windows when no good frame was received; feeding that garbage to libgsm produces noises that are very unkind on ears. The correct solution is to implement an Rx DTX handler, pass the stream of frames and flags from the BTS or the MS PHY to this handler first, and then pass the output of this handler to libgsm 06.10 decoder. Themyscira libgsmfrp is a Free Software implementation of Rx DTX handler for GSM FR, implementing SID classification, comfort noise generation and error concealment. Effect of extra preprocessing ============================= One key detail deserves extra emphasis before going into library API details: if the input to libgsmfrp consists entirely of good speech frames (no SID frames and no BFIs), then the preprocessor becomes an identity transform. Therefore, if the output of our libgsmfrp preprocessor were to be fed to an additional instance of the same further down the processing chain, no extra transformation of any kind will happen. Using libgsmfrp =============== The external public interface to Themyscira libgsmfrp consists of a single header file <gsm_fr_preproc.h>; it should be installed in the same system include directory as <gsm.h> from libgsm. Please note that <gsm_fr_preproc.h> includes <gsm.h>, as needed for gsm_byte and gsm_frame defined types. The dialect of C we chose for libgsmfrp is ANSI C (function prototypes), const qualifier is used where appropriate; however, unlike libgsmefr, the interface to libgsmfrp is defined in terms of gsm_byte type defined in <gsm.h>, included from <gsm_fr_preproc.h>. State allocation and freeing ============================ The Rx DTX handler is stateful, hence you will need to allocate a preprocessor state structure in addition to the usual libgsm state structure for your GSM FR Rx session. The necessary function is: extern struct gsmfr_preproc_state *gsmfr_preproc_create(void); struct gsmfr_preproc_state is an opaque structure to library users: you only get a pointer which you remember and pass around, but <gsm_fr_preproc.h> does not give you a full definition of this struct. As a library user, you don't even get to know the size of this struct, hence the necessary malloc() operation happens inside gsmfr_preproc_create(). However, the structure is malloc'ed as a single chunk, hence when you are done with it, simply call free() on the pointer you got from gsmfr_preproc_create(). gsmfr_preproc_create() can fail if the malloc() call inside fails, in which case it returns NULL. Preprocessing good frames ========================= For every good traffic frame (BFI=0) you receive from the radio subsystem, you need to call this preprocessor function: extern void gsmfr_preproc_good_frame(struct gsmfr_preproc_state *state, gsm_byte *frame); The second argument is both input and output, i.e., the frame is modified in place. If the received frame is not SID (specifically, if the SID field deviates from the SID codeword by 16 or more bits, per GSM 06.31 section 6.1.1), then the frame (considered a good speech frame) will be left unmodified (i.e., it is to be passed unchanged to the GSM 06.10 decoder), but preprocessor state will be updated. OTOH, if the received frame is classified as either valid or invalid SID per GSM 06.31, then the output frame will contain comfort noise generated by the preprocessor using a PRNG, or a silence frame in one particular corner case. GSM-FR RTP (or libgsm) 0xD magic: the upper nibble of the first byte can be anything on input to gsmfr_preproc_good_frame(), but the output frame will always have the correct magic in it. Handling BFI conditions ======================= If you received a lost/missing frame indication instead of a good traffic frame, call this preprocessor function: extern void gsmfr_preproc_bfi(struct gsmfr_preproc_state *state, int taf, gsm_byte *frame_out); TAF is a flag defined in GSM 06.31 section 6.1.1; if you don't have this flag, pass 0 - you will lose the function of comfort noise muting in the event of prolonged SID loss, but all other Rx DTX functions will still work the same. With this function the 33-byte frame buffer is only an output, i.e., prior buffer content is a don't-care and there is no provision for making any use of erroneous frames like in EFR. The frame generated by the preprocessor may be substitution/muting, comfort noise or silence depending on the state. Other miscellaneous functions ============================= extern void gsmfr_preproc_reset(struct gsmfr_preproc_state *state); This function resets the preprocessor state to what it is right out of gsmfr_preproc_create(), which is naturally just a combination of malloc() and gsmfr_preproc_reset(). Given that our Rx DTX handler state is much simpler than, for example, EFR codec state, there does not seem to be any need for explicit resets, but the reset function is made public for the sake of completeness. extern int gsmfr_preproc_sid_classify(const gsm_byte *frame); This function analyzes an RTP-encoded FR frame (the upper nibble of the first byte is NOT checked for 0xD signature) for the SID codeword of GSM 06.12 and classifies the frame as SID=0, SID=1 or SID=2 per the rules of GSM 06.31 section 6.1.1. Silence frame datum =================== extern const gsm_frame gsmfr_preproc_silence_frame; Many implementors make the mistake of thinking that a GSM FR silence frame is a frame of 260 zero bits, but the official specs disagree: the silence frame given in GSM 06.11 (3GPP TS 46.011, at the very end of the spec) is quite different. Themyscira libgsmfrp implements the correct silence frame per the spec, and that datum is also made public. libgsmfrp change history: version 1.0.1 to version 1.0.2 ======================================================== There are only two changes, both involving corner cases with invalid SID frames being received: 1) An invalid SID frame was received immediately following a good speech frame. In this case we start CN generation, but we take the needed LARc and Xmaxc parameters from the last speech frame, instead of the usual procedure of extracting them from a valid SID frame. The change from 1.0.1 to 1.0.2 concerns the Xmaxc parameter in this corner case: in 1.0.1 we took Xmaxc from the last subframe and used it for ensuing CN generation, but in 1.0.2 we compute a more proper mean Xmaxc from all 4 subframes, by dequantizing, summing and requantizing. 2) An invalid SID frame was received in the speech muting state. The sequence of inputs would have to be: - a good speech frame; - one or more BFIs, but not too many, so that the cached speech frame does not decay fully by Xmaxc reduction; - an invalid SID frame. In version 1.0.1 we handled this even more obscure corner case by entering the CN muting state, i.e., the state that is normally entered upon the second lost SID. In version 1.0.2 we ignore invalid SID in the speech muting state and act as if we got BFI, i.e., continue speech muting rather than switch to CN muting. libgsmfrp change history: version 1.0.0 to version 1.0.1 ======================================================== Version 1.0.0 exhibited the following defects, which are fixed in 1.0.1: 1) The last received valid SID was cached forever for the purpose of handling future invalid SIDs - we could have received some valid SID ages ago, then lots of speech or NO_DATA, and if we then get an invalid SID, we would resurrect the last valid SID from ancient history - a bad design. In our new design, we handle invalid SID based on the current state, much like BFI. 2) GSM 06.11 spec says clearly that after the second lost SID (received BFI=1 && TAF=1 in CN state) we need to gradually decrease the output level, rather than jump directly to emitting silence frames - we previously failed to implement such logic. 3) Per GSM 06.12 section 5.2, Xmaxc should be the same in all 4 subframes in a SID frame. What should we do if we receive an otherwise valid SID frame with different Xmaxc? Our previous approach would replicate this Xmaxc oddity in every subsequent generated CN frame, which is rather bad. In our new design, the very first CN frame (which can be seen as a transformation of the SID frame itself) retains the original 4 distinct Xmaxc, but all subsequent CN frames are based on the Xmaxc from the last subframe of the most recent SID.