FreeCalypso > hg > freecalypso-docs
view TCH-tap-modes @ 98:915ff61137ee
Speech-codec-selection: document MSCAP
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Tue, 06 Jun 2023 01:47:36 +0000 |
parents | 8a45cd92e3c3 |
children | 28c1cb869d91 |
line wrap: on
line source
It has been discovered that the DSP ROM in the Calypso GSM baseband processor makes it possible to "tap" into speech traffic on GSM traffic channels (TCH): 1) In the downlink direction, the signal processing chain which every GSM MS must implement includes a GSM 05.03 channel decoder, operating in one of several variants as necessary for each supported TCH mode, followed by speech decoders for each supported codec. TI's DSP naturally implements this required signal processing chain, and this implementation includes one nifty feature: the bits that make up the internal interface from GSM 05.03 channel decoder output to the input of speech decoders are written into the NDB API RAM page that is also accessible to the ARM core, and these bits can be externally read out. The act of reading these bits is completely non-invasive (we are only reading bits that are already there, not modifying anything), thus we can sniff TCH downlink on any voice call in real time without disrupting or impacting standard type-approved GSM MS operation in any way. 2) In the uplink direction, there is a reverse signal processing chain in which the output of the internal speech encoder for the selected codec feeds into the input of the corresponding GSM 05.03 channel encoder. In this direction there are two tapping possibilities: 2a) There is a buffer in the NDB API RAM page from which one can read the bits that pass from the speech encoder output to the channel encoder input - let's call this form of TCH tap "uplink sniffing"; 2b) There is a special mode in which the output of the internal speech encoder is effectively suppressed and the input to the channel encoder comes from another NDB API RAM buffer that needs to be filled by ARM firmware - let's call this form of TCH tap "uplink substitution". Sources of knowledge about these DSP functions ============================================== For the functions of TCH DL sniffing (tap 1 in the above summary) and TCH UL substitution (tap 2b in the above summary), the primary source of knowledge is the defunct '#if TRACE_TYPE==3' code in TSM30 and LoCosto L1 sources. I call this code defunct because the TRACE_TYPE preprocessor symbol is set to 4 (not 3) in both TCS211 and LoCosto versions, and appears to be set to 0 (all trace disabled) in the ancient TSM30 build. This code appears to be some very old test mode, apparently sending some test bit patterns into TCH UL and expecting the same bit patterns back on TCH DL, presumably with a test instrument such as CMU200 providing a loopback from UL to DL on this test TCH, and has only survived in an incomplete form: * There are '#if TRACE_TYPE==3' stanzas in l1_cmplx.c, in both TSM30 and LoCosto versions, that implement DSP buffer writing for TCH UL substitution (TCH/F only) and timing control for TCH DL buffer reading (both TCH/F and TCH/H), calling a function named play_trace() for the latter. * There is no play_trace() code in the LoCosto source. but there is an hw_debug.c source module in the TSM30 code drop under MCU/Layer1/L1c/Src, and it contains (presumed) TI-legacy play_trace() and play_diagnostics() functions, once again under '#if (TRACE_TYPE==3)'. play_trace() reads the DSP's TCH DL buffer and saves the bits in an ARM firmware RAM buffer, and then play_diagnostics() analyzes the captured booty - and studying the second function is how we learn the apparent original intent of doing test bit patterns on TCH. * The code that feeds "UL play" test bit patterns to the earlier-mentioned '#if TRACE_TYPE==3' TCH UL substitution code in l1_cmplx.c (apparently once hacked into dll_read_dcch() and tx_tch_data()) has not been found anywhere. For TCH tap 2a in our summary at the beginning of this article (non-invasive sniffing of TCH UL bits produced by the internal speech encoder) there does not exist any authoritative source of knowledge. It naturally follows from otherwise-known Calypso DSP architecture that these internally produced TCH UL bits should reside in the "main" a_du_0 buffer (or in a_du_1 when TCH/H subchannel 1 is active), and I (Mother Mychaela) have heard an anecdotal report (from someone who once worked with Calypso in a non-community-based manner) that these UL bits could indeed be read out of this buffer - but in the absence of an authoritative source, we don't know when would be the correct time to read this buffer. In our current state of knowledge, only TCH DL sniffing can be exercised safely: for UL sniffing we don't know the correct time when the buffer would need to be read, while active UL substitution is obviously an invasive hack involving a DSP debug or test feature that is never used in standard GSM MS operation. Support for different speech codecs =================================== When it comes to passively sniffing TCH DL and/or UL, we are merely reading bits that are already there, and basic reasoning tells us that the DSP's DL and UL buffers involved in this exercise exist in all speech TCH modes supported by the DSP: FR1, HR1, EFR and AMR. However: * The ancient '#if TRACE_TYPE==3' reference code exists only for FR1, HR1 and EFR - it clearly predates the addition of AMR in the later Calypso DSP versions. * FR1, HR1 and EFR are the only codecs for which we (FreeCalypso community) know the format in which TCH DL bits appear in the DSP's a_dd_0 and a_dd_1 buffers. * I (Mother Mychaela) have heard an anecdotal report (from the same non-community-based party mentioned earlier) that TCH DL bits could be read out of a_dd_0 buffer in TCH/AFS (AMR) mode - but I never got any details. In contrast with passive sniffing, active TCH UL substitution requires explicit support from the DSP - and this explicit DSP support is known to exist for certain only for TCH/FS and TCH/EFS channel modes, i.e., for FR1 and EFR codecs only. In the case of TCH/HS channel mode (HR1 codec), it *appears* that the DSP supports UL substitution in this mode too, but this combination has only been exercised by OsmocomBB people (the original '#if TRACE_TYPE==3' code for UL play only supports TCH/F), and FreeCalypso policy is to treat everything coming out of OBB as highly suspect. What about AMR? The anecdotal report (from the same already-mentioned party) is that TCH UL substitution that works for FR1 and EFR appears to NOT work for AMR - that's all I know - but frankly speaking, given that it's a weird DSP debug mode that is never needed in standard GSM MS operation, I find it more surprising that it works for FR1 and EFR than the observation that it doesn't work for AMR. FreeCalypso support for TCH tap functions ========================================= TCH DL sniffing and UL substitution provisions were initially implemented in FreeCalypso back in 2016, but only in the Citrine version, which was deemed to be a dead end later that same year. However, this functionality is now being resurrected, and it has been incorporated into our production FC Tourmaline firmware as of 2022-12-13. In order to activate the function of TCH DL sniffing and save the recording of a TCH DL session into a file, one needs to use the fc-shell utility from FC host tools, specifically the tch record command in an interactive fc-shell session. The format in which TCH DL tap traffic is passed over RVTMUX (an original FreeCalypso invention) has changed in a slight but incompatible way between the original hackish version from 2016 and the new production version as of 2022, and capturing TCH DL with new firmware requires the updated version of fc-shell that will be released as part of fc-host-tools-r18. The current (late 2022) incarnation of FreeCalypso TCH DL sniffing feature supports FR1, HR1 and EFR codecs, although only FR1 and EFR have been tested so far. The function of TCH UL substitution is currently implemented in FC Tourmaline only for FR1 and EFR (no HR1, no AMR), and it likewise requires running an interactive fc-shell session in which you would invoke the tool's tch play command. In the case of TCH UL play feature there has been NO change in the RVTMUX transport format between 2016 and 2022 versions. TCH DL DSP buffers and capture format ===================================== The DSP's NDB API page has two buffers in which TCH DL bits appear: a_dd_0 and a_dd_1. All TCH/F modes use a_dd_0, but TCH/H uses one buffer or the other depending on the subchannel: subchannel 0 uses a_dd_0, subchannel 1 uses a_dd_1. (It is certainly a strange design - the DSP won't be able to receive and decode the "wrong" subchannel because it doesn't know the ciphering key for the other MS - but perhaps the designers of this DSP architecture aeons ago found this design to somehow flow more naturally with their scheduling of DSP tasks.) Each buffer consists of 22 16-bit words - they were originally 20 words, but then extended to 22 words to support CSD 14.4 kbps mode. Each TCH buffer in the DSP's NDB API page consists of 3 status or header words followed by N words of payload, where N depends on TCH mode: 17 for TCH/FS and TCH/EFS, 8 for TCH/HS, and not-yet-studied for AMR and CSD. Let's begin our analysis with the 3 status words that make up the buffer header: Status word 0 (a_dd_0[0] or a_dd_1[0]) is a word of flag bits. We don't know the meaning of every bit in this word, but at least for TCH/FS and TCH/EFS (we haven't exercised TCH/HS at all) we know the following bits: * Bit 15 (B_BLUD) is a "buffer filled" or "data present" flag. This flag is observed as 1 in *almost* every 20 ms window in which a traffic frame is expected (fn_report_mod13_mod4 == 0 in l1s_read_dedic_dl(), case TCHTF), except for certain instances early in the call setup process which remain to be studied. * Bit 14 (B_AF) will be set if the block of 8 half-bursts (block diagonal interleaving of GSM 05.03) corresponding to this buffer was channel-decoded as speech rather than as FACCH - see further analysis below. * Bit 9 (B_ECRC) has only ever been observed as 1 when B_AF is set, i.e., when the speech-not-FACCH channel decoder was invoked. In the case of TCH/EFS this bit is set to 1 if the EFR-added CRC-8 was bad, and cleared if this CRC-8 was good; in the case of TCH/FS this bit has always been observed as 1 and should be ignored because there is no CRC-8 in TCH/FS. * Bit 7 has always been observed as 1 wheneven B_BLUD is set but B_AF is cleared, i.e., whenever the block was channel-decoded in FACCH rather than speech mode. * Bits 6:5 indicate the result of FIRE decoding in the event that the FACCH decoder was invoked. * Bits 4:3 carry the ternary SID flag encoded as in section 6.1.1 of GSM 06.31 and 06.81, but only when the speech-not-FACCH channel decoder was invoked as indicated by B_AF. * Bit 2 is BFI as defined in section 6.1.1 of GSM 06.31 and 06.81. Whenever the block was decoded as FACCH (bit 14 clear, bit 7 set), bit 2 has always been observed as set, agreeing with the stipulation in GSM 06.31 and 06.81 that BFI=1 whenever a FACCH frame has been received. However, in the case of TCH/EFS it appears that CRC-8 status (reported in bit 9) is NOT factored into the logic that sets bit 2 - it appears that the subsequent speech decoding logic is expected to OR bits 2 and 9 together to get the BFI flag for the Rx DTX handler of GSM 06.81. In the case of 20 ms blocks (reassembled from 8 half-bursts) that were channel- decoded as speech rather than FACCH, the observed behavior is that bits 15 and 14 are set, the payload portion of the buffer is filled with the output from the channel decoder, and bits 4:3 are set from this payload by the bit-counting rule of section 6.1.1 of GSM 06.31 and 06.81 irrespective of the good-or-bad status in bits 2 and 9. However, when bit 14 is clear and bit 7 is set, indicating that the block (from 8 half-bursts) was channel-decoded in FACCH mode, the following additional behavior is observed: * The payload portion of the buffer remains unchanged from its previous content, last written when a frame was channel-decoded in speech-not-FACCH mode; * Bit 2 is set, bit 9 is cleared; * Bits 4:3 are cleared even when they previously indicated SID based on the bit pattern in the payload portion of the buffer, even when that SID-encoding payload is still there. In the standard TCH DL signal processing chain, GSM 05.03 channel decoding is followed by the Rx DTX handler of GSM 06.31 or 06.81 for TCH/FS or TCH/EFS, respectively. It appears that the Rx DTX handler implemented in TI's DSP is driven by this status word 0 at the head of the buffer, and we can only guess as to its exact logic. At this point it bears reminding that the functions of the Rx DTX handler are not rigidly prescribed in the specs: in the case of EFR the bit-exact reference implementation is normative only in certain aspects (e.g., comfort noise generation after receiving SID), but is considered a non- normative example in some other key aspects (all GSM 06.61 functions, including what happens when a FACCH block was received when speech frames were expected), and in the case of FR1 there is no bit-exact reference implementation at all, only general guidance. Having the curiosity of a cat, I (Mother Mychaela) naturally desire to know exactly how the Rx DTX handler (the bridge between the channel decoder and the speech decoder) works in TI's DSP. A full static reversing job on the DSP ROM would provide complete answers, but is a very daunting proposition, thus I am also looking at the idea of behavioral analysis: the output of the speech decoder can be captured from MCSI on FCDEV3B hardware, or from the VSP tap on FC Venus if we ever build that board, and if we combine that speech decoder output capture with the currently-discussed capture of TCH DL buffers, we may be able to glean some insight into the workings of the Rx DTX handler block: we could implement a candidate Rx DTX handler clone in software and compare the output (of this proposed handler followed by the spec-defined speech decoder) against the actual speech output from the DSP. Back to our exposition of TCH DL buffer content: Status word 1 (a_dd_0[1] or a_dd_1[1]) is some kind of DSP measurement or count which Calypso ARM fw does not need to look at, except when debugging - the only code which I (Mother Mychaela) could find that does anything with this DSP status word is the ancient play_diagnostics() code in the TSM30 version (obviously never included in any production fw); this code looks at the unknown word in question and calls it "D_MACC". This play_diagnostics() code compares the D_MACC reading against a threshold, and if the per-block reading is below the threshold, an error message is printed. That's all we know! Status word 2 (a_dd_0[2] or a_dd_1[2]) is a bit error count: the code in l1s_read_dedic_dl() reads this error count and uses it for RXQUAL computation for measurement reports. If one's area of interest is in replicating Rx DTX handling and speech decoding that happens in the DSP, status words 1 and 2 can probably be ignored - instead the important parts are status word 0 (extensively covered above) and the payload portion of the buffer. The payload portion of the buffer consists of some number of 16-bit words: 17 of them for TCH/FS and TCH/EFS, or 8 of them for TCH/HS. The DSP does not have any notion of 8-bit bytes, instead it operates on 16-bit words as its elementary data unit. The ordering of bits within these 16-bit words (in the payload portion of TCH buffers) is from the most-significant bit toward the least- significant bit, thus when these TCH buffers are transferred via octet-oriented interfaces, the upper byte of each word should be transferred first, even though this byte order is counter to the little-endian byte order of the Calypso ARM core. In the case of TCH/FS and TCH/EFS, the fill order of bits in the payload words is as follows, starting with the most-significant bit of buffer word 3 (first word of the payload portion): * 182 bits of class 1; * 4 dummy bits (always observed as 0); * 78 bits of class 2; * the last 8 bits of a_dd_0[19] are unused. In the case of TCH/HS, the fill order is similar, but modified as appropriate for TCH/HS: * 95 bits of class 1; * 4 dummy bits; * 17 bits of class 2; * the last 12 bits of a_dd_0[10] or a_dd_1[10] are unused. Aside from the insertion of 4 extra dummy bits at the boundary between class 1 and class 2, the overall bit order is that of GSM 05.03 Figure 1 interface 1. In the case of TCH/EFS, the following additional considerations apply: * Bits [65:73] in all received DL frames, where CRC-8 would go in the 260-bit frame of GSM 05.03 interface 1 for EFR, are always observed as 0, whether this CRC-8 was good (a_dd_0[0] bit 9 clear) or bad (a_dd_0[0] bit 9 set). * The handling of repetition bits (4 bits of 244-bit EFR codec frame, each of which is triplicated in the channel encoding for transmission) is unclear. Further detail regarding the repetition bits of TCH/EFS: distinct bit positions exist in the 260-bit frame of GSM 05.03 interface 1 (which is the frame format in the TCH buffers of TI's DSP) for each of the 3 copies of each of the 4 triplicated bits. It is obvious that correct decoding of these triplicated bits requires a majority-vote function just like the one implemented in TMR systems in space gear - but it is not absolutely and unquestionably obvious where this TMR voting function is implemented in the Rx processing chain of TI's DSP. It *appears* that this majority-vote function has already been performed by the DSP function that writes a_dd_0, and that the first bit position out of each group of 3 holds the output of this voting function, so that the subsequent speech decoder only needs to use those "cooked" bits - but there is this mystery: * At certain times, particularly during the main part of a test call, TCH DL buffer readouts contain zeros in the "extra" repetition bit positions: for each group of 3 bits, the first will contain 0 or 1, but the other two will always be 0. * At other times, seemingly in the beginning and ending parts of test calls, TCH DL buffer readouts contain matching bit values in all 3 positions: for each group of 3 bits, if the first bit is 0, the other two will also be 0, or if the first bit is 1, then the other two will also be 1. One possibility is that the DSP applies the required majority-voting function, writes its output into the first bit position of each group of 3, but then sometimes (and not at other times) applies another function that writes the voting function output into the remaining bit positions, perhaps for loopback of TCH DL into TCH UL. More study is needed in this area. FreeCalypso file format for TCH DL captures =========================================== The file format written by fc-shell tch record command is ASCII hex, line-based, with one line for every captured 20 ms window. The new format as of 2022 is: * Each line begins with an FR, HR or EFR keyword indicating which variant of TCH DL has been captured; * This keyword is followed by 3 space-separated DSP status words, each written as 4 hex digits; * The main body of the frame is written as 33 (TCH/FS & TCH/EFS) or 15 (TCH/HS) hex bytes, produced from the payload portion of the TCH DL buffer by turning each 16-bit word into 2 bytes (MSB first) and discarding the last byte that is unused (always 0); * Each line ends with a frame number in decimal, specifically the value of fn_mod_104 variable in the l1s_read_dedic_dl() function when the DSP buffer was read. The addition of the frame number field allows these TCH DL captures to be reconciled against the SACCH multiframe structure, which matters for the rules of DTX. TCH UL substitution: open questions =================================== Moving from the mostly-understood realm of TCH DL capture into the much more experimental realm of TCH UL substitution, we have some open questions: how does this DSP special mode really work? Here is what we know: if we load externally sourced speech frames into otherwise-unused a_du_1 DSP buffer at the time of (fn_report_mod13_mod4 == 3), which is the same time when FACCH or CSD UL would be expected, and set B_PLAY_UL bit in DSP NDB API word d_tch_mode, the speech frame stream going to the other end of the call will be the one we feed into a_du_1 instead of the one produced from the microphone input by the internal speech encoder. But here are the parts we don't know: * If one were to set B_PLAY_UL in d_tch_mode but not feed external UL input into a_du_1 buffer at the needed time, what will happen? * Vice-versa, if one were to load a_du_1 and set its B_BLUD bit without setting B_PLAY_UL in d_tch_mode, what will happen? * Can the frame stream fed into a_du_1 be encoded in DTX-enabled mode, including SID frames? If this possibility is allowed, what magic bits would need to be set where in order to get the correct behavior from the DSP's subsequent burst-by-burst DTX logic? TCH UL substitution: implemented PoC ==================================== Back in 2016 we implemented a proof-of-concept TCH UL play feature in FreeCalypso (only for TCH/FS and TCH/EFS), and the same PoC has been retained when the overall TCH tap facility has been mainlined in late 2022. Having this highly experimental (not fit for production use) TCH UL play code present in our current production fw is deemed acceptable because this code will never be invoked unless the user sends TCH_ULBITS_REQ packets to the running fw via RVTMUX - and if you do send such packets (via tch play command in an fc-shell session or by any other means), you are leaving the realm of production-approved functionality and entering the realm of wild experimentation. The PoC TCH UL play mechanism consists of a small buffer (holding up to 4 FR1 or EFR frames) implemented in the ARM firmware; this buffer is filled by arriving TCH_ULBITS_REQ packets and drained by the tchf_substitute_uplink() function called from l1s_ctrl_tchtf(). Specifically, a flag named tch_ul_play_mode is set when TCH_ULBITS_REQ input is received, telling l1s_ctrl_tchtf() to start calling tchf_substitute_uplink() when (fn_report_mod13_mod4 == 3); the called function drains an uplink frame from the ring buffer, writes it into the DSP's a_du_1 buffer, sets B_PLAY_UL in d_tch_mode and sends a TCH_ULBITS_CONF packet back to the host. If the ring buffer is empty, the function clears both B_PLAY_UL and the firmware's tch_ul_play_mode flag, ending the special TCH UL play mode. This PoC mechanism is meant to be exercised with tch play command in an interactive fc-shell session: this command reads an ASCII line-based uplink data file and sends it to the firmware frame by frame, paced by TCH_ULBITS_CONF packets from the target. The input to this command is a line-based ASCII hex file similar to the format written by tch record, but simplified: each line is just the 33-byte frame to be sent (in TI DSP buffer format, following GSM 05.03 interface 1), without any flags or status words or frame numbers.