diff TCH-tap-modes @ 95:8a45cd92e3c3

TCH-tap-modes: new article
author Mychaela Falconia <falcon@freecalypso.org>
date Mon, 19 Dec 2022 02:02:28 +0000
parents
children 28c1cb869d91
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/TCH-tap-modes	Mon Dec 19 02:02:28 2022 +0000
@@ -0,0 +1,416 @@
+It has been discovered that the DSP ROM in the Calypso GSM baseband processor
+makes it possible to "tap" into speech traffic on GSM traffic channels (TCH):
+
+1) In the downlink direction, the signal processing chain which every GSM MS
+   must implement includes a GSM 05.03 channel decoder, operating in one of
+   several variants as necessary for each supported TCH mode, followed by speech
+   decoders for each supported codec.  TI's DSP naturally implements this
+   required signal processing chain, and this implementation includes one nifty
+   feature: the bits that make up the internal interface from GSM 05.03 channel
+   decoder output to the input of speech decoders are written into the NDB API
+   RAM page that is also accessible to the ARM core, and these bits can be
+   externally read out.  The act of reading these bits is completely
+   non-invasive (we are only reading bits that are already there, not modifying
+   anything), thus we can sniff TCH downlink on any voice call in real time
+   without disrupting or impacting standard type-approved GSM MS operation in
+   any way.
+
+2) In the uplink direction, there is a reverse signal processing chain in which
+   the output of the internal speech encoder for the selected codec feeds into
+   the input of the corresponding GSM 05.03 channel encoder.  In this direction
+   there are two tapping possibilities:
+
+2a) There is a buffer in the NDB API RAM page from which one can read the bits
+    that pass from the speech encoder output to the channel encoder input -
+    let's call this form of TCH tap "uplink sniffing";
+
+2b) There is a special mode in which the output of the internal speech encoder
+    is effectively suppressed and the input to the channel encoder comes from
+    another NDB API RAM buffer that needs to be filled by ARM firmware - let's
+    call this form of TCH tap "uplink substitution".
+
+Sources of knowledge about these DSP functions
+==============================================
+
+For the functions of TCH DL sniffing (tap 1 in the above summary) and TCH UL
+substitution (tap 2b in the above summary), the primary source of knowledge is
+the defunct '#if TRACE_TYPE==3' code in TSM30 and LoCosto L1 sources.  I call
+this code defunct because the TRACE_TYPE preprocessor symbol is set to 4 (not 3)
+in both TCS211 and LoCosto versions, and appears to be set to 0 (all trace
+disabled) in the ancient TSM30 build.  This code appears to be some very old
+test mode, apparently sending some test bit patterns into TCH UL and expecting
+the same bit patterns back on TCH DL, presumably with a test instrument such as
+CMU200 providing a loopback from UL to DL on this test TCH, and has only
+survived in an incomplete form:
+
+* There are '#if TRACE_TYPE==3' stanzas in l1_cmplx.c, in both TSM30 and LoCosto
+  versions, that implement DSP buffer writing for TCH UL substitution (TCH/F
+  only) and timing control for TCH DL buffer reading (both TCH/F and TCH/H),
+  calling a function named play_trace() for the latter.
+
+* There is no play_trace() code in the LoCosto source. but there is an
+  hw_debug.c source module in the TSM30 code drop under MCU/Layer1/L1c/Src,
+  and it contains (presumed) TI-legacy play_trace() and play_diagnostics()
+  functions, once again under '#if (TRACE_TYPE==3)'.  play_trace() reads the
+  DSP's TCH DL buffer and saves the bits in an ARM firmware RAM buffer, and
+  then play_diagnostics() analyzes the captured booty - and studying the second
+  function is how we learn the apparent original intent of doing test bit
+  patterns on TCH.
+
+* The code that feeds "UL play" test bit patterns to the earlier-mentioned
+  '#if TRACE_TYPE==3' TCH UL substitution code in l1_cmplx.c (apparently once
+  hacked into dll_read_dcch() and tx_tch_data()) has not been found anywhere.
+
+For TCH tap 2a in our summary at the beginning of this article (non-invasive
+sniffing of TCH UL bits produced by the internal speech encoder) there does not
+exist any authoritative source of knowledge.  It naturally follows from
+otherwise-known Calypso DSP architecture that these internally produced TCH UL
+bits should reside in the "main" a_du_0 buffer (or in a_du_1 when TCH/H
+subchannel 1 is active), and I (Mother Mychaela) have heard an anecdotal report
+(from someone who once worked with Calypso in a non-community-based manner) that
+these UL bits could indeed be read out of this buffer - but in the absence of
+an authoritative source, we don't know when would be the correct time to read
+this buffer.
+
+In our current state of knowledge, only TCH DL sniffing can be exercised safely:
+for UL sniffing we don't know the correct time when the buffer would need to be
+read, while active UL substitution is obviously an invasive hack involving a DSP
+debug or test feature that is never used in standard GSM MS operation.
+
+Support for different speech codecs
+===================================
+
+When it comes to passively sniffing TCH DL and/or UL, we are merely reading bits
+that are already there, and basic reasoning tells us that the DSP's DL and UL
+buffers involved in this exercise exist in all speech TCH modes supported by
+the DSP: FR1, HR1, EFR and AMR.  However:
+
+* The ancient '#if TRACE_TYPE==3' reference code exists only for FR1, HR1 and
+  EFR - it clearly predates the addition of AMR in the later Calypso DSP
+  versions.
+
+* FR1, HR1 and EFR are the only codecs for which we (FreeCalypso community) know
+  the format in which TCH DL bits appear in the DSP's a_dd_0 and a_dd_1 buffers.
+
+* I (Mother Mychaela) have heard an anecdotal report (from the same
+  non-community-based party mentioned earlier) that TCH DL bits could be read
+  out of a_dd_0 buffer in TCH/AFS (AMR) mode - but I never got any details.
+
+In contrast with passive sniffing, active TCH UL substitution requires explicit
+support from the DSP - and this explicit DSP support is known to exist for
+certain only for TCH/FS and TCH/EFS channel modes, i.e., for FR1 and EFR codecs
+only.  In the case of TCH/HS channel mode (HR1 codec), it *appears* that the DSP
+supports UL substitution in this mode too, but this combination has only been
+exercised by OsmocomBB people (the original '#if TRACE_TYPE==3' code for UL play
+only supports TCH/F), and FreeCalypso policy is to treat everything coming out
+of OBB as highly suspect.
+
+What about AMR?  The anecdotal report (from the same already-mentioned party) is
+that TCH UL substitution that works for FR1 and EFR appears to NOT work for AMR
+- that's all I know - but frankly speaking, given that it's a weird DSP debug
+mode that is never needed in standard GSM MS operation, I find it more
+surprising that it works for FR1 and EFR than the observation that it doesn't
+work for AMR.
+
+FreeCalypso support for TCH tap functions
+=========================================
+
+TCH DL sniffing and UL substitution provisions were initially implemented in
+FreeCalypso back in 2016, but only in the Citrine version, which was deemed to
+be a dead end later that same year.  However, this functionality is now being
+resurrected, and it has been incorporated into our production FC Tourmaline
+firmware as of 2022-12-13.
+
+In order to activate the function of TCH DL sniffing and save the recording of a
+TCH DL session into a file, one needs to use the fc-shell utility from FC host
+tools, specifically the tch record command in an interactive fc-shell session.
+The format in which TCH DL tap traffic is passed over RVTMUX (an original
+FreeCalypso invention) has changed in a slight but incompatible way between the
+original hackish version from 2016 and the new production version as of 2022,
+and capturing TCH DL with new firmware requires the updated version of fc-shell
+that will be released as part of fc-host-tools-r18.  The current (late 2022)
+incarnation of FreeCalypso TCH DL sniffing feature supports FR1, HR1 and EFR
+codecs, although only FR1 and EFR have been tested so far.
+
+The function of TCH UL substitution is currently implemented in FC Tourmaline
+only for FR1 and EFR (no HR1, no AMR), and it likewise requires running an
+interactive fc-shell session in which you would invoke the tool's tch play
+command.  In the case of TCH UL play feature there has been NO change in the
+RVTMUX transport format between 2016 and 2022 versions.
+
+TCH DL DSP buffers and capture format
+=====================================
+
+The DSP's NDB API page has two buffers in which TCH DL bits appear: a_dd_0 and
+a_dd_1.  All TCH/F modes use a_dd_0, but TCH/H uses one buffer or the other
+depending on the subchannel: subchannel 0 uses a_dd_0, subchannel 1 uses a_dd_1.
+(It is certainly a strange design - the DSP won't be able to receive and decode
+the "wrong" subchannel because it doesn't know the ciphering key for the other
+MS - but perhaps the designers of this DSP architecture aeons ago found this
+design to somehow flow more naturally with their scheduling of DSP tasks.)  Each
+buffer consists of 22 16-bit words - they were originally 20 words, but then
+extended to 22 words to support CSD 14.4 kbps mode.
+
+Each TCH buffer in the DSP's NDB API page consists of 3 status or header words
+followed by N words of payload, where N depends on TCH mode: 17 for TCH/FS and
+TCH/EFS, 8 for TCH/HS, and not-yet-studied for AMR and CSD.  Let's begin our
+analysis with the 3 status words that make up the buffer header:
+
+Status word 0 (a_dd_0[0] or a_dd_1[0]) is a word of flag bits.  We don't know
+the meaning of every bit in this word, but at least for TCH/FS and TCH/EFS (we
+haven't exercised TCH/HS at all) we know the following bits:
+
+* Bit 15 (B_BLUD) is a "buffer filled" or "data present" flag.  This flag is
+  observed as 1 in *almost* every 20 ms window in which a traffic frame is
+  expected (fn_report_mod13_mod4 == 0 in l1s_read_dedic_dl(), case TCHTF),
+  except for certain instances early in the call setup process which remain to
+  be studied.
+
+* Bit 14 (B_AF) will be set if the block of 8 half-bursts (block diagonal
+  interleaving of GSM 05.03) corresponding to this buffer was channel-decoded
+  as speech rather than as FACCH - see further analysis below.
+
+* Bit 9 (B_ECRC) has only ever been observed as 1 when B_AF is set, i.e., when
+  the speech-not-FACCH channel decoder was invoked.  In the case of TCH/EFS this
+  bit is set to 1 if the EFR-added CRC-8 was bad, and cleared if this CRC-8 was
+  good; in the case of TCH/FS this bit has always been observed as 1 and should
+  be ignored because there is no CRC-8 in TCH/FS.
+
+* Bit 7 has always been observed as 1 wheneven B_BLUD is set but B_AF is
+  cleared, i.e., whenever the block was channel-decoded in FACCH rather than
+  speech mode.
+
+* Bits 6:5 indicate the result of FIRE decoding in the event that the FACCH
+  decoder was invoked.
+
+* Bits 4:3 carry the ternary SID flag encoded as in section 6.1.1 of GSM 06.31
+  and 06.81, but only when the speech-not-FACCH channel decoder was invoked as
+  indicated by B_AF.
+
+* Bit 2 is BFI as defined in section 6.1.1 of GSM 06.31 and 06.81.  Whenever
+  the block was decoded as FACCH (bit 14 clear, bit 7 set), bit 2 has always
+  been observed as set, agreeing with the stipulation in GSM 06.31 and 06.81
+  that BFI=1 whenever a FACCH frame has been received.  However, in the case of
+  TCH/EFS it appears that CRC-8 status (reported in bit 9) is NOT factored into
+  the logic that sets bit 2 - it appears that the subsequent speech decoding
+  logic is expected to OR bits 2 and 9 together to get the BFI flag for the Rx
+  DTX handler of GSM 06.81.
+
+In the case of 20 ms blocks (reassembled from 8 half-bursts) that were channel-
+decoded as speech rather than FACCH, the observed behavior is that bits 15 and
+14 are set, the payload portion of the buffer is filled with the output from the
+channel decoder, and bits 4:3 are set from this payload by the bit-counting rule
+of section 6.1.1 of GSM 06.31 and 06.81 irrespective of the good-or-bad status
+in bits 2 and 9.  However, when bit 14 is clear and bit 7 is set, indicating
+that the block (from 8 half-bursts) was channel-decoded in FACCH mode, the
+following additional behavior is observed:
+
+* The payload portion of the buffer remains unchanged from its previous content,
+  last written when a frame was channel-decoded in speech-not-FACCH mode;
+
+* Bit 2 is set, bit 9 is cleared;
+
+* Bits 4:3 are cleared even when they previously indicated SID based on the bit
+  pattern in the payload portion of the buffer, even when that SID-encoding
+  payload is still there.
+
+In the standard TCH DL signal processing chain, GSM 05.03 channel decoding is
+followed by the Rx DTX handler of GSM 06.31 or 06.81 for TCH/FS or TCH/EFS,
+respectively.  It appears that the Rx DTX handler implemented in TI's DSP is
+driven by this status word 0 at the head of the buffer, and we can only guess
+as to its exact logic.  At this point it bears reminding that the functions of
+the Rx DTX handler are not rigidly prescribed in the specs: in the case of EFR
+the bit-exact reference implementation is normative only in certain aspects
+(e.g., comfort noise generation after receiving SID), but is considered a non-
+normative example in some other key aspects (all GSM 06.61 functions, including
+what happens when a FACCH block was received when speech frames were expected),
+and in the case of FR1 there is no bit-exact reference implementation at all,
+only general guidance.
+
+Having the curiosity of a cat, I (Mother Mychaela) naturally desire to know
+exactly how the Rx DTX handler (the bridge between the channel decoder and the
+speech decoder) works in TI's DSP.  A full static reversing job on the DSP ROM
+would provide complete answers, but is a very daunting proposition, thus I am
+also looking at the idea of behavioral analysis: the output of the speech
+decoder can be captured from MCSI on FCDEV3B hardware, or from the VSP tap on
+FC Venus if we ever build that board, and if we combine that speech decoder
+output capture with the currently-discussed capture of TCH DL buffers, we may
+be able to glean some insight into the workings of the Rx DTX handler block: we
+could implement a candidate Rx DTX handler clone in software and compare the
+output (of this proposed handler followed by the spec-defined speech decoder)
+against the actual speech output from the DSP.
+
+Back to our exposition of TCH DL buffer content:
+
+Status word 1 (a_dd_0[1] or a_dd_1[1]) is some kind of DSP measurement or count
+which Calypso ARM fw does not need to look at, except when debugging - the only
+code which I (Mother Mychaela) could find that does anything with this DSP
+status word is the ancient play_diagnostics() code in the TSM30 version
+(obviously never included in any production fw); this code looks at the unknown
+word in question and calls it "D_MACC".  This play_diagnostics() code compares
+the D_MACC reading against a threshold, and if the per-block reading is below
+the threshold, an error message is printed.  That's all we know!
+
+Status word 2 (a_dd_0[2] or a_dd_1[2]) is a bit error count: the code in
+l1s_read_dedic_dl() reads this error count and uses it for RXQUAL computation
+for measurement reports.
+
+If one's area of interest is in replicating Rx DTX handling and speech decoding
+that happens in the DSP, status words 1 and 2 can probably be ignored - instead
+the important parts are status word 0 (extensively covered above) and the
+payload portion of the buffer.
+
+The payload portion of the buffer consists of some number of 16-bit words: 17
+of them for TCH/FS and TCH/EFS, or 8 of them for TCH/HS.  The DSP does not have
+any notion of 8-bit bytes, instead it operates on 16-bit words as its elementary
+data unit.  The ordering of bits within these 16-bit words (in the payload
+portion of TCH buffers) is from the most-significant bit toward the least-
+significant bit, thus when these TCH buffers are transferred via octet-oriented
+interfaces, the upper byte of each word should be transferred first, even though
+this byte order is counter to the little-endian byte order of the Calypso ARM
+core.
+
+In the case of TCH/FS and TCH/EFS, the fill order of bits in the payload words
+is as follows, starting with the most-significant bit of buffer word 3 (first
+word of the payload portion):
+
+* 182 bits of class 1;
+
+* 4 dummy bits (always observed as 0);
+
+* 78 bits of class 2;
+
+* the last 8 bits of a_dd_0[19] are unused.
+
+In the case of TCH/HS, the fill order is similar, but modified as appropriate
+for TCH/HS:
+
+* 95 bits of class 1;
+
+* 4 dummy bits;
+
+* 17 bits of class 2;
+
+* the last 12 bits of a_dd_0[10] or a_dd_1[10] are unused.
+
+Aside from the insertion of 4 extra dummy bits at the boundary between class 1
+and class 2, the overall bit order is that of GSM 05.03 Figure 1 interface 1.
+
+In the case of TCH/EFS, the following additional considerations apply:
+
+* Bits [65:73] in all received DL frames, where CRC-8 would go in the 260-bit
+  frame of GSM 05.03 interface 1 for EFR, are always observed as 0, whether
+  this CRC-8 was good (a_dd_0[0] bit 9 clear) or bad (a_dd_0[0] bit 9 set).
+
+* The handling of repetition bits (4 bits of 244-bit EFR codec frame, each of
+  which is triplicated in the channel encoding for transmission) is unclear.
+
+Further detail regarding the repetition bits of TCH/EFS: distinct bit positions
+exist in the 260-bit frame of GSM 05.03 interface 1 (which is the frame format
+in the TCH buffers of TI's DSP) for each of the 3 copies of each of the 4
+triplicated bits.  It is obvious that correct decoding of these triplicated bits
+requires a majority-vote function just like the one implemented in TMR systems
+in space gear - but it is not absolutely and unquestionably obvious where this
+TMR voting function is implemented in the Rx processing chain of TI's DSP.  It
+*appears* that this majority-vote function has already been performed by the DSP
+function that writes a_dd_0, and that the first bit position out of each group
+of 3 holds the output of this voting function, so that the subsequent speech
+decoder only needs to use those "cooked" bits - but there is this mystery:
+
+* At certain times, particularly during the main part of a test call, TCH DL
+  buffer readouts contain zeros in the "extra" repetition bit positions: for
+  each group of 3 bits, the first will contain 0 or 1, but the other two will
+  always be 0.
+
+* At other times, seemingly in the beginning and ending parts of test calls,
+  TCH DL buffer readouts contain matching bit values in all 3 positions: for
+  each group of 3 bits, if the first bit is 0, the other two will also be 0, or
+  if the first bit is 1, then the other two will also be 1.
+
+One possibility is that the DSP applies the required majority-voting function,
+writes its output into the first bit position of each group of 3, but then
+sometimes (and not at other times) applies another function that writes the
+voting function output into the remaining bit positions, perhaps for loopback
+of TCH DL into TCH UL.  More study is needed in this area.
+
+FreeCalypso file format for TCH DL captures
+===========================================
+
+The file format written by fc-shell tch record command is ASCII hex, line-based,
+with one line for every captured 20 ms window.  The new format as of 2022 is:
+
+* Each line begins with an FR, HR or EFR keyword indicating which variant of
+  TCH DL has been captured;
+
+* This keyword is followed by 3 space-separated DSP status words, each written
+  as 4 hex digits;
+
+* The main body of the frame is written as 33 (TCH/FS & TCH/EFS) or 15 (TCH/HS)
+  hex bytes, produced from the payload portion of the TCH DL buffer by turning
+  each 16-bit word into 2 bytes (MSB first) and discarding the last byte that
+  is unused (always 0);
+
+* Each line ends with a frame number in decimal, specifically the value of
+  fn_mod_104 variable in the l1s_read_dedic_dl() function when the DSP buffer
+  was read.
+
+The addition of the frame number field allows these TCH DL captures to be
+reconciled against the SACCH multiframe structure, which matters for the rules
+of DTX.
+
+TCH UL substitution: open questions
+===================================
+
+Moving from the mostly-understood realm of TCH DL capture into the much more
+experimental realm of TCH UL substitution, we have some open questions: how does
+this DSP special mode really work?  Here is what we know: if we load externally
+sourced speech frames into otherwise-unused a_du_1 DSP buffer at the time of
+(fn_report_mod13_mod4 == 3), which is the same time when FACCH or CSD UL would
+be expected, and set B_PLAY_UL bit in DSP NDB API word d_tch_mode, the speech
+frame stream going to the other end of the call will be the one we feed into
+a_du_1 instead of the one produced from the microphone input by the internal
+speech encoder.  But here are the parts we don't know:
+
+* If one were to set B_PLAY_UL in d_tch_mode but not feed external UL input
+  into a_du_1 buffer at the needed time, what will happen?
+
+* Vice-versa, if one were to load a_du_1 and set its B_BLUD bit without setting
+  B_PLAY_UL in d_tch_mode, what will happen?
+
+* Can the frame stream fed into a_du_1 be encoded in DTX-enabled mode, including
+  SID frames?  If this possibility is allowed, what magic bits would need to be
+  set where in order to get the correct behavior from the DSP's subsequent
+  burst-by-burst DTX logic?
+
+TCH UL substitution: implemented PoC
+====================================
+
+Back in 2016 we implemented a proof-of-concept TCH UL play feature in
+FreeCalypso (only for TCH/FS and TCH/EFS), and the same PoC has been retained
+when the overall TCH tap facility has been mainlined in late 2022.  Having this
+highly experimental (not fit for production use) TCH UL play code present in our
+current production fw is deemed acceptable because this code will never be
+invoked unless the user sends TCH_ULBITS_REQ packets to the running fw via
+RVTMUX - and if you do send such packets (via tch play command in an fc-shell
+session or by any other means), you are leaving the realm of production-approved
+functionality and entering the realm of wild experimentation.
+
+The PoC TCH UL play mechanism consists of a small buffer (holding up to 4 FR1 or
+EFR frames) implemented in the ARM firmware; this buffer is filled by arriving
+TCH_ULBITS_REQ packets and drained by the tchf_substitute_uplink() function
+called from l1s_ctrl_tchtf().  Specifically, a flag named tch_ul_play_mode is
+set when TCH_ULBITS_REQ input is received, telling l1s_ctrl_tchtf() to start
+calling tchf_substitute_uplink() when (fn_report_mod13_mod4 == 3); the called
+function drains an uplink frame from the ring buffer, writes it into the DSP's
+a_du_1 buffer, sets B_PLAY_UL in d_tch_mode and sends a TCH_ULBITS_CONF packet
+back to the host.  If the ring buffer is empty, the function clears both
+B_PLAY_UL and the firmware's tch_ul_play_mode flag, ending the special TCH UL
+play mode.
+
+This PoC mechanism is meant to be exercised with tch play command in an
+interactive fc-shell session: this command reads an ASCII line-based uplink data
+file and sends it to the firmware frame by frame, paced by TCH_ULBITS_CONF
+packets from the target.  The input to this command is a line-based ASCII hex
+file similar to the format written by tch record, but simplified: each line is
+just the 33-byte frame to be sent (in TI DSP buffer format, following GSM 05.03
+interface 1), without any flags or status words or frame numbers.