view TCH-tap-modes @ 100:48ea323c1c47

Linux-DTR-RTS-flaw: import from freecalypso-hwlab
author Mychaela Falconia <falcon@freecalypso.org>
date Mon, 11 Sep 2023 06:22:47 +0000
parents 8a45cd92e3c3
children 28c1cb869d91
line wrap: on
line source

It has been discovered that the DSP ROM in the Calypso GSM baseband processor
makes it possible to "tap" into speech traffic on GSM traffic channels (TCH):

1) In the downlink direction, the signal processing chain which every GSM MS
   must implement includes a GSM 05.03 channel decoder, operating in one of
   several variants as necessary for each supported TCH mode, followed by speech
   decoders for each supported codec.  TI's DSP naturally implements this
   required signal processing chain, and this implementation includes one nifty
   feature: the bits that make up the internal interface from GSM 05.03 channel
   decoder output to the input of speech decoders are written into the NDB API
   RAM page that is also accessible to the ARM core, and these bits can be
   externally read out.  The act of reading these bits is completely
   non-invasive (we are only reading bits that are already there, not modifying
   anything), thus we can sniff TCH downlink on any voice call in real time
   without disrupting or impacting standard type-approved GSM MS operation in
   any way.

2) In the uplink direction, there is a reverse signal processing chain in which
   the output of the internal speech encoder for the selected codec feeds into
   the input of the corresponding GSM 05.03 channel encoder.  In this direction
   there are two tapping possibilities:

2a) There is a buffer in the NDB API RAM page from which one can read the bits
    that pass from the speech encoder output to the channel encoder input -
    let's call this form of TCH tap "uplink sniffing";

2b) There is a special mode in which the output of the internal speech encoder
    is effectively suppressed and the input to the channel encoder comes from
    another NDB API RAM buffer that needs to be filled by ARM firmware - let's
    call this form of TCH tap "uplink substitution".

Sources of knowledge about these DSP functions
==============================================

For the functions of TCH DL sniffing (tap 1 in the above summary) and TCH UL
substitution (tap 2b in the above summary), the primary source of knowledge is
the defunct '#if TRACE_TYPE==3' code in TSM30 and LoCosto L1 sources.  I call
this code defunct because the TRACE_TYPE preprocessor symbol is set to 4 (not 3)
in both TCS211 and LoCosto versions, and appears to be set to 0 (all trace
disabled) in the ancient TSM30 build.  This code appears to be some very old
test mode, apparently sending some test bit patterns into TCH UL and expecting
the same bit patterns back on TCH DL, presumably with a test instrument such as
CMU200 providing a loopback from UL to DL on this test TCH, and has only
survived in an incomplete form:

* There are '#if TRACE_TYPE==3' stanzas in l1_cmplx.c, in both TSM30 and LoCosto
  versions, that implement DSP buffer writing for TCH UL substitution (TCH/F
  only) and timing control for TCH DL buffer reading (both TCH/F and TCH/H),
  calling a function named play_trace() for the latter.

* There is no play_trace() code in the LoCosto source. but there is an
  hw_debug.c source module in the TSM30 code drop under MCU/Layer1/L1c/Src,
  and it contains (presumed) TI-legacy play_trace() and play_diagnostics()
  functions, once again under '#if (TRACE_TYPE==3)'.  play_trace() reads the
  DSP's TCH DL buffer and saves the bits in an ARM firmware RAM buffer, and
  then play_diagnostics() analyzes the captured booty - and studying the second
  function is how we learn the apparent original intent of doing test bit
  patterns on TCH.

* The code that feeds "UL play" test bit patterns to the earlier-mentioned
  '#if TRACE_TYPE==3' TCH UL substitution code in l1_cmplx.c (apparently once
  hacked into dll_read_dcch() and tx_tch_data()) has not been found anywhere.

For TCH tap 2a in our summary at the beginning of this article (non-invasive
sniffing of TCH UL bits produced by the internal speech encoder) there does not
exist any authoritative source of knowledge.  It naturally follows from
otherwise-known Calypso DSP architecture that these internally produced TCH UL
bits should reside in the "main" a_du_0 buffer (or in a_du_1 when TCH/H
subchannel 1 is active), and I (Mother Mychaela) have heard an anecdotal report
(from someone who once worked with Calypso in a non-community-based manner) that
these UL bits could indeed be read out of this buffer - but in the absence of
an authoritative source, we don't know when would be the correct time to read
this buffer.

In our current state of knowledge, only TCH DL sniffing can be exercised safely:
for UL sniffing we don't know the correct time when the buffer would need to be
read, while active UL substitution is obviously an invasive hack involving a DSP
debug or test feature that is never used in standard GSM MS operation.

Support for different speech codecs
===================================

When it comes to passively sniffing TCH DL and/or UL, we are merely reading bits
that are already there, and basic reasoning tells us that the DSP's DL and UL
buffers involved in this exercise exist in all speech TCH modes supported by
the DSP: FR1, HR1, EFR and AMR.  However:

* The ancient '#if TRACE_TYPE==3' reference code exists only for FR1, HR1 and
  EFR - it clearly predates the addition of AMR in the later Calypso DSP
  versions.

* FR1, HR1 and EFR are the only codecs for which we (FreeCalypso community) know
  the format in which TCH DL bits appear in the DSP's a_dd_0 and a_dd_1 buffers.

* I (Mother Mychaela) have heard an anecdotal report (from the same
  non-community-based party mentioned earlier) that TCH DL bits could be read
  out of a_dd_0 buffer in TCH/AFS (AMR) mode - but I never got any details.

In contrast with passive sniffing, active TCH UL substitution requires explicit
support from the DSP - and this explicit DSP support is known to exist for
certain only for TCH/FS and TCH/EFS channel modes, i.e., for FR1 and EFR codecs
only.  In the case of TCH/HS channel mode (HR1 codec), it *appears* that the DSP
supports UL substitution in this mode too, but this combination has only been
exercised by OsmocomBB people (the original '#if TRACE_TYPE==3' code for UL play
only supports TCH/F), and FreeCalypso policy is to treat everything coming out
of OBB as highly suspect.

What about AMR?  The anecdotal report (from the same already-mentioned party) is
that TCH UL substitution that works for FR1 and EFR appears to NOT work for AMR
- that's all I know - but frankly speaking, given that it's a weird DSP debug
mode that is never needed in standard GSM MS operation, I find it more
surprising that it works for FR1 and EFR than the observation that it doesn't
work for AMR.

FreeCalypso support for TCH tap functions
=========================================

TCH DL sniffing and UL substitution provisions were initially implemented in
FreeCalypso back in 2016, but only in the Citrine version, which was deemed to
be a dead end later that same year.  However, this functionality is now being
resurrected, and it has been incorporated into our production FC Tourmaline
firmware as of 2022-12-13.

In order to activate the function of TCH DL sniffing and save the recording of a
TCH DL session into a file, one needs to use the fc-shell utility from FC host
tools, specifically the tch record command in an interactive fc-shell session.
The format in which TCH DL tap traffic is passed over RVTMUX (an original
FreeCalypso invention) has changed in a slight but incompatible way between the
original hackish version from 2016 and the new production version as of 2022,
and capturing TCH DL with new firmware requires the updated version of fc-shell
that will be released as part of fc-host-tools-r18.  The current (late 2022)
incarnation of FreeCalypso TCH DL sniffing feature supports FR1, HR1 and EFR
codecs, although only FR1 and EFR have been tested so far.

The function of TCH UL substitution is currently implemented in FC Tourmaline
only for FR1 and EFR (no HR1, no AMR), and it likewise requires running an
interactive fc-shell session in which you would invoke the tool's tch play
command.  In the case of TCH UL play feature there has been NO change in the
RVTMUX transport format between 2016 and 2022 versions.

TCH DL DSP buffers and capture format
=====================================

The DSP's NDB API page has two buffers in which TCH DL bits appear: a_dd_0 and
a_dd_1.  All TCH/F modes use a_dd_0, but TCH/H uses one buffer or the other
depending on the subchannel: subchannel 0 uses a_dd_0, subchannel 1 uses a_dd_1.
(It is certainly a strange design - the DSP won't be able to receive and decode
the "wrong" subchannel because it doesn't know the ciphering key for the other
MS - but perhaps the designers of this DSP architecture aeons ago found this
design to somehow flow more naturally with their scheduling of DSP tasks.)  Each
buffer consists of 22 16-bit words - they were originally 20 words, but then
extended to 22 words to support CSD 14.4 kbps mode.

Each TCH buffer in the DSP's NDB API page consists of 3 status or header words
followed by N words of payload, where N depends on TCH mode: 17 for TCH/FS and
TCH/EFS, 8 for TCH/HS, and not-yet-studied for AMR and CSD.  Let's begin our
analysis with the 3 status words that make up the buffer header:

Status word 0 (a_dd_0[0] or a_dd_1[0]) is a word of flag bits.  We don't know
the meaning of every bit in this word, but at least for TCH/FS and TCH/EFS (we
haven't exercised TCH/HS at all) we know the following bits:

* Bit 15 (B_BLUD) is a "buffer filled" or "data present" flag.  This flag is
  observed as 1 in *almost* every 20 ms window in which a traffic frame is
  expected (fn_report_mod13_mod4 == 0 in l1s_read_dedic_dl(), case TCHTF),
  except for certain instances early in the call setup process which remain to
  be studied.

* Bit 14 (B_AF) will be set if the block of 8 half-bursts (block diagonal
  interleaving of GSM 05.03) corresponding to this buffer was channel-decoded
  as speech rather than as FACCH - see further analysis below.

* Bit 9 (B_ECRC) has only ever been observed as 1 when B_AF is set, i.e., when
  the speech-not-FACCH channel decoder was invoked.  In the case of TCH/EFS this
  bit is set to 1 if the EFR-added CRC-8 was bad, and cleared if this CRC-8 was
  good; in the case of TCH/FS this bit has always been observed as 1 and should
  be ignored because there is no CRC-8 in TCH/FS.

* Bit 7 has always been observed as 1 wheneven B_BLUD is set but B_AF is
  cleared, i.e., whenever the block was channel-decoded in FACCH rather than
  speech mode.

* Bits 6:5 indicate the result of FIRE decoding in the event that the FACCH
  decoder was invoked.

* Bits 4:3 carry the ternary SID flag encoded as in section 6.1.1 of GSM 06.31
  and 06.81, but only when the speech-not-FACCH channel decoder was invoked as
  indicated by B_AF.

* Bit 2 is BFI as defined in section 6.1.1 of GSM 06.31 and 06.81.  Whenever
  the block was decoded as FACCH (bit 14 clear, bit 7 set), bit 2 has always
  been observed as set, agreeing with the stipulation in GSM 06.31 and 06.81
  that BFI=1 whenever a FACCH frame has been received.  However, in the case of
  TCH/EFS it appears that CRC-8 status (reported in bit 9) is NOT factored into
  the logic that sets bit 2 - it appears that the subsequent speech decoding
  logic is expected to OR bits 2 and 9 together to get the BFI flag for the Rx
  DTX handler of GSM 06.81.

In the case of 20 ms blocks (reassembled from 8 half-bursts) that were channel-
decoded as speech rather than FACCH, the observed behavior is that bits 15 and
14 are set, the payload portion of the buffer is filled with the output from the
channel decoder, and bits 4:3 are set from this payload by the bit-counting rule
of section 6.1.1 of GSM 06.31 and 06.81 irrespective of the good-or-bad status
in bits 2 and 9.  However, when bit 14 is clear and bit 7 is set, indicating
that the block (from 8 half-bursts) was channel-decoded in FACCH mode, the
following additional behavior is observed:

* The payload portion of the buffer remains unchanged from its previous content,
  last written when a frame was channel-decoded in speech-not-FACCH mode;

* Bit 2 is set, bit 9 is cleared;

* Bits 4:3 are cleared even when they previously indicated SID based on the bit
  pattern in the payload portion of the buffer, even when that SID-encoding
  payload is still there.

In the standard TCH DL signal processing chain, GSM 05.03 channel decoding is
followed by the Rx DTX handler of GSM 06.31 or 06.81 for TCH/FS or TCH/EFS,
respectively.  It appears that the Rx DTX handler implemented in TI's DSP is
driven by this status word 0 at the head of the buffer, and we can only guess
as to its exact logic.  At this point it bears reminding that the functions of
the Rx DTX handler are not rigidly prescribed in the specs: in the case of EFR
the bit-exact reference implementation is normative only in certain aspects
(e.g., comfort noise generation after receiving SID), but is considered a non-
normative example in some other key aspects (all GSM 06.61 functions, including
what happens when a FACCH block was received when speech frames were expected),
and in the case of FR1 there is no bit-exact reference implementation at all,
only general guidance.

Having the curiosity of a cat, I (Mother Mychaela) naturally desire to know
exactly how the Rx DTX handler (the bridge between the channel decoder and the
speech decoder) works in TI's DSP.  A full static reversing job on the DSP ROM
would provide complete answers, but is a very daunting proposition, thus I am
also looking at the idea of behavioral analysis: the output of the speech
decoder can be captured from MCSI on FCDEV3B hardware, or from the VSP tap on
FC Venus if we ever build that board, and if we combine that speech decoder
output capture with the currently-discussed capture of TCH DL buffers, we may
be able to glean some insight into the workings of the Rx DTX handler block: we
could implement a candidate Rx DTX handler clone in software and compare the
output (of this proposed handler followed by the spec-defined speech decoder)
against the actual speech output from the DSP.

Back to our exposition of TCH DL buffer content:

Status word 1 (a_dd_0[1] or a_dd_1[1]) is some kind of DSP measurement or count
which Calypso ARM fw does not need to look at, except when debugging - the only
code which I (Mother Mychaela) could find that does anything with this DSP
status word is the ancient play_diagnostics() code in the TSM30 version
(obviously never included in any production fw); this code looks at the unknown
word in question and calls it "D_MACC".  This play_diagnostics() code compares
the D_MACC reading against a threshold, and if the per-block reading is below
the threshold, an error message is printed.  That's all we know!

Status word 2 (a_dd_0[2] or a_dd_1[2]) is a bit error count: the code in
l1s_read_dedic_dl() reads this error count and uses it for RXQUAL computation
for measurement reports.

If one's area of interest is in replicating Rx DTX handling and speech decoding
that happens in the DSP, status words 1 and 2 can probably be ignored - instead
the important parts are status word 0 (extensively covered above) and the
payload portion of the buffer.

The payload portion of the buffer consists of some number of 16-bit words: 17
of them for TCH/FS and TCH/EFS, or 8 of them for TCH/HS.  The DSP does not have
any notion of 8-bit bytes, instead it operates on 16-bit words as its elementary
data unit.  The ordering of bits within these 16-bit words (in the payload
portion of TCH buffers) is from the most-significant bit toward the least-
significant bit, thus when these TCH buffers are transferred via octet-oriented
interfaces, the upper byte of each word should be transferred first, even though
this byte order is counter to the little-endian byte order of the Calypso ARM
core.

In the case of TCH/FS and TCH/EFS, the fill order of bits in the payload words
is as follows, starting with the most-significant bit of buffer word 3 (first
word of the payload portion):

* 182 bits of class 1;

* 4 dummy bits (always observed as 0);

* 78 bits of class 2;

* the last 8 bits of a_dd_0[19] are unused.

In the case of TCH/HS, the fill order is similar, but modified as appropriate
for TCH/HS:

* 95 bits of class 1;

* 4 dummy bits;

* 17 bits of class 2;

* the last 12 bits of a_dd_0[10] or a_dd_1[10] are unused.

Aside from the insertion of 4 extra dummy bits at the boundary between class 1
and class 2, the overall bit order is that of GSM 05.03 Figure 1 interface 1.

In the case of TCH/EFS, the following additional considerations apply:

* Bits [65:73] in all received DL frames, where CRC-8 would go in the 260-bit
  frame of GSM 05.03 interface 1 for EFR, are always observed as 0, whether
  this CRC-8 was good (a_dd_0[0] bit 9 clear) or bad (a_dd_0[0] bit 9 set).

* The handling of repetition bits (4 bits of 244-bit EFR codec frame, each of
  which is triplicated in the channel encoding for transmission) is unclear.

Further detail regarding the repetition bits of TCH/EFS: distinct bit positions
exist in the 260-bit frame of GSM 05.03 interface 1 (which is the frame format
in the TCH buffers of TI's DSP) for each of the 3 copies of each of the 4
triplicated bits.  It is obvious that correct decoding of these triplicated bits
requires a majority-vote function just like the one implemented in TMR systems
in space gear - but it is not absolutely and unquestionably obvious where this
TMR voting function is implemented in the Rx processing chain of TI's DSP.  It
*appears* that this majority-vote function has already been performed by the DSP
function that writes a_dd_0, and that the first bit position out of each group
of 3 holds the output of this voting function, so that the subsequent speech
decoder only needs to use those "cooked" bits - but there is this mystery:

* At certain times, particularly during the main part of a test call, TCH DL
  buffer readouts contain zeros in the "extra" repetition bit positions: for
  each group of 3 bits, the first will contain 0 or 1, but the other two will
  always be 0.

* At other times, seemingly in the beginning and ending parts of test calls,
  TCH DL buffer readouts contain matching bit values in all 3 positions: for
  each group of 3 bits, if the first bit is 0, the other two will also be 0, or
  if the first bit is 1, then the other two will also be 1.

One possibility is that the DSP applies the required majority-voting function,
writes its output into the first bit position of each group of 3, but then
sometimes (and not at other times) applies another function that writes the
voting function output into the remaining bit positions, perhaps for loopback
of TCH DL into TCH UL.  More study is needed in this area.

FreeCalypso file format for TCH DL captures
===========================================

The file format written by fc-shell tch record command is ASCII hex, line-based,
with one line for every captured 20 ms window.  The new format as of 2022 is:

* Each line begins with an FR, HR or EFR keyword indicating which variant of
  TCH DL has been captured;

* This keyword is followed by 3 space-separated DSP status words, each written
  as 4 hex digits;

* The main body of the frame is written as 33 (TCH/FS & TCH/EFS) or 15 (TCH/HS)
  hex bytes, produced from the payload portion of the TCH DL buffer by turning
  each 16-bit word into 2 bytes (MSB first) and discarding the last byte that
  is unused (always 0);

* Each line ends with a frame number in decimal, specifically the value of
  fn_mod_104 variable in the l1s_read_dedic_dl() function when the DSP buffer
  was read.

The addition of the frame number field allows these TCH DL captures to be
reconciled against the SACCH multiframe structure, which matters for the rules
of DTX.

TCH UL substitution: open questions
===================================

Moving from the mostly-understood realm of TCH DL capture into the much more
experimental realm of TCH UL substitution, we have some open questions: how does
this DSP special mode really work?  Here is what we know: if we load externally
sourced speech frames into otherwise-unused a_du_1 DSP buffer at the time of
(fn_report_mod13_mod4 == 3), which is the same time when FACCH or CSD UL would
be expected, and set B_PLAY_UL bit in DSP NDB API word d_tch_mode, the speech
frame stream going to the other end of the call will be the one we feed into
a_du_1 instead of the one produced from the microphone input by the internal
speech encoder.  But here are the parts we don't know:

* If one were to set B_PLAY_UL in d_tch_mode but not feed external UL input
  into a_du_1 buffer at the needed time, what will happen?

* Vice-versa, if one were to load a_du_1 and set its B_BLUD bit without setting
  B_PLAY_UL in d_tch_mode, what will happen?

* Can the frame stream fed into a_du_1 be encoded in DTX-enabled mode, including
  SID frames?  If this possibility is allowed, what magic bits would need to be
  set where in order to get the correct behavior from the DSP's subsequent
  burst-by-burst DTX logic?

TCH UL substitution: implemented PoC
====================================

Back in 2016 we implemented a proof-of-concept TCH UL play feature in
FreeCalypso (only for TCH/FS and TCH/EFS), and the same PoC has been retained
when the overall TCH tap facility has been mainlined in late 2022.  Having this
highly experimental (not fit for production use) TCH UL play code present in our
current production fw is deemed acceptable because this code will never be
invoked unless the user sends TCH_ULBITS_REQ packets to the running fw via
RVTMUX - and if you do send such packets (via tch play command in an fc-shell
session or by any other means), you are leaving the realm of production-approved
functionality and entering the realm of wild experimentation.

The PoC TCH UL play mechanism consists of a small buffer (holding up to 4 FR1 or
EFR frames) implemented in the ARM firmware; this buffer is filled by arriving
TCH_ULBITS_REQ packets and drained by the tchf_substitute_uplink() function
called from l1s_ctrl_tchtf().  Specifically, a flag named tch_ul_play_mode is
set when TCH_ULBITS_REQ input is received, telling l1s_ctrl_tchtf() to start
calling tchf_substitute_uplink() when (fn_report_mod13_mod4 == 3); the called
function drains an uplink frame from the ring buffer, writes it into the DSP's
a_du_1 buffer, sets B_PLAY_UL in d_tch_mode and sends a TCH_ULBITS_CONF packet
back to the host.  If the ring buffer is empty, the function clears both
B_PLAY_UL and the firmware's tch_ul_play_mode flag, ending the special TCH UL
play mode.

This PoC mechanism is meant to be exercised with tch play command in an
interactive fc-shell session: this command reads an ASCII line-based uplink data
file and sends it to the firmware frame by frame, paced by TCH_ULBITS_CONF
packets from the target.  The input to this command is a line-based ASCII hex
file similar to the format written by tch record, but simplified: each line is
just the 33-byte frame to be sent (in TI DSP buffer format, following GSM 05.03
interface 1), without any flags or status words or frame numbers.