diff doc/RTP-TRAUlike-format @ 207:185225722714

doc: new extended RTP format
author Mychaela Falconia <falcon@freecalypso.org>
date Thu, 06 Apr 2023 21:30:33 -0800
parents
children f0b90591f67c
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/RTP-TRAUlike-format	Thu Apr 06 21:30:33 2023 -0800
@@ -0,0 +1,417 @@
+TRAU-UL-like RTP transport format for FR & EFR codecs
+=====================================================
+
+The generally accepted industry standard format for RTP transport of FR and EFR
+codec frames in an IP-based GSM RAN is given in ETSI TS 101 318; the same format
+is also codified in IETF RFC 3551.  However, when compared to the classic
+TRAU-UL format of 3GPP TS 48.060, the standard RTP format of RFC 3551 exhibits
+the following two shortcomings:
+
+1) no way to indicate a BFI condition and still send frame data bits;
+2) no way to transport the Time Alignment Flag (TAF).
+
+Both of these shortcomings will be explained in detail further in this document;
+however, the primary purpose of this document is to propose a new, regrettably
+non-standard, RTP transport format for FR & EFR codecs, for use only within a
+GSM RAN and the immediately attached CN transcoder ("soft TRAU"), that provides
+the same functionality as the classic TRAU-UL format of TS 48.060, but is
+carried over RTP in IP rather than a 16 kbps TDM subchannel.
+
+The non-standard RTP transport format presented in this document is implemented
+in OsmoBTS on a private feature branch:
+
+https://cgit.osmocom.org/osmo-bts/log/?h=falconia/rtp_traulike
+
+OsmoBTS versions that include this code always accept TRAUlike FR/EFR packets
+on their RTP input, following the principle of being liberal in what you accept
+while being conservative in what you send, but emit such packets on their RTP
+output only when this non-default vty config option is given:
+
+rtp fr-efr-traulike
+
+The recently added (mainline) "rtp continuous-streaming" vty config option also
+needs to be enabled.
+
+The present document serves as the formal specification for the TRAUlike RTP
+transport format for FR and EFR.
+
+Detailed description of shortcomings of standard RTP transport for FR & EFR
+===========================================================================
+
+These shortcomings are solved in the TRAUlike RTP transport format defined in
+this document; understanding these shortcomings provides the essential rationale
+for TRAU-like RTP.
+
+Indicating BFI along with data bits
+-----------------------------------
+
+The only way to indicate a BFI condition in standard RTP (for FR/EFR) is to
+either send no packet at all in the 20 ms window in question (industry standard
+behavior and OsmoBTS default) or send an RTP packet with a zero-length payload
+("rtp continuous-streaming" option in OsmoBTS).  The latter option provides a
+timing tick for a CN-attached transcoder relying on the BTS-originating RTP
+stream as its timing source, but there is still no way to send a frame of
+marked-erroneous data bits.  Contrast with TS 48.060 TRAU-UL format: in this
+format the Dn bits carrying FR or EFR frame bits and the C12 bit carrying BFI
+are orthogonal.
+
+Why would one care about known-bad or deemed-to-be-bad frame data bits?  They
+do matter at least in the case of EFR: the official reference C-source EFR
+decoder from ETSI makes use of the "fixed codebook excitation pulses" portion
+of its EFR frame bits input (140 bits out of 244) even when BFI=1.  This
+portion of reference C-source behavior is declared to be a non-normative example
+by the text of GSM 06.61 spec, thus there may be other compliant EFR decoder
+implementations that never look at marked-erroneous data bits - but given the
+ease of simply using the C code from ETSI as-is, or recoding it more efficiently
+but keeping unchanged all bit-exact algorithms, including non-normative ones,
+we should expect that the behavior of ETSI reference code is retained in many
+production implementations and deployments.
+
+Consider the case where a traditional E1-based BTS with a classic TRAU interface
+is attached to an IP-based Osmocom RAN by way of OsmoMGW, and the resulting RTP
+stream then (after passing through another OsmoMGW instance at the MSC) goes to
+a "soft TRAU" transcoder (TC) in the CN.  The TC will feed its RTP input to FR
+and EFR decoders, and at least the EFR decoder makes use of "fixed codebook
+excitation pulses" bits from erroneous frames.  Furthermore, the TC may
+implement in-band TFO (3GPP TS 28.062) inside its G.711 RTP output, in which
+case it will need to insert a slightly modified TRAU-UL frame into that output.
+The bits that would ideally be fed to the ETSI EFR decoder and emitted to the
+outside world in TFO frames already exist at the output of the E1-based BTS,
+but they get lost in the RTP transport when the industry standard RTP payload
+format is used.
+
+Consider another case where OsmoBTS does have an FR/EFR traffic frame that
+could potentially be sent out, but it is suppressed by the
+(tch_ind->lqual_cb >= bts->min_qual_norm) check in l1sap_tch_ind() in
+src/common/l1sap.c.  In this case it would be ideal to send out that frame
+along with a BFI=1 indication, if the RTP transport format were to allow such
+representation.
+
+Lack of TAF bit in standard RTP transport
+-----------------------------------------
+
+The TRAU-UL frame format of TS 48.060 for FR and EFR includes a bit called TAF,
+for Time Alignment Flag.  Per the specs (TS 48.060 refers to TS 46.031 for
+definition and coding of frame indicators) this bit shall be set to 1 in one
+particular position in the 480 ms SACCH multiframe (the particular 20 ms frame
+position in which a valid frame is always transmitted, even during DTX pauses)
+and set to 0 in all other frames.  This flag factors into the Rx DTX handler
+logic prescribed in GSM 06.31 and 06.81 specs for FR and EFR, respectively, and
+there exist production decoders for these codecs that implement their Rx DTX
+handler function exactly to the letter of the specs, including the use of TAF
+bit when deciding what to do with a BFI=1 frame received in the comfort noise
+generation state.  (These spec-compliant decoders include the reference ETSI
+C-source decoder for EFR and Themyscira libgsmfrp for FR.)
+
+This TAF bit does not exist in the standard RTP transport for FR & EFR.  The
+lack of this TAF bit causes the following problems for the CN-attached "soft
+TRAU" transcoder:
+
+1) The ability to implement spec-compliant handling of GSM 06.11 or 06.61
+   section 5.4 requirement (same section in both specs) is lost;
+
+2) The TC won't know when to set the TAF bit in its outgoing TFO frames, if it
+   implements in-band TFO per 3GPP TS 28.062.
+
+The TFO problem is particularly concerning because these TFO frames are emitted
+to the outside world, outside of administrative and technical control of the
+party implementing the Osmocom-based GSM network and the TC at its edge.  The
+resulting G.711 octet stream with TFO frames embedded inside can be carried
+half-way around the world by the international toll telephone network, and there
+is no telling what kind of implementation may be receiving and decoding these
+bits on the other end.  For this reason, "poor man's" workarounds in the
+RTP-fed, TFO-generating TC are very unattractive:
+
+* If the TC were to set TAF=0 in all TFO frames it generates, the receiver's
+  expectation of seeing TAF=1 in every 24th frame will be violated.
+
+* If the TC were to arbitrarily set TAF=1 in every 24th frame by its own free-
+  running count, without knowledge of the actual SACCH alignment in the original
+  GSM call leg, these TAF-marked frames won't coincide with those frame
+  positions where the MS sends its SID frames, and the resulting TFO frame
+  stream will be invalid to the receiving Rx DTX handler on the far end.
+
+The knowledge of which frames need to be marked with TAF=1 exists inside the
+entity that generates the FR/EFR RTP stream: if this entity is a converter from
+E1-based Abis to RTP, the TRAU-UL frames from the BTS contain this TAF bit, and
+if the RTP-generating entity is a native IP BTS, it knows the frame number for
+which it generates each RTP packet.  The only problem is that there is no place
+to insert this TAF bit in the standard RTP transport format of TS 101 318.
+
+Why TRAU-UL and not TRAU-DL
+===========================
+
+The present document argues the case that the industry standard RTP transport
+format for FR & EFR is functionally crippled compared to the TRAU-UL transport
+format of 3GPP TS 48.060, and defines an alternative RTP transport format that
+can be used by those who desire TRAU-UL-like functionality badly enough to
+accept the price of going totally non-standard in their IP RAN transport.  The
+new RTP transport format defined in this document explicitly mimics the
+functionality and semantics of TS 48.060 TRAU-UL for FR and EFR.
+
+At this point a reader may reasonably ask: why TRAU-UL and not TRAU-DL?  The
+answer is TFO: 3GPP TS 28.062 and its predecessor GSM 08.62 define the TFO frame
+format as being based on TRAU-UL frames with only a few bits changed, and no
+change in semantics of any of the frame indicator bits of TRAU-UL (C12 through
+C17).  Whereas the Abis interface is inherently asymmetric (TRAU-UL frames in
+one direction, TRAU-DL frames in the other direction), end-to-end TFO is
+directionally symmetric.  If we imagine a TFO call between Alice in America and
+Bob in Britain, there will be TRAU-UL frames flowing in both directions of the
+trans-oceanic G.711 toll connection, one set coming almost unchanged from
+Alice's BTS CCU and the other coming almost unchanged from Bob's BTS CCU.  Of
+course each party's GSM call DL will require TRAU-DL frames to be fed to it,
+not TRAU-UL, but the necessary UL-to-DL conversion is the responsibility of the
+TFO receiver on each end.
+
+The general rules for turning a TRAU-UL frame into one for TRAU-DL are specified
+in TS 28.062 section C.3.2.1.1; it should be noted that this section spells out
+the requirements of what the UL-to-DL converter must do, but does not specify
+exactly how to do it algorithmically - the wording it uses is "subject to
+manufacturer dependent future improvements and is not part of this
+recommendation."  Implementing all of these section C.3.2.1.1 rules (hereafter
+called C3211 rules for short) exactly to the letter is quite easy for the FR
+codec (Themyscira libgsmfrp does everything that is needed, and is a simple and
+lightweight FLOSS function library), but much harder for EFR.  At the present
+time it is unclear to the author of this document whether real historical T1/E1
+TRAU implementations for which GSM 08.62 TFO was originally specified really did
+implement C3211 rules to the letter, particularly for EFR, or if they cut some
+corners.
+
+Because the TRAUlike RTP transport format defined in this document is
+semantically equivalent to TRAU-UL, any entity that receives such RTP packets
+but internally needs to generate either TRAU-DL or some private functional
+equivalent thereof will need to perform the same UL-to-DL conversion as called
+for in TFO.  The lack of a readily available function library that implements
+the onerous rules of C3211 for EFR is certainly an obstacle, but it is also
+possible to "cut corners" by doing the following:
+
+1) Ignore Table C.3.2.1-1 case 1 and treat it like case 2, at least for EFR:
+   whenever SID frames are received on the incoming TRAU-UL or TRAUlike RTP
+   interface, forward them to call leg B even when that destination call leg
+   has no DTXd.  Given that DTX and SID support has been an integral part of
+   the EFR codec from the beginning, as opposed to an after-addition in the
+   case of FR, every GSM MS that supports EFR can be expected to understand
+   SID frames on the downlink.
+
+2) During speech pauses following transmission of a SID frame on call leg B DL,
+   if real DTXd (turning off Tx) is not allowed, do "fake DTXd" by transmitting
+   dummy FACCH with an L2 fill frame in the same 20 ms traffic frame windows in
+   which real DTXd would have been exercised if it were allowed.
+
+3) Whenever a BFI condition is encountered in the incoming TRAU-UL or TRAUlike
+   RTP frame stream outside of SID, i.e., the case described in the first
+   paragraph of section C.3.2.1.1, induce an intentional BFI condition in the
+   receiving GSM MS by transmitting a dummy FACCH frame as above, instead of
+   trying to devise a parameter-level ECU for EFR.
+
+It should be noted that the just-outlined "cut corners" method is exactly what
+OsmoBTS (and a "pure" Osmocom network in general) does currently, hence nothing
+is lost and no regression is introduced by continuing to do the same.
+
+Seen another way, by making our RTP transport semantically equivalent to
+TRAU-UL, we achieve harmonization between TFO and TrFO.  TrFO (Transcoder-Free
+Operation) is a scenario in which the RTP output from one IP BTS for call leg A
+goes directly to the RTP input of another IP BTS for call leg B, possibly
+passing through simple RTP forwarders like OsmoMGW, but never passing through
+any transcoder.  TrFO is what happens in a self-contained Osmocom network
+without any external MNCC connected to OsmoMSC.  The principal rules of what
+transformations are inherently necessary in order to produce a fully proper DL
+for call leg B from the UL of call leg A remain the same whether the transport
+in between is old-fashioned TFO or modern TrFO, hence the same conversions that
+are codified in TS 28.062 section C.3.2.1.1 are still needed - the only question
+is where in the network are they to be performed.  The original TDM-based GSM
+designers at ETSI gave us a superb architecture end to end; by employing an RTP
+transport that is semantically equivalent to TRAU-UL, we can preserve that whole
+architecture fully intact in an all-IP implementation.
+
+Specification for TRAUlike RTP payload format for FR and EFR
+============================================================
+
+The modified RTP payload format shall consist of a single octet called TRAUlike
+Extension Header (TEH), followed (most of the time) by the standard (same as in
+RFC 3551) 33 octets for FR or 31 octets for EFR.  The TEH octet has the
+following structure:
+
+         +----+----+----+----+----+----+----+----+
+Hex mask |       0xF0        |0x08|0x04|0x02|0x01|
+         +----+----+----+----+----+----+----+----+
+Meaning  |     signature     |DTXd|NDF |BFI |TAF |
+         +----+----+----+----+----+----+----+----+
+
+(Bit numbers are identified by hex masks in order to avoid getting into an
+ argument over which bit numbering convention should be used.)
+
+The following bit fields are defined within the TEH octet:
+
+signature: the upper nibble of the TEH octet shall be set to 0xE.  This
+signature allows RTP packet receivers to identify the payload format by the
+upper nibble of the first octet: if it equals 0xC, the format is EFR without
+TEH, if it equals 0xD, the format is FR without TEH, and if it equals 0xE, then
+the first octet is TEH.
+
+DTXd: this bit is strictly identical with TRAU-UL frame bit C17.
+
+No_Data flag (NDF): this bit shall be set to 1 if the TRAUlike payload consists
+solely of TEH, with the standard 33-octet FR frame or 31-octet EFR frame
+entirely omitted, and shall be 0 otherwise.
+
+BFI: this bit is strictly identical with TRAU-UL frame bit C12.
+
+TAF: this bit is strictly identical with TRAU-UL frame bit C15.
+
+There are two possibilities for full composition of a TRAUlike RTP payload:
+
+Possibility 1: TEH with NDF=0 is followed by a standard 33-octet FR frame or a
+standard 31-octet EFR frame.  The signature in the upper nibble of the octet
+immediately following TEH shall be correct: 0xD for FR or 0xC for EFR.
+
+Possibility 2: TEH with NDF=1 constitutes the entirety of the RTP payload for
+the 20 ms time window in question.
+
+If the No_Data flag is set, BFI must also be set: the combination of NDF=1 and
+BFI=0 is invalid.
+
+Per this specification, the sender of a BFI packet has the choice of sending it
+in one of two forms: with or without presumed-erroneous frame bits.  If the
+TRAUlike RTP packet is generated from bits received in an actual TRAU-UL frame
+(E1 Abis or TFO), erroneous frame bits shall be included, unchanged from the
+TRAU-UL source.  However, if the entity generating the TRAUlike RTP packet is
+the ultimate point of origin (e.g., a native IP BTS), then it shall choose one
+form or the other based on the situation at hand:
+
+a) if the sender does have an FR or EFR frame "on hand" but that frame is
+   considered to be erroneous (for example, the link quality check in
+   l1sap_tch_ind() in OsmoBTS), the long form of BFI shall be sent, with the
+   presumed-erroneous frame bits included.
+
+b) if the sender does not have any FR or EFR frame at all that could be sent
+   (for example, if the reason for the BFI condition is because FACCH was
+   successfully received and decoded instead of a traffic frame), then the
+   No_Data form of BFI shall be sent.
+
+The option of No_Data BFI is provided in this RTP transport format specification
+because if this option were disallowed, senders would be tasked with an
+additional burden of having to artificially generate dummy or "garbage" frame
+bits.  This task is slightly complicated, as explained in the following section,
+and the present design moves that task from all senders to only those receivers
+that need it.
+
+Lack of SID classification bits matching TRAU-UL C13 & C14
+----------------------------------------------------------
+
+TRAU-UL frame format includes two bits C13 & C14 that carry the ternany SID flag
+(0, 1 or 2) as defined in GSM 06.31 and 06.81 section 6.1.1 (same section in
+both specs).  No equivalent bits are included in the TRAUlike RTP transport
+format as defined by this specification - however, these bits are redundant.
+The rules of section 6.1.1 in GSM 06.31 and 06.81, hereafter called S611 rules,
+specify a strictly deterministic, unambiguous formula by which these C13 & C14
+bits derive their values from the bit content of the FR/EFR frame payload -
+thus if a TRAU-UL frame is received in which these C13 & C14 bits fail to match
+the S611 value derived from the contained payload, then that TRAU-UL frame is
+defective.  There is no need to include such redundant bits in our TRAUlike RTP
+format, only to create confusion for receivers as to which source of SID S611
+classification they should use.
+
+Feeding received TRAUlike BFI frames to an EFR decoder
+======================================================
+
+If an EFR decoder implementation is based on the reference C source from ETSI,
+this decoder requires that _some_ frame bits input be fed to it at all times,
+even when BFI=1.  But what if the BFI packet came in as No_Data?  In that case
+the receiver must synthesize its own fake "bad data" bits to feed to the
+standard decoder.  When synthesizing "bad data" bits in this manner, the
+following rules should be observed:
+
+* The 140 bits corresponding to "fixed codebook excitation pulses" (35 bits in
+  each of the 4 subframes) shall be filled using a PRNG.  These bits are the
+  ones used by the standard decoder when its internal state, based on previous
+  good frames, puts it in GSM 06.61 substitution/muting mode as opposed to
+  GSM 06.62 comfort noise generation mode.
+
+* The remaining 104 bits of the EFR frame shall be set to 0.  These bits are
+  never used by the standard decoder under the condition of BFI=1, and setting
+  them to 0 prevents the possibility of S611 rules classifying the frame as SID
+  even if the PRNG output in the other 140 bits happens to be all 1s in those
+  SID codeword bit positions (70 out of 140) that fall within the "fixed
+  codebook excitation pulses" portion.
+
+Converting from TRAU-UL to TRAUlike RTP
+=======================================
+
+There will be a need to convert from standard TS 48.060 TRAU-UL frames to our
+TRAUlike RTP format in the following two scenarios:
+
+1) When interfacing an E1 BTS to Osmocom RAN, when and if such support is to be
+   added to OsmoMGW;
+
+2) In the CN transcoder operating in TFO mode, when forwarding received TFO
+   frames to the local RAN.
+
+In both cases the conversion is straightforward:
+
+* Always generate full-length TRAUlike RTP payloads, never generate No_Data in
+  the case of a properly received TRAU-UL speech (not idle) frame.
+
+* Forward the payload bits directly from TRAU-UL to TRAUlike RTP, for both good
+  and bad frames.
+
+* Directly forward BFI, TAF and DTXd frame indicator bits from TRAU-UL C-bits
+  to TEH octet bits.
+
+* Ignore TRAU-UL C13 & C14 bits.
+
+Converting from TRAUlike RTP to TRAU-UL
+=======================================
+
+This direction of conversion will need to be performed in the CN transcoder when
+emitting TFO frames toward the outside world.  The following rules will need to
+be applied:
+
+* If the incoming TRAUlike RTP payload is full-length, as opposed to No_Data,
+  simply copy the payload bits into the constructed TRAU-UL frame, for both
+  good (BFI=0) and bad (BFI=1) frames.
+
+* If the incoming TRAUlike RTP payload is No_Data, put the following filler in
+  the data bits portion of the TRAU-UL frame:
+
+  - For FR codec, use the silence frame of 3GPP TS 46.011 Table 1 as the filler.
+
+  - For EFR codec, perform the same PRNG procedure as detailed earlier in this
+    document for the case of feeding a No_Data BFI packet to the standard ETSI
+    decoder for EFR.  Given that a TFO-frame-emitting transcoder still needs to
+    run its regular speech decoder in order to fill the upper 6 bits of each
+    outgoing G.711 sample octet, the same No_Data PRNG handler will typically
+    be run just once for both internal decoding and TFO frame output.
+
+* Algorithmically set C13 & C14 bits in the generated TRAU-UL frame per the
+  rules of S611.  This step can be done using osmo_{fr,efr}_sid_classify()
+  functions proposed in this Gerrit patch submission:
+
+  https://gerrit.osmocom.org/c/libosmocore/+/32183
+
+  or using equivalent functions in Themyscira libgsmefr and libgsmfrp.
+
+* Directly forward BFI, TAF and DTXd frame indicator bits from TEH octet bits
+  to TRAU-UL C12, C15 and C17, respectively.
+
+Mixing standard RFC 3551 and TRAUlike RTP payloads
+==================================================
+
+An RTP stream receiver for FR/EFR codecs that supports the present non-standard
+extension to the RTP payload format shall behave gracefully when it receives a
+mixture of standard RFC 3551 payloads and TRAUlike payloads in the same RTP
+stream.  A receiver that has no interest in the additional information carried
+in the TRAUlike Extension Header shall simply strip the TEH octet when one is
+received, reducing the received payload to standard RFC 3551; if a BFI or
+No_Data payload is received, treat it the same as if nothing at all was
+received.  A receiver that is interested in the TRAUlike Extension Header but
+receives an FR/EFR payload without one should behave as if it received a TEH
+with BFI=0, TAF=0, and a received zero-length RTP payload should be treated the
+same as receiving a No_Data TRAUlike payload with TAF=0.
+
+There may even be cases when an RTP sender may alternate between sending
+standard RFC 3551 payloads and TRAUlike payloads in the same session: for
+example, a TFO-supporting CN transcoder may emit "plain" RFC 3551 payloads when
+supplying the output of its free-running speech encoder, but switch to sending
+TRAUlike payloads when it switches to forwarding bits received in TFO frames
+from the far end.