changeset 549:d9f6b3125259

document TW-TS-005 utilities
author Mychaela Falconia <falcon@freecalypso.org>
date Sat, 05 Oct 2024 00:58:01 +0000
parents 583dc4cbee95
children de333989a12b
files doc/TW-TS-005 doc/Utils-overview
diffstat 2 files changed, 86 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/doc/TW-TS-005	Sat Oct 05 00:58:01 2024 +0000
@@ -0,0 +1,81 @@
+The original set of Themyscira Wireless utilities for FR and EFR codecs uses an
+ad hoc binary file format to represent streams of FR or EFR codec frames - see
+Binary-file-format article.  However, a newer hexadecimal format has now been
+standardized as Themyscira Wireless Technical Specification TW-TS-005:
+
+https://www.freecalypso.org/specs/tw-ts-005-v010003.txt
+
+The standard has two annexes intended for practical use:
+
+* TW-TS-005 Annex A defines a representation format for FR and EFR codecs;
+* TW-TS-005 Annex B defines a representation format for HR codec.
+
+The present version of ThemWi GSM codec libraries & utilities suite includes
+some utilities that operate on TW-TS-005 Annex A hex files; support for Annex B
+will appear in a future version when our work on GSM-HR codec integration
+progresses further.
+
+TW-TS-005 Annex A vs gsmx binary format
+=======================================
+
+For working with FR and EFR codecs, our original binary file format has one
+major defect: it cannot represent bad traffic frames (in GSM 06.31 & 06.81
+definition, i.e., BFI=1) that have payload data bits included, as happens in
+well-designed GSM networks that use GSM 08.60 TRAU-UL frames or TW-TS-001
+enhanced RTP transport.  This file format deficiency leads to the following
+downstream defects:
+
+* The combination of "bad traffic frame" and "accepted SID frame" (again,
+  GSM 06.31 & 06.81 terminology) gets incorrectly treated as "unusable frame"
+  rather than "invalid SID frame" as the specs decree.
+
+* In the case of EFR, the reference decoder C code that forms the basis for
+  Themyscira libgsmefr makes use of "fixed codebook excitation pulses" portion
+  of bad frames during speech (as opposed to comfort noise) state - but these
+  bits were lost to file format shortcoming.
+
+The new hexadecimal format of TW-TS-005 Annex A solves this shortcoming: each
+frame is stored as a hex line that directly corresponds to a single RTP payload,
+hence the full capabilities of TW-TS-001 extended RTP format are made available
+in a file at rest.
+
+Because we have so many existing utilities that read and write gsmx binary
+files, and this binary format is so entrenched in Themyscira development
+environment, we are not doing a "forklift" migration of all of our tools to the
+new format.  Instead we are taking a more tempered approach:
+
+* For the decoding operation (taking a frame stream from an Rx Radio Subsystem
+  and producing linear PCM output) that is most affected by the shortcomings of
+  gsmx format, we have new utilities that read TW-TS-005 Annex A input, while
+  the old gsmx-reading utilities are still preserved and maintained;
+
+* For most other workflows (for example, encoding of new speech) conversion
+  utilities between the two formats (described below) are deemed sufficient;
+
+* New developments such as TFO transform use TW-TS-005 Annex A format natively.
+
+Human-readable dump decoding of TW-TS-005 hex files
+===================================================
+
+A line-based hexadecimal file format with one line per stored codec frame is
+inherently more human-readable than a binary file, but we also desire a more
+complete decoding such as that produced by gsmrec-dump, showing all codec
+parameters and frame metadata flags.  tw5a-dump produces such decoding for
+TW-TS-005 Annex A hex files; there will also be a corresponding tw5b-dump
+utility for TW-TS-005 Annex B when we finish integrating GSM-HR codec support.
+
+Conversion utilities (FR and EFR codecs)
+========================================
+
+gsmx-to-tw5a and tw5a-to-gsmx utilities do what their names suggest: convert
+FR/EFR speech recordings or test sequences between gsmx (binary) and TW-TS-005
+Annex A (hex) formats.  Important semantic notes:
+
+* gsmx-to-tw5a emits basic RTP format (no TEH) for all good frames, while each
+  BFI marker record is converted to a TEH-only No_Data frame.
+
+* tw5a-to-gsmx is the lossy conversion: distinction between basic and extended
+  RTP formats is lost, ditto for TAF without BFI, all BFIs become BFI-no-data.
+
+A conversion from gsmx to tw5a back to gsmx is lossless, but not the other way
+around.
--- a/doc/Utils-overview	Fri Oct 04 20:40:42 2024 +0000
+++ b/doc/Utils-overview	Sat Oct 05 00:58:01 2024 +0000
@@ -68,6 +68,8 @@
 
 gsmrec-dump		See Binary-file-format article.
 
+gsmx-to-tw5a		See TW-TS-005 article.
+
 pcm16-check13		This program reads a 16-bit linear PCM recording file
 			(raw BE by default, or raw LE with -l option) and checks
 			if the 3 least significant bits of every sample are all
@@ -84,6 +86,9 @@
 pcm16-to-ulaw
 pcm8-to-pcm16
 
+tw5a-dump		See TW-TS-005 article.
+tw5a-to-gsmx
+
 twamr-decode		See Codec-utils article.
 twamr-decode-r
 twamr-encode