comparison Voice-memo-feature @ 96:69061d044f05

Voice-memo-feature: new article
author Mychaela Falconia <falcon@freecalypso.org>
date Tue, 27 Dec 2022 20:23:55 +0000
parents
children 80f0996bfd16
comparison
equal deleted inserted replaced
95:8a45cd92e3c3 96:69061d044f05
1 The full Calypso hw+fw solution as delivered by TI (the relevant components here
2 are the DSP, the official L1 code and RiViera Audio Service) implements an
3 interesting feature called voice memos. It is actually two paired features:
4
5 * Voice memo recording: in almost all states of the MS (no GSM network at all,
6 or idle mode, or in an active call) it is possible to activate an extra
7 instance of GSM 06.10 encoder that takes input from the microphone (and also
8 from the active call downlink if invoked during a speech call) and writes its
9 output into an otherwise-unused DSP buffer. The combination of L1 and RiViera
10 Audio Service then writes this speech recording into a file in FFS.
11
12 * Voice memo playback: voice memo files recorded with the just-described VM
13 record feature can be played into the phone's speaker output. The operation
14 of playing a previously recorded voice memo is conceptually no different from
15 playing tones or melodies, and can likewise be done in any state: with no GSM
16 network at all, in idle mode, or in an active call.
17
18 VM recording and VM playback cannot be active at the same time: they use the
19 same DSP buffer, and likely other mutually exclusive DSP resources too.
20 Furthermore, the same DSP buffer that is used for these VM features is also
21 used for TCH UL substitution debug/test feature described in the TCH-tap-modes
22 article - therefore, all 3 features (VM record, VM play and TCH UL play) need
23 to be treated as mutually exclusive in time. However, aside from this mutual
24 exclusion, it is very remarkable that VM recording or VM playback can be invoked
25 during an active speech call (which can use any codec!), and the extra instance
26 of FR1 encoder or decoder (always FR1) invoked by VM features is essentially
27 independent from the main TCH encoder and the main TCH decoder, all of which
28 run simultaneously. It is worth noting that all newer GSM speech codecs (HR1,
29 EFR and AMR) are much more computationally intensive than FR1, thus given that
30 the DSP has the necessary horsepower to run any one of those "heavy" codecs, it
31 probably isn't too much extra work to also run a simultaneous instance of
32 unidirectional (encoder only or decoder only) FR1.
33
34 The entire voice memo facility was already fully implemented in the TCS211 code
35 delivery from TI, but prior to FreeCalypso there was no way to exercise it. In
36 order to exercise VM functionality in TCS211, one needs to invoke these RiViera
37 Audio Service API functions:
38
39 audio_vm_record_start()
40 audio_vm_record_stop()
41 audio_vm_play_start()
42 audio_vm_play_stop()
43
44 In FreeCalypso we've added some simple AT commands that call the just-listed API
45 functions, and the facility that has been there all along is now accessible to
46 play - it is the same situation as with Melody E1.
47
48 FreeCalypso AT commands for voice memo testing
49 ==============================================
50
51 AT@VMR="/pathname",dur,dtx
52
53 This command initiates VM recording. The FFS pathname into which the recording
54 should be written must be given as a quoted string (and as a reminder, all FFS
55 pathnames must be absolute - there are no current directories in the firmware
56 architecture), and there is a second required argument that sets the maximum
57 size of the recording. The duration argument is a decimal integer, and it is
58 reckoned in 1000-word units: if you specify duration as 1, the maximum recording
59 size is 1000 words (2000 bytes), if you specify duration as 2, the maximum
60 recording size is 2000 words (4000 bytes), and so forth. If you record with DTX
61 disabled, each block of 1000 words corresponds to 1 second in time (every 20 ms
62 frame turns into a block of 20 words), thus with DTX disabled the duration
63 argument becomes the actual duration in seconds. However, if you record with
64 DTX enabled, then periods of silence will be written in a compressed format
65 described later in this article, and the time duration of the recording will
66 depend on how much silence there is.
67
68 The dtx argument is 1 to enable DTX or 0 to disable it; the default is DTX
69 disabled. The employed FR1 DTX algorithm appears to be the same as would be
70 used for TCH/FS uplink, except that an "artificial" (there is no SACCH with
71 independent-of-GSM voice memos) TAF position is generated on every 16th audio
72 frame, i.e., every 320 ms. (Note the shortening of this SID interval compared
73 to official TCH, where it is 24 frames or 480 ms.)
74
75 AT@VMRS
76
77 This command stops any VM recording in progress, but it is rarely needed - the
78 recording will stop automatically when the size limit is reached.
79
80 AT@VMP="/pathname"
81
82 This command initiates playback of the VM recording contained in the named file
83 in FFS. The FFS pathname is the only argument.
84
85 AT@VMPS
86
87 This command stops any VM playback in progress, but it is rarely needed - the
88 playback will stop automatically when the end-marker is read from the file.
89
90 Voice memo file format
91 ======================
92
93 Using fc-fsio, you can read out voice memo files written by the VM record
94 facility, and you can likewise construct your own memo files externally, upload
95 them into FC device FFS and then play them via the VM play facility. The format
96 of these files is determined by TI's firmware stack (RV Audio Service on top of
97 L1 on top of the DSP), but is fundamentally based on a DSP buffer that is just
98 like those used for TCH. The companion TCH-tap-modes article describes the
99 format of the DSP buffer from which TCH DL bits can be read out; in the present
100 article we are going to cover the differences specific to the voice memo
101 facility.
102
103 When VM recording is done with DTX disabled, every 20 ms speech frame turns into
104 a block of 40 bytes in the memo file. This block of 40 bytes is produced from
105 20 16-bit words in the DSP buffer, each word turned into two bytes in LE order
106 by the ARM part of Calypso. The DSP buffer used for the VM facility has the
107 same overall format as the one used for TCH DL, described in the TCH-tap-modes
108 article - 3 status or header words followed by 17 words of payload, with the
109 latter words carrying a 260-bit FR1 codec frame in the bit order of GSM 05.03
110 interface 1. As explained in the TCH-tap-modes article, speech codec payload
111 words are filled in the msb-to-lsb direction by the DSP, thus the natural byte-
112 oriented representation would be big-endian - but because the little-endian ARM
113 core sits in between the DSP and the on-media file format, the byte order in
114 voice memo files comes out "wrong". Oh well - it is what it is.
115
116 Of the 3 header words that precede every 20 ms speech frame, words 1 and 2
117 appear to be dummies - they have meaning related to the channel decoder block
118 in the case of TCH DL, but in the case of isolated-from-GSM voice memos, there
119 does not seem to be any meaning. However, header or status word 0, consisting
120 of bit flags, is still important, but the bit flags for the VM facility are
121 different from those of TCH DL.
122
123 When VM recording is done with DTX disabled, status word 0 is observed to always
124 equal 0xC400 on every frame. However, when DTX is enabled, the following bits
125 are seen in status word 0:
126
127 * Bit 15 will be set if this frame needs to be saved in its entirety, or cleared
128 if it is to be skipped. When VM recording code in L1S sees that the DSP has
129 delivered a frame with this status bit cleared, it will save only this status
130 word 0, i.e., 2 bytes will be written into the memo file instead of 40 bytes
131 for this 20 ms frame. On VM playback, the code likewise checks this bit to
132 see how many words need to be read from the file, so synchronization is
133 maintained.
134
135 * Bit 14 appears to be the SP flag of GSM 06.31 section 5.1: set when a speech
136 frame has been generated, or cleared when a SID frame has been generated
137 instead.
138
139 * Bit 11 is a TAF-like flag: when DTX is enabled, this bit is set in every 16th
140 frame generated by the DSP in the VM recording session, otherwise it is
141 cleared.
142
143 * Bit 10 will always be set in every status word 0 that gets written to voice
144 memo files: this bit is set by the DSP when it has finished encoding a 20 ms
145 audio frame and is checked by L1S on every TDMA frame, serving as a
146 synchronization mechanism telling L1S when it needs to copy a speech frame
147 from the DSP to the memo file.
148
149 When VM recording is done with DTX enabled, the recorded memo file will consist
150 of speech frames (header word 0xC400 or 0xCC00), SID frames (header word 0x8400
151 or 0x8C00) and skipped frames consisting of only the header word 0x0400, with
152 the remaining words omitted. There will always be a present (not skipped) frame
153 in every 16th position (0xCC00 or 0x8C00), thus no 0x0C00 frames are ever seen.
154
155 Every voice memo binary file ends with a 0xFBFF end-marker word; this end-marker
156 is needed because TCS211 fw architecture exhibits a separation between the
157 actual data reading and writing processes in L1S and the FFS read/write
158 interface provided by RiViera Audio Service, and because of this separation the
159 operational code in L1S can't "see" an EOF condition at the file system level.
160
161 FreeCalypso tools for decoding voice memo files
162 ===============================================
163
164 If you have recorded a voice memo with AT@VMR and then read it out with fc-fsio,
165 you can use additional FC tools to analyze it. The following tools are
166 available, split between FC host tools and GSM codec libs & utilities packages:
167
168 * fc-vm2hex converts a binary VM recording into ASCII hex format, similar to
169 the old (2016) TCH DL recording format before it was extended in late 2022.
170 Every fully-written frame is emitted in the hex output as 3 space-separated
171 hex status words followed by a block of 66 hex digits giving the FR1 codec
172 frame in the unchanged bit order of TI's DSP, and every skipped frame (one
173 for which only status word 0 was written into the memo file) is emitted in
174 the hex output as just that one word.
175
176 * gsmfr-dlcap-parse utility, originally written for parsing TCH DL capture
177 files, accepts TCH DL recording files in both old and new formats, and it also
178 accepts the output from fc-vm2hex as its input. The combination of fc-vm2hex
179 and gsmfr-dlcap-parse allows a developer or tinkerer to do thorough human
180 analysis of TCS211 VM recordings in both DTX-disabled and DTX-enabled modes.
181
182 * There will soon be a new fc-vm2gsmx utility that will read binary VM recording
183 files (as you would read out with fc-fsio) and convert them into extended-
184 libgsm (gsmx) format defined in our GSM codec libraries & utilities package.
185 This gsmx format is an extension of the classic libgsm (GSM 06.10) format,
186 adding the possibility of SID frames and BFI markers (frame gaps) in addition
187 to regular speech frames, thus it can represent the content of a voice memo
188 recording made in DTX mode. These gsmx files can then be decoded into
189 playable WAV with our gsmfr-decode utility.
190
191 FreeCalypso tools for external generation of voice memo files
192 =============================================================
193
194 Using FreeCalypso tools, you can produce an external speech recording in GSM
195 06.10 FR1 codec format, convert it into TCS211 VM format, upload it into FC
196 device FFS with fc-fsio, and then play these externally-produced voice memos
197 with AT@VMP. The steps are as follows:
198
199 1) You can use gsmfr-encode to FR1-encode a speech sample from WAV into classic
200 .gsm format, or gsmfr-encode-r if the source is raw BE instead of WAV.
201 Alternatively, you can use any other off-the-shelf software that can encode
202 FR1 and write libgsm format; SoX shipped with Slackware includes the
203 necessary support.
204
205 2) fc-gsm2vm converts a .gsm recording into non-DTX TCS211 VM format.
206
207 At the present time we don't have any tools for producing external DTX-enabled
208 VM recordings: the main limitation is that at least to this Mother's knowledge,
209 the published source software community does not currently possess a GSM 06.10
210 encoding library that has been extended with VAD and DTX functions. There is
211 classic libgsm from 1990s, used by everyone in the FOSS community who needs a
212 GSM 06.10 encoder or decoder, but it doesn't do DTX; we (FreeCalypso and
213 Themyscira Wireless) have produced our own libgsmfrp front-end that implements
214 Rx DTX handler functions (that's how we can properly decode FR1 streams that
215 contain SIDs and/or missing frames), but it doesn't help with DTX encoding.
216 Therefore, our ability to produce TCS211-compatible VM recordings externally is
217 currently limited to non-DTX mode.