comparison doc/AMR-library-API @ 476:c84bf526c7eb

beginning of libtwamr documentation
author Mychaela Falconia <falcon@freecalypso.org>
date Sat, 18 May 2024 21:22:07 +0000
parents
children 936a08cc73ce
comparison
equal deleted inserted replaced
475:e512f0d25409 476:c84bf526c7eb
1 Libtwamr general usage
2 ======================
3
4 The external public interface to Themyscira libtwamr consists of a single
5 header file <tw_amr.h>; it should be installed in some system include
6 directory.
7
8 The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function
9 prototypes), const qualifier is used where appropriate, and the interface is
10 defined in terms of <stdint.h> types; <tw_amr.h> includes <stdint.h>.
11
12 Public #define constant definitions
13 ===================================
14
15 Libtwamr public API header file <tw_amr.h> defines these constants:
16
17 #define AMR_MAX_PRM 57 /* max. num. of params */
18 #define AMR_IETF_MAX_PL 32 /* max bytes in RFC 4867 frame */
19 #define AMR_IETF_HDR_LEN 6 /* .amr file header bytes */
20 #define AMR_COD_WORDS 250 /* # of words in 3GPP test seq format */
21
22 Explanation:
23
24 * AMR_MAX_PRM is the maximum number of broken-down speech parameters in the
25 highest 12k2 mode of AMR; this definition is needed for struct amr_param_frame
26 covered later in this document.
27
28 * AMR_IETF_MAX_PL is the size of the output buffer that must be provided for
29 amr_frame_to_ietf(), and also most commonly the size of the staging buffer
30 which most applications will likely use for gathering the input to
31 amr_frame_from_ietf().
32
33 * AMR_IETF_HDR_LEN is the size of amr_file_header_magic[] public const datum
34 covered later in this document, and this constant will also be needed by any
35 application that needs to read or write the fixed header at the beginning of
36 .amr files.
37
38 * AMR_COD_WORDS is the number of 16-bit words in one encoded frame in 3GPP test
39 sequence format (.cod); the public definition is needed for sizing the arrays
40 used with amr_frame_to_tseq() and amr_frame_from_tseq() API functions.
41
42 Libtwamr enumerated types
43 =========================
44
45 Libtwamr public API header file <tw_amr.h> defines these 3 enums:
46
47 enum RXFrameType {
48 RX_SPEECH_GOOD = 0,
49 RX_SPEECH_DEGRADED,
50 RX_ONSET,
51 RX_SPEECH_BAD,
52 RX_SID_FIRST,
53 RX_SID_UPDATE,
54 RX_SID_BAD,
55 RX_NO_DATA,
56 RX_N_FRAMETYPES /* number of frame types */
57 };
58
59 enum TXFrameType {
60 TX_SPEECH_GOOD = 0,
61 TX_SID_FIRST,
62 TX_SID_UPDATE,
63 TX_NO_DATA,
64 TX_SPEECH_DEGRADED,
65 TX_SPEECH_BAD,
66 TX_SID_BAD,
67 TX_ONSET,
68 TX_N_FRAMETYPES /* number of frame types */
69 };
70
71 enum Mode {
72 MR475 = 0,
73 MR515,
74 MR59,
75 MR67,
76 MR74,
77 MR795,
78 MR102,
79 MR122,
80 MRDTX
81 };
82
83 Rx and Tx frame types are as defined by 3GPP, and the numeric values assigned
84 to each type are the same as those used by the official TS 26.073 encoder and
85 decoder programs. Note that Rx and Tx frame types are NOT equal!
86
87 enum Mode should be self-explanatory: it covers the 8 possible codec modes of
88 AMR, plus the pseudo-mode of MRDTX used for packing and format manipulation of
89 SID frames.
90
91 State allocation and freeing
92 ============================
93
94 In order to use the AMR encoder, you will need to allocate an encoder state
95 structure, and to use the AMR decoder, you will need to allocate a decoder state
96 structure. The necessary state allocation functions are:
97
98 struct amr_encoder_state *amr_encoder_create(int dtx, int use_vad2);
99 struct amr_decoder_state *amr_decoder_create(void);
100
101 struct amr_encoder_state and struct amr_decoder_state are opaque structures to
102 library users: you only get pointers which you remember and pass around, but
103 <tw_amr.h> does not give you full definitions of these structs. As a library
104 user, you don't even get to know the size of these structs, hence the necessary
105 malloc() operation happens inside amr_encoder_create() and amr_decoder_create().
106 However, each structure is malloc'ed as a single chunk, hence when you are done
107 with it, simply call free() to relinquish each encoder or decoder state
108 instance.
109
110 amr_encoder_create() and amr_decoder_create() functions can fail if the malloc()
111 call inside fails, in which case the two libtwamr functions in question return
112 NULL.
113
114 The dtx argument to amr_encoder_create() is a Boolean flag represented as an
115 int; it tells the AMR encoder whether it should operate with DTX enabled or
116 disabled. (Note that DTX is also called SCR for Source-Controlled Rate in some
117 AMR specs.) The use_vad2 argument is another Boolean flag, also represented as
118 an int; it tells the AMR encoder to use VAD2 algorithm instead of VAD1. It is
119 a novel feature of libtwamr in that both VAD versions are included and
120 selectable at run time; see AMR-library-desc article for the details.
121
122 State reset functions
123 ---------------------
124
125 The state of an already-allocated AMR encoder or AMR decoder can be reset at
126 any time with these functions:
127
128 void amr_encoder_reset(struct amr_encoder_state *st, int dtx, int use_vad2);
129 void amr_decoder_reset(struct amr_decoder_state *st);
130
131 Note that the two extra arguments to amr_encoder_reset() are the same as the
132 arguments to amr_encoder_create() - the reset operation is complete.
133 amr_encoder_create() is a wrapper around malloc() followed by
134 amr_encoder_reset(), and amr_decoder_create() is a wrapper around malloc()
135 followed by amr_decoder_reset().
136
137 Using the AMR encoder
138 =====================
139
140 To encode one 20 ms audio frame per AMR, call amr_encode_frame():
141
142 void amr_encode_frame(struct amr_encoder_state *st, enum Mode mode,
143 const int16_t *pcm, struct amr_param_frame *frame);
144
145 You need to provide an encoder state structure allocated earlier with
146 amr_encoder_create(), the selection of which codec mode to use, and a block of
147 160 linear PCM samples. Only modes MR475 through MR122 are valid for 'mode'
148 argument to amr_encode_frame(); MRDTX is not allowed in this context.
149
150 The output from amr_encode_frame() is written into this structure:
151
152 struct amr_param_frame {
153 uint8_t type;
154 uint8_t mode;
155 int16_t param[AMR_MAX_PRM];
156 };
157
158 This structure is public, but it is defined by libtwamr (not by any external
159 standard), and it is generally intended to be an intermediate stage before
160 output encoding. Library functions exist for generating 3 output formats: 3GPP
161 AMR test sequence format, IETF RFC 4867 format, and AMR-EFR hybrid.
162
163 Native encoder output
164 ---------------------
165
166 The output structure is filled as follows:
167
168 type: Set to one of TX_SPEECH_GOOD, TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA,
169 as defined by 3GPP. The last 3 are possible only when the encoder
170 operates with DTX enabled.
171
172 mode: One of MR475 through MR122, same as the 'mode' argument to
173 amr_encode_frame().
174
175 param: Array of codec parameters, from 17 to 57 of them for modes MR475 through
176 MR122 in the case of TX_SPEECH_GOOD output, or 5 parameters for MRDTX
177 in the case of TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA DTX output.
178
179 3GPP AMR test sequence output
180 -----------------------------
181
182 The following function exists to convert the above encoder output into the test
183 sequence format which 3GPP defined for AMR, the insanely inefficient one with
184 250 (AMR_COD_WORDS) 16-bit words per frame:
185
186 void amr_frame_to_tseq(const struct amr_param_frame *frame, uint16_t *cod);
187
188 This function allows libtwamr encoder to be tested for correctness against the
189 set of test sequences in 3GPP TS 26.074. The output is in the local machine's
190 native byte order.
191
192 RFC 4867 output
193 ---------------
194
195 To turn libtwamr encoder output into an octet-aligned RFC 4867 single-frame
196 payload or storage-format frame (ToC octet followed by speech or SID data, but
197 no CMR payload header), call this function:
198
199 unsigned amr_frame_to_ietf(const struct amr_param_frame *frame, uint8_t *bytes);
200
201 The output buffer must have room for up to 32 bytes (AMR_IETF_MAX_PL); the
202 return value is the actual number of bytes used. The shortest possible output
203 is 1 byte in the case of TX_NO_DATA; the longest possible output is 32 bytes in
204 the case of TX_SPEECH_GOOD, mode MR122.
205
206 Additional notes regarding output conversion functions
207 ------------------------------------------------------
208
209 The struct amr_param_frame that is input to amr_frame_to_ietf() or
210 amr_frame_to_tseq() is expected to be a valid output from amr_encode_frame().
211 These output conversion functions contain no guards against invalid input
212 (anything that cannot occur in the output from amr_encode_frame()), and are
213 thus allowed to segfault or corrupt memory etc if fed such invalid input.
214
215 This lack of guard is justified in the present instance because struct
216 amr_param_frame is not intended to ever function as an external interface to
217 untrusted entities, instead this struct is intended to be only an intermediate
218 staging buffer between the call to amr_encode_frame() and an immediately
219 following call to one of the provided output conversion functions.
220
221 AMR-EFR hybrid encoder
222 ======================
223
224 To use libtwamr as an AMR-EFR hybrid encoder, follow these constraints:
225
226 * 'dtx' argument must be 0 (no DTX) on the call to amr_encoder_create() or
227 amr_encoder_reset() that establishes the state for the encoder session.
228
229 * 'mode' argument to amr_encode_frame() must be MR122 on every frame.
230
231 After getting struct amr_param_frame out of amr_encode_frame(), call one of
232 these functions to generate the correct EFR DHF under the right conditions:
233
234 void amr_dhf_subst_efr(struct amr_param_frame *frame);
235 void amr_dhf_subst_efr2(struct amr_param_frame *frame, const int16_t *pcm);
236
237 Both functions check if the encoded frame is MR122 DHF (type equals
238 TX_SPEECH_GOOD, mode equals MR122, param array equals the fixed bit pattern of
239 MR122 DHF), and if so, overwrite param[] array in the structure with the
240 different bit pattern of EFR DHF. The difference between the two functions is
241 that amr_dhf_subst_efr() performs the just-described substitution
242 unconditionally, whereas amr_dhf_subst_efr2() applies this substitution only if
243 the PCM input is EHF. The latter function matches the observed behavior of
244 T-Mobile USA, but perhaps some others implemented the simpler logic equivalent
245 to our first function.
246
247 After this transformation, call EFR_params2frame() from libgsmefr (see
248 EFR-library-API) with param[] array in struct amr_param_frame as input.
249
250 Using the AMR decoder
251 =====================
252