FreeCalypso > hg > gsm-codec-lib
comparison doc/AMR-library-API @ 476:c84bf526c7eb
beginning of libtwamr documentation
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Sat, 18 May 2024 21:22:07 +0000 |
parents | |
children | 936a08cc73ce |
comparison
equal
deleted
inserted
replaced
475:e512f0d25409 | 476:c84bf526c7eb |
---|---|
1 Libtwamr general usage | |
2 ====================== | |
3 | |
4 The external public interface to Themyscira libtwamr consists of a single | |
5 header file <tw_amr.h>; it should be installed in some system include | |
6 directory. | |
7 | |
8 The dialect of C used by all Themyscira GSM codec libraries is ANSI C (function | |
9 prototypes), const qualifier is used where appropriate, and the interface is | |
10 defined in terms of <stdint.h> types; <tw_amr.h> includes <stdint.h>. | |
11 | |
12 Public #define constant definitions | |
13 =================================== | |
14 | |
15 Libtwamr public API header file <tw_amr.h> defines these constants: | |
16 | |
17 #define AMR_MAX_PRM 57 /* max. num. of params */ | |
18 #define AMR_IETF_MAX_PL 32 /* max bytes in RFC 4867 frame */ | |
19 #define AMR_IETF_HDR_LEN 6 /* .amr file header bytes */ | |
20 #define AMR_COD_WORDS 250 /* # of words in 3GPP test seq format */ | |
21 | |
22 Explanation: | |
23 | |
24 * AMR_MAX_PRM is the maximum number of broken-down speech parameters in the | |
25 highest 12k2 mode of AMR; this definition is needed for struct amr_param_frame | |
26 covered later in this document. | |
27 | |
28 * AMR_IETF_MAX_PL is the size of the output buffer that must be provided for | |
29 amr_frame_to_ietf(), and also most commonly the size of the staging buffer | |
30 which most applications will likely use for gathering the input to | |
31 amr_frame_from_ietf(). | |
32 | |
33 * AMR_IETF_HDR_LEN is the size of amr_file_header_magic[] public const datum | |
34 covered later in this document, and this constant will also be needed by any | |
35 application that needs to read or write the fixed header at the beginning of | |
36 .amr files. | |
37 | |
38 * AMR_COD_WORDS is the number of 16-bit words in one encoded frame in 3GPP test | |
39 sequence format (.cod); the public definition is needed for sizing the arrays | |
40 used with amr_frame_to_tseq() and amr_frame_from_tseq() API functions. | |
41 | |
42 Libtwamr enumerated types | |
43 ========================= | |
44 | |
45 Libtwamr public API header file <tw_amr.h> defines these 3 enums: | |
46 | |
47 enum RXFrameType { | |
48 RX_SPEECH_GOOD = 0, | |
49 RX_SPEECH_DEGRADED, | |
50 RX_ONSET, | |
51 RX_SPEECH_BAD, | |
52 RX_SID_FIRST, | |
53 RX_SID_UPDATE, | |
54 RX_SID_BAD, | |
55 RX_NO_DATA, | |
56 RX_N_FRAMETYPES /* number of frame types */ | |
57 }; | |
58 | |
59 enum TXFrameType { | |
60 TX_SPEECH_GOOD = 0, | |
61 TX_SID_FIRST, | |
62 TX_SID_UPDATE, | |
63 TX_NO_DATA, | |
64 TX_SPEECH_DEGRADED, | |
65 TX_SPEECH_BAD, | |
66 TX_SID_BAD, | |
67 TX_ONSET, | |
68 TX_N_FRAMETYPES /* number of frame types */ | |
69 }; | |
70 | |
71 enum Mode { | |
72 MR475 = 0, | |
73 MR515, | |
74 MR59, | |
75 MR67, | |
76 MR74, | |
77 MR795, | |
78 MR102, | |
79 MR122, | |
80 MRDTX | |
81 }; | |
82 | |
83 Rx and Tx frame types are as defined by 3GPP, and the numeric values assigned | |
84 to each type are the same as those used by the official TS 26.073 encoder and | |
85 decoder programs. Note that Rx and Tx frame types are NOT equal! | |
86 | |
87 enum Mode should be self-explanatory: it covers the 8 possible codec modes of | |
88 AMR, plus the pseudo-mode of MRDTX used for packing and format manipulation of | |
89 SID frames. | |
90 | |
91 State allocation and freeing | |
92 ============================ | |
93 | |
94 In order to use the AMR encoder, you will need to allocate an encoder state | |
95 structure, and to use the AMR decoder, you will need to allocate a decoder state | |
96 structure. The necessary state allocation functions are: | |
97 | |
98 struct amr_encoder_state *amr_encoder_create(int dtx, int use_vad2); | |
99 struct amr_decoder_state *amr_decoder_create(void); | |
100 | |
101 struct amr_encoder_state and struct amr_decoder_state are opaque structures to | |
102 library users: you only get pointers which you remember and pass around, but | |
103 <tw_amr.h> does not give you full definitions of these structs. As a library | |
104 user, you don't even get to know the size of these structs, hence the necessary | |
105 malloc() operation happens inside amr_encoder_create() and amr_decoder_create(). | |
106 However, each structure is malloc'ed as a single chunk, hence when you are done | |
107 with it, simply call free() to relinquish each encoder or decoder state | |
108 instance. | |
109 | |
110 amr_encoder_create() and amr_decoder_create() functions can fail if the malloc() | |
111 call inside fails, in which case the two libtwamr functions in question return | |
112 NULL. | |
113 | |
114 The dtx argument to amr_encoder_create() is a Boolean flag represented as an | |
115 int; it tells the AMR encoder whether it should operate with DTX enabled or | |
116 disabled. (Note that DTX is also called SCR for Source-Controlled Rate in some | |
117 AMR specs.) The use_vad2 argument is another Boolean flag, also represented as | |
118 an int; it tells the AMR encoder to use VAD2 algorithm instead of VAD1. It is | |
119 a novel feature of libtwamr in that both VAD versions are included and | |
120 selectable at run time; see AMR-library-desc article for the details. | |
121 | |
122 State reset functions | |
123 --------------------- | |
124 | |
125 The state of an already-allocated AMR encoder or AMR decoder can be reset at | |
126 any time with these functions: | |
127 | |
128 void amr_encoder_reset(struct amr_encoder_state *st, int dtx, int use_vad2); | |
129 void amr_decoder_reset(struct amr_decoder_state *st); | |
130 | |
131 Note that the two extra arguments to amr_encoder_reset() are the same as the | |
132 arguments to amr_encoder_create() - the reset operation is complete. | |
133 amr_encoder_create() is a wrapper around malloc() followed by | |
134 amr_encoder_reset(), and amr_decoder_create() is a wrapper around malloc() | |
135 followed by amr_decoder_reset(). | |
136 | |
137 Using the AMR encoder | |
138 ===================== | |
139 | |
140 To encode one 20 ms audio frame per AMR, call amr_encode_frame(): | |
141 | |
142 void amr_encode_frame(struct amr_encoder_state *st, enum Mode mode, | |
143 const int16_t *pcm, struct amr_param_frame *frame); | |
144 | |
145 You need to provide an encoder state structure allocated earlier with | |
146 amr_encoder_create(), the selection of which codec mode to use, and a block of | |
147 160 linear PCM samples. Only modes MR475 through MR122 are valid for 'mode' | |
148 argument to amr_encode_frame(); MRDTX is not allowed in this context. | |
149 | |
150 The output from amr_encode_frame() is written into this structure: | |
151 | |
152 struct amr_param_frame { | |
153 uint8_t type; | |
154 uint8_t mode; | |
155 int16_t param[AMR_MAX_PRM]; | |
156 }; | |
157 | |
158 This structure is public, but it is defined by libtwamr (not by any external | |
159 standard), and it is generally intended to be an intermediate stage before | |
160 output encoding. Library functions exist for generating 3 output formats: 3GPP | |
161 AMR test sequence format, IETF RFC 4867 format, and AMR-EFR hybrid. | |
162 | |
163 Native encoder output | |
164 --------------------- | |
165 | |
166 The output structure is filled as follows: | |
167 | |
168 type: Set to one of TX_SPEECH_GOOD, TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA, | |
169 as defined by 3GPP. The last 3 are possible only when the encoder | |
170 operates with DTX enabled. | |
171 | |
172 mode: One of MR475 through MR122, same as the 'mode' argument to | |
173 amr_encode_frame(). | |
174 | |
175 param: Array of codec parameters, from 17 to 57 of them for modes MR475 through | |
176 MR122 in the case of TX_SPEECH_GOOD output, or 5 parameters for MRDTX | |
177 in the case of TX_SID_FIRST, TX_SID_UPDATE or TX_NO_DATA DTX output. | |
178 | |
179 3GPP AMR test sequence output | |
180 ----------------------------- | |
181 | |
182 The following function exists to convert the above encoder output into the test | |
183 sequence format which 3GPP defined for AMR, the insanely inefficient one with | |
184 250 (AMR_COD_WORDS) 16-bit words per frame: | |
185 | |
186 void amr_frame_to_tseq(const struct amr_param_frame *frame, uint16_t *cod); | |
187 | |
188 This function allows libtwamr encoder to be tested for correctness against the | |
189 set of test sequences in 3GPP TS 26.074. The output is in the local machine's | |
190 native byte order. | |
191 | |
192 RFC 4867 output | |
193 --------------- | |
194 | |
195 To turn libtwamr encoder output into an octet-aligned RFC 4867 single-frame | |
196 payload or storage-format frame (ToC octet followed by speech or SID data, but | |
197 no CMR payload header), call this function: | |
198 | |
199 unsigned amr_frame_to_ietf(const struct amr_param_frame *frame, uint8_t *bytes); | |
200 | |
201 The output buffer must have room for up to 32 bytes (AMR_IETF_MAX_PL); the | |
202 return value is the actual number of bytes used. The shortest possible output | |
203 is 1 byte in the case of TX_NO_DATA; the longest possible output is 32 bytes in | |
204 the case of TX_SPEECH_GOOD, mode MR122. | |
205 | |
206 Additional notes regarding output conversion functions | |
207 ------------------------------------------------------ | |
208 | |
209 The struct amr_param_frame that is input to amr_frame_to_ietf() or | |
210 amr_frame_to_tseq() is expected to be a valid output from amr_encode_frame(). | |
211 These output conversion functions contain no guards against invalid input | |
212 (anything that cannot occur in the output from amr_encode_frame()), and are | |
213 thus allowed to segfault or corrupt memory etc if fed such invalid input. | |
214 | |
215 This lack of guard is justified in the present instance because struct | |
216 amr_param_frame is not intended to ever function as an external interface to | |
217 untrusted entities, instead this struct is intended to be only an intermediate | |
218 staging buffer between the call to amr_encode_frame() and an immediately | |
219 following call to one of the provided output conversion functions. | |
220 | |
221 AMR-EFR hybrid encoder | |
222 ====================== | |
223 | |
224 To use libtwamr as an AMR-EFR hybrid encoder, follow these constraints: | |
225 | |
226 * 'dtx' argument must be 0 (no DTX) on the call to amr_encoder_create() or | |
227 amr_encoder_reset() that establishes the state for the encoder session. | |
228 | |
229 * 'mode' argument to amr_encode_frame() must be MR122 on every frame. | |
230 | |
231 After getting struct amr_param_frame out of amr_encode_frame(), call one of | |
232 these functions to generate the correct EFR DHF under the right conditions: | |
233 | |
234 void amr_dhf_subst_efr(struct amr_param_frame *frame); | |
235 void amr_dhf_subst_efr2(struct amr_param_frame *frame, const int16_t *pcm); | |
236 | |
237 Both functions check if the encoded frame is MR122 DHF (type equals | |
238 TX_SPEECH_GOOD, mode equals MR122, param array equals the fixed bit pattern of | |
239 MR122 DHF), and if so, overwrite param[] array in the structure with the | |
240 different bit pattern of EFR DHF. The difference between the two functions is | |
241 that amr_dhf_subst_efr() performs the just-described substitution | |
242 unconditionally, whereas amr_dhf_subst_efr2() applies this substitution only if | |
243 the PCM input is EHF. The latter function matches the observed behavior of | |
244 T-Mobile USA, but perhaps some others implemented the simpler logic equivalent | |
245 to our first function. | |
246 | |
247 After this transformation, call EFR_params2frame() from libgsmefr (see | |
248 EFR-library-API) with param[] array in struct amr_param_frame as input. | |
249 | |
250 Using the AMR decoder | |
251 ===================== | |
252 |