comparison doc/User-phone-tools @ 805:a43c5dc251dc

doc/User-phone-tools: new sms-pdu-decode backslash escapes
author Mychaela Falconia <falcon@freecalypso.org>
date Thu, 25 Mar 2021 05:10:43 +0000
parents b5235f8240b9
children 8cf7d41f2821
comparison
equal deleted inserted replaced
804:30fbaa652ea5 805:a43c5dc251dc
198 Character sets and encodings 198 Character sets and encodings
199 ---------------------------- 199 ----------------------------
200 200
201 By default, sms-pdu-decode only emits 7-bit ASCII characters in its output; any 201 By default, sms-pdu-decode only emits 7-bit ASCII characters in its output; any
202 GSM7 or UCS-2 characters which fall outside of this plain ASCII repertoire are 202 GSM7 or UCS-2 characters which fall outside of this plain ASCII repertoire are
203 displayed as the '?' error character and the presence of such decoding errors 203 converted into backslash escapes. This conservative default behaviour can be
204 is indicated in the Length: header. This conservative default behaviour can be
205 modified as follows: 204 modified as follows:
206 205
207 -e option extends the potential output character repertoire from 7-bit ASCII to 206 -e option extends the potential output character repertoire from 7-bit ASCII to
208 8-bit ISO 8859-1. Any 8859-1 high characters are emitted as single bytes, 207 8-bit ISO 8859-1. Any 8859-1 high characters are emitted as single bytes,
209 i.e., are NOT encoded in UTF-8 - this option is intended for non-UTF-8 208 i.e., are NOT encoded in UTF-8 - this option is intended for non-UTF-8
210 environments. 209 environments.
211 210
212 -u option extends the potential output character repertoire to the entire Basic 211 -u option extends the potential output character repertoire to all of Unicode,
213 Multilingual Plane of Unicode, and changes the output encoding to UTF-8. 212 and changes the output encoding to UTF-8.
213
214 Regardless of whether the source message character set is GSM7 or UCS-2 and
215 irrespective of -e or -u options, any backslash characters are always escaped
216 as \\, and any CR characters are represented as \r. Additional backslash
217 escape encodings depend on the source message character set:
218
219 * If the source message character set is GSM7, the following additional
220 backslash escapes can be emitted:
221
222 - In the absence of -u option, the Euro currency symbol is converted to \E;
223
224 - Any GSM7 escape characters (0x1B) that aren't part of a valid escape
225 sequence for [\]^ or {|}~ or \E are represented as \e;
226
227 - Any GSM7 characters that either can't be represented in the output character
228 set (ASCII or ISO 8859-1) or are outright invalid per GSM 03.38 are
229 represented as \xX, where xX is the original GSM7 code point in 2-digit
230 hexadecimal form between 00 and 7F;
231
232 - Invalid GSM7 escape sequences are emitted as \e\xX.
233
234 * If the source message character set is UCS-2, the following additional
235 backslash escapes can be emitted:
236
237 - Invalid UCS-2 characters falling onto control character code points are
238 emitted as \u00XX;
239
240 - UCS-2 characters that can't be represented in ASCII or ISO 8859-1 (when
241 running without -u option) are emitted as \uXXXX;
242
243 - If UTF-16 surrogate pairs are detected in the input, the encoded high-plane
244 Unicode character is reconstructed and emitted as \UXXXXXX in the absence
245 of -u option, or as the appropriate UTF-8 byte sequence with -u.
214 246
215 -h option causes the user data portion of every message to be displayed as a 247 -h option causes the user data portion of every message to be displayed as a
216 raw hex dump; in the case of GSM7-encoded messages, this hex dump shows the 248 raw hex dump; in the case of GSM7-encoded messages, this hex dump shows the
217 unpacked septets. 249 unpacked septets.
218 250