FreeCalypso > hg > freecalypso-tools
comparison doc/User-phone-tools @ 805:a43c5dc251dc
doc/User-phone-tools: new sms-pdu-decode backslash escapes
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Thu, 25 Mar 2021 05:10:43 +0000 |
parents | b5235f8240b9 |
children | 8cf7d41f2821 |
comparison
equal
deleted
inserted
replaced
804:30fbaa652ea5 | 805:a43c5dc251dc |
---|---|
198 Character sets and encodings | 198 Character sets and encodings |
199 ---------------------------- | 199 ---------------------------- |
200 | 200 |
201 By default, sms-pdu-decode only emits 7-bit ASCII characters in its output; any | 201 By default, sms-pdu-decode only emits 7-bit ASCII characters in its output; any |
202 GSM7 or UCS-2 characters which fall outside of this plain ASCII repertoire are | 202 GSM7 or UCS-2 characters which fall outside of this plain ASCII repertoire are |
203 displayed as the '?' error character and the presence of such decoding errors | 203 converted into backslash escapes. This conservative default behaviour can be |
204 is indicated in the Length: header. This conservative default behaviour can be | |
205 modified as follows: | 204 modified as follows: |
206 | 205 |
207 -e option extends the potential output character repertoire from 7-bit ASCII to | 206 -e option extends the potential output character repertoire from 7-bit ASCII to |
208 8-bit ISO 8859-1. Any 8859-1 high characters are emitted as single bytes, | 207 8-bit ISO 8859-1. Any 8859-1 high characters are emitted as single bytes, |
209 i.e., are NOT encoded in UTF-8 - this option is intended for non-UTF-8 | 208 i.e., are NOT encoded in UTF-8 - this option is intended for non-UTF-8 |
210 environments. | 209 environments. |
211 | 210 |
212 -u option extends the potential output character repertoire to the entire Basic | 211 -u option extends the potential output character repertoire to all of Unicode, |
213 Multilingual Plane of Unicode, and changes the output encoding to UTF-8. | 212 and changes the output encoding to UTF-8. |
213 | |
214 Regardless of whether the source message character set is GSM7 or UCS-2 and | |
215 irrespective of -e or -u options, any backslash characters are always escaped | |
216 as \\, and any CR characters are represented as \r. Additional backslash | |
217 escape encodings depend on the source message character set: | |
218 | |
219 * If the source message character set is GSM7, the following additional | |
220 backslash escapes can be emitted: | |
221 | |
222 - In the absence of -u option, the Euro currency symbol is converted to \E; | |
223 | |
224 - Any GSM7 escape characters (0x1B) that aren't part of a valid escape | |
225 sequence for [\]^ or {|}~ or \E are represented as \e; | |
226 | |
227 - Any GSM7 characters that either can't be represented in the output character | |
228 set (ASCII or ISO 8859-1) or are outright invalid per GSM 03.38 are | |
229 represented as \xX, where xX is the original GSM7 code point in 2-digit | |
230 hexadecimal form between 00 and 7F; | |
231 | |
232 - Invalid GSM7 escape sequences are emitted as \e\xX. | |
233 | |
234 * If the source message character set is UCS-2, the following additional | |
235 backslash escapes can be emitted: | |
236 | |
237 - Invalid UCS-2 characters falling onto control character code points are | |
238 emitted as \u00XX; | |
239 | |
240 - UCS-2 characters that can't be represented in ASCII or ISO 8859-1 (when | |
241 running without -u option) are emitted as \uXXXX; | |
242 | |
243 - If UTF-16 surrogate pairs are detected in the input, the encoded high-plane | |
244 Unicode character is reconstructed and emitted as \UXXXXXX in the absence | |
245 of -u option, or as the appropriate UTF-8 byte sequence with -u. | |
214 | 246 |
215 -h option causes the user data portion of every message to be displayed as a | 247 -h option causes the user data portion of every message to be displayed as a |
216 raw hex dump; in the case of GSM7-encoded messages, this hex dump shows the | 248 raw hex dump; in the case of GSM7-encoded messages, this hex dump shows the |
217 unpacked septets. | 249 unpacked septets. |
218 | 250 |