FreeCalypso > hg > freecalypso-docs
comparison SIM-data-formats @ 38:ec184dad4877
SIM-data-formats article written
author | Mychaela Falconia <falcon@freecalypso.org> |
---|---|
date | Fri, 12 Feb 2021 08:42:11 +0000 |
parents | |
children | ce044aa49baf |
comparison
equal
deleted
inserted
replaced
37:ac33ec9a07d9 | 38:ec184dad4877 |
---|---|
1 FreeCalypso is developing a family of several different tools that operate on | |
2 SIM cards and user data (primarily phonebooks) stored in them, accessing the | |
3 same underlying data through various mechanisms: | |
4 | |
5 * Our current fc-simtool utility operates on SIM cards inserted into a smart | |
6 card "reader" device, without going through any kind of phone or other GSM | |
7 device - most direct manipulation of SIM user data content. | |
8 | |
9 * We have plans to develop a companion utility (tentatively named fc-simint) | |
10 that will operate on SIM cards inserted into Calypso phones or FC modem | |
11 boards, working on the same principle as fc-loadtool (suspending and bypassing | |
12 the Calypso device's regular operational firmware), but operating on the | |
13 device's SIM interface rather than its flash. This companion utility is | |
14 planned to replicate the end-user-oriented functionality of fc-simtool. | |
15 | |
16 * We have a FreeCalypso User Phone Tools suite that communicates with FC modem | |
17 boards and the future FC phone handset via AT commands. We have plans to add | |
18 phonebook manipulation commands to this suite (based on AT+CPBR and AT+CPBW), | |
19 reading and writing phonebook data files in the same format as fc-simtool. | |
20 | |
21 Because we have several different tools (some already written, others only | |
22 planned) that will need to read and write exactly the same data formats, and | |
23 because these tools will have to live in different source repositories (totally | |
24 different underlying hardware and system library requirements), the data format | |
25 specification needs to be global and independent of particular hw tools - it is | |
26 the present document. | |
27 | |
28 GSM 03.38 / 23.038 string representation | |
29 ======================================== | |
30 | |
31 The world of GSM does not use ASCII - in all places where ASCII strings would | |
32 appear in the world of ordinary computing, GSM uses its own different 7-bit | |
33 character set instead, defined in GSM TS 03.38 or 3GPP TS 23.038. Many SIM card | |
34 data files (including phonebooks) contain so-called alpha fields in which GSM | |
35 03.38 (not ASCII!) characters are packed into 8-bit bytes, with the high bit | |
36 zeroed. (These alpha fields also allow alternative UCS-2 encodings, | |
37 distinguished by the high bit being set - but we handle this case separately.) | |
38 Some other SIM card data files (EF_PNN for example) contain GSM 03.38 7-bit text | |
39 strings packed into bytes like in SMS. | |
40 | |
41 However, when we store text strings (such as phonebook contact names) that have | |
42 been read out of a SIM (or are intended to be written to a SIM) in UNIX text | |
43 files, or pass them around in command line arguments, we need an ASCII-based | |
44 representation of these text strings that are encoded in GSM7 in the actual | |
45 GSM/SIM world. Furthermore, our ASCII representation needs to be 100% lossless | |
46 and well-defined. | |
47 | |
48 Our function for lossless conversion of GSM 03.38 strings to ASCII operates as | |
49 follows: | |
50 | |
51 * The output is always enclosed in double-quote characters, as in "text string". | |
52 | |
53 * All GSM7 code points that map to characters that are also present in ASCII | |
54 translate to these ASCII characters: for example, GSM7 code 0x00 becomes '@', | |
55 and GSM7 code 0x02 becomes '$'. | |
56 | |
57 * Any double-quote characters in the data are escaped with a backslash, | |
58 becoming \" | |
59 | |
60 * GSM7 escape sequences for ASCII characters [\]^ and {|}~ are recognized and | |
61 converted to these ASCII characters; \ is then escaped in the output as \\ | |
62 | |
63 * GSM7 code points corresponding to CR and LF are represented as \r and \n | |
64 | |
65 * GSM7 escape characters that are not part of a valid sequence for [\]^ or {|}~ | |
66 are represented as \e | |
67 | |
68 * All other GSM7 characters that cannot be represented in ASCII in any other | |
69 way are represented as \xX escapes, where xX is a two-digit hexadecimal number | |
70 in the range between 00 and 7F, inclusive. | |
71 | |
72 The result of these rules is as follows: | |
73 | |
74 * If the text item consists entirely of characters that exist in ASCII (the most | |
75 common use case), it will appear naturally in ASCII, even if it contains | |
76 characters like '@' and '$' that have different code points in GSM7, or | |
77 characters in the [\]^ and {|}~ sets that require escaping in GSM7. | |
78 | |
79 * Any text item containing weird characters will still be converted losslessly, | |
80 so it can be written back into the SIM or decoded manually by a GSM7-knowing | |
81 user, and the representation in data files and command output is always | |
82 printable ASCII, nothing else. | |
83 | |
84 * In cases where an occasional weird character appears in an otherwise ASCII- | |
85 dominated string, it is easy to both mentally decode and manually enter such | |
86 characters when necessary. For example, if one of your SIM contacts is a lady | |
87 named Michele who spells her name in the French way, with an accent grave on | |
88 the first 'e' (non-ASCII character U+00E8), her name shall be entered as | |
89 "Mich\04le", nicely preserving the needed non-ASCII character whose GSM 03.38 | |
90 code point is 0x04. | |
91 | |
92 When a string argument that is destined for conversion to GSM7 is parsed, our | |
93 input parser always interprets any backslash (\) characters as escapes; it | |
94 understands all of the same escapes sequences which we emit in output: | |
95 | |
96 \" literal " | |
97 \\ literal \ (encoded in GSM 03.38 as another form of escape) | |
98 \e GSM 03.38 escape character 0x1B | |
99 \n GSM 03.38 LF character 0x0A | |
100 \r GSM 03.38 CR character 0x0D | |
101 \xX GSM 03.38 code point xX, passed through literally | |
102 | |
103 If the input contains ASCII characters which do not exist in GSM7 (` and all | |
104 control characters except \n and \r), it is an error. | |
105 | |
106 If our ASCII-to-GSM7 conversion functions are given 8-bit input, such input is | |
107 interpreted as ISO 8859-1: any 8859-1 high characters that have GSM7 | |
108 counterparts will be translated accordingly. (Non-GSM7-mappable high characters | |
109 are an error just like non-GSM7-mappable ASCII chars.) However, our output is | |
110 always 7-bit ASCII only, using \xX escapes for GSM 03.38 characters that fall | |
111 outside of ASCII. | |
112 | |
113 Phonebook file format | |
114 ===================== | |
115 | |
116 fc-simtool pb-dump command displays SIM phonebook content on the terminal or | |
117 saves it in a file in the format defined here, and other tools such as | |
118 fc-simtool pb-update command need to be able to read back the same format | |
119 losslessly. The phonebook file format is hereby shown by way of example: | |
120 | |
121 #1: #646#,0x81 "Check Minutes" | |
122 #2: #674#,0x81 "Check Text Usage" | |
123 #3: #225#,0x81 "Check Balance" | |
124 #4: 8675309,0x81 "Jenny" | |
125 #5: 88211016401,0x91 "sysmoUSIM-SJS1 MSISDN" | |
126 #6: 44444,0x81 HEX 810B0893BEC03ABEBC209A9FA1A1 | |
127 #7: *123#,0x81 "" | |
128 #8: 5551234,0x81 "HEX magic spells by Mich\04le" | |
129 | |
130 The rules are as follows: | |
131 | |
132 * Each line in the file format represents one phonebook record. | |
133 | |
134 * The decimal number between the initial '#' and the following ':' is the | |
135 record number in the phonebook, between 1 and 255 as in the SIM protocol | |
136 READ RECORD and UPDATE RECORD commands. | |
137 | |
138 * The phone number is always given without quotes, and consists only of digits | |
139 and '*' and '#' characters - no '+' international symbol is allowed in this | |
140 file format. | |
141 | |
142 * The TON/NPI byte is required, is always given in hex as 0xXX (no other form | |
143 allowed in this file format), and is separated from the phone number digit | |
144 string by a comma. Note how this byte usually equals 0x91 for international | |
145 numbers (those entered with a '+' in typical UIs) or 0x81 otherwise. | |
146 | |
147 * Either a quoted-string or a hex-string is always present at the end of each | |
148 record, giving the alpha tag for the phonebook entry. This field is | |
149 mandatory in the file format; if there is no alpha tag (really meaning empty | |
150 alpha tag), the line ends with empty quoted-string "". | |
151 | |
152 * Quoted-strings for the alpha tag are used for either empty/null or | |
153 GSM7-encoded alpha tags; hex-strings are used for UCS2-encoded alpha tags. | |
154 | |
155 * The format of hex-string alpha tags is as shown in entry #6 in the example | |
156 above - this example gives a contact name in Russian. (Full decoding of this | |
157 contact name is left as an exercise for adventurous readers - see | |
158 ETSI TS 102 221 Annex A and the Cyrillic block of Unicode.) | |
159 | |
160 * Hex-strings can be used for any arbitrary bytes in the alpha tag, but are only | |
161 needed for UCS-2 encodings. Every possible GSM7 string can be represented in | |
162 our quoted-string notation. | |
163 | |
164 * The quoted-string (GSM 03.38) form of the alpha tag must always be quoted, | |
165 even if quotes seem optional like in the "Jenny" example above (record #4). | |
166 The absence of quotes is what allows the HEX keyword to be distinguished: | |
167 compare and contrast records #6 and #8 in the example. | |
168 | |
169 The above format applies when the almost-never-used CCP and EXT bytes in the | |
170 phonebook record both equal 0xFF, meaning not used. In the unlikely case when | |
171 these fields are used, the following extra fields are added to the line-based | |
172 representation: | |
173 | |
174 * If CCP != 0xFF, a "CCP=%u " field is inserted between the phone number and | |
175 the alpha tag. | |
176 | |
177 * If EXT != 0xFF, a "EXT=%u " field is inserted between the phone number and | |
178 the alpha tag. | |
179 | |
180 * If both CCP and EXT are present, the CCP= field appears before the EXT= field, | |
181 same order as in the SIM binary record. |