comparison SIM-data-formats @ 38:ec184dad4877

SIM-data-formats article written
author Mychaela Falconia <falcon@freecalypso.org>
date Fri, 12 Feb 2021 08:42:11 +0000
parents
children ce044aa49baf
comparison
equal deleted inserted replaced
37:ac33ec9a07d9 38:ec184dad4877
1 FreeCalypso is developing a family of several different tools that operate on
2 SIM cards and user data (primarily phonebooks) stored in them, accessing the
3 same underlying data through various mechanisms:
4
5 * Our current fc-simtool utility operates on SIM cards inserted into a smart
6 card "reader" device, without going through any kind of phone or other GSM
7 device - most direct manipulation of SIM user data content.
8
9 * We have plans to develop a companion utility (tentatively named fc-simint)
10 that will operate on SIM cards inserted into Calypso phones or FC modem
11 boards, working on the same principle as fc-loadtool (suspending and bypassing
12 the Calypso device's regular operational firmware), but operating on the
13 device's SIM interface rather than its flash. This companion utility is
14 planned to replicate the end-user-oriented functionality of fc-simtool.
15
16 * We have a FreeCalypso User Phone Tools suite that communicates with FC modem
17 boards and the future FC phone handset via AT commands. We have plans to add
18 phonebook manipulation commands to this suite (based on AT+CPBR and AT+CPBW),
19 reading and writing phonebook data files in the same format as fc-simtool.
20
21 Because we have several different tools (some already written, others only
22 planned) that will need to read and write exactly the same data formats, and
23 because these tools will have to live in different source repositories (totally
24 different underlying hardware and system library requirements), the data format
25 specification needs to be global and independent of particular hw tools - it is
26 the present document.
27
28 GSM 03.38 / 23.038 string representation
29 ========================================
30
31 The world of GSM does not use ASCII - in all places where ASCII strings would
32 appear in the world of ordinary computing, GSM uses its own different 7-bit
33 character set instead, defined in GSM TS 03.38 or 3GPP TS 23.038. Many SIM card
34 data files (including phonebooks) contain so-called alpha fields in which GSM
35 03.38 (not ASCII!) characters are packed into 8-bit bytes, with the high bit
36 zeroed. (These alpha fields also allow alternative UCS-2 encodings,
37 distinguished by the high bit being set - but we handle this case separately.)
38 Some other SIM card data files (EF_PNN for example) contain GSM 03.38 7-bit text
39 strings packed into bytes like in SMS.
40
41 However, when we store text strings (such as phonebook contact names) that have
42 been read out of a SIM (or are intended to be written to a SIM) in UNIX text
43 files, or pass them around in command line arguments, we need an ASCII-based
44 representation of these text strings that are encoded in GSM7 in the actual
45 GSM/SIM world. Furthermore, our ASCII representation needs to be 100% lossless
46 and well-defined.
47
48 Our function for lossless conversion of GSM 03.38 strings to ASCII operates as
49 follows:
50
51 * The output is always enclosed in double-quote characters, as in "text string".
52
53 * All GSM7 code points that map to characters that are also present in ASCII
54 translate to these ASCII characters: for example, GSM7 code 0x00 becomes '@',
55 and GSM7 code 0x02 becomes '$'.
56
57 * Any double-quote characters in the data are escaped with a backslash,
58 becoming \"
59
60 * GSM7 escape sequences for ASCII characters [\]^ and {|}~ are recognized and
61 converted to these ASCII characters; \ is then escaped in the output as \\
62
63 * GSM7 code points corresponding to CR and LF are represented as \r and \n
64
65 * GSM7 escape characters that are not part of a valid sequence for [\]^ or {|}~
66 are represented as \e
67
68 * All other GSM7 characters that cannot be represented in ASCII in any other
69 way are represented as \xX escapes, where xX is a two-digit hexadecimal number
70 in the range between 00 and 7F, inclusive.
71
72 The result of these rules is as follows:
73
74 * If the text item consists entirely of characters that exist in ASCII (the most
75 common use case), it will appear naturally in ASCII, even if it contains
76 characters like '@' and '$' that have different code points in GSM7, or
77 characters in the [\]^ and {|}~ sets that require escaping in GSM7.
78
79 * Any text item containing weird characters will still be converted losslessly,
80 so it can be written back into the SIM or decoded manually by a GSM7-knowing
81 user, and the representation in data files and command output is always
82 printable ASCII, nothing else.
83
84 * In cases where an occasional weird character appears in an otherwise ASCII-
85 dominated string, it is easy to both mentally decode and manually enter such
86 characters when necessary. For example, if one of your SIM contacts is a lady
87 named Michele who spells her name in the French way, with an accent grave on
88 the first 'e' (non-ASCII character U+00E8), her name shall be entered as
89 "Mich\04le", nicely preserving the needed non-ASCII character whose GSM 03.38
90 code point is 0x04.
91
92 When a string argument that is destined for conversion to GSM7 is parsed, our
93 input parser always interprets any backslash (\) characters as escapes; it
94 understands all of the same escapes sequences which we emit in output:
95
96 \" literal "
97 \\ literal \ (encoded in GSM 03.38 as another form of escape)
98 \e GSM 03.38 escape character 0x1B
99 \n GSM 03.38 LF character 0x0A
100 \r GSM 03.38 CR character 0x0D
101 \xX GSM 03.38 code point xX, passed through literally
102
103 If the input contains ASCII characters which do not exist in GSM7 (` and all
104 control characters except \n and \r), it is an error.
105
106 If our ASCII-to-GSM7 conversion functions are given 8-bit input, such input is
107 interpreted as ISO 8859-1: any 8859-1 high characters that have GSM7
108 counterparts will be translated accordingly. (Non-GSM7-mappable high characters
109 are an error just like non-GSM7-mappable ASCII chars.) However, our output is
110 always 7-bit ASCII only, using \xX escapes for GSM 03.38 characters that fall
111 outside of ASCII.
112
113 Phonebook file format
114 =====================
115
116 fc-simtool pb-dump command displays SIM phonebook content on the terminal or
117 saves it in a file in the format defined here, and other tools such as
118 fc-simtool pb-update command need to be able to read back the same format
119 losslessly. The phonebook file format is hereby shown by way of example:
120
121 #1: #646#,0x81 "Check Minutes"
122 #2: #674#,0x81 "Check Text Usage"
123 #3: #225#,0x81 "Check Balance"
124 #4: 8675309,0x81 "Jenny"
125 #5: 88211016401,0x91 "sysmoUSIM-SJS1 MSISDN"
126 #6: 44444,0x81 HEX 810B0893BEC03ABEBC209A9FA1A1
127 #7: *123#,0x81 ""
128 #8: 5551234,0x81 "HEX magic spells by Mich\04le"
129
130 The rules are as follows:
131
132 * Each line in the file format represents one phonebook record.
133
134 * The decimal number between the initial '#' and the following ':' is the
135 record number in the phonebook, between 1 and 255 as in the SIM protocol
136 READ RECORD and UPDATE RECORD commands.
137
138 * The phone number is always given without quotes, and consists only of digits
139 and '*' and '#' characters - no '+' international symbol is allowed in this
140 file format.
141
142 * The TON/NPI byte is required, is always given in hex as 0xXX (no other form
143 allowed in this file format), and is separated from the phone number digit
144 string by a comma. Note how this byte usually equals 0x91 for international
145 numbers (those entered with a '+' in typical UIs) or 0x81 otherwise.
146
147 * Either a quoted-string or a hex-string is always present at the end of each
148 record, giving the alpha tag for the phonebook entry. This field is
149 mandatory in the file format; if there is no alpha tag (really meaning empty
150 alpha tag), the line ends with empty quoted-string "".
151
152 * Quoted-strings for the alpha tag are used for either empty/null or
153 GSM7-encoded alpha tags; hex-strings are used for UCS2-encoded alpha tags.
154
155 * The format of hex-string alpha tags is as shown in entry #6 in the example
156 above - this example gives a contact name in Russian. (Full decoding of this
157 contact name is left as an exercise for adventurous readers - see
158 ETSI TS 102 221 Annex A and the Cyrillic block of Unicode.)
159
160 * Hex-strings can be used for any arbitrary bytes in the alpha tag, but are only
161 needed for UCS-2 encodings. Every possible GSM7 string can be represented in
162 our quoted-string notation.
163
164 * The quoted-string (GSM 03.38) form of the alpha tag must always be quoted,
165 even if quotes seem optional like in the "Jenny" example above (record #4).
166 The absence of quotes is what allows the HEX keyword to be distinguished:
167 compare and contrast records #6 and #8 in the example.
168
169 The above format applies when the almost-never-used CCP and EXT bytes in the
170 phonebook record both equal 0xFF, meaning not used. In the unlikely case when
171 these fields are used, the following extra fields are added to the line-based
172 representation:
173
174 * If CCP != 0xFF, a "CCP=%u " field is inserted between the phone number and
175 the alpha tag.
176
177 * If EXT != 0xFF, a "EXT=%u " field is inserted between the phone number and
178 the alpha tag.
179
180 * If both CCP and EXT are present, the CCP= field appears before the EXT= field,
181 same order as in the SIM binary record.