FreeCalypso > hg > freecalypso-docs
comparison SIM-data-formats @ 38:ec184dad4877
SIM-data-formats article written
| author | Mychaela Falconia <falcon@freecalypso.org> |
|---|---|
| date | Fri, 12 Feb 2021 08:42:11 +0000 |
| parents | |
| children | ce044aa49baf |
comparison
equal
deleted
inserted
replaced
| 37:ac33ec9a07d9 | 38:ec184dad4877 |
|---|---|
| 1 FreeCalypso is developing a family of several different tools that operate on | |
| 2 SIM cards and user data (primarily phonebooks) stored in them, accessing the | |
| 3 same underlying data through various mechanisms: | |
| 4 | |
| 5 * Our current fc-simtool utility operates on SIM cards inserted into a smart | |
| 6 card "reader" device, without going through any kind of phone or other GSM | |
| 7 device - most direct manipulation of SIM user data content. | |
| 8 | |
| 9 * We have plans to develop a companion utility (tentatively named fc-simint) | |
| 10 that will operate on SIM cards inserted into Calypso phones or FC modem | |
| 11 boards, working on the same principle as fc-loadtool (suspending and bypassing | |
| 12 the Calypso device's regular operational firmware), but operating on the | |
| 13 device's SIM interface rather than its flash. This companion utility is | |
| 14 planned to replicate the end-user-oriented functionality of fc-simtool. | |
| 15 | |
| 16 * We have a FreeCalypso User Phone Tools suite that communicates with FC modem | |
| 17 boards and the future FC phone handset via AT commands. We have plans to add | |
| 18 phonebook manipulation commands to this suite (based on AT+CPBR and AT+CPBW), | |
| 19 reading and writing phonebook data files in the same format as fc-simtool. | |
| 20 | |
| 21 Because we have several different tools (some already written, others only | |
| 22 planned) that will need to read and write exactly the same data formats, and | |
| 23 because these tools will have to live in different source repositories (totally | |
| 24 different underlying hardware and system library requirements), the data format | |
| 25 specification needs to be global and independent of particular hw tools - it is | |
| 26 the present document. | |
| 27 | |
| 28 GSM 03.38 / 23.038 string representation | |
| 29 ======================================== | |
| 30 | |
| 31 The world of GSM does not use ASCII - in all places where ASCII strings would | |
| 32 appear in the world of ordinary computing, GSM uses its own different 7-bit | |
| 33 character set instead, defined in GSM TS 03.38 or 3GPP TS 23.038. Many SIM card | |
| 34 data files (including phonebooks) contain so-called alpha fields in which GSM | |
| 35 03.38 (not ASCII!) characters are packed into 8-bit bytes, with the high bit | |
| 36 zeroed. (These alpha fields also allow alternative UCS-2 encodings, | |
| 37 distinguished by the high bit being set - but we handle this case separately.) | |
| 38 Some other SIM card data files (EF_PNN for example) contain GSM 03.38 7-bit text | |
| 39 strings packed into bytes like in SMS. | |
| 40 | |
| 41 However, when we store text strings (such as phonebook contact names) that have | |
| 42 been read out of a SIM (or are intended to be written to a SIM) in UNIX text | |
| 43 files, or pass them around in command line arguments, we need an ASCII-based | |
| 44 representation of these text strings that are encoded in GSM7 in the actual | |
| 45 GSM/SIM world. Furthermore, our ASCII representation needs to be 100% lossless | |
| 46 and well-defined. | |
| 47 | |
| 48 Our function for lossless conversion of GSM 03.38 strings to ASCII operates as | |
| 49 follows: | |
| 50 | |
| 51 * The output is always enclosed in double-quote characters, as in "text string". | |
| 52 | |
| 53 * All GSM7 code points that map to characters that are also present in ASCII | |
| 54 translate to these ASCII characters: for example, GSM7 code 0x00 becomes '@', | |
| 55 and GSM7 code 0x02 becomes '$'. | |
| 56 | |
| 57 * Any double-quote characters in the data are escaped with a backslash, | |
| 58 becoming \" | |
| 59 | |
| 60 * GSM7 escape sequences for ASCII characters [\]^ and {|}~ are recognized and | |
| 61 converted to these ASCII characters; \ is then escaped in the output as \\ | |
| 62 | |
| 63 * GSM7 code points corresponding to CR and LF are represented as \r and \n | |
| 64 | |
| 65 * GSM7 escape characters that are not part of a valid sequence for [\]^ or {|}~ | |
| 66 are represented as \e | |
| 67 | |
| 68 * All other GSM7 characters that cannot be represented in ASCII in any other | |
| 69 way are represented as \xX escapes, where xX is a two-digit hexadecimal number | |
| 70 in the range between 00 and 7F, inclusive. | |
| 71 | |
| 72 The result of these rules is as follows: | |
| 73 | |
| 74 * If the text item consists entirely of characters that exist in ASCII (the most | |
| 75 common use case), it will appear naturally in ASCII, even if it contains | |
| 76 characters like '@' and '$' that have different code points in GSM7, or | |
| 77 characters in the [\]^ and {|}~ sets that require escaping in GSM7. | |
| 78 | |
| 79 * Any text item containing weird characters will still be converted losslessly, | |
| 80 so it can be written back into the SIM or decoded manually by a GSM7-knowing | |
| 81 user, and the representation in data files and command output is always | |
| 82 printable ASCII, nothing else. | |
| 83 | |
| 84 * In cases where an occasional weird character appears in an otherwise ASCII- | |
| 85 dominated string, it is easy to both mentally decode and manually enter such | |
| 86 characters when necessary. For example, if one of your SIM contacts is a lady | |
| 87 named Michele who spells her name in the French way, with an accent grave on | |
| 88 the first 'e' (non-ASCII character U+00E8), her name shall be entered as | |
| 89 "Mich\04le", nicely preserving the needed non-ASCII character whose GSM 03.38 | |
| 90 code point is 0x04. | |
| 91 | |
| 92 When a string argument that is destined for conversion to GSM7 is parsed, our | |
| 93 input parser always interprets any backslash (\) characters as escapes; it | |
| 94 understands all of the same escapes sequences which we emit in output: | |
| 95 | |
| 96 \" literal " | |
| 97 \\ literal \ (encoded in GSM 03.38 as another form of escape) | |
| 98 \e GSM 03.38 escape character 0x1B | |
| 99 \n GSM 03.38 LF character 0x0A | |
| 100 \r GSM 03.38 CR character 0x0D | |
| 101 \xX GSM 03.38 code point xX, passed through literally | |
| 102 | |
| 103 If the input contains ASCII characters which do not exist in GSM7 (` and all | |
| 104 control characters except \n and \r), it is an error. | |
| 105 | |
| 106 If our ASCII-to-GSM7 conversion functions are given 8-bit input, such input is | |
| 107 interpreted as ISO 8859-1: any 8859-1 high characters that have GSM7 | |
| 108 counterparts will be translated accordingly. (Non-GSM7-mappable high characters | |
| 109 are an error just like non-GSM7-mappable ASCII chars.) However, our output is | |
| 110 always 7-bit ASCII only, using \xX escapes for GSM 03.38 characters that fall | |
| 111 outside of ASCII. | |
| 112 | |
| 113 Phonebook file format | |
| 114 ===================== | |
| 115 | |
| 116 fc-simtool pb-dump command displays SIM phonebook content on the terminal or | |
| 117 saves it in a file in the format defined here, and other tools such as | |
| 118 fc-simtool pb-update command need to be able to read back the same format | |
| 119 losslessly. The phonebook file format is hereby shown by way of example: | |
| 120 | |
| 121 #1: #646#,0x81 "Check Minutes" | |
| 122 #2: #674#,0x81 "Check Text Usage" | |
| 123 #3: #225#,0x81 "Check Balance" | |
| 124 #4: 8675309,0x81 "Jenny" | |
| 125 #5: 88211016401,0x91 "sysmoUSIM-SJS1 MSISDN" | |
| 126 #6: 44444,0x81 HEX 810B0893BEC03ABEBC209A9FA1A1 | |
| 127 #7: *123#,0x81 "" | |
| 128 #8: 5551234,0x81 "HEX magic spells by Mich\04le" | |
| 129 | |
| 130 The rules are as follows: | |
| 131 | |
| 132 * Each line in the file format represents one phonebook record. | |
| 133 | |
| 134 * The decimal number between the initial '#' and the following ':' is the | |
| 135 record number in the phonebook, between 1 and 255 as in the SIM protocol | |
| 136 READ RECORD and UPDATE RECORD commands. | |
| 137 | |
| 138 * The phone number is always given without quotes, and consists only of digits | |
| 139 and '*' and '#' characters - no '+' international symbol is allowed in this | |
| 140 file format. | |
| 141 | |
| 142 * The TON/NPI byte is required, is always given in hex as 0xXX (no other form | |
| 143 allowed in this file format), and is separated from the phone number digit | |
| 144 string by a comma. Note how this byte usually equals 0x91 for international | |
| 145 numbers (those entered with a '+' in typical UIs) or 0x81 otherwise. | |
| 146 | |
| 147 * Either a quoted-string or a hex-string is always present at the end of each | |
| 148 record, giving the alpha tag for the phonebook entry. This field is | |
| 149 mandatory in the file format; if there is no alpha tag (really meaning empty | |
| 150 alpha tag), the line ends with empty quoted-string "". | |
| 151 | |
| 152 * Quoted-strings for the alpha tag are used for either empty/null or | |
| 153 GSM7-encoded alpha tags; hex-strings are used for UCS2-encoded alpha tags. | |
| 154 | |
| 155 * The format of hex-string alpha tags is as shown in entry #6 in the example | |
| 156 above - this example gives a contact name in Russian. (Full decoding of this | |
| 157 contact name is left as an exercise for adventurous readers - see | |
| 158 ETSI TS 102 221 Annex A and the Cyrillic block of Unicode.) | |
| 159 | |
| 160 * Hex-strings can be used for any arbitrary bytes in the alpha tag, but are only | |
| 161 needed for UCS-2 encodings. Every possible GSM7 string can be represented in | |
| 162 our quoted-string notation. | |
| 163 | |
| 164 * The quoted-string (GSM 03.38) form of the alpha tag must always be quoted, | |
| 165 even if quotes seem optional like in the "Jenny" example above (record #4). | |
| 166 The absence of quotes is what allows the HEX keyword to be distinguished: | |
| 167 compare and contrast records #6 and #8 in the example. | |
| 168 | |
| 169 The above format applies when the almost-never-used CCP and EXT bytes in the | |
| 170 phonebook record both equal 0xFF, meaning not used. In the unlikely case when | |
| 171 these fields are used, the following extra fields are added to the line-based | |
| 172 representation: | |
| 173 | |
| 174 * If CCP != 0xFF, a "CCP=%u " field is inserted between the phone number and | |
| 175 the alpha tag. | |
| 176 | |
| 177 * If EXT != 0xFF, a "EXT=%u " field is inserted between the phone number and | |
| 178 the alpha tag. | |
| 179 | |
| 180 * If both CCP and EXT are present, the CCP= field appears before the EXT= field, | |
| 181 same order as in the SIM binary record. |
