comparison mpffs/Description @ 27:343b6b2f178b

beginning of Mokopir-FFS verbal description
author Michael Spacefalcon <msokolov@ivan.Harhan.ORG>
date Sun, 30 Jun 2013 01:17:30 +0000
parents
children c9f7a4afccc9
comparison
equal deleted inserted replaced
26:d19b4e20ff9f 27:343b6b2f178b
1 This is a description, based on reverse engineering, of the flash file system
2 (FFS) implemented in Pirelli's original firmware for the DP-L10 GSM/WiFi dual
3 mode mobile phone, and in the Closedmoko GTA0x modem firmware. Not knowing the
4 "proper" name for this FFS, and needing _some_ identifier to refer to it, I
5 have named it Mokopir-FFS, from "Moko" and "Pirelli" - sometimes abbreviated
6 further to MPFFS.
7
8 (I have previously called the FFS in question MysteryFFS; but now that I've
9 successfully reverse-engineered it, it isn't such a mystery any more :-)
10
11 At a high functional level, Mokopir-FFS presents the following features:
12
13 * Has a directory tree structure like UNIX file systems;
14
15 * The file system API that must be implemented inside the proprietary firmware
16 appears to use UNIX-style pathnames; doing strings on firmware images reveals
17 pathname strings like these:
18
19 /var/dbg/dar
20 /gsm/l3/rr_white_list
21 /gsm/l3/rr_medium_rxlev_thr
22 /gsm/l3/rr_upper_rxlev_thr
23 /gsm/l3/shield
24
25 Parsing the corresponding FFS image with tools included in the present
26 package has confirmed that the directory structure implied by these pathnames
27 does indeed exist in the FFS.
28
29 * Absolutely no DOS-ish semantics seen anywhere: no 8.3 filenames and no
30 colon-separated device names (seen in the TSM30 file system source, for
31 example) are visible in the Closedmoko/Pirelli FFS.
32
33 * File contents are stored uncompressed, but not necessarily contiguous: one
34 could probably store a file in FFS which is bigger than the flash sector
35 size, it which case it can never be contiguous in a writable FFS (see below),
36 and the firmware implementation seems to limit chunk sizes to a fairly small
37 number: on the Pirelli phones all largish files are divided into chunks of
38 8 KiB each, and on my GTA02 the largest observed chunk size is only 2 KiB.
39
40 The smaller files, like the IMEI and the firmware ID strings in my GTA02 FFS,
41 are contiguous.
42
43 * The FFS structure is such that the length of "user" payload data stored in
44 each chunk (and consequently, in each file) can be known exactly in bytes,
45 with the files/chunks able to contain arbitrary binary data. (This property
46 may seem obvious or trivial, as all familiar UNIX and DOS file systems have
47 it, but contrast with RT-11 for example.)
48
49 * The flash file system is a writable one: the running firmware can create,
50 delete and overwrite files (and possibly directories too) in the live FFS;
51 thus the FFS design is such that allows these operations to be performed
52 within the physical constraints of NOR flash write operations.
53
54 I have reverse-engineered this Mokopir-FFS on a read-only level. What it means
55 is that I, or anyone else who can read this document and the accompanying
56 source for the listing/extraction utilities, can take a Mokopir-FFS image read
57 out of a device and see/extract its full content: the complete directory tree
58 and the exact binary byte content of all files contained therein.
59
60 However, the knowledge possessed by the present hacker (and conveyed in this
61 document and the accompanying source code) is NOT sufficient for constructing a
62 valid Mokopir-FFS image "in vitro" given a tree of directories and files, or
63 for making modifications to the file or directory content on an existing image
64 and producing a content-modified image that is also valid; valid as in suitable
65 for the original proprietary firmware to make its normal read and write
66 operations without noticing anything amiss.
67
68 Constructing "de novo" Mokopir-FFS images or modifying existing images in such
69 a way that they remain 100% valid for all read and write operations of the
70 original proprietary firmware would, at the very minimum, require an
71 understanding of the meaning of *all* fields on the on-media FFS format. Some
72 of these fields are still left as "non-understood" for now though: a read-only
73 implementation can get away with simply ignoring them, but a writer/generator
74 would have to put *something* in those fields.
75
76 As you read the "read-only" description of the Mokopir-FFS on-media format in
77 the remainder of this document, it should become fairly obvious which pieces
78 are missing before our understanding of this FFS can be elevated to a
79 "writable" level.
80
81 However, when it comes to writing new code to run on the two Calypso phones in
82 question (Closedmoko and Pirelli), it seems, at least to the present hacker,
83 that a read-only understanding of Mokopir-FFS should be sufficient:
84
85 * In the case of Closedmoko GTA0x modems, the FFS is seen to contain the IMEI
86 and the RF calibration data. The format of the former is obvious; the latter
87 not so much - but in any case, the information of interest is clearly of a
88 read-only nature. It's difficult to tell (or rather, I haven't bothered to
89 experiment enough) whether the Closedmoko firmware does any writes to FFS or
90 if the FFS is treated as read-only outside of the production line environment,
91 but in any case, it seems to me that for any 3rd party replacement firmware,
92 the best strategy would be to treat the FFS as a read-only source of IMEI and
93 RF calibration data, and nothing more.
94
95 * In the case of Pirelli phones, the FFS is used to store user data: sent and
96 received SMS (and MMS/email/whatever), call history, UI settings, pictures
97 taken with the camera, and whatever else. It also stores a ton of files
98 which I can only presume were meant to be immutable except at the time of
99 firmware updates: graphics for the UI, ringtones, i18n UI strings, and even
100 "helper" firmware images for the WiFi and VoIP processors. However, no IMEI
101 or RF calibration data are anywhere to be found in the FFS - instead this
102 information appears to be stored in the "factory block" at the end of the
103 flash (in its own sector) outside of the FFS.
104
105 Being able to parse FFS images extracted out of Pirelli phones "in vitro"
106 allows us to steal some of these helper files (UI artwork, ringtones,
107 WiFi/VoIP helpers), and some of these might even come useful to firmware
108 replacement projects, but it seems to me that a replacement firmware would
109 be better off using its own FFS design for storing user data, and as to
110 retrieving the original IMEI and RF calibration data, the original FFS isn't
111 of any use for that anyway.
112
113 =======================
114 Moko/Pirelli FFS format
115 =======================
116
117 OK, now that I'm done with the introduction, we can get to the actual
118 Mokopir-FFS format.
119
120 * On the GTA0x modem (or at least on my GTA02; my sample size is 1) the FFS
121 occupies 7 flash sectors of 64 KiB each at offsets 0x380000 through 0x3E0000,
122 inclusive.
123
124 (The 4 MiB NOR flash chip used by Closedmoko has an independent R/W bank
125 division between the first 3 MiB and the last 1 MiB. The first 3 MiB are used
126 to hold the field-flashable closed firmware images distributed as *.m0 files;
127 the independent last megabyte holds the FFS, and thus the FW could be
128 implemented to do FFS writes while running from flash in the main bank.
129 Less than half of that last megabyte appears to be used for the FFS though;
130 the rest appears to be unused - blank flash observed.)
131
132 * On the Pirelli the FFS occupies 18 sectors of 256 KiB each at offsets 0
133 through 0x440000 (inclusive) of the 2nd flash chip select, the one wired to
134 nCS3 on the Calypso.
135
136 Each flash sector allocated to FFS begins with the following signature:
137
138 00000000: 46 66 73 23 10 02 xx yy zz FF FF FF FF FF FF FF Ffs#............
139
140 The bytes shown as xx and yy above serve a non-understood purpose; as a guess,
141 they may hold some info for the flash wear leveling algorithm: in a "virgin"
142 FFS image like that found in my GTA02 (which never had a SIM card in it and
143 never made or received a call) or read out of a "virgin" Pirelli phone that
144 hasn't seen any active use yet, both of these bytes are FFs, but when I look at
145 FFS images read out of the Pirelli which I currently use as my everyday-use
146 cellphone, I see other values in sectors which must have been erased and
147 rewritten. A read-only implementation can ignore these bytes, as mine does.
148
149 The byte shown as zz is more important though, even to a read-only
150 implementation. The 3 values I've encountered in this byte so far are AB, BD
151 and BF. Per my current understanding, in a "healthy" FFS exactly one sector
152 will have AB in its header, exactly one will have BF, and the rest will have
153 BD. The meanings are (or appear to be):
154
155 AB: the sector holds a vital data structure which I have called the active
156 index block;
157 BD: the sector holds regular data;
158 BF: the sector is blank except for the header, can be turned into a new AB or
159 BD.
160
161 (Note that a flash program operation, which can turn 1s into 0s but not the
162 other way around, can turn BF into either AB or BD - but neither AB nor BD can
163 be turned into any other valid value.)
164
165 In a "virgin" FFS image (as explained above) the first FFS sector is AB, the
166 last one is BF, and the ones in between are BDs.
167
168 An FFS read operation (a search for a given pathname, or a listing of all
169 present directories and files) needs to start with locating the active index
170 block - the FFS sector with AB in the header. Following this header, which is
171 treated as being 16 bytes long (almost everything in Mokopir-FFS is aligned on
172 16-byte boundaries), the active index block contains a linear array of 16-byte
173 records, each record describing an FFS object: directory, file or file
174 continuation chunk.
175
176 Here is my current understanding of the 16-byte index block record structure:
177
178 2 bytes: Length of the described chunk in bytes
179 1 byte: Purpose/meaning not understood, ignored by my current code
180 1 byte: Object type
181 2 bytes: Descendant pointer
182 2 bytes: Sibling pointer
183 4 bytes: Data pointer
184 4 bytes: Purpose/meaning not understood, ignored by my current code
185
186 (On the Calypso phones of interest, all multibyte fields are in the native
187 little-endian byte order of the ARM7TDMI processor.)
188
189 The active index block gets filled with these records as objects are created;
190 the first record goes right after the 'Ffs#'...AB header (padded to 16 bytes);
191 the last record (at any given moment) is followed by blank flash for the
192 remainder of the sector. Records thus appear in the order in which they are
193 created, which bears no direct relation to the directory tree structure.
194
195 The objects, each described by a record in the index block, are organized into
196 a tree structure by the descendant and sibling pointers, plus the object type
197 indicator byte. Let's start with the latter; the following objtype byte values
198 have been observed:
199
200 00: deleted object - a read-only implementation should ignore everything except
201 the descendant and sibling pointers. (A write-capable implementation would
202 need more care - it would need a way of reclaiming dirty flash space taken
203 up by deleted/overwritten files.)
204
205 E1: a special file - see the description of the /.journal file further down
206 F1: a regular file (head chunk thereof)
207 F2: a directory
208 F4: file continuation chunk (explained below)
209
210 Each record in the index block has an associated chunk in one of the data
211 sectors; the index record contains fields giving the address and length of this
212 chunk. The length of a chunk is always a nonzero multiple of 16 bytes, and is
213 stored (as a number in bytes) in the first 16-bit field of the 16-byte index
214 entry. The address of each chunk is given by the data pointer field of the
215 index record, and it is reckoned in 16-byte units (thereby 16-byte alignment is
216 required) from the beginning of the FFS sector group in the flash address space.
217
218 For objects of type F1 and F2 (regular files and directories) the just-described
219 chunk begins with the name of the file or subdirectory as a NUL-terminated ASCII
220 string. This name is just for the current level of the directory tree, just
221 like in UNIX directories, thus one will have chunk names like gsm, l3, eplmn
222 etc, rather than /gsm/l3/eplmn. One practical effect is that one can't readily
223 see pathnames or any of the directory structure by looking at an FFS image as a
224 raw hex dump; the structure is only revealed when one uses a parsing program
225 like those which accompany this document.
226
227 In the case of directories, the "chunk" part of the object contains only the
228 name of the directory itself, padded with FFs to a 16-byte boundary. For
229 example, an FFS directory named /gsm would be represented by an object
230 consisting of two flash writes: a 16-byte entry in the active index block, with
231 the object type byte set to F2, and a corresponding 16-byte chunk in one of the
232 data sectors, with the 16 bytes containing "gsm", a terminating NUL byte, and
233 12 FF bytes to pad up to 16. In the case of files, this name may be following
234 by the first chunk of file data content, as explained further down.
235
236 In order to parse the FFS directory tree (whether the objective is to dump the
237 whole thing recursively or to find a specific file given a pathname), one needs
238 to first (well, after finding the active AB block) find the root directory node.
239 The root directory object is similar to other directory objects: it has a type
240 of F2, and an associated chunk of 16 bytes in one of the data sectors. The
241 latter contains the name of the root node: on the Pirelli it is "/", whereas on
242 my GTA02 it is "/ffs-root".
243
244 The astute reader should notice that it really makes no sense to store a name
245 for the root node, and indeed, this name plays no part in the traversal of the
246 directory tree given an absolute pathname. But instead this name, or rather
247 its first character, appears to be used for the purpose of locating the root
248 node itself. At first I had assumed that the index record for the root node is
249 always the first record in the active index block right after the signature
250 header - that is how it is in "virgin" FFS images, and also in some quite non-
251 virgin ones I have pulled from my daily-use Pirelli. Naturally my first version
252 of the Mokopir-FFS (then called MysteryFFS) extraction utility expected the root
253 node to always be at index #1. But then I got some additional Pirelli phones,
254 and discovered that in certain cases, index record #1 is a deleted object (the
255 original root node which has been deleted), and the new active root node is
256 somewhere in the middle of the index!
257
258 Thus it appears that in order to find the active root node, one needs to scan
259 the active index block linearly from the beginning (disregarding the tree
260 structure pointers in this initial pass), looking for a non-deleted object of
261 type F2 (a directory) whose corresponding name chunk sports a name beginning
262 with the '/' character. (Anyone who's been raised in UNIX will immediately
263 know that the path separator character '/' is the only character other than NUL
264 that's absolutely forbidden in the individual filenames - so this special
265 "root node name" is the only case of a '/' character appearing in what would
266 otherwise be a regular filename.)
267
268 [What causes the root node to be somewhere other than at index #1? I assume it
269 has to do with the dirty space reclamation / data movement algorithm. In a
270 "virgin" FFS image the very first sector is the active index block, and the
271 following sector is the first to hold chunks, beginning with the name chunk of
272 the root node. Now what happens if all data in that sector aside from the
273 root node name and some other mostly-static directory names becomes dirty,
274 i.e., belonging to deleted or overwritten files? How would that flash space
275 get reclaimed? I assume that the FFS firmware algorithm moves all still-active
276 chunks to a new flash sector, invalidating the old copies - turning the latter
277 into deleted objects. The root node will be among them. Then at some point
278 the active index block is going to fill up too, and will need to be rewritten
279 into a new sector - at which point the previously-deleted index entries are
280 omitted and the root node becomes #1 again...]