comparison doc/TIFFS-old-description @ 254:4eeab025b502

doc/TIFFS-old-description: the original description from SE 52 Mes 16
author Michael Spacefalcon <msokolov@ivan.Harhan.ORG>
date Sun, 02 Feb 2014 00:02:26 +0000
parents
children
comparison
equal deleted inserted replaced
253:475887e6b396 254:4eeab025b502
1 The description of TIFFS that follows was originally written in the summer of
2 SE52 (A.D. 2013), before the major TI source discoveries which happened later
3 that year. The text following the dividing line below has not been edited in
4 content since it was written; for a newer write-up based on the current source-
5 enabled understanding and reflecting the current FreeCalypso plans with respect
6 to this FFS, see TIFFS-Overview.
7
8 -------------------------------------------------------------------------------
9
10 This is a description, based on reverse engineering, of the flash file system
11 (FFS) implemented in Pirelli's original firmware for the DP-L10 GSM/WiFi dual
12 mode mobile phone, and in the Closedmoko GTA0x modem firmware. Not knowing the
13 "proper" name for this FFS, and needing _some_ identifier to refer to it, I
14 have named it Mokopir-FFS, from "Moko" and "Pirelli" - sometimes abbreviated
15 further to MPFFS.
16
17 (I have previously called the FFS in question MysteryFFS; but now that I've
18 successfully reverse-engineered it, it isn't as much of a mystery any more :-)
19
20 At a high functional level, Mokopir-FFS presents the following features:
21
22 * Has a directory tree structure like UNIX file systems;
23
24 * The file system API that must be implemented inside the proprietary firmware
25 appears to use UNIX-style pathnames; doing strings on firmware images reveals
26 pathname strings like these:
27
28 /var/dbg/dar
29 /gsm/l3/rr_white_list
30 /gsm/l3/rr_medium_rxlev_thr
31 /gsm/l3/rr_upper_rxlev_thr
32 /gsm/l3/shield
33
34 Parsing the corresponding FFS image with tools included in the present
35 package has confirmed that the directory structure implied by these pathnames
36 does indeed exist in the FFS.
37
38 * Absolutely no DOS-ish semantics seen anywhere: no 8.3 filenames and no
39 colon-separated device names (seen in the TSM30 file system source, for
40 example) are visible in the Closedmoko/Pirelli FFS.
41
42 * File contents are stored uncompressed, but not necessarily contiguous: one
43 could probably store a file in FFS which is bigger than the flash sector
44 size, it which case it can never be contiguous in a writable FFS (see below),
45 and the firmware implementation seems to limit chunk sizes to a fairly small
46 number: on the Pirelli phones all largish files are divided into chunks of
47 8 KiB each, and on my GTA02 the largest observed chunk size is only 2 KiB.
48
49 The smaller files, like the IMEI and the firmware ID strings in my GTA02 FFS,
50 are contiguous.
51
52 * The FFS structure is such that the length of "user" payload data stored in
53 each chunk (and consequently, in each file) can be known exactly in bytes,
54 with the files/chunks able to contain arbitrary binary data. (This property
55 may seem obvious or trivial, as all familiar UNIX and DOS file systems have
56 it, but contrast with RT-11 for example.)
57
58 * The flash file system is a writable one: the running firmware can create,
59 delete and overwrite files (and possibly directories too) in the live FFS;
60 thus the FFS design is such that allows these operations to be performed
61 within the physical constraints of NOR flash write operations.
62
63 I have reverse-engineered this Mokopir-FFS on a read-only level. What it means
64 is that I, or anyone else who can read this document and the accompanying
65 source for the listing/extraction utilities, can take a Mokopir-FFS image read
66 out of a device and see/extract its full content: the complete directory tree
67 and the exact binary byte content of all files contained therein.
68
69 However, the knowledge possessed by the present hacker (and conveyed in this
70 document and the accompanying source code) is NOT sufficient for constructing a
71 valid Mokopir-FFS image "in vitro" given a tree of directories and files, or
72 for making modifications to the file or directory content of an existing image
73 and producing a content-modified image that is also valid; valid as in suitable
74 for the original proprietary firmware to make its normal read and write
75 operations without noticing anything amiss.
76
77 Constructing "de novo" Mokopir-FFS images or modifying existing images in such
78 a way that they remain 100% valid for all read and write operations of the
79 original proprietary firmware would, at the very minimum, require an
80 understanding of the meaning of *all* fields of the on-media FFS format. Some
81 of these fields are still left as "non-understood" for now though: a read-only
82 implementation can get away with simply ignoring them, but a writer/generator
83 would have to put *something* in those fields.
84
85 As you read the "read-only" description of the Mokopir-FFS on-media format in
86 the remainder of this document, it should become fairly obvious which pieces
87 are missing before our understanding of this FFS can be elevated to a
88 "writable" level.
89
90 However, when it comes to writing new code to run on the two Calypso phones in
91 question (Closedmoko and Pirelli), it seems, at least to the present hacker,
92 that a read-only understanding of Mokopir-FFS should be sufficient:
93
94 * In the case of Closedmoko GTA0x modems, the FFS is seen to contain the IMEI
95 and the RF calibration data. The format of the former is obvious; the latter
96 not so much - but in any case, the information of interest is clearly of a
97 read-only nature. It's difficult to tell (or rather, I haven't bothered to
98 experiment enough) whether the Closedmoko firmware does any writes to FFS or
99 if the FFS is treated as read-only outside of the production line environment,
100 but in any case, it seems to me that for any 3rd party replacement firmware,
101 the best strategy would be to treat the FFS as a read-only source of IMEI and
102 RF calibration data, and nothing more.
103
104 * In the case of Pirelli phones, the FFS is used to store user data: sent and
105 received SMS (and MMS/email/whatever), call history, UI settings, pictures
106 taken with the camera, and whatever else. It also stores a ton of files
107 which I can only presume were meant to be immutable except at the time of
108 firmware updates: graphics for the UI, ringtones, i18n UI strings, and even
109 "helper" firmware images for the WiFi and VoIP processors. However, no IMEI
110 or RF calibration data are anywhere to be found in the FFS - instead this
111 information appears to be stored in the "factory block" at the end of the
112 flash (in its own sector) outside of the FFS.
113
114 Being able to parse FFS images extracted out of Pirelli phones "in vitro"
115 allows us to steal some of these helper files (UI artwork, ringtones,
116 WiFi/VoIP helpers), and some of these might even come useful to firmware
117 replacement projects, but it seems to me that a replacement firmware would
118 be better off using its own FFS design for storing user data, and as to
119 retrieving the original IMEI and RF calibration data, the original FFS isn't
120 of any use for that anyway.
121
122 =======================
123 Moko/Pirelli FFS format
124 =======================
125
126 OK, now that I'm done with the introduction, we can get to the actual
127 Mokopir-FFS format.
128
129 * On the GTA0x modem (or at least on my GTA02; my sample size is 1) the FFS
130 occupies 7 flash sectors of 64 KiB each at offsets 0x380000 through 0x3E0000,
131 inclusive.
132
133 (The 4 MiB NOR flash chip used by Closedmoko has an independent R/W bank
134 division between the first 3 MiB and the last 1 MiB. The first 3 MiB are used
135 to hold the field-flashable closed firmware images distributed as *.m0 files;
136 the independent last megabyte holds the FFS, and thus the FW could be
137 implemented to do FFS writes while running from flash in the main bank.
138 Less than half of that last megabyte appears to be used for the FFS though;
139 the rest appears to be unused - blank flash observed.)
140
141 * On the Pirelli the FFS occupies 18 sectors of 256 KiB each at offsets 0
142 through 0x440000 (inclusive) of the 2nd flash chip select, the one wired to
143 nCS3 on the Calypso.
144
145 Each flash sector allocated to FFS begins with the following signature:
146
147 00000000: 46 66 73 23 10 02 xx yy zz FF FF FF FF FF FF FF Ffs#............
148
149 The bytes shown as xx and yy above serve a non-understood purpose; as a guess,
150 they may hold some info for the flash wear leveling algorithm: in a "virgin"
151 FFS image like that found in my GTA02 (which never had a SIM card in it and
152 never made or received a call) or read out of a "virgin" Pirelli phone that
153 hasn't seen any active use yet, both of these bytes are FFs, but when I look at
154 FFS images read out of the Pirelli which I currently use as my everyday-use
155 cellphone, I see other values in sectors which must have been erased and
156 rewritten. A read-only implementation can ignore these bytes, as mine does.
157
158 The byte shown as zz is more important though, even to a read-only
159 implementation. The 3 values I've encountered in this byte so far are AB, BD
160 and BF. Per my current understanding, in a "healthy" FFS exactly one sector
161 will have AB in its header, exactly one will have BF, and the rest will have
162 BD. The meanings are (or appear to be):
163
164 AB: the sector holds a vital data structure which I have called the active
165 index block;
166 BD: the sector holds regular data;
167 BF: the sector is blank except for the header, can be turned into a new AB or
168 BD.
169
170 (Note that a flash program operation, which can turn 1s into 0s but not the
171 other way around, can turn BF into either AB or BD - but neither AB nor BD can
172 be turned into any other valid value.)
173
174 In a "virgin" FFS image (as explained above) the first FFS sector is AB, the
175 last one is BF, and the ones in between are BDs.
176
177 An FFS read operation (a search for a given pathname, or a listing of all
178 present directories and files) needs to start with locating the active index
179 block - the FFS sector with AB in the header. Following this header, which is
180 treated as being 16 bytes long (almost everything in Mokopir-FFS is aligned on
181 16-byte boundaries), the active index block contains a linear array of 16-byte
182 records, each record describing an FFS object: directory, file or file
183 continuation chunk.
184
185 Here is my current understanding of the 16-byte index block record structure:
186
187 2 bytes: Length of the described chunk in bytes
188 1 byte: Purpose/meaning not understood, ignored by my current code
189 1 byte: Object type
190 2 bytes: Descendant pointer
191 2 bytes: Sibling pointer
192 4 bytes: Data pointer
193 4 bytes: Purpose/meaning not understood, ignored by my current code
194
195 (On the Calypso phones of interest, all multibyte fields are in the native
196 little-endian byte order of the ARM7TDMI processor.)
197
198 The active index block gets filled with these records as objects are created;
199 the first record goes right after the 'Ffs#'...AB header (padded to 16 bytes);
200 the last record (at any given moment) is followed by blank flash for the
201 remainder of the sector. Records thus appear in the order in which they are
202 created, which bears no direct relation to the directory tree structure.
203
204 The objects, each described by a record in the index block, are organized into
205 a tree structure by the descendant and sibling pointers, plus the object type
206 indicator byte. Let's start with the latter; the following objtype byte values
207 have been observed:
208
209 00: deleted object - a read-only implementation should ignore everything except
210 the descendant and sibling pointers. (A write-capable implementation would
211 need more care - it would need a way of reclaiming dirty flash space taken
212 up by deleted/overwritten files.)
213
214 E1: a special file - see the description of the /.journal file further down
215 F1: a regular file (head chunk thereof)
216 F2: a directory
217 F4: file continuation chunk (explained below)
218
219 Each record in the index block has an associated chunk in one of the data
220 sectors; the index record contains fields giving the address and length of this
221 chunk. The length of a chunk is always a nonzero multiple of 16 bytes, and is
222 stored (as a number in bytes) in the first 16-bit field of the 16-byte index
223 entry. The address of each chunk is given by the data pointer field of the
224 index record, and it is reckoned in 16-byte units (thereby 16-byte alignment is
225 required) from the beginning of the FFS sector group in the flash address space.
226
227 For objects of type F1 and F2 (regular files and directories) the just-described
228 chunk begins with the name of the file or subdirectory as a NUL-terminated ASCII
229 string. This name is just for the current level of the directory tree, just
230 like in UNIX directories, thus one will have chunk names like gsm, l3, eplmn
231 etc, rather than /gsm/l3/eplmn. One practical effect is that one can't readily
232 see pathnames or any of the directory structure by looking at an FFS image as a
233 raw hex dump; the structure is only revealed when one uses a parsing program
234 like those which accompany this document.
235
236 In the case of directories, the "chunk" part of the object contains only the
237 name of the directory itself, padded with FFs to a 16-byte boundary. For
238 example, an FFS directory named /gsm would be represented by an object
239 consisting of two flash writes: a 16-byte entry in the active index block, with
240 the object type byte set to F2, and a corresponding 16-byte chunk in one of the
241 data sectors, with the 16 bytes containing "gsm", a terminating NUL byte, and
242 12 FF bytes to pad up to 16. In the case of files, this name may be followed
243 by the first chunk of file data content, as explained further down.
244
245 In order to parse the FFS directory tree (whether the objective is to dump the
246 whole thing recursively or to find a specific file given a pathname), one needs
247 to first (well, after finding the active AB block) find the root directory node.
248 The root directory object is similar to other directory objects: it has a type
249 of F2, and an associated chunk of 16 bytes in one of the data sectors. The
250 latter contains the name of the root node: on the Pirelli it is "/", whereas on
251 my GTA02 it is "/ffs-root".
252
253 The astute reader should notice that it really makes no sense to store a name
254 for the root node, and indeed, this name plays no part in the traversal of the
255 directory tree given an absolute pathname. But instead this name, or rather
256 its first character, appears to be used for the purpose of locating the root
257 node itself. At first I had assumed that the index record for the root node is
258 always the first record in the active index block right after the signature
259 header - that is how it is in "virgin" FFS images, and also in some quite non-
260 virgin ones I have pulled from my daily-use Pirelli. Naturally my first version
261 of the Mokopir-FFS (then called MysteryFFS) extraction utility expected the root
262 node to always be at index #1. But then I got some additional Pirelli phones,
263 and discovered that in certain cases, index record #1 is a deleted object (the
264 original root node which has been deleted), and the new active root node is
265 somewhere in the middle of the index!
266
267 Thus it appears that in order to find the active root node, one needs to scan
268 the active index block linearly from the beginning (disregarding the tree
269 structure pointers in this initial pass), looking for a non-deleted object of
270 type F2 (a directory) whose corresponding name chunk sports a name beginning
271 with the '/' character. (Anyone who's been raised in UNIX will immediately
272 know that the path separator character '/' is the only character other than NUL
273 that's absolutely forbidden in the individual filenames - so this special
274 "root node name" is the only case of a '/' character appearing in what would
275 otherwise be a regular filename.)
276
277 [What causes the root node to be somewhere other than at index #1? I assume it
278 has to do with the dirty space reclamation / data movement algorithm. In a
279 "virgin" FFS image the very first sector is the active index block, and the
280 following sector is the first to hold chunks, beginning with the name chunk of
281 the root node. Now what happens if all data in that sector aside from the
282 root node name and some other mostly-static directory names becomes dirty,
283 i.e., belonging to deleted or overwritten files? How would that flash space
284 get reclaimed? I assume that the FFS firmware algorithm moves all still-active
285 chunks to a new flash sector, invalidating the old copies - turning the latter
286 into deleted objects. The root node will be among them. Then at some point
287 the active index block is going to fill up too, and will need to be rewritten
288 into a new sector - at which point the previously-deleted index entries are
289 omitted and the root node becomes #1 again...]
290
291 Tree structure
292
293 Once the root node has been found, the descendant and sibling pointers are used
294 to traverse the tree structure. For each directory object, including the root
295 node, the descendant pointer points to the first child object of this directory:
296 the first file or subdirectory contained therein. (Descendant and sibling
297 pointers take the form of index numbers in the active index block. A "nil"
298 pointer is indicated by all 1s (FFFF) - the usual all-0s NULL pointer convention
299 couldn't be used because it's flash, where the blank state is all 1s.) If the
300 descendant pointer of a directory object is nil, that means an empty directory.
301 The sibling pointer of each file or directory points to its next sibling, i.e.,
302 the next member of the same parent directory. The sibling pointer of the root
303 node is nil.
304
305 Data content of files
306
307 Objects of type F1 are the head chunks of files. Each file has a head chunk,
308 and may or may not have continuation chunks. More precisely, the head chunk
309 may contain only the name (or viewed alternatively, 0 bytes of data), or it may
310 contain a nonzero number of payload bytes; orthogonally to this variability,
311 there may or may not be continuation chunk(s) present.
312
313 Continuation chunks
314
315 The descendant pointer of each file head object (the object of type F1, the one
316 reached by traversing the directory tree) indicates whether or not there are
317 any continuation chunks present. If this descendant pointer is nil, there are
318 no continuation chunks; otherwise it points to the first continuation chunk
319 object. File continuation objects have type F4, don't have any siblings (the
320 sibling pointer is nil - but see below regarding relocated chunks), and the
321 descendant pointer of each continuation object points to the next continuation
322 object, if there is one - nil otherwise.
323
324 Payload data delineation
325
326 Each chunk, whether head or continuation, always has a length that is a nonzero
327 multiple of 16 bytes. The length of the chunk here means the amount of flash
328 space it occupies in its data sector - which is NOT equal to the payload data
329 length.
330
331 The head chunk of each file begins with the filename, terminated by a NUL byte.
332 If there are any payload data bytes present in this head chunk (I'll explain
333 momentarily how you would tell), the byte immediately after the NUL that
334 terminates the filename is the first byte of the payload. In the case of a
335 continuation chunk, there is no filename and the first byte of the chunk is the
336 first byte of that chunk's portion of the user data payload.
337
338 Each data-containing chunk (head or continuation) has the following termination
339 after the last byte of that chunk's payload data: one byte of 00, followed by
340 however many bytes are needed ([0,15] range) of FFs to pad to a 16-byte
341 boundary. A file head chunk that has no payload data has the same format as a
342 directory name chunk: filename followed by its terminating NUL followed by
343 [0,15] bytes of FFs to pad to the next 16-byte boundary.
344
345 When working with a head chunk, find the beginning of possible payload data (1
346 byte after the filename terminating NUL) and find the end per the standard
347 termination logic: scanning from the end of the chunk, skip FFs until 00 is
348 found (encountering anything else is an error). If the head chunk has no data,
349 the effective data length (end_pointer - start_pointer) will be 0 or -1. (The
350 latter possibility is the most likely, as there will normally be a "shared" 00
351 byte, serving as both the filename terminator and the 00 before the padding
352 FF bytes.)
353
354 Relocated chunks
355
356 Let's go back to the scenario in which a particular data sector is full (no more
357 usable free space left) and contains a mixture of active and dirty (deleted or
358 invalidated) data. How does the dirty flash space get reclaimed, so that the
359 amount of available space (blank flash ready to hold new data) becomes equal to
360 the total FFS size minus the total size of active files and overhead? It can
361 only be done by relocating the still-active objects from the full sector to a
362 new one, invalidating the old copies, and once the old sector consists of
363 nothing but invalidated data, subjecting it to flash erasure.
364
365 So how do the active FFS objects get relocated from a "condemned" sector to a
366 new one? If the object is a directory, a new index entry is created, pointing
367 to the newly relocated name chunk, but it is then made to fit into the old tree
368 structure without disrupting the latter: the new index entry is added at the
369 tail of the sibling-chain of the parent directory's descendants, the old index
370 entry for the same directory is invalidated (as if the directory were rmdir'ed),
371 and the descendant pointer of the newly written index entry is set to a copy of
372 the descendant pointer from the old index entry for the same directory. The
373 same approach is used when the head chunk of a file needs to be relocated; in
374 both cases a read-only FFS implementation doesn't need to do anything special to
375 support reading file and directory objects that have been relocated in this
376 manner.
377
378 However, if the relocated object is a file continuation chunk, then the manner
379 in which such objects get relocated does affect file reading code. What if a
380 chunk in the middle of a chain linked by "descend" pointers needs to be moved?
381 What happens in this case is that the old copy of the chunk gets invalidated
382 (the object type byte turned to 00) like in the other object relocating cases,
383 and the sibling pointer of that old index entry (which was originally FFFF as
384 continuation objects have no siblings) is set to point to the new index entry
385 for the same chunk. The "descend" pointer in the new index entry is a copy of
386 that pointer from the old index entry.
387
388 The manner of chunk relocation just described has been observed in the FFS
389 images read out of my most recent batch of Pirelli phones - the same ones in
390 which the root directory object is not at index #1. Thinking about it as I
391 write this, I've realized that the way in which continuation objects get
392 relocated is exactly the same as for other object types - thus the compaction
393 code in the firmware doesn't need to examine what object type it is moving.
394 However, the case of continuation chunk relocation deserves special attention
395 because it affects a read-only implementation like ours - the utilities whose
396 source accompanies this document used to fail on these FFS images until I
397 implemented the following additional handling:
398
399 When following the chunk chain of a file, normally the only object type that's
400 expected is F4 - any other object type is an error. However, as a result of
401 chunk relocation, one can also encounter deleted objects, i.e., type == 00.
402 If such a deleted object is encountered, follow its sibling pointer, which must
403 be non-nil.
404
405 Journal file
406
407 Every Mokopir-FFS image I've seen so far contains a special file named
408 /.journal; this file is special in the following ways:
409
410 * The object type byte is E1 instead of F1;
411 * Unlike regular files, this special file is internally-writable.
412
413 What I mean by the above is that regular files are mostly immutable: once a
414 file has been created with some data content in the head chunk, it can only be
415 either appended to (one or more continuation chunks added), or overwritten by
416 creating a new file with the same name at the same level in the tree hierarchy
417 and invalidating the old one. But the special /.journal file is different: I
418 have never observed it to consist of more than the head chunk, and this head
419 chunk is pre-allocated with some largish and apparently fixed length (4 KiB on
420 my GTA02, 16 KiB on the Pirelli). This pre-allocated chunk contains what look
421 like 16-byte records at the beginning (on the first 4-byte boundary after the
422 NUL terminating the ".journal" name), followed by blank flash for the remainder
423 of the pre-allocated chunk - so it surely looks like new flash writes happen
424 within this chunk.
425
426 I do not currently know the purpose of this /.journal file or the meaning of the
427 records it seems to contain. This understanding would surely be needed if one
428 wanted to create FFS images from scratch or to implement FFS write operations,
429 but I reason that a read-only implementation can get away with simply ignoring
430 this file. I reason that this file can't be necessary in order to parse an FFS
431 image for reading because one needs to parse the tree structure first in order
432 to locate this journal file itself.
433
434 -------------------------------------------------------------------------------
435
436 That's all I can think of right now. If anything is unclear, see the
437 accompanying source code for the listing/extraction utilities: with the general
438 explanation given by this document, it should be clear what my code does and
439 why. And if a given piece of knowledge is found neither in this document nor
440 in my source code, then I don't know it myself either, and my read-only
441 Mokopir-FFS implementation makes do without it.
442
443 All knowledge contained herein has been recovered by reverse engineering.
444 Believe it or not, I have figured it out by staring at the hex dump of FFS
445 sectors, reasoning about how one could possibly implement an FFS given the
446 requirement of dynamic writability and the physical constraints of flash memory,
447 and writing listing/extraction test code iteratively until I got something that
448 appears to correctly parse all FFS images available to me - the result is the
449 code in this package.
450
451 I never got as far as attempting to locate the FFS implementation routines
452 within the proprietary firmware binary code images, and I haven't found an
453 implementation of this particular FFS in any of the leaked sources yet either.
454 The TSM30 code doesn't seem to be of any use as its FFS appears to be totally
455 different. As to the more recently found LoCosto code leak, I found that one a
456 few days *after* I got the Moko/Pirelli "MysteryFFS" reverse-engineered on my
457 own, and when I did look at the FFS in the LoCosto code later, I saw what seems
458 to be a different FFS as well.
459
460 Michael Spacefalcon
461 SE 52 Mes 16