comparison doc/Loadtools-performance @ 680:89ed8b374bc0

doc/Loadtools-performance: finished updates for fc-host-tools-r13
author Mychaela Falconia <falcon@freecalypso.org>
date Mon, 09 Mar 2020 04:40:13 +0000
parents f2a023c20653
children 0815661d6e3e
comparison
equal deleted inserted replaced
679:be641fa7b68d 680:89ed8b374bc0
79 flashing. 79 flashing.
80 80
81 Notice the difference in flash programming times between GTA02 and FCDEV3B: the 81 Notice the difference in flash programming times between GTA02 and FCDEV3B: the
82 fw image size is almost exactly the same, any difference in latency between 82 fw image size is almost exactly the same, any difference in latency between
83 CP2102 and FT2232D is less likely to produce such significant time difference 83 CP2102 and FT2232D is less likely to produce such significant time difference
84 given our current 2048 byte transfer block size, thus the difference in physical 84 given our current 2048 byte transfer block size (in fact fc-xram transfer times
85 flash program operation times between K5A3281CTM and S71PL129N flash chips seems 85 suggest that FT2232D is faster), thus the difference in physical flash program
86 to be the most likely explanation. 86 operation times between K5A3281CTM and S71PL129N flash chips seems to be the
87 most likely explanation.
88
89 It also needs to be noted that in the current version of fc-loadtool there is
90 no difference in performance between flash program-bin, program-m0 and
91 program-srec operations: they all use the same binary protocol with 2048 byte
92 transfer block size. There is no coupling between source S-records and flash
93 programming operation blocks (2048-byte units) in the case of flash program-m0
94 and program-srec: the new implementation of these commands prereads the entire
95 S-record image as a separate preparatory step on the host side, the bits to be
96 programmed are saved in a temporary binary file (automatically deleted
97 afterward), and the actual flash programming operation proceeds from this
98 internal binary source - but it knows about any discontiguous program regions
99 and skips the gaps properly.
87 100
88 XRAM loading via fc-xram 101 XRAM loading via fc-xram
89 ======================== 102 ========================
90 103
91 Our current fc-xram implementation is similar to the old 2013 implementation of 104 The new version of fc-xram as of fc-host-tools-r13 is dramatically faster than
92 flash program-m0 and program-srec commands in that fc-xram sends a separate ML 105 the original implementation from 2013, using a new binary transfer protocol.
93 command to loadagent for each S-record, thus the total XRAM image loading time 106 The speed increase comes from not only switching from hex to binary, but even
94 is not only the serial bit transfer time, but also the overhead of command- 107 more so from eliminating the command-response turnaround time on every S3
95 response exchanges between fc-xram and loadagent. The flash programming times 108 record. The new XRAM loading times obtained on the Mother's Slackware 14.2
96 listed above include flashing an FC Magnetite fw image into an FCDEV3B, which 109 host system are:
97 took 2m11s; doing an fc-xram load of the same FC Magnetite fw image (built as
98 ramimage.srec) into the same FCDEV3B via the same FT2232D adapter at 812500
99 baud takes 2m54s.
100 110
101 Why does XRAM loading take longer than flashing? Shouldn't it be faster because 111 Pirelli DP-L10 with built-in CP2102 USB-serial chip, 812500 baud, loading
102 the flash programming step on the target is replaced with a simple memcpy()? 112 hybrid-vpm fw build, 49969 S3 records: 0m27s
103 Answer: fc-xram is currently slower than flash program operations because the 113
104 latter send 256 bytes at a time to loadagent, whereas fc-xram sends one 114 FCDEV3B interfaced via FT2232D adapter, 812500 baud, loading hybrid fw build,
105 S-record at a time; the division of the image into S-records is determined by 115 78875 S3 records: 0m35m
106 the tool that generates the SREC image, but TI's hex470 post-linker generates 116
107 images with 30 bytes of payload per S-record. Having the operation proceed in 117 With the previous version of fc-xram these two loads took 1m40s and 2m54s,
108 smaller chunks increases the overhead of command-response exchanges and thus 118 respectively. With the current version of loadtools XRAM loading is faster
109 increases the overall time. 119 than flash programming for the same fw image as one would naturally expect (the
120 flash programming step on the target is replaced with a simple memcpy()
121 operation), but in the previous version XRAM loading was slower because of
122 massive command-response exchange overhead: there was a command-response
123 turnaround time incurred for every S3 record, typically carrying only 30 bytes
124 of payload.
110 125
111 Additional complication with FTDI adapters and newer Linux kernel versions 126 Additional complication with FTDI adapters and newer Linux kernel versions
112 ========================================================================== 127 ==========================================================================
113 128
114 If you are using an FTDI adapter and a Linux kernel version newer than early 129 If you are using an FTDI adapter and a Linux kernel version newer than early
115 2017 (the change was introduced between 4.10 and 4.11), then you have one 130 2017 (the change was introduced between 4.10 and 4.11), then you have one
116 additional complication: a change was made to the ftdi_sio driver in the Linux 131 additional complication: a change was made to the ftdi_sio driver in the Linux
117 kernel that makes many loadtools operations (basically everything other than 132 kernel that made many loadtools operations (basically everything other than
118 flash dumps which are entirely target-driven) unbearably slow (much slower than 133 flash dumps which are entirely target-driven) unbearably slow, at least with
119 the Slackware 14.2 reference times given above) unless you execute a special 134 previous versions of loadtools that made many more command-response exchanges
120 setserial command first. After you plug in your FTDI-based USB-serial cable or 135 with loadagent for smaller transfer units and thus were much more sensitive to
121 connect the USB cable between your PC or laptop and your FTDI adapter board, 136 host system latency on these exchanges. We do not yet know if this FTDI
122 causing the corresponding ttyUSBx device to appear, execute the following 137 latency timer issue still has a significant negative impact or not with current
123 command: 138 loadtools, but if it does, the solution is to run a special setserial command.
139 After you plug in your FTDI-based USB-serial cable or connect the USB cable
140 between your PC or laptop and your FTDI adapter board, causing the
141 corresponding ttyUSBx device to appear, execute the following command:
124 142
125 setserial /dev/ttyUSBx low_latency 143 setserial /dev/ttyUSBx low_latency
126 144
127 (Obviously change ttyUSBx to your actual ttyUSB number.) Execute this 145 (Obviously change ttyUSBx to your actual ttyUSB number.) Execute this
128 setserial command before running fc-loadtool or fc-xram, and then hopefully you 146 setserial command before running fc-loadtool or fc-xram, and then hopefully you
129 should get performance that is comparable to what I get on classic Slackware. 147 should get performance that is comparable to what I get on classic Slackware.
130 I say "hopefully" because I am not able to test it myself - I refuse to run any 148 I say "hopefully" because I am not able to test it myself - I refuse to run any
131 OS that can be categorized as "modern" - but field reports of performance on 149 OS that can be categorized as "modern" - but field reports of performance on
132 non-Slackware systems running newer Linux kernels (4.11 or later) are welcome. 150 non-Slackware systems running newer Linux kernels (4.11 or later) are welcome,
151 both with and without the low_latency setting. Please be sure to include your
152 Linux kernel version and your USB-serial adapter type in your report!