Some Raspberry Pi 2 Benchmarks

This article presents a few informal benchmarks comparing the Raspberry Pi to the Raspberry Pi 2. The original Pi has a single core ARM v6 processor. The Pi 2 is quad core, ARM v7, and clocked faster than the Pi 1. But is it really six times as fast, as the makers claim ? Short answer: yes it is. And then some.

Single Core Test

Gzipping a file is a CPU intensive task. Let’s see how both Pis perform when asked to gzip a large file of 237 MB. The Pi1 will go first:

pi1 $ ls -lh
-rwxr-xr-x 1 root root 237M Feb  7 19:56 bigfile.mp4
pi1 $ time gzip bigfile.mp4
real	4m25.768s
user	4m2.430s
sys	0m11.770s

266 seconds on the Pi1.

Now the Pi2:

pi2 $ ls -lh
-rwxr-xr-x 1 root root 237M Feb  7 19:56 bigfile.mp4
pi2 $ time gzip bigfile.mp4
real	1m31.396s
user	1m24.870s
sys	0m3.520s

91 seconds on the Pi2.

A speed improvement of around 2.95x. Repeating the test gave speed ratios of around 3. Sometimes just over, sometimes just under.

Yes, the Pi2 is 3 times as fast as the Pi1. Just as Eben Upton said it was, in his interview with The Register in February 2015.

Multi Core Test

Hang on though, doesn’t the Pi Foundation claim a performance improvement of “at least 6x” for the new Pi2, as mentioned in the same Register article ? Ah, well, the Pi2 has four CPU cores, and only one of them was used in the test above. Fine, lets get all four cores running, and see what happens.

Split that big file into four equal chunks as follows.

pi2 $ split -n 4 bigfile.mp4
pi2 $ ls -lh
-rwxr-xr-x 1 root root 237M Feb  7 19:56 bigfile.mp4
-rwxr-xr-x 1 root root  60M Feb 22 16:53 xaa
-rwxr-xr-x 1 root root  60M Feb 22 16:53 xab
-rwxr-xr-x 1 root root  60M Feb 22 16:53 xac
-rwxr-xr-x 1 root root  60M Feb 22 16:53 xad

The big 297MB file has been split into 4 pieces of 60MB each, called xaa, xab, xac, xad. The following command will create 4 processes, gzipping all four at the same time:

pi2 $ time ls x* | xargs -t -n 1 -P 4 gzip
real	0m32.472s
user	1m56.110s
sys	0m5.790s

By using four cores in parallel, the Pi2 has zipped all the data in 32 seconds, improving on the Pi1’s performance by a factor of 8.3. That, as they say, is a spicy meat ball. And it certainly backs up claims of an “at least 6x” performance improvement.

Note. Here is an example of a top command run during the gzip test. Each of the Pi2’s four CPUs are running at nearly 100%.

PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
2398 root     20   0  2292 1328 1044 R 100.0  0.2   0:28.84 gzip
2400 root     20   0  2292 1320 1032 R 100.0  0.2   0:29.02 gzip
2401 root     20   0  2292 1280  996 R 100.0  0.2   0:29.02 gzip
2399 root     20   0  2292 1280  988 R  92.1  0.2   0:27.39 gzip

More CPU Intensive

Pi2 is outperforming Pi1 by a factor of 8.3. But can it do more ? In order to flex those Pi2 muscles properly, let’s take an even more CPU bound task. Calculating a checksum on that big file will get those CPU cores really humming. Check out the Pi1:

pi1 $ time sha512sum bigfile.mp4
real	5m37.850s
user	5m36.140s
sys	0m1.270s

and the Pi2:

pi2 $ time sha512sum bigfile.mp4
real	1m28.688s
user	1m28.120s
sys	0m0.560s

With a more CPU intensive task, the Pi2, using only one core, just outperformed the P1 by a factor of 337/88 = 3.8.

And when we let all 4 cores in on the act…

pi2 $ time ls x* | xargs -t -n 1 -P 4 sha512sum 
real	0m23.397s
user	1m29.300s
sys	0m0.500s

The Pi2 outperformed the Pi1 by a factor of 15.6 (366/23.4). The Pi2 has also exceeded its own single-core performance by a factor of 88.6/23.4 = 3.49, close to a factor of 4x which the number of cores would naturally suggest.

Note The “time” command records 1 minute and 29 seconds of “user” time elapsing, even though only 23.397 seconds passed in real time. This is a quirk of the time command – adding up “user” time for each core.

Conclusion

For single threaded, CPU intensive processes, the Pi2 is typically 3 times faster than the Pi1, rising to 3.8 or so for very CPU bound processes. For multi-threaded applications, the improvement is 6x or more. For example it is up to 8.3 when compressing a file, and up to 15x for pure CPU bound activities such as checksumming data. This is easily demonstrated on the command line and confirms the Pi Foundation’s claims.

The performance jump offered by the Pi2 will also be obvious to anyone using the LXDE GUI, the Epiphany web browser or any other large, multi-threaded application. To check if an app is multi-threaded, use ps -eLfP. For example, Epiphany is now running on my Pi2 with 11 threads, scattered like snow across the 4 processors. The PSR column identifies the core:

pi2 $ ps -eLfP | grep epip
UID        PID  PPID  LWP  PSR  C NLWP STIME TTY      TIME     CMD
pi        2336     1  2336   2  6   11 23:03 tty1     00:00:09 epiphany-browser
pi        2336     1  2337   2  0   11 23:03 tty1     00:00:00 epiphany-browser
pi        2336     1  2338   3  0   11 23:03 tty1     00:00:00 epiphany-browser
pi        2336     1  2342   2  0   11 23:03 tty1     00:00:00 epiphany-browser
pi        2336     1  2348   3  0   11 23:03 tty1     00:00:00 epiphany-browser
pi        2336     1  2350   1  0   11 23:03 tty1     00:00:00 epiphany-browser
pi        2336     1  2353   0  0   11 23:03 tty1     00:00:00 epiphany-browser
pi        2336     1  2354   1  0   11 23:03 tty1     00:00:00 epiphany-browser
pi        2336     1  2355   0  0   11 23:03 tty1     00:00:00 epiphany-browser
pi        2336     1  2356   2  0   11 23:03 tty1     00:00:00 epiphany-browser
pi        2336     1  2377   0  0   11 23:04 tty1     00:00:00 epiphany-browser

The same command (without the grep) will show many LXDE background processes and threads running across all four cores.

Backround Information

Question: In the test on the Pi2 above, how do we know that there is a gzip process running on each CPU core ? Might they not all be running on one core ?

Answer: Take a look at the first top output above. Four gzip processes each taking over 90% CPU time. A total of 490%. That is only possible if each gzip is running on a separate CPU core. Also, note the “R” in the Status (“S”) column. R means running, indicating that all 4 processes are on the CPU (in a core) at the same time.

We could also have typed ps -elfP while the test was running:

pi2 $ ps -elfP | grep gzip
F S UID        PID  PPID PSR  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 R root      2935  2934   2 91  80   0 -   573 ?      20:29 pts/0    00:00:11 gzip xaa
0 R root      2936  2934   3 91  80   0 -   573 ?      20:29 pts/0    00:00:11 gzip xab
0 R root      2937  2934   1 90  80   0 -   573 ?      20:29 pts/0    00:00:11 gzip xac
0 R root      2938  2934   0 87  80   0 -   573 ?      20:29 pts/0    00:00:11 gzip xad

The PSR column shows the ID of the CPU running each process. It can be seen that each gzip command is indeed in a different CPU (ie. core). And again, the S (“Status”) column shows every gzip is running.

Technical Information

All tests were performed using the same external USB2 disk.

The timings shown in the article are averaged, calculated from 3 or 4 runs of each test.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.