Profiling JVM applications in Androind with simpleperf
7 mins readJanuary 26, 2026
perf
simpleperf
arm
benchmark
android
adb
aarch64
Android platform-tools (shipped with Android studio) have a binary named simpleperf which is very similar to perf but some devices like mine have it already it at /system/bin. Running a Java benchmark with simpleperf is really fun (cough painful) due to Android’s nuances. Here is what works and what doesn’t -
Experiment 1: Running via termux
I thought it is as simple as /system/bin/simpleperf stat ... but no. I couldn’t use it via termux as it’s coded to not run when invoked thru an app user (source).
App user - Android maintains a user per application
So I came up with this workaround (which I will run from adb)
$ simpleperf stat \
/system/bin/run-as com.termux \
files/usr/bin/java -jar tensor.benchmark-main/target/benchmarks.jar
Idea behind this approach
-
run-asis an android debug utility which helps in running programs in context of other users. -
But the
run-asrequires the app to be marked as debuggable. For this, I got the source code ofTermux, made it debuggable (steps below), compiled and installed it on my phone.NOTE: if you already have termux installed, you will loose the data of that app
<!-- changes I did to app/src/main/AndroidManifest.xml > ... <application ... android:debuggable="true" tools:ignore="HardcodedDebugMode"> ... -
Once the app is installed, I had to install the required packages again (see above) and compile my benchmark.
-
Connect via
adbasshelluser and run the benchmark withsimpleperf.
Stats from simpleperf
The stats were really weird. See the task-clock and page-faults. 0 PFs, really?
# count event_name # count / runtime
110,044 cpu-cycles # 1.064410 GHz
14,797 stalled-cycles-frontend # 143.125 M/sec
74,950 stalled-cycles-backend # 724.960 M/sec
25,018 instructions # 241.989 M/sec
585 branch-misses # 5.658 M/sec
0.108230(ms) task-clock # 0.000001 cpus used
0 context-switches # 0.000 /sec
0 page-faults # 0.000 /sec
To confirm whether I was getting real results, I did a small test
$ simpleperf stat -e cpu-cycles ls
...
Performance counter statistics:
# count event_name # count / runtime
18,805,994 cpu-cycles # 0.930426 GHz
Total test time: 0.022808 seconds.
$ simpleperf stat -e cpu-cycles /system/bin/run-as com.termux ls
...
Performance counter statistics:
# count event_name # count / runtime
85,303 cpu-cycles # 0.777114 GHz
Total test time: 0.038096 seconds.
Something is wrong when I profile the program invoked via run-as. It seems simpleperf doesn’t follow the process heirarchy. So this was a deadend.
Experiment 2: Profile the termux app
I found that there is an option to perf an android application hoping simpleperf will follow the process heirarchy. This was my approach
-
Start the profiler in adb shell.
$ simpleperf stat --app com.termux -
Run the application on termux (thru SSH / termux app ui itself, you can keep the command typed before step 1 and hit enter once you start profiling)
$ java -jar target/benchmarks.jar -wi 2 -i 2 -
Stop profiling (Ctrl+C) after the benchmark process finishes
Results
Although the results had some huge numbers, the task-clock again revealed it’s all useless.
$ simpleperf stat --app com.termux
^CPerformance counter statistics:
# count event_name # count / runtime
2,326,910,321 cpu-cycles # 1.969961 GHz
565,468,992 stalled-cycles-frontend # 465.821 M/sec
849,928,935 stalled-cycles-backend # 704.037 M/sec
2,701,203,792 instructions # 2.243 G/sec
6,660,805 branch-misses # 5.658 M/sec
1331.798921(ms) task-clock # 0.014056 cpus used
1,955 context-switches # 1.468 K/sec
749 page-faults # 562.397 /sec
Total test time: 94.750378 seconds.
Experiment 3: Run everything in adb shell (THE WORKING ONE!!)
Idea
-
I have JDK, C-runtime library (required by Java) in the Termux app (
/data/data/com.termux/files/usrdirectory) and since I have installed a debuggable instance of the Termux on my phone, these files are accessible from Android studio (file browser). -
I need to copy the files in the following order: Termux App directory -> Laptop -> Shell app directory. To do the second copy, I used
adb.$ adb push usr /data/local/tmp/bins/ $ adb push benchmarks.jar /data/local/tmp/ -
Once files are ready, we need to setup
PATH,LD_LIBRARY_PATHand we are good to go.
Execution
-
The env setup
LP=/system/lib64:/vendor/lib64 # android libs LP=$LP:/data/local/tmp/bins/lib # c library LP=$LP:/data/local/tmp/bins/lib/jvm/java-25-openjdk/lib # java libs export LD_LIBRARY_PATH=$LP export PATH=$PATH:/data/local/tmp/bins/bin -
The permission bits
cd /data/local/tmp/bins chmod +x bin/* chmod +x lib/jvm/java-25-openjdk/lib/* chmod +x lib/jvm/java-25-openjdk/bin/* -
The run (for some reason the java was creating temp files in termux app’s private directory, so I had to reset it to
/tmp)java -Djava.io.tmpdir=/tmp -jar benchmarks.jar ... Benchmark Mode Cnt Score Error Units Tensor4DBenchmark.accessTest thrpt 210.361 ops/s Performance counter statistics: # count event_name # count / runtime 45,580,956,762 cpu-cycles # 2.358157 GHz 2,195,865,743 stalled-cycles-frontend # 113.596 M/sec 12,975,768,615 stalled-cycles-backend # 671.116 M/sec 149,786,430,843 instructions # 7.742 G/sec 60,366,459 branch-misses # 3.124 M/sec 23738.725052(ms) task-clock # 1.063421 cpus used 7,162 context-switches # 301.701 /sec 45,111 page-faults # 1.900 K/sec Total test time: 22.322981 seconds.