Benchmarking Raspberry Pi Pico 2
|I recently got my hands on the the newly released Raspberry pi pico 2 board. The board is powered by ARM Cortex-33 based RP2350 microcontroller. As with any new microcontroller, the first question that comes to mind is how does it stack against the existing controllers offerings in the market. Benchmarking allows us to compare different controllers. Benchmark performance often does not directly translate to real-world applications. However, benchmarks are valuable for understanding the relative performance of each board compared to others, providing a baseline for comparison..
I wrote a small arduino sketch that performs addition,multiplication and division. I ran the same sketch on all the boards that are in considerations, and also disabled the compiler optimisations.
Benchmarked boards
- Raspberry Pico
- Raspberry Pico 2
- M5Stack Core 2
- Wio Terminal
All boards are operated at their recommended frequencies (no overclocking), they are also powered by same power supply.
Both integer and float operation performance is captured (MOPS – Million operations per second). The numbers doesn’t represent the absolute performance of the operation under consideration. As we can see in the sketch, each operation (addition,multiplication,division) is done inside a double loop.
The results are mostly as expected, RP2350 beats RP2040 in all the operations. There is a wide margin in the floating point performance between RP2350 and RP2040. This can be explained by the absence of hardware floating point support in RP2040. With in the RP2350 the Hazard3 RISC-V core has slightly better integer performance than ARM Cortex-M33 core (For addition and multiplication).
ESP32 results are little bit surprising, I expected ESP32 performance will be better than RP2350 as it is running at much higher clock. LX6 results are very similar to Cortex-M4F results, though it is CPU clock is twice that of the the Cortex-M4F. We can see the difference much better if we scale all the results to 100MHz.
Benchmarking sketch
#define ITERATIONS 1000000 // Number of iterations per cycle
#define CYCLES 10 // number of cycles
#pragma GCC push_options
#pragma GCC optimize ("O0") // Disable optimizations
void setup() {
Serial.begin(115200);
delay(1000);
while (!Serial); // Wait for the Serial Monitor to connect
Serial.println("Starting benchmark");
unsigned long start, end;
float mops;
float t_ops = (CYCLES*ITERATIONS)/1000.0f;
uint32_t int_result;
float float_result;
uint32_t int_seed_1 = random(0,1000);
uint32_t int_seed_2 = random(0,1000);
float float_seed_1 = random(0,1000) * 0.1f;
float float_seed_2 = random(0,1000) * 0.5f;
start = millis();
for (uint32_t cycle = 0; cycle < CYCLES; cycle++) {
for (uint32_t i = 0; i < ITERATIONS; i++) {
int_result = int_seed_1 + int_seed_2;
}
}
end = millis();
Serial.print("Unsigned Int Addition Time (ms): ");
Serial.print(int_result);
Serial.print(",");
mops = (float)t_ops / (end - start) ;
Serial.println(mops);
start = millis();
for (uint32_t cycle = 0; cycle < CYCLES; cycle++) {
for (uint32_t i = 0; i < ITERATIONS; i++) {
float_result = float_seed_1 + float_seed_2;
}
}
end = millis();
Serial.print("Float Addition Time (ms): ");
Serial.print(float_result);
Serial.print(",");
mops = (float)t_ops / (end - start) ;
Serial.println(mops);
start = millis();
for (uint32_t cycle = 0; cycle < CYCLES; cycle++) {
for (uint32_t i = 0; i < ITERATIONS; i++) {
int_result = int_seed_1 * int_seed_2;
}
}
end = millis();
Serial.print("Unsigned Int multiplication Time (ms): ");
Serial.print(int_result);
Serial.print(",");
mops = (float)t_ops / (end - start) ;
Serial.println(mops);
start = millis();
for (uint32_t cycle = 0; cycle < CYCLES; cycle++) {
for (uint32_t i = 0; i < ITERATIONS; i++) {
float_result = float_seed_1 * float_seed_2;
}
}
end = millis();
Serial.print("Float multiplication Time (ms): ");
Serial.print(float_result);
Serial.print(",");
mops = (float)t_ops / (end - start) ;
Serial.println(mops);
start = millis();
for (uint32_t cycle = 0; cycle < CYCLES; cycle++) {
for (uint32_t i = 0; i < ITERATIONS; i++) {
int_result = int_seed_1 / int_seed_2;
}
}
end = millis();
Serial.print("Unsigned Int division Time (ms): ");
Serial.print(int_result);
Serial.print(",");
mops = (float)t_ops / (end - start) ;
Serial.println(mops);
start = millis();
for (uint32_t cycle = 0; cycle < CYCLES; cycle++) {
for (uint32_t i = 0; i < ITERATIONS; i++) {
float_result = float_seed_1 / float_seed_2;
}
}
end = millis();
Serial.print("Float division Time (ms): ");
Serial.print(float_result);
Serial.print(",");
mops = (float)t_ops / (end - start) ;
Serial.println(mops);
}
#pragma GCC pop_options
void loop() {
}