Skip to main content

Performance

Work in Progress

Nexus is under active development. These benchmarks reflect the current state of the codebase and may change as optimizations are added.

All benchmarks run inside Docker on an Apple M4 Max (16 cores, 128 GB RAM), PHP 8.5.3, Swoole 6.0. Numbers are from the automated PHPUnit performance test suite (tests/Performance/).

Message throughput

How many messages per second a single actor can process end-to-end (tell -> mailbox -> behavior handler):

BenchmarkFiberSwoole
100K messages to one actor1.16M msgs/sec929K msgs/sec
50K message burst1.29M msgs/sec909K msgs/sec
100K stateful transitions1.06M msgs/sec853K msgs/sec
Fan-out (100 actors x 100 msgs)1.06M msgs/sec659K msgs/sec
Multi-dispatch (50 x 100 rounds)998K msgs/sec574K msgs/sec

Fiber is faster in single-process benchmarks because it avoids Swoole's coroutine scheduling overhead. Swoole's advantage is true async I/O and multi-process scaling — not single-process throughput.

Dispatch rate

Raw tell() throughput without waiting for processing:

RuntimeDispatch rate
Fiber5.14M tells/sec
Swoole995K tells/sec

Actor lifecycle

OperationFiberSwoole
Spawn 1,000 actors453K ops/sec (2.2 us/actor)471K ops/sec (2.1 us/actor)
Kill 500 actors (PoisonPill)165K ops/sec107K ops/sec
500 spawn-kill cycles151K ops/sec98K ops/sec

Ping-pong latency

Round-trip time for a message sent to an actor that replies immediately:

RuntimeLatencyThroughput
Fiber2.5 us per round trip399K ops/sec
Swoole2.5 us per round trip407K ops/sec

Memory

RuntimeMemory per actor
Fiber3,884 bytes
Swoole3,164 bytes

At ~3-4 KB per actor, 100K actors consume roughly 300-400 MB.

Multi-process scaling (Swoole)

Cross-worker messaging through Unix domain sockets with CompactClusterSerializer:

MetricResult
Cross-worker throughput260K msgs/sec per worker pair
Cross-worker round-trip latency20 us per round trip
Serialization throughput1.18M serialize+deserialize cycles/sec
Fan-out (4 workers, 5K messages)188K msgs/sec aggregate

Wire format

The CompactClusterSerializer sends actor paths as raw UTF-8 strings and only calls PHP serialize() on the message object:

[2B: target path length][target path bytes][2B: sender path length][sender path bytes][message bytes]

This is ~6x smaller than serializing the full Envelope object graph with PHP's native serialize().

Read buffering

UnixSocketTransport receives up to 64 KB at a time and parses multiple length-prefixed frames from the buffer. This reduces read syscalls from 2 per message (header + payload) to roughly N per buffer-full.

Running benchmarks

# All benchmarks (requires Swoole container)
docker compose exec php-swoole vendor/bin/phpunit --testsuite=performance

# Fiber-only benchmarks (no Swoole needed)
docker compose exec php vendor/bin/phpunit --testsuite=performance --filter=Fiber

# Cluster benchmarks only
docker compose exec php-swoole vendor/bin/phpunit --testsuite=performance --filter=Cluster

Interpreting the numbers

Fiber vs Swoole: Fiber is faster in isolated single-process benchmarks. This does not mean Fiber is "better" — Swoole provides true async I/O (database, HTTP, filesystem), multi-process scaling, and native coroutine support. Use Fiber for development and moderate workloads. Use Swoole for production with I/O-bound or multi-core workloads.

Docker overhead: Benchmarks run inside Docker containers. Native performance on the host machine is typically 10-20% faster.

Message size: All benchmarks use small messages ((object)['seq' => $i]). Larger messages will reduce throughput due to serialization and memory copy costs.