Morphling (DeviceEmulator) emulates distributed machine-learning training on heterogeneous edge devices. It runs unmodified training scripts on a unified backend — real or emulated — intercepts backend dispatch calls, fits stride-aware and thermal-aware performance models from real measurements, decouples memory demand from device count, and uses event-driven virtual time to preserve execution semantics. The result is large-scale what-if studies of edge training without a physical device fleet.
Morphling: Emulator for Distributed Machine Learning at the Edge Leyang Xue, Yufeng Xia, Eren Mendi, Ismaeel Bashir, Jiaxun Yang, Myungjin Lee, Mahesh K. Marina. MobiSys Workshop '26 (EdgeSys '26), Cambridge, United Kingdom, June 2026. DOI: 10.1145/3812836.3814779 · Citation · Figure inventory
- Per-GEMM CUDA green-context switching — trace-driven SM partitioning
routed through autograd hooks, see
docs/green-context.md. - Worker pool with pluggable scheduling policies (round-robin, greedy,
load-balanced); GPU path via
XtGemmWorker(cuBLASXt), CPU path viaCpuWorker(MKL). - Zero-copy scatter-gather buffers for inter-device data transfer over
libevent (
evbuffer_add_referencewithshared_ptrcleanup). - Pool-based memory management — pinned host pool, aligned buffer pool,
bucketed by power-of-2 sizes and
mlocked for stable latency. - Virtual + physical device emulation under one runtime, with a single CLI surface.
- Linux host with Docker and the NVIDIA Container Toolkit (
--gpus all). - NVIDIA GPU supported by the runtime path (development uses L40S; green contexts require driver 12.5+).
- CUDA toolkit 12.x inside the image (provided by the Dockerfile).
The canonical environment is the Docker image defined in
Dockerfile. Rebuild after any code change.
docker build -t device-emulator:latest .
# or
make docker-buildFor native (non-Docker) development on a configured host, see
docs/DEV_README.md and
docs/troubleshooting.md.
morphling_cmd save --model "facebook/opt-125m" --output <ckpt-path>
morphling_emulator --ckpt_path <ckpt-path>morphling_cmd save converts a HuggingFace model into the emulator's
checkpoint layout. morphling_emulator then starts the proxy backend
server, which loads that checkpoint and listens for device connections
(default 0.0.0.0:39000, override with --listen_ip / --listen_port);
it runs until interrupted with Ctrl-C. Point one or more devices at it
to drive emulated training.
Multi-device deployments (virtual fleet on one host, or physical edge
devices behind an Nginx stream proxy) are documented in
docs/deployment.md.
All tests run inside the Docker image (per CLAUDE.md §1–2).
make docker-test
# or
docker run --rm --gpus all --ulimit memlock=-1 device-emulator:latest \
python3 -m pytest tests -v--ulimit memlock=-1 is required: the proxy server's pinned-buffer pools
and the #55 4 MiB bandwidth probe exceed the default 8 MiB container
memlock budget. See issue #59
and docs/deployment.md.
The image builds all C++ test categories (unit, CUDA/cuBLAS,
XtGemm/worker, zerocopy, benchmarks). See
tests/cpp/README.md for the full catalogue.
We welcome bug reports, feature requests, documentation, and code. Start with
CONTRIBUTING.md — it covers the merge policy,
pre-commit setup, Docker-only test policy, and Angular-style commit format.
docs/paper.md— EdgeSys '26 paper companion (abstract, authors, figure inventory, BibTeX).docs/DEV_README.md— native development notes.docs/DOCKER.md— Docker workflow deep-dive.docs/green-context.md— per-GEMM CUDA green context API.docs/deployment.md— virtual + physical device deployments.docs/troubleshooting.md— common errors and fixes.docs/GEMM_ID_ISSUES.md— performance log formats (VTIME, throughput, PROFILE_DELTA) and thegemm_idfield.docs/EARLIEST_vs_LATEST.md— scheduling policy notes.tests/cpp/README.md— C++ test catalogue.docs/cuda/README.md— offline CUDA Driver/Runtime API reference.
If you use Morphling (DeviceEmulator) in your research, please cite our EdgeSys '26 workshop paper:
@inproceedings{DBLP:conf/mobisys/XueXMBYLM26,
author = {Leyang Xue and
Yufeng Xia and
Eren Mendi and
Ismaeel Bashir and
Jiaxun Yang and
Myungjin Lee and
Mahesh K. Marina},
title = {Morphling: Emulator for Distributed Machine Learning at the Edge},
booktitle = {The 24th Annual International Conference on Mobile Systems,
Applications and Services, MobiSys Workshop '26,
Cambridge, United Kingdom, June 21-25, 2026},
publisher = {{ACM}},
year = {2026},
url = {https://doi.org/10.1145/3812836.3814779},
doi = {10.1145/3812836.3814779}
}A machine-readable citation is also available in
CITATION.cff — GitHub renders a "Cite this repository"
button from it.
Morphling is released under the Apache License 2.0. Bundled
and linked third-party components and their licenses are listed in
THIRD_PARTY_LICENSES.md; see also
NOTICE.
Morphling builds on excellent open-source work: