Ziqi Ye1, 2, ∗, Shuran Ma3, ∗, Jie Yang2, Xiaoyi Yang1, Yi Yang1, Ziyang Gong4, Xue Yang4, †, ‡, Haipeng Wang1, †
1 Fudan University, 2 Shanghai Innovation Institute, 3 Xidian University, 4 Shanghai Jiao Tong University
∗ Equal Contribution, † Corresponding Author, ‡ Project Lead
- We summarize common failure modes in the generation of remote sensing images, including control leakage, structural distortion, dense generation collapse, and feature-level mismatch. In these four aspects, OF-Diff performs excellently.
- Comparison of OF-Diff with Mainstream Methods.
- An Overview of OF-Diff.
- Comparison of the Generation Results of OF-Diff with Other Methods.
- Diversity Results and Style Preference Results
- Quantitative Comparison with Other Methods on DIOR and DOTA.
- **Trainability Comparison Results, and the Results on Unknown Layout Dataset during Training **
- t-SNE Visualization of different generation image features.
conda env create -f environment.yaml
conda activate ofdiff2.1 Datasets and structure
You need to download the datasets. Taking DIOR as an example, the dataset needs to be processed (see the data_process.md) to form the following format.
DIOR-R-train
├── images
│ ├── 00001.jpg
| ├── ...
| ├── 05862.jpg
├── labels
| ├── 00001.jpg
| ├── ...
| ├── 05862.jpg
├── prompt.json
2.2 weights
Initialize the ControlNet model using the pretrained UNet encoder weights obtained from Stable Diffusion, and subsequently merge these weights with the Stable Diffusion model weights, saving the result as ./model/control_sd15_ini.ckpt. More pre-trained weights will be updated to Hugging Face in the future.
python ./tools/add_control.pypython train.pypython ./tools/merge_weights.py ./path/to/checkpoints
python inference.py- Release the paper on arXiv.
- Release the initial code.
- Release the complete code.
- Release the model and weights on Hugging Face.
- Release synthetic images by OF-Diff.
If you have any questions about this paper or code, feel free to email me at ye.ziqi19@foxmail.com. This ensures I can promptly notice and respond! Thank you for your support, understanding, and patience regarding this work.
Our work is based on Stable Diffusion, ControlNet, RemoteSAM, we appreciate their outstanding contributions. In addition, we are also extremely grateful to AeroGen and CC-Diff for their outstanding contributions in the field of remote sensing image generation. It is their excellent experiments that have promoted the development of this field.
@misc{ye2025objectfidelitydiffusionremote,
title={Object Fidelity Diffusion for Remote Sensing Image Generation},
author={Ziqi Ye and Shuran Ma and Jie Yang and Xiaoyi Yang and Ziyang Gong and Xue Yang and Haipeng Wang},
year={2025},
eprint={2508.10801},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.10801},
}










