SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystalline Symmetry Classification Benchmark
Published in ICLR, 2025
In this paper, we developed the largest open-source simulated X-ray diffraction database (SimXRD). SimXRD comprises 4,065,346 simulated powder XRD patterns, representing 119,569 unique crystal structures under 33 simulated conditions that reflect real-world variations. We benchmark 21 sequence models in both in-library and out-of-library scenarios and analyze the impact of class imbalance in long-tailed crystal label distributions. Remarkably, we find that: (1) current neural networks struggle with classifying low-frequency crystals, particularly in out-of-library situations; (2) models trained on SimXRD can generalize to real experimental data.
Recommended citation: Cao, bin and Liu, Yang and Zheng, Zinan and Tan, Ruifeng and Li, Jia and Zhang, Tong-Yi. "SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystalline Symmetry Classification Benchmark", ICLR 2025.
Download Paper