In the field of machine learning, processing symmetric data has always been a challenge, as traditional models may fail to accurately recognize the same object after rotation or transformation. A new study by MIT researchers proposes an innovative method for machine learning symmetric data processing. The study introduces a machine learning approach that has been proven effective in both computational cost and the amount of data required. This method is the first to ensure the accuracy of models when handling symmetric data.

Symmetric data widely exists in natural sciences and physics, such as molecular structures or changes in object positions in images. Traditional machine learning models may reduce accuracy when facing such data because they cannot recognize their symmetry. The MIT research team explored the trade-off between statistics and computation in machine learning for symmetric data through theoretical evaluation and designed an effective algorithm. The algorithm draws on ideas from algebra and geometry, combining the two into an optimization problem, thereby simplifying the processing of symmetric data. Co-lead author of the study Behrooz Tahmasebi said: "We have now proven that machine learning with symmetric data is feasible." This algorithm not only reduces the number of data samples required for training, but also improves the model's accuracy and its ability to adapt to new applications.
The study not only provides answers to fundamental questions, but also opens up new avenues for developing more powerful machine learning models. These models will be widely applied in fields such as new material discovery, astronomical anomaly identification, and revealing complex climate patterns. Co-first author Ashkan Soleymani added: "Once we better understand the principles of symmetric data processing, we can design more interpretable, more powerful, and more efficient neural network architectures." The study was presented at the International Conference on Machine Learning (ICML 2025) held in Vancouver from July 13 to 19, and has been published on the arXiv preprint server.











