arXiv's Historic Migration: Leading Scientific Repository Moves to Google Cloud

· 1 min read

article picture

The world's leading open-access platform for scientific preprints, arXiv, is embarking on a significant technological transformation by moving its entire infrastructure from Cornell University servers to Google Cloud Platform (GCP).

This transition is part of a broader modernization initiative called "arXiv CE" (Cloud Edition), aimed at enhancing the platform's performance and reliability. With over 2.6 million papers and approximately five million monthly users, arXiv has outgrown its current technical infrastructure.

The migration comes at a time when the platform faces increasing demands and technical challenges. The current system, which has served the scientific community since 1991, relies on aging code components written in Perl and PHP. The move to GCP will enable arXiv to standardize its backend using Python and implement modern containerization technologies.

Key improvements planned include:

  • Fully asynchronous article processing
  • Enhanced monitoring and logging capabilities
  • Automated code deployment systems
  • Improved scalability to handle growing user demands

The project has garnered support from the Simons Foundation and strategic guidance from Invest in Open Infrastructure. However, the announcement has sparked discussions within the technical community, with some users expressing concerns about potential increased operational costs and vendor lock-in.

arXiv's decision aligns with its goals to expand into new subject areas, improve metadata collection, and enhance accessibility for its global research community. The platform, which started as a simple system running on a NeXT machine, has evolved into an indispensable resource for researchers, particularly in physics and mathematics.

The timeline for the complete transition to Google Cloud Platform has not been publicly announced, but the project represents a major step in arXiv's evolution as it adapts to meet the growing needs of the scientific community.