Data Pipeline

Orion's ETL orchestrates the data collection, enrichment and analysis of scientific documents. It retrieves documents from Microsoft Academic Graph, enriches them with third-party APIs and creates science of science indicators. Orion produces document embeddings that are used its search engine and which you could use in other downstream tasks.

Orion  ETL diagram

Orion's ETL is based on Airflow, a platform to programmatically author, schedule and monitor workflows.

Orion  DAG