For long-text queries, Orion finds semantically similar publications by making vector-based comparisons at scale. Here is how it works:
- Infer a document vector for every paper abstract in the database using a sentence-level DistilBERT model.
- Build a Faiss index with the document vectors.
- For every new, long-text query:
- Infer its vector representation with the same sentence-level DistilBERT model.
- Use the pre-built Faiss index for similarity matching and return the most relevant results.