Information | Neurvance

Licensing

Licenses we filter for

Our pipeline is strict by design: we gather and publish only CC0 and public-domain data. That means your team can train and deploy without stepping into rights uncertainty.

RAG system

Retrieval without copyright risk

Our RAG layer filters outputs to deliver only public-domain and CC0 material. Models can call internet-scale context while keeping the response path aligned with clean licensing.

Bundles

Keyword bundles with low-friction access

Data is organized into focused bundles by keyword. Most bundles cost one credit, so it is practical to combine several in a single workflow. Bundles are continuously updated as source coverage improves.

1 credit typical Keyword sorted Always updated

What data?

Language strength plus STEM understanding

Text quality

We extract high-purity text from books, articles, and other long-form sources so models learn strong language behavior and clearer responses.

STEM depth

We pair language data with STEM content because future-quality models must not only speak well, but also understand the world they describe.

Synthetic expansion

Equation-driven augmentation expands useful coverage while preserving quality constraints and source-traceability principles.

Result

A clean, legally safe, and constantly improving data foundation for model training, evaluation, and retrieval.