Niyeo Logo

niyeo-n1-preview

System for training self-learning agents through natural language instructions.

During this preview, we evaluate the performance of agents trained with n1-preview together with our academic partners and public researchers.

Capabilities:

Known Limitations

The known limitations are still being actively investigated during the preview. We study its behaviours, capabilities and limitations to come up with a detailed report.

Use Cases

niyeo-n1-preview is intended for application in commercial applications and academic research. Any application that violates applicable laws or regulations (including trade compliance laws) is out of scope.

Safety

n1 is a foundational technology designed for a large diversity of use cases and audiences. Through public testing and collaboration with academic partners we are actively working on alignment, safety and reducing a industry standard practice set of harms. Methods we are exploring include:

Responsible deployment of n1-preview is possible on the NiYEO platform, where agents operate in an environment with active safety monitoring.

Safety ->

Red Teaming

We run red teaming exercises to find risks and improve safety. We work with experts in security, content moderation, and responsible AI to understand real-world harms. We also assess if self-learning agents could help bad actors plan CBRNE attacks. We refine our approach with community feedback to stay ahead.

Some tests we are conducting in controlled environments, include:

Preliminary Benchmarks

Evaluation of n1-preview against industry-standard benchmarks is currently in progress, results in the table are not final.

Category Benchmark Metric niyeo-n1-preview
MMLU Pro (CoT) macro_avg/acc ~69
General MMLU (CoT) macro_avg/acc ~80
Reasoning GPQA Diamond (CoT) acc ~50
Code HumanEval pass@1 ~89
Steerability IFEval ~93
MBPP EvalPlus (base) pass@1 ~86
Math MATH (CoT) sympy_intersection_score ~70
Tool Use BFCL v2 overall_ast_summary/macro_avg/valid ~78
Multilingual MGSM em ~92

Ethical Considerations

At Niyeo, our core values are accessibility, helpfulness, and alignment. Everything we do is driven by a commitment to serving humanity and the planet. Our mission is to make AI accessible to people everywhere, empowering innovation for everyone.

As with any new technology, there are risks associated with AI. While we conduct thorough testing, it’s impossible to account for every scenario. The outputs of the n1-preview, like all AI, cannot be fully predicted. Before deploying agents using n1-preview, we strongly encourage developers to test and tune their agent for specific use cases.