What is AI Safety?

AI Safety is a research field focused on ensuring that advanced artificial intelligence systems remain beneficial, aligned with human values, and under human control as they become more capable. It encompasses technical research areas like alignment, interpretability, and robustness, as well as governance considerations about how AI systems should be developed and deployed.

Why It Matters

As AI systems become more powerful and autonomous, they may develop capabilities that could lead to unintended consequences if not properly designed and controlled. The stakes are high: advanced AI could help solve humanity's greatest challenges, but also poses significant risks if developed without adequate safety measures. The field aims to maximize the benefits while minimizing potential harms.

Key Risks & Challenges

  • Alignment Problem

    Ensuring AI systems pursue goals aligned with human values and intentions, even as they become more capable.

  • Interpretability

    Developing techniques to understand how AI systems make decisions and represent knowledge.

  • Robustness

    Creating systems that behave safely even when deployed in new environments or facing unexpected situations.

  • Power-seeking Behavior

    Preventing AI systems from developing instrumental goals that conflict with human welfare.

  • Coordination Challenges

    Ensuring that safety standards are maintained across all major AI development efforts globally.

Learn More About AI Safety

🔍

Alignment Forum

A forum dedicated to technical research in AI alignment, with papers and discussions from leading researchers.

Technical Visit
💡

LessWrong

A community blog focused on human rationality and the implications of artificial intelligence.

Intermediate Visit
🧠

80,000 Hours

Career guidance for working on the world's most pressing problems, including AI safety.

Introductory Visit
📚

Stampy's Wiki

A collaborative wiki providing accessible explanations of AI alignment concepts.

Introductory Visit

Our Approach

Focus Areas

At BAISH - Buenos Aires AI Safety Hub, we focus on several key areas within AI safety research:

  • Mechanistic interpretability of neural networks
  • Alignment techniques for large language models
  • Robust training methodologies
  • Value learning and preference inference

Our Contribution

We contribute to the field through:

  • Supporting student research projects
  • Developing educational resources in Spanish
  • Building a regional community of AI safety researchers
  • Organizing workshops and training programs
  • Mentoring students interested in AI safety careers

Our Core Team

Eitan Sprejer

Eitan Sprejer

Co-founding Director

Luca De Leo

Luca De Leo

Co-founding Director

Lucas Vitali

Lucas Vitali

Communications Director

Sergio Abriola, PhD

Sergio Abriola, PhD

Advisor