🧠

Mech Interp Course

Starts end of April 2025

The Mechanistic Interpretability Course is an intensive 1-month program focused on techniques for understanding the internal mechanisms of neural networks. This course combines theoretical learning with hands-on projects.

Mechanistic interpretability is a key area of AI safety research, aiming to make AI systems more transparent and understandable.

Curriculum Overview

  • Foundations of neural network architectures
  • Feature visualization techniques
  • Attribution methods for understanding network decisions
  • Advanced case studies from recent literature

Time Commitment

  • 2 lectures per week (2 hours each)
  • 1 practical session per week (3 hours)
  • Individual project work (5-10 hours per week)
  • Final project presentation

Prerequisites

  • Strong programming skills (Python)
  • Experience with deep learning frameworks (PyTorch preferred)
  • Familiarity with basic neural network architectures
  • Linear algebra and calculus

View Detailed Course Information

Course Details

  • Duration: 4 weeks
  • Start Date: June 2, 2025
  • End Date: June 27, 2025
  • Application Deadline: May 15, 2025
  • Location: Hybrid (In-person & Zoom)
  • Instructors: Dr. Laura Fernandez, Carlos Mendez
Express Interest
📚

AGI Safety Fundamentals Cohort

Currently Active

The AGI Safety Fundamentals cohort is an 8-week guided course covering the essential concepts in AI alignment and safety. Participants read selected materials and meet weekly to discuss the readings with a facilitator.

This program is based on the AGI Safety Fundamentals curriculum by BlueDot and provides a structured introduction to the field of AI safety.

What to Expect

  • Weekly 2-hour discussion sessions
  • 1-3 hours of reading per week
  • Small groups of 6-12 participants
  • Experienced facilitators to guide discussions
  • Certificate of completion

Program Details

  • Duration: 10-12 weeks
  • Fellowship Period: August - December 2025
View Curriculum
💬

Weekly Discussion Group

Every Tuesday @ 18:00

Our Weekly Discussion Group provides a casual forum for discussing recent papers, concepts, and developments in AI safety. These sessions are open to anyone interested in the field, regardless of prior knowledge.

Each week features a different topic, announced in advance through our mailing list and Telegram group.

Format

  • 90-minute discussions led by a rotating facilitator
  • Short presentation of the week's topic (15-20 minutes)
  • Open discussion and Q&A
  • Optional pre-reading materials shared in advance

Participation

No registration is required. Simply show up! If you're attending for the first time, we recommend arriving 10 minutes early to meet the organizers.

Next Discussion

  • Date: March 25, 2025
  • Time: 18:00 - 19:30
  • Location: Pabellon 0+inf, Room 1604, Ciudad Universitaria
  • Topic: Interpretability Methods
  • Facilitator: Eitan Sprejer
Join Telegram for Updates
📝

Paper Reading Club

Every Friday @ 17:00

The Paper Reading Club conducts deep dives into foundational and recent papers in AI safety research. Unlike the more casual discussion group, this activity involves a thorough examination of specific research papers.

Participants are expected to read the selected paper in advance and come prepared to discuss its methods, results, and implications.

Paper Selection Criteria

  • Importance to the field of AI safety
  • Technical relevance to current research directions
  • Mix of foundational papers and recent publications
  • Accessibility to graduate and advanced undergraduate students

Session Format

  • Brief overview of the paper (5-10 minutes)
  • Section-by-section discussion
  • Examination of methods and results
  • Critical evaluation of claims and limitations
  • Discussion of potential follow-up research

Next Paper Session

  • Date: March 21, 2025
  • Time: 17:00 - 18:30
  • Location: Pabellon 0+inf, Room 1604, Ciudad Universitaria
  • Paper: "Mechanistic Interpretability for Language Models"
  • Discussion Lead: Eitan Sprejer
Access Reading List