<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ai | Math to Power Industry</title><link>https://m2pi.ca/keywords/ai/</link><atom:link href="https://m2pi.ca/keywords/ai/index.xml" rel="self" type="application/rss+xml"/><description>Ai</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>© 2025 Pacific Institute for the Mathematical Sciences</copyright><lastBuildDate>Tue, 24 Mar 2026 00:00:00 +0000</lastBuildDate><image><url>https://m2pi.ca/media/logo.svg</url><title>Ai</title><link>https://m2pi.ca/keywords/ai/</link></image><item><title>Hummingbird Bioscience</title><link>https://m2pi.ca/project/2026/hummingbird/</link><pubDate>Tue, 24 Mar 2026 00:00:00 +0000</pubDate><guid>https://m2pi.ca/project/2026/hummingbird/</guid><description>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="" srcset="
/project/2026/hummingbird/HummingBird_hu63188a33346aef26bb8c3c5517f9c1ba_21428_3a3ca834165371f007f8600db3103ad2.webp 400w,
/project/2026/hummingbird/HummingBird_hu63188a33346aef26bb8c3c5517f9c1ba_21428_513567795b78023d07d194ae6f526436.webp 760w,
/project/2026/hummingbird/HummingBird_hu63188a33346aef26bb8c3c5517f9c1ba_21428_1200x1200_fit_q90_h2_lanczos_3.webp 1200w"
src="https://m2pi.ca/project/2026/hummingbird/HummingBird_hu63188a33346aef26bb8c3c5517f9c1ba_21428_3a3ca834165371f007f8600db3103ad2.webp"
width="760"
height="305"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Recently, the quest for mathematical superintelligence has become a focal point in artificial intelligence. For example, Robinhood’s spinoff Harmonic has achieved a valuation exceeding $1 billion with its Aristotle tool. A key reason for this excitement is that mathematical reasoning is fundamentally different from other forms of reasoning: it is, by nature, airtight. It was believed that AI systems would struggle with this domain. However, recent advances suggest otherwise.&lt;/p>
&lt;p>Modern systems are increasingly capable of translating between informal mathematical language (as written in papers) and formal representations suitable for proof assistants such as Lean, Rocq, or Agda. One striking example includes the recent overturning of a long-standing result in an extended quantum field theory, previously cited hundreds of times over more than a decade. However, these systems are demonstrating strong performance primarily on carefully selected, benchmark-style problems or carefully chosen problems. Their behavior outside of these settings remains poorly understood.&lt;/p>
&lt;p>In particular, while they can often verify closed-form results in isolation, they often struggle to correctly represent and validate the dependencies those results rely on. This creates a critical reliability gap: outputs may appear correct locally while being globally inconsistent.&lt;/p>
&lt;p>As an industry partner developing AI systems for mathematical reasoning, we are directly interested in understanding the limits of these auto-formalization tools. Without a systematic understanding of their failure modes, deploying such systems introduces substantial risk. This is made worse as organizations begin making significant financial and strategic decisions based on their outputs.&lt;/p>
&lt;h3 id="project-objective">Project Objective&lt;/h3>
&lt;p>This project will investigate the robustness of auto-formalization systems by
identifying and characterizing their failure modes. Teams will:&lt;/p>
&lt;ul>
&lt;li>Explore how current systems translate informal mathematics into formal
representations&lt;/li>
&lt;li>Identify classes of problems where these systems perform well and where they
fail&lt;/li>
&lt;li>Develop strategies—such as adversarial search or evolutionary (genetic)
methods—to generate mathematical inputs that induce failure&lt;/li>
&lt;li>Analyze and categorize failure modes, with particular attention to dependency
structure and logical consistency&lt;/li>
&lt;/ul>
&lt;h3 id="deliverables">Deliverables&lt;/h3>
&lt;ul>
&lt;li>Challenging mathematical statements on which tools struggle or fail&lt;/li>
&lt;li>A taxonomy of observed failure modes&lt;/li>
&lt;li>Quantitative or qualitative metrics for evaluating system robustness&lt;/li>
&lt;li>Bonus: Recommendations for improving reliability in auto-formalization systems&lt;/li>
&lt;/ul>
&lt;h3 id="why-this-matters">Why This Matters&lt;/h3>
&lt;p>These systems can already produce convincing formal outputs. However, without
understanding when and how they fail—particularly in handling dependencies—their
use in research, verification, and high-stakes applications remains
fundamentally limited. This project aims to make this gap more visible.&lt;/p>
&lt;h3 id="teams-may-consider-approaches-such-as">Teams may consider approaches such as:&lt;/h3>
&lt;ul>
&lt;li>Restricting to a specific domain (e.g., algebraic identities, inequalities,
combinatorics, measure theory, symplectic geometry, etc.)&lt;/li>
&lt;li>Designing perturbations of known theorems to test robustness&lt;/li>
&lt;li>Modeling the search space of candidate statements&lt;/li>
&lt;li>Using adversarial or evolutionary methods to discover failure cases&lt;/li>
&lt;/ul>
&lt;p>Further, we will assist teams in getting bootstrapped to experimentation with
both closed and open-source LLMs and auto-formalization tools, and help set up
tooling for advanced search methods such as adversarial or genetic approaches.&lt;/p></description></item></channel></rss>