Do you have something cool to share? Some questions? Let us know:
The discussion centered on the emergence of AI in offensive security, specifically through the lens of RunSybil, a company developing AI agents designed to automate "hacker intuition." Ari Herbert-Voss, formerly an early scientist at OpenAI, positioned his technology not as a total replacement for human security professionals, but as a distinct bifurcation of capabilities: replacing routine penetration testing while augmenting mature red teams. The central thesis is that AI’s primary advantage in offensive operations is not necessarily surpassing supreme human ingenuity, but rather achieving unparalleled coverage and speed that human teams cannot physically match.
Core Value Proposition: Augmentation vs. Replacement
A critical distinction made early in the conversation was RunSybil's positioning in the market. Herbert-Voss argued that AI is best suited to replace standard penetration testing but should only serve to augment red teaming.
To illustrate this, he utilized an analogy:
Penetration Testing (The Pirates): Analogous to hiring pirates to see if they can steal your jewels. They test if doors are locked, try to drill through floors, and use brute force to find valuables.
Red Teaming (The Ninjas): Analogous to hiring ninjas to test if specific laser detection systems are working correctly. This is a mature function usually reserved for organizations that already have robust "blue teams" (defensive teams).
RunSybil views its AI agents as a replacement for the "pirate" work—the slow, monotonous, spreadsheet-based coverage testing that human hackers generally dislike. By automating this, they allow human red teams (the ninjas) to focus on high-creativity tasks rather than routine validation of authentication across every user role.
Technical Focus: The Prevalence of Auth Bugs
The dialogue highlighted RunSybil’s specific strength in discovering Authentication and Authorization (Auth) vulnerabilities, such as Insecure Direct Object References (IDOR). Herbert-Voss noted that while these are often the most critical severity bugs on bounty platforms, they are historically difficult for traditional scanners to find because they require understanding the "state" and context of an application.
While Large Language Models (LLMs) traditionally struggle with maintaining state, RunSybil has focused its engineering on allowing the agents to understand the context in which an asset exists (e.g., recognizing that a specific URL pattern implies a user should not have access to that asset). They prioritize these over older vulnerability classes like buffer overflows, which are less relevant to their modern, web-heavy customer base.
The Operational Impact: Remediation and Purple Teaming
The conversation touched on "Purple Teaming" (the integration of red and blue team feedback loops). RunSybil aims to close the loop on vulnerabilities quickly by handling the operational load of prioritization and providing clear reproduction steps.
Herbert-Voss noted a nuanced approach to remediation. Rather than offering prescriptive, automated fixes (which customers often reject due to lack of control), the AI provides context-dependent suggestions, allowing engineering teams to choose the best path forward. This approach positions the AI as a collaborative tool rather than just a vulnerability hose.
Competitive Advantage: Coverage as a Differentiator
Anton Chuvakin initially expressed skepticism regarding "yet another AI pen testing" tool. The turning point in his assessment came when discussing coverage. Herbert-Voss argued that the primary benefit of AI over humans is that humans cannot compete with AI on sheer breadth of coverage.
While a human red teamer might be highly skilled, they are time-constrained. AI can persistently test every facet of an attack surface 24/7. Herbert-Voss shared a case study involving a large AI infrastructure company with a mature bug bounty program. RunSybil’s agent discovered a Remote Code Execution (RCE) vulnerability that had been exposed to the internet for years, missed by human researchers. This validated the premise that AI’s persistence and thoroughness can uncover high-value bugs that humans simply miss due to fatigue or lack of bandwidth.
Future Outlook and Industry Advice
Looking forward, Herbert-Voss anticipates AI speed increasing, making it possible to conduct security reviews that previously took a month in just a week. This speed enables "pressure relief" for product security teams, allowing them to keep up with rapid development cycles without becoming a blocker.
In closing, Herbert-Voss advised organizations to only invest in red teaming (AI or human) after establishing a solid blue team baseline. There is zero value in paying expensive offensive talent to find vulnerabilities the organization already knows about but hasn't had time to fix.
Conversation Timeline
Introduction of AI in Security
Initial debate on whether AI should teach machines to hack humans.
Comparison of AI risks to historical technological advancements (e.g., transistor calculators).
The "Catch-22" of AI optimism versus pessimism regarding existential risks.
Defining RunSybil’s Mission
Introduction of Ari Herbert-Voss and his background at OpenAI.
RunSybil defined: Automating hacker intuition through three phases (Discovery, Testing, Remediation).
Differentiating Offensive Roles (Pirates vs. Ninjas)
Distinction drawn between Pen Testing (Pirates/broad value finding) and Red Teaming (Ninjas/specific control testing).
RunSybil positioned as a replacement for Pen Testing and an augmentation tool for Red Teaming.
Discussion on hacker psychology: preference for creative work over monotonous coverage tasks.
Technical Deep Dive: Auth Vulnerabilities
Focus on why AI excels at finding complex Authentication/Authorization bugs (e.g., IDOR) compared to traditional scanners.
Limitations of LLMs regarding "state" and how RunSybil addresses this for modern web assets.
Shift in focus away from legacy bugs (memory overflows) toward modern infrastructure issues.
Operationalizing Findings (Blue/Purple Teaming)
The economics and rarity of true "Purple Teams" in the industry.
RunSybil’s approach to remediation: offering reproductive steps and flexible suggestions rather than prescriptive automated fixes.
The Strategic Value of AI: Speed and Coverage
The qualitative difference AI speed brings to development cycles (enabling reviews that were previously impossible due to time).
Key Customer Use Case: Pressure relief for overwhelmed product security teams.
Coverage identified as the primary unassailable advantage of AI over human testers.
Case Study Success
Example of RunSybil finding a long-standing RCE in a mature infrastructure that human bug bounty hunters missed for years.
Closing Thoughts
Advice on red team budgeting: Do not hire red teams until the blue team is mature enough to handle the findings.
Recommended reading: Countdown to Zero Day by Kim Zetter and Empires of AI.