Enterprise AI Delivery: Built custom evaluation pipelines and model testing infrastructure for frontier AI labs, collaborating directly with product teams from scoping through deployment.
Mario Brajkovski
AI Research Engineer at HUD (Founding Team)
San Francisco / Munich · Remote
I build custom AI solutions for frontier labs, working directly with product and engineering teams to deliver production systems on tight timelines. Currently at HUD building evaluation infrastructure for computer-use agents. I've shipped full-stack AI products and delivered bespoke LLM evaluation pipelines for major AI companies.
Technical Highlights
Full-Stack Production: End-to-end AI applications with Next.js, TypeScript, PostgreSQL, and Stripe—320+ active users, sub-second latency, real-time WebRTC streaming.
Client Collaboration: Track record of on-time delivery working embedded with client engineering and PM teams on complex, deadline-driven AI projects.
RL & Evaluation Systems: Designed full-scale reinforcement learning environments for next-generation AI models, including reward hacking analysis and prevention.
Safety & Red-teaming: Identifying failure modes, alignment issues, and adversarial vulnerabilities in autonomous agents through systematic red-teaming.
Now
Founding Engineer. Building evaluation infrastructure and RL environments for computer-use agents. Working embedded with frontier AI labs to deliver custom evaluation systems—scoping requirements directly with product managers, iterating on solutions, and shipping production-ready tooling on tight deadlines. YC-backed.
Client Engagements
Frontier Lab (2025): Designed and delivered custom LLM evaluation infrastructure for an unreleased model. Worked directly with product managers from requirements gathering through deployment. Delivered on schedule.
Frontier Lab (2025): Built full-scale RL environments for next-generation AI model training. Designed reward hacking analysis and prevention systems, red-teaming infrastructure, and alignment testing frameworks.
Projects
AI tutoring platform for the Macedonian curriculum serving 320+ students. Full-stack TypeScript with Next.js, PostgreSQL, Stripe integration, and custom LLM prompting for curriculum-aligned responses.
Voice-first AI companion with sub-second end-to-end latency. Custom WebRTC implementation with SDP negotiation, VAD, and persistent memory system. Deployed on Firebase with real-time audio streaming.
Real-time tutoring with Live2D avatars and OpenAI Realtime API. Custom WebRTC audio session management, dynamic tool injection, and emotion-mapped avatar expressions. TypeScript throughout with Terraform IaC.
Before
DevOps. On-prem to AWS EKS migration, Crossplane.
Site Reliability. Kafka, Vault, observability stack.
Research
Demonstrated practical privacy-preserving ML using AMD SEV-SNP encrypted VMs with full memory encryption and attestation. Achieved <20% performance overhead on 110M-param BERT, showing viable path for confidential AI in production environments.