Description
Location: San Francisco, Bay Area, Melo Park
ASAP Talent Services is a leading I.T. executive search firm. Our recruiting team has been retained by a late-stage Pre-IPO SaaS business to help scale their team near Menlo Park. If you'd like to grow with a company that is approaching $50M ARR - a firm that has raised over $500M in PE funding (including Blackrock and others) - we need to talk!
Job Overview
As a Principal Architect, you will lead the design and evolution of intelligent, scalable systems for Multi-Agent Path Finding (MAPF). You’ll define architectural direction, integrate machine learning and reinforcement learning, and build the core capabilities required to achieve safe, efficient, and autonomous decision-making at scale.
Key Responsibilities
- Architect distributed systems to support real-time path planning of multi-agent using Machine Learning.
- Lead design and implementation of ML model pipelines, including data ingestion, training, validation, deployment, and monitoring.
- Own production deployment of ML/RL models using MLOps tools such as Vertex AI or similar platforms.
- Integrate ML pipelines with robotic orchestration engines to support continuous learning and adaptation.
- Collaborate closely with software, robotics, product, and operations teams to align system goals with real-world fulfillment challenges.
Required Qualifications
- Education: B.E/M.S in Computer Science, AI/ML, or a related field.
- Experience: 12+ years of total experience (7+ years of experience in system architecture & AI/ML ; including 2+ years in technical leadership roles).
Technical Expertise
- Strong knowledge of path planning, graph search algorithms, and optimization techniques for multi-agent systems.
- Deep understanding of machine learning, deep learning, and reinforcement learning, with experience using TensorFlow or PyTorch.
- Proven experience in building, deploying, and maintaining ML models in production environments.
- Hands-on experience with MLOps, including CI/CD for models, pipeline orchestration, and model monitoring.
- Proficiency in Python. Familiarity with Erlang, Elixir, or other concurrency-first functional programming languages is good to have.
- Solid understanding of concurrency, parallelism, and real-time systems.
- Strong CS fundamentals including algorithms, operating systems, networking, memory management, and performance tuning.
- Experience with distributed systems, microservices, and containerized deployments using Docker and Kubernetes.
- Knowledge of event-driven architectures
- Familiarity with cloud platforms such as Google Cloud Platform (GCP), and specifically Vertex AI.
- (Nice to Have) - Contributions to open-source projects, research publications, or patents in MAPF, RL, or distributed AI systems.