Antonio Osorio
Verified Expert in Engineering
Engineering Management Developer
As a tech leader at Netflix, Amplitude, and Schrodinger, Antonio has driven key projects enhancing data processing, developer tools, and cloud infrastructure. His work includes revolutionizing Netflix's event handling, initiating Kubernetes at Amplitude, and scaling scientific simulations at Schrodinger. Focused on innovation, efficiency, and scalability, Antonio empowers teams to achieve technological excellence.
Portfolio
Experience
Availability
Preferred Environment
Python, Java, Apache Kafka, Big Data, Continuous Integration (CI), Spring Boot, Go
The most amazing...
...system I've contributed to is Netflix's event telemetry ingestion pipeline. This system handles 12 million events and 100GB of event data per second.
Work Experience
Senior Software Engineer
Netflix
- Handled Netflix's event telemetry ingestion pipeline. This system runs 12 million events and 100GB of event data per second.
- Took charge of migrating the playback-related event telemetry from a legacy service based on synchronous gRPC calls to a new service based on asynchronous Kafka messages.
- Created the GraphQL back end that exposed developer tooling to our federated graph edge. The new back end powers the "console" as a single pane of glass and entry point for all developer tools at Netflix.
- Revamped the internal tool used to create new software projects. The tool generates projects that follow the "paved path" best practices, including continuous integration and deployment, observability, logging, and production readiness.
Head of Cloud Engineering
Amplitude
- Led the cloud engineering team. The team's key responsibilities included increasing developer efficiency and looking after the scalability, stability, and security of our production infrastructure.
- Scaled the team from one to three full-time engineers and two contractors.
- Refactored Terraform (Infrastructure as Code) code to increase usability by developer teams, modularity, and reduction of blast radius for unintended changes.
- Introduced a log aggregation tool (Datadog) and drove developer education and usability, resulting in log aggregation becoming an integral part of our monitoring strategy and incident response procedure.
- Introduced self-serve developer staging environments, where developers could deploy development versions of our application to test and share before merging into master.
- Introduced Kubernetes as a container orchestrator; our first production service was launched in March 2019. Drove the operationalization, monitoring, and stability strategy for Kubernetes deployments.
- Contributed to the ingestion system, which handles around 200,000 events per second, and our query engine, which runs 2PB of data in under 12 seconds (P95).
- Introduced the concept of SLOs and drove the adoption of SLAs, SLOs, and error budgets as a common language to balance development velocity with stability.
- Negotiated contracts with Threat Stack and Datadog, which resulted in a 40% and 30% reduction in estimated costs, respectively, while keeping full functionality.
- Led the technical aspects of getting SOC2 certified and becoming an AWS Competency Partner.
Senior Software Engineer
Amplitude
- Utilized Spinnaker to improve our delivery strategy and replace custom scripts orchestrating Salt deployments.
- Introduced Vault for secret management, developed Python and Java clients, and migrated key services.
- Implemented automated testing for pull requests, so test breakages could be detected before merging code into master.
- Created a framework for running tests in isolated environments using Docker Compose. Previously, tests were run using external resources, resulting in flaky tests and tests that had to be run serially.
Senior Software Engineer
Schrodinger, Inc.
- Led the development of a scalable remote execution server, TaskEngine, used by our flagship LiveDesign and FEP products to run scientific simulations asynchronously (Django and Celery-based).
- Developed a cloud-agnostic deployment tool, Spinner. Enabled the configuration and deployment of full LiveDesign stacks in less than 10 minutes.
- Led the development of a data analysis and configuration tool, LD Admin. This tool will be used to perform advanced configuration and to get usage statistics of LiveDesign servers.
- Maintained and supported continuous integration, testing, and deployment infrastructure.
- Tested, packaged, published, and deployed enterprise Python products.
- Developed cookbooks to automate deployment tasks using Chef.
Office of Technology Transfer - Fellow
University of Michigan
- Reviewed technologies presented for evaluation and determined potential applications and markets.
- Identified technological and legal challenges for the commercialization of these new technologies.
- Did work study during the doctoral program at the University of Michigan Materials Science and Engineering Department.
Graduate Research Assistant
University of Michigan
- Implemented statistical mechanics-based computational methods to simulate systems far from equilibrium.
- Designed and developed simulation software for high-performance computers (MPI and OpenMP) and general-purpose graphics processing units (GPU, CUDA).
- Designed the user interface for our simulation software, HOOMD, using C++ and Python.
Storage Support Senior Analyst
Dell, Inc.
- Troubleshot, deployed, and validated advanced EMC CX-series solutions on switched fabrics and AX-series on fiber channel and iSCSI configurations.
- Rectified storage configurations, EMC, and PowerVault solutions on SAN and NAS environments.
- Troubleshot and deployed Windows and Linux-based PowerEdge servers.
- Resolved network-related issues in PowerConnect switches.
Experience
Amplitude's DevOps Journey
https://www.youtube.com/watch?v=ID5Qtk6TTSwThis is a talk about the lessons we learned building a healthy DevOps culture, the role of the DevOps team, the tools we lean on, and the challenges ahead. It was a snapshot of the evolution of software development at Amplitude during my tenure as head of cloud engineering.
LiveDesign
https://www.schrodinger.com/livedesignA project team can generate new ideas far more quickly than it can record and analyze each idea; also, great ideas can happen anytime, not just during regularly scheduled meetings. Every dropped idea is a missed opportunity to find a path forward through the complex maze of drug discovery.
LiveDesign allows every idea to be captured, shared, analyzed, and prioritized—leading to a fuller exploration of the chemical space while facilitating better communication across the different functional groups. Any project team member can enter an idea and instantly get feedback using computational models to help further refine the design—resulting in real-time collaboration across time and location barriers.
My primary responsibility was designing, developing, and maintaining the infrastructure that runs the scientific simulations to inform design decisions.
FEP+ on the Cloud
https://www.schrodinger.com/science-articles/free-energy-methods-fepSeeing this unmet need, we embarked on a multiyear research project to develop a new free energy calculation technology (FEP+). Our objective was to provide a rigorous approach for computing binding free energies that offers significant value to industrial drug discovery efforts. We are pleased to report that, after utilizing the FEP+ technology on seven different active drug discovery collaborations over the past year, we now have firm evidence that the free energy approach developed in FEP+ can facilitate better synthesis decisions during lead optimization.
My responsibilities were to do the initial, on-cloud proof of concept by automatically scaling FEP+ simulations in AWS while minimizing infrastructure costs.
Skills
Languages
Python, Java, Go, GraphQL, C++, SQL, JavaScript
Frameworks
Spring Boot, Django, Django REST Framework
Paradigms
DevOps, Test-driven Development (TDD), Agile Software Development, Continuous Integration (CI), Continuous Deployment
Platforms
Amazon EC2, Spinnaker, Linux, Amazon, Apache Kafka, Docker, MacOS, Amazon Web Services (AWS), NVIDIA CUDA, AWS IoT, Kubernetes
Other
CI/CD Pipelines, Stream Processing, Technology Transfer, Networking, Technical Support, Productivity, Engineering Management, Infrastructure as Code (IaC), Contract Negotiation, Big Data, Site Reliability Engineering (SRE), Materials Science, GPU Computing, Statistical Analysis, Physics, Simulations, Markov Chain Monte Carlo (MCMC) Algorithms, Medical Devices, Electronics, Microprocessors, Physics Simulations
Tools
Chef, Jira, PyCharm, Dell EMC, Amazon Virtual Private Cloud (VPC), Celery, Jenkins, RabbitMQ, Boto, GitHub, Git, Terraform
Storage
Data Pipelines, Storage Area Networks (SAN), MongoDB, PostgreSQL, Amazon S3 (AWS S3), Datadog
Libraries/APIs
MPI, OpenMP, Tastypie
Education
Ph.D. in Materials Science and Engineering, Computational Emphasis
University of Michigan - Ann Arbor, Michigan, USA
Bachelor of Science in Electrical Engineering
University of Texas - Austin, Texas, USA
Certifications
AWS Certified Solutions Architect
Amazon Web Services
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring