Staff Software Engineer (Full-time)2015 - 2018Lyft, Inc.
Technologies: Hadoop, Apache Flink, Apache Kafka, AWS Cloud Architecture, AWS DynamoDB, AWS Kinesis, AWS CloudWatch, Redshift, AWS S3, Amazon SQS, AWS Lambda, AWS EC2, Python
- Worked as the tech lead and architect on streaming platform team; also drove the vision and strategy.
- Built the real-time events ingestion and pub/sub infrastructure for Lyft that ingests/moves more than 200 billion events every day.
- Developed the highly scalable and reliable message bus at Lyft which is used by hundreds of internal micro-services to asynchronously communicate with each other.
- Maintained multiple tier-0 services with five nines of reliability guarantees/SLA.
- Trained and mentored dozens of other engineers.
Principal Member of Technical Staff (full-time)2014 - 2015Salesforce.com
Technologies: Apache Lucene, Apache Solr, Java
- Developed several relevancy features which involved customizing Apache Lucene’s scoring framework for Salesforce’s needs.
- Implemented infrastructure work to enable runtime feature extraction for the training of an ML-based ranker and its integration into an Apache Solr’s query processing pipeline.
- Designed the search infrastructure to scale out Salesforce search’s static rank feature to 100% documents (currently only partially enabled due to infrastructure limitations).
Senior Software Engineer (Full-time)2005 - 2014Microsoft (Bing Search)
Technologies: Machine Learning, Apache Hive, Hadoop, Microsoft SQL Server, C#, .NET
- Led a team of engineers to develop scalable infrastructure for a distributed web crawler and content extraction platform—enableing it to crawl hundreds of millions of web documents every day from hundreds of websites (like Amazon.com, Imdb.com, Walmart.com) and parse them to extract structured content for enriching Bing’s search index.
- Received a Microsoft Gold Star Award for the above project.
- Developed a log mining platform to enrich a local search index; enabled it to algorithmically discover/mine URLs and search keywords, associated with local businesses (restaurants, hotels, banks, etc.), by mining search results click logs (petabytes of data). The platform is being used in more than 20 Bing markets to enrich the local search index and cut down the URL coverage gap with Google.
- Worked both as the technical lead and in the IC capacities to enhance and evolve a machine learning-based text classification framework (originally conceived by Microsoft Research) into a classification platform and integrate it with local data pipeline.
- Developed a process to train, evaluate and consume statistical models which classify hundreds of millions of local businesses around the world into a taxonomy of more than 1,000 categories; for the above project.
- Managed (from a tech-lead standpoint) the day-to-day maintenance and operations of a local data ingestion/processing pipeline that feeds into the index of Bing local search engine.
- Worked on back-end data acquisition/processing pipeline for Bing Entertainment search (music, movies, TV shows, and more).
Professional Services Consultant (Full-time)2005 - 2006Teradata Corporation
Technologies: Teradata, Java, SQL
- Developed automated ETL framework, for DHL (a Teradata customer) in order for it to ingest data from multiple heterogeneous sources and integrate into an Enterprise data warehouse.
- Led a team of four developers on Eircom Metadata-driven ETL Tool project which was meant to develop generic parsing and transformation engines for data extraction from more than 50 different semi-structured CDR formats. (Eircom is Ireland’s leading telecommunication operator).
- Conducted Teradata trainings and data warehouse workshops for new hires.