Verified Expert in Engineering
Web Scraping Developer
Hafiz is a seasoned software architect who's lead complex software projects for last 12 years at organizations like Bing (Microsoft), Lyft, and Salesforce.com in full-time roles—now, he's pursuing a freelancing career. His areas of expertise are back-end/server development, databases, big data, cloud computing, DevOps, web crawling, and search engines.
Git, Linux, MacOS
The most amazing...
...thing I've built was a real-time streaming data pipeline at Lyft. I built a web crawler to scrape 1 billion pages every day at Bing.com.
Staff Software Engineer (Full-time)
- Worked as the tech lead and architect on streaming platform team; also drove the vision and strategy.
- Built the real-time events ingestion and pub/sub infrastructure for Lyft that ingests/moves more than 200 billion events every day.
- Developed the highly scalable and reliable message bus at Lyft which is used by hundreds of internal micro-services to asynchronously communicate with each other.
- Maintained multiple tier-0 services with five nines of reliability guarantees/SLA.
- Trained and mentored dozens of other engineers.
Principal Member of Technical Staff (Full-time)
- Developed several relevancy features which involved customizing Apache Lucene’s scoring framework for Salesforce’s needs.
- Implemented infrastructure work to enable runtime feature extraction for the training of an ML-based ranker and its integration into an Apache Solr’s query processing pipeline.
- Designed the search infrastructure to scale out Salesforce search’s static rank feature to 100% documents (currently only partially enabled due to infrastructure limitations).
Senior Software Engineer (Full-time)
Microsoft (Bing Search)
- Led a team of engineers to develop scalable infrastructure for a distributed web crawler and content extraction platform—enableing it to crawl hundreds of millions of web documents every day from hundreds of websites (like Amazon.com, Imdb.com, Walmart.com) and parse them to extract structured content for enriching Bing’s search index.
- Received a Microsoft Gold Star Award for the above project.
- Developed a log mining platform to enrich a local search index; enabled it to algorithmically discover/mine URLs and search keywords, associated with local businesses (restaurants, hotels, banks, etc.), by mining search results click logs (petabytes of data). The platform is being used in more than 20 Bing markets to enrich the local search index and cut down the URL coverage gap with Google.
- Worked both as the technical lead and in the IC capacities to enhance and evolve a machine learning-based text classification framework (originally conceived by Microsoft Research) into a classification platform and integrate it with local data pipeline.
- Developed a process to train, evaluate and consume statistical models which classify hundreds of millions of local businesses around the world into a taxonomy of more than 1,000 categories; for the above project.
- Managed (from a tech-lead standpoint) the day-to-day maintenance and operations of a local data ingestion/processing pipeline that feeds into the index of Bing local search engine.
- Worked on back-end data acquisition/processing pipeline for Bing Entertainment search (music, movies, TV shows, and more).
Professional Services Consultant (Full-time)
- Developed automated ETL framework, for DHL (a Teradata customer) in order for it to ingest data from multiple heterogeneous sources and integrate into an Enterprise data warehouse.
- Led a team of four developers on Eircom Metadata-driven ETL Tool project which was meant to develop generic parsing and transformation engines for data extraction from more than 50 different semi-structured CDR formats. (Eircom is Ireland’s leading telecommunication operator).
- Conducted Teradata trainings and data warehouse workshops for new hires.
Microsoft (Bing Search)
Technologies: C#/.NET, Microsoft SQL Server, Hadoop/Hive, Machine Learning
Hadoop, Scrapy, Flask, .NET, Django
Amazon Simple Queue Service (SQS), Amazon CloudWatch, Zapier, Apache Solr, Git, Flink
DevOps, ETL, Agile Software Development
AWS Lambda, Amazon EC2, Amazon Web Services (AWS), Apache Kafka, Apache Flink, MacOS, Linux
Amazon DynamoDB, PostgreSQL, Redshift, Amazon S3 (AWS S3), Databases, Teradata, SQL Server 2010, Apache Hive, Elasticsearch, Microsoft SQL Server
Data Warehouse Design, Web Scraping, Data Warehousing, Amazon Kinesis Data Firehose, Big Data, Amazon Kinesis, Big Data Architecture, Stream Processing, Large Scale Distributed Systems, Pub/Sub, Machine Learning, Search Engine Development, Information Retrieval, Data Modeling, Text Classification, AWS Cloud Architecture
Master's Degree in Computer Science and Engineering
University of Washington - Seattle, WA, USA
Bachelor's Degree in Computer Science
FAST | National University of Computer and Emerging Sciences - Islamabad, Pakistan
Teradata Certified Master
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.Start hiring