Data Science Consultant2019 - PRESENTBrookings Institution India Center
Technologies: Python, Pandas, AWS Redshift, AWS Lambda, Scikit-Learn, AWS S3, AWS EC2, AWS RDS, DigitalOcean Droplets, AWS Cloudfront, AWS Route 53, BeautifulSoup, Selenium
- Created a data warehouse in Redshift with 1-minute resolution demand data for a large Indian state using Python, Pandas, and EC2. The data was about 6 TB of column-formatted .xls files compressed as .rars.
- Built a real-time carbon emissions tracker for India at carbontracker.in using Vue.js and Plotly, as well as AWS S3, Route 53, and Cloudfront for hosting. Featured in the Wall Street Journal (https://www.wsj.com/articles/solar-power-is-beginning-to-eclipse-fossil-fuels-11581964338?mod=hp_lead_pos5).
- Created an API for the carbon emissions tracker using AWS Lambda, AWS API Gateway, Python, and an AWS RDS MySQL instance to serve real-time generation data, as well as various statistics.
- Scraped data for the carbon tracker using Python, BeautifulSoup and a Digital Ocean Droplet, storing it in the RDS instance used by the Lambda API.
- Created a machine learning model using Scikit-learn, Python, and Pandas to predict daily electricity demand for a large Indian state trained on data from a Redshift warehouse.
- Created Python scripts to scrape housing data from various Indian state government websites using Selenium and Pandas.
Scraping Engineer2019 - 2020Tether Energy
- Wrote Bash and SQL scripts that ran on a cron job to download data from the New York ISO website and upload it to Tether's data warehouse using Presto and Hive.
- Developed Python scripts to scrape data from various formats of PDF electricity bills and then upload them to an internal service using Tabula and Pandas.
- Implemented a robust regression testing framework using Pytest to ensure that PDFs are correctly scraped.
- Augmented an internal API by adding new endpoints and models using Ruby on Rails.
- Improved an internal cron service by adding a JSON schedule that methods could run on.
- Added documentation on how to set up and test various internal services locally.
Senior Software Engineer2014 - 2018AutoGrid Systems, Inc.
Technologies: Ruby on Rails, Python, Spark, Hive, HBase, Redis, Resque, Celery, RabbitMQ, Kafka, CDH, Yarn, Kubernetes, Docker
- Led an engineering team both on- and off-shore and drove on-time development and deployment of product features using Agile.
- Implemented several features across AutoGrid's suite of applications using Ruby on Rails, MySQL, Rspec, Cucumber, Python, and Nose Tests.
- Created Pyspark jobs to aggregate daily and monthly electricity usage reports for viewing through AutoGrid's customer portal using HBase, Redis, and RabbitMQ.
- Designed and developed a data warehouse for use by customers using Hive, HBase, and Oozie. This data warehouse was used to replace all custom in-house visualizations done by AutoGrid.
- Built an API endpoint to allow end-users to opt out of Demand Response events via SMS using Ruby on Rails and Twilio.
- Optimized SQL queries to take 40% less time, making loading times much quicker in the UI.
- Designed a messaging microservice to send and track emails, SMS, and phone calls via Twilio and Sendgrid.