Utkarsh Dalal, Big Data Developer in Mumbai, Maharashtra, India
Utkarsh Dalal

Big Data Developer in Mumbai, Maharashtra, India

Member since February 3, 2019
Utkarsh currently works as a data science consultant at the Brookings Institution's India Center in the Energy and Sustainability vertical where he uses data to make policy recommendations. Prior to this, he worked at a Silicon Valley Cleantech startup called AutoGrid and has a degree in computer science and political economy from UC Berkeley. He has extensive experience with Python, Rails, and big data and loves fun projects with social impact!
Utkarsh is now available for hire

Portfolio

Experience

Location

Mumbai, Maharashtra, India

Availability

Full-time

Preferred Environment

Linux, Mac OSX, GitHub, PyCharm, RubyMine

The most amazing...

...project I've worked on is creating a live carbon emissions tracker for India which is now used by the Indian government for planning. It's at carbontracker.in.

Employment

  • Data Science Consultant

    2019 - PRESENT
    Brookings Institution India Center
    • Created a data warehouse in Redshift with 1-minute resolution demand data for a large Indian state using Python, Pandas, and EC2. The data was about 6 TB of column-formatted .xls files compressed as .rars.
    • Built a real-time carbon emissions tracker for India at carbontracker.in using Vue.js and Plotly, as well as AWS S3, Route 53, and Cloudfront for hosting. Featured in the Wall Street Journal (https://www.wsj.com/articles/solar-power-is-beginning-to-eclipse-fossil-fuels-11581964338?mod=hp_lead_pos5).
    • Created an API for the carbon emissions tracker using AWS Lambda, AWS API Gateway, Python, and an AWS RDS MySQL instance to serve real-time generation data, as well as various statistics.
    • Scraped data for the carbon tracker using Python, BeautifulSoup and a Digital Ocean Droplet, storing it in the RDS instance used by the Lambda API.
    • Created a machine learning model using Scikit-learn, Python, and Pandas to predict daily electricity demand for a large Indian state trained on data from a Redshift warehouse.
    • Created Python scripts to scrape housing data from various Indian state government websites using Selenium and Pandas.
    Technologies: Python, Pandas, AWS Redshift, AWS Lambda, Scikit-Learn, AWS S3, AWS EC2, AWS RDS, DigitalOcean Droplets, AWS Cloudfront, AWS Route 53, BeautifulSoup, Selenium
  • Scraping Engineer

    2019 - 2020
    Tether Energy
    • Wrote Bash and SQL scripts that ran on a cron job to download data from the New York ISO website and upload it to Tether's data warehouse using Presto and Hive.
    • Created scripts to automatically fetch electricity bill data for Brazilian consumers and then upload them to an S3 bucket using JavaScript, Puppeteer, and AWS.
    • Automated solving of ReCAPTCHAs using JavaScript and 2captcha.
    • Developed Python scripts to scrape data from various formats of PDF electricity bills and then upload them to an internal service using Tabula and Pandas.
    • Implemented a robust regression testing framework using Pytest to ensure that PDFs are correctly scraped.
    • Augmented an internal API by adding new endpoints and models using Ruby on Rails.
    • Improved an internal cron service by adding a JSON schedule that methods could run on.
    • Added documentation on how to set up and test various internal services locally.
    Technologies: Python, SQL, Tabula, Javascript, Node.js, Puppeteer, Ruby on Rails, Pandas, Presto, Hive
  • Senior Software Engineer

    2014 - 2018
    AutoGrid Systems, Inc.
    • Led an engineering team both on- and off-shore and drove on-time development and deployment of product features using Agile.
    • Implemented several features across AutoGrid's suite of applications using Ruby on Rails, MySQL, Rspec, Cucumber, Python, and Nose Tests.
    • Created Pyspark jobs to aggregate daily and monthly electricity usage reports for viewing through AutoGrid's customer portal using HBase, Redis, and RabbitMQ.
    • Designed and developed a data warehouse for use by customers using Hive, HBase, and Oozie. This data warehouse was used to replace all custom in-house visualizations done by AutoGrid.
    • Built an API endpoint to allow end-users to opt out of Demand Response events via SMS using Ruby on Rails and Twilio.
    • Optimized SQL queries to take 40% less time, making loading times much quicker in the UI.
    • Designed a messaging microservice to send and track emails, SMS, and phone calls via Twilio and Sendgrid.
    Technologies: Ruby on Rails, Python, Spark, Hive, HBase, Redis, Resque, Celery, RabbitMQ, Kafka, CDH, Yarn, Kubernetes, Docker

Experience

  • Brookings India Electricity and Carbon Tracker (Development)
    http://carbontracker.in

    Created a near real-time electricity and carbon tracker for India for use for policy analysis. The data for the tracker is continuously scraped from meritindia.in, stored in an AWS RDS instance, and served to the website via an API using AWS Lambda. The website itself uses Vue.js and plotly.js. It was featured in the Wall Street Journal - https://www.wsj.com/articles/solar-power-is-beginning-to-eclipse-fossil-fuels-11581964338?mod=hp_lead_pos5

  • Democrafy (Development)

    Created and launched an Android app to make governance more accountable by allowing users to post about issues they face, view existing issues posted by other users, and hold elected officials accountable to their actions.

    This uses a Ruby on Rails back end hosted on Heroku with MongoDB as the database.

  • Bombay Food Blog (Development)
    https://www.instagram.com/bombay_food_blog/

    Created a fully-automated Instagram page that scrapes and reposts photos of food from Mumbai, crediting the source. Trained a neural network using transfer learning to classify photos of food, and trained a random forest to predict which users are likely to follow the page. All of this is hosted on an EC2 instance.

  • Haryana Medical Scraper (Development)

    Built a scraper using Python, Pandas, and Tabula to extract data about doctors in the Indian state of Haryana from PDFs and graph various statistics about them.

  • CustomJob (Development)

    Created an Android app to connect buyers with local sellers of items. The buyers can post for items they would like to have made, and sellers can bid on the price they would charge to make the item. Used Parse as the database and for authentication.

  • WhatsApp Fact Checker (Development)

    Created a WhatsApp bot which crowdsources reports of fake news and works with text, photos, videos and audio. I used Python, Lambda, DynamoDB, API Gateway, Docker, ECR, ECS and S3 for this.
    Users can forward suspected fake news to the bot to report it, or see how many other users have reported it and the reasons they reported it.

Skills

  • Languages

    Python, SQL, Ruby, Java, Bash, XML, HTML, CSS, JavaScript
  • Frameworks

    Ruby on Rails (RoR), Spark, Apache Spark, Selenium, Flask, Hadoop, Presto DB
  • Libraries/APIs

    Pandas, PySpark, Scikit-learn, Matplotlib, Instagram API, Selenium WebDriver, Beautiful Soup, Puppeteer, Node.js, SQLAlchemy, Vue.js, Keras
  • Tools

    IPython, IPython Notebook, Jupyter, Stitch Data, Seaborn, MATLAB, RabbitMQ, SendGrid, Cloudera, BigQuery, Tableau, Looker, AWS ECS
  • Paradigms

    ETL, REST, Testing, Automated Testing, Agile, Data Science, Business Intelligence (BI), Microservices
  • Platforms

    AWS Lambda, Amazon Web Services (AWS), Jupyter Notebook, Google Cloud Platform (GCP), AWS EC2, Apache Kafka, Twilio, DigitalOcean, Linux, Android, Kubernetes, Docker
  • Storage

    MySQL, PostgreSQL, Redis, Relational Databases, AWS S3, Redshift, Apache Hive, HBase, NoSQL, AWS RDS, JSON, AWS DynamoDB, MongoDB
  • Other

    Scraping, Web Scraping, PDF Scraping, Data Scraping, Data Engineering, Big Data, Data Analytics, Data Analysis, Data Warehousing, Data Warehouse Design, APIs, Web Crawlers, Forecasting, Machine Learning, AWS, Serverless, Data Visualization, Cloud, Time Series, Google BigQuery, Instagram Growth, Natural Language Processing (NLP), Statistics, ECS, AWS API Gateway

Education

  • Bachelor of Arts degree in Computer Science, Political Economy
    2010 - 2014
    UC Berkeley - Berkeley, CA

To view more profiles

Join Toptal
Share it with others