Utkarsh Dalal, Big Data Developer in Mumbai, Maharashtra, India
Utkarsh Dalal

Big Data Developer in Mumbai, Maharashtra, India

Member since February 3, 2019
Utkarsh currently works as a data science consultant at the Brookings Institution's India Center in the Energy and Sustainability vertical where he uses data to make policy recommendations. Prior to this, he worked at a Silicon Valley Cleantech startup called AutoGrid and has a degree in computer science and political economy from UC Berkeley. He has extensive experience with Python, Rails, and big data and loves fun projects with social impact!
Utkarsh is now available for hire

Portfolio

Experience

Location

Mumbai, Maharashtra, India

Availability

Full-time

Preferred Environment

RubyMine, PyCharm, GitHub, MacOS, Linux

The most amazing...

...project I've worked on is creating a live carbon emissions tracker for India which is now used by the Indian government for planning. It's at carbontracker.in.

Employment

  • Founder

    2020 - PRESENT
    Firmation
    • Founded a legaltech startup to help Indian lawyers automate their time-keeping and billing, saving them time and increasing their revenue. Led product development, sales, marketing, and development.
    • Built an Azure app to integrate with law firms' timesheets in OneDrive, and automatically generate invoices from them; used Oauth 2, Microsoft Graph API, and Pandas.
    • Used Oauth 2 and Microsoft Graph API to build an Azure app that automatically generates summaries of billable work lawyers have done by reading from their Outlook emails and calendars.
    • Extended this functionality to Google accounts through the use of Oauth 2, Gmail API, and Google Calendar API.
    • Created AWS Lambda functions with API Gateway endpoints that end-users could access to generate invoices and summaries of their billable work whenever they wanted.
    • Used AWS SES to deliver invoices and billable work summaries to lawyers whenever they were generated.
    • Called/emailed potential clients, demoed our product to several potential users, and successfully set up customers with the product for usage.
    • Designed website (www.firmation.in) and handled marketing using Google Ads.
    • Wrote a Python script to automatically contact potential leads on LinkedIn.
    Technologies: Google Analytics, Google Ads, OneDrive, Google APIs, Microsoft Graph API, Pandas, AWS SES, AWS S3, AWS Lambda, OAuth 2, Python
  • Data Science Consultant

    2019 - PRESENT
    Brookings Institution India Center
    • Created a data warehouse in Redshift with one-minute resolution demand data for a large Indian state using Python, Pandas, and EC2. The data was about 6 TB of column-formatted .xls files compressed as .rars.
    • Built a real-time carbon emissions tracker for India at carbontracker.in using Vue.js and Plotly, as well as AWS S3, Route 53, and Cloudfront for hosting.
    • Featured in the Wall Street Journal (https://www.wsj.com/articles/solar-power-is-beginning-to-eclipse-fossil-fuels-11581964338?mod=hp_lead_pos5).
    • Scraped data for the carbon tracker using Python, BeautifulSoup and a Digital Ocean Droplet, storing it in the RDS instance used by the Lambda API.
    • Created a machine learning model using Scikit-learn, Python, and Pandas to predict daily electricity demand for a large Indian state trained on data from a Redshift warehouse.
    • Created Python scripts to scrape housing data from various Indian state government websites using Selenium and Pandas.
    • Created an API for the carbon emissions tracker using AWS Lambda, AWS API Gateway, Python, and an AWS RDS MySQL instance to serve real-time generation data, as well as various statistics.
    Technologies: Selenium, Beautiful Soup, AWS Route 53, AWS CloudFront, Droplets, DigitalOcean, AWS RDS, AWS EC2, AWS S3, Scikit-learn, AWS Lambda, Redshift, AWS, Pandas, Python
  • Data Warehouse Developer

    2020 - 2020
    Confidential NDA (Toptal Client)
    • Designed and developed a production data warehouse with denormalized tables in BigQuery using a MongoDB database on Heroku as the data source and Stitch Data as an ETL tool.
    • Scheduled extractions from MongoDB every six hours using Stitch Data, ensuring that only recently-updated data was included.
    • Created scheduled query to join and load data to a denormalized table in BigQuery after extraction from MongoDB is complete.
    • Created graphs and geospatial plots from BigQuery data for customer demo to his client using Plotly and Jupyter Notebooks.
    • Thoroughly documented instructions for setting up and querying BigQuery for future developers working on the project.
    • Researched integrating Google Analytics into BigQuery to track the customer lifecycle from acquisition onwards.
    • Created views on the denormalized BigQuery table to allow users to easily see the most recent state of the database.
    • Worked closely with QA lead to the end-to-end test data warehouse, from automated extractions to loads and views.
    Technologies: Plotly, Heroku, Stitch Data, MongoDB, BigQuery
  • Scraping Engineer

    2019 - 2020
    Tether Energy
    • Wrote Bash and SQL scripts that ran on a cron job to download data from the New York ISO website and upload it to Tether's data warehouse using Presto and Hive.
    • Created scripts to automatically fetch electricity bill data for Brazilian consumers and then upload them to an S3 bucket using JavaScript, Puppeteer, and AWS.
    • Automated solving of ReCAPTCHAs using JavaScript and 2captcha.
    • Developed Python scripts to scrape data from various formats of PDF electricity bills and then upload them to an internal service using Tabula and Pandas.
    • Implemented a robust regression testing framework using Pytest to ensure that PDFs are correctly scraped.
    • Augmented an internal API by adding new endpoints and models using Ruby on Rails.
    • Improved an internal cron service by adding a JSON schedule that methods could run on.
    • Added documentation on how to set up and test various internal services locally.
    Technologies: Apache Hive, Presto DB, Pandas, Ruby on Rails (RoR), Puppeteer, Node.js, JavaScript, SSAS Tabular, SQL, Python
  • Senior Software Engineer

    2014 - 2018
    AutoGrid Systems, Inc.
    • Led an engineering team both on- and off-shore and drove on-time development and deployment of product features using Agile.
    • Implemented several features across AutoGrid's suite of applications using Ruby on Rails, MySQL, RSpec, Cucumber, Python, and Nose Tests.
    • Created PySpark jobs to aggregate daily and monthly electricity usage reports for viewing through AutoGrid's customer portal using HBase, Redis, and RabbitMQ.
    • Designed and developed a data warehouse for use by customers using Hive, HBase, and Oozie. This data warehouse was used to replace all custom in-house visualizations done by AutoGrid.
    • Built an API endpoint to allow end-users to opt-out of demand response events via SMS using Ruby on Rails and Twilio.
    • Optimized SQL queries to take 40% less time, making loading times much quicker in the UI.
    • Designed a messaging microservice to send and track emails, SMS, and phone calls via Twilio and SendGrid.
    Technologies: Docker, Kubernetes, YARN, Apache Kafka, RabbitMQ, Celery, Resque, Redis, HBase, Apache Hive, Spark, Python, Ruby on Rails (RoR), Ruby

Experience

  • Brookings India Electricity and Carbon Tracker (Development)
    http://carbontracker.in

    Created a near real-time electricity and carbon tracker for India for use for policy analysis. The data for the tracker is continuously scraped from meritindia.in, stored in an AWS RDS instance, and served to the website via an API using AWS Lambda. The website itself uses Vue.js and plotly.js. It was featured in the Wall Street Journal - https://www.wsj.com/articles/solar-power-is-beginning-to-eclipse-fossil-fuels-11581964338?mod=hp_lead_pos5

  • Firmation Time Tracker (Development)
    http://firmation.in/time-tracker

    Created a time tracking tool for Indian lawyers to help Indian lawyers fill out their timesheets more quickly and ensure they don't forget about any billable work they've done. The tool integrated with Outlook/Google emails and calendar, as well as OneDrive, using OAuth, to give users a summary of the work they've done by parsing emails, calendar events and files worked on.

    Successfully piloted the tool with two medium-sized law firms, and iteratively improved it to meet customer needs.

    Led design, development, sales, and marketing for the product.

    Used AWS Lambda with API Gateway as the back end, and DynamoDB as the database. Also used Python, Pandas, AWS S3, and SES to generate and send out billable work summaries to users via email. Used AWS Cloudwatch to automatically generate and send out billable work summaries to users weekly. Used the Microsoft Graph API and Gmail and Google Calendar APIs to read user data.

    Used Google Ads and Analytics to sell to users, Leadpages to host our landing page, and a combination of personal referrals and cold emails for sales.

  • Democrafy (Development)

    Created and launched an Android app to make governance more accountable by allowing users to post about issues they face, view existing issues posted by other users, and hold elected officials accountable to their actions.

    This uses a Ruby on Rails back end hosted on Heroku with MongoDB as the database.

  • Bombay Food Blog (Development)
    https://www.instagram.com/bombay_food_blog/

    Created a fully-automated Instagram page that scrapes and reposts photos of food from Mumbai, crediting the source. Trained a neural network using transfer learning to classify photos of food, and trained a random forest to predict which users are likely to follow the page. All of this is hosted on an EC2 instance.

  • Haryana Medical Scraper (Development)

    Built a scraper using Python, Pandas, and Tabula to extract data about doctors in the Indian state of Haryana from PDFs and graph various statistics about them.

  • CustomJob (Development)

    Created an Android app to connect buyers with local sellers of items. The buyers can post for items they would like to have made, and sellers can bid on the price they would charge to make the item. Used Parse as the database and for authentication.

  • WhatsApp Fact Checker (Development)

    Created a WhatsApp bot which crowdsources reports of fake news and works with text, photos, videos and audio. I used Python, Lambda, DynamoDB, API Gateway, Docker, ECR, ECS and S3 for this.
    Users can forward suspected fake news to the bot to report it, or see how many other users have reported it and the reasons they reported it.

Skills

  • Languages

    Python, SQL, Ruby, Java, Bash, XML, HTML, CSS, JavaScript
  • Frameworks

    Ruby on Rails (RoR), Spark, Apache Spark, OAuth 2, Selenium, YARN, Flask, Hadoop, Presto DB
  • Libraries/APIs

    Pandas, PySpark, Microsoft Graph API, Gmail API, Google Calendar API, Google Apps, Scikit-learn, Matplotlib, Instagram API, Selenium WebDriver, Beautiful Soup, Puppeteer, Node.js, SQLAlchemy, Google APIs, OneDrive, Resque, Vue.js, Keras
  • Tools

    IPython, IPython Notebook, Jupyter, Stitch Data, Microsoft Outlook, Azure App Service, Seaborn, MATLAB, RabbitMQ, SendGrid, Cloudera, BigQuery, GitHub, PyCharm, RubyMine, AWS SES, Google Analytics, Celery, Plotly, AWS CloudWatch, LeadPages, Tableau, Looker, AWS ECS
  • Paradigms

    ETL, REST, Testing, Automated Testing, Agile, Data Science, Business Intelligence (BI), Microservices
  • Platforms

    AWS Lambda, Amazon Web Services (AWS), Jupyter Notebook, Google Cloud Platform (GCP), AWS EC2, Apache Kafka, Twilio, DigitalOcean, Linux, MacOS, Droplets, Heroku, Android, Kubernetes, Docker
  • Storage

    MySQL, PostgreSQL, Redis, Relational Databases, AWS S3, Redshift, Apache Hive, HBase, NoSQL, AWS RDS, JSON, AWS DynamoDB, SSAS Tabular, MongoDB
  • Other

    Scraping, Web Scraping, PDF Scraping, Data Scraping, Data Engineering, Big Data, Data Analytics, Data Analysis, Data Warehousing, Data Warehouse Design, APIs, Web Crawlers, Forecasting, Machine Learning, AWS, Serverless, Data Visualization, Cloud, Google Ads, AWS CloudFront, AWS Route 53, Excel, OAuth, Google Tag Manager, Time Series, Google BigQuery, Instagram Growth, Natural Language Processing (NLP), Statistics, ECS, AWS API Gateway

Education

  • Bachelor of Arts degree in Computer Science, Political Economy
    2010 - 2014
    UC Berkeley - Berkeley, CA

To view more profiles

Join Toptal
Share it with others