Utkarsh is available for hire

Utkarsh Dalal

Verified Expert in Engineering

Big Data Developer

Location

Mumbai, Maharashtra, India

Toptal Member Since

June 3, 2019

Utkarsh currently works as a full-time freelance developer, specializing in Data Engineering, scraping, and back-end development. Before this, he worked as a data scientist and researcher at the Brookings Institution's India Center and at a Silicon Valley cleantech startup called AutoGrid. Utkarsh has a degree in computer science and political economy from UC Berkeley; he has extensive experience with Python, AWS, big data, and loves fun projects with a social impact.

Portfolio

ThoughtLeaders (via Toptal)

Django, Celery, Amazon S3 (AWS S3), AWS Lambda, Amazon Kinesis, Architecture...

Hoomi (Toptal client)

Amazon DynamoDB, AWS Lambda, AWS Amplify, Amazon Cognito, Amazon API Gateway...

Confidential NDA (Toptal Client)

Plotly, Heroku, Stitch Data, MongoDB, BigQuery

Experience

SQL - 5 years Python - 5 years Ruby on Rails (RoR) - 5 years Spark - 4 years Data Engineering - 3 years Pandas - 3 years Amazon Web Services (AWS) - 2 years Data Science - 2 years

Availability

Part-time

Preferred Environment

RubyMine, PyCharm, GitHub, MacOS, Linux

The most amazing...

...project I've worked on is creating a live carbon emissions tracker for India, which the Indian government now uses for planning.

Work Experience

Tech Lead and Full-stack Developer

2021 - PRESENT

ThoughtLeaders (via Toptal)

Worked as a full-stack developer and technical lead, implementing several new features in the platform using Django, AngularJS, Heroku, and AWS and ramping up data gathering efforts and mentoring other team members.
Added new formats to the platform, including Twitch, and designed processes for enhancing automated scraping of YouTube and Podcast data while reducing API usage.
Designed several new tables in the Postgres database and worked extensively with Elasticsearch to query and store data.
Developed a cost-effective solution to generate transcripts for podcasts using Zapier, Celery, and AWS.
Reduced monthly AWS costs by around 70% by optimizing Lambda usage, S3 storage, and removing redundant processes.
Optimized Heroku dyno usage to reduce budget, preventing a potential cost increase of 100%. Additionally handled the upgrade of several libraries and migration of infrastructure when a security incident occurred between Heroku and GitHub.
Designed and developed an authorization back-end and integrated the application with a payment gateway via BlueSnap.
Created several predictive and automation features, reducing the burden on employees in other verticals of the company.
Designed end-to-end data pipelines using AWS Kinesis Firehose, SNS, SQS, Lambda, and S3.
Integrated AWS Athena with an existing JSON data lake in S3, allowing the querying of unstructured data.

Technologies: Django, Celery, Amazon S3 (AWS S3), AWS Lambda, Amazon Kinesis, Architecture, Elasticsearch, Heroku, Python 3, Node.js, PostgreSQL, PyCharm, AngularJS, APIs, YouTube API, Scraping, Cron, Cost Reduction & Optimization, BlueSnap, Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (SQS)

Back-end Developer

2020 - 2021

Hoomi (Toptal client)

Designed and created infrastructure, databases, and APIs for a baked-goods delivery app using DynamoDB, Lambda, API Gateway, and Python with AWS Cognito for authentication.
Created an order management front-end system for bakeries with React and authentication using AWS Cognito, using APIs that I wrote connecting to the DynamoDB database.
Used geo libraries in DynamoDB to allow indexing and sorting by location, allowing APIs to return bakeries by distance.
Utilized local secondary indexes in DynamoDB to index and sort by various attributes, such as rating, price, distance, etc.
Created various APIs for both the customer and bakery apps, correctly handling authentication, order histories, order statuses, etc.

Technologies: Amazon DynamoDB, AWS Lambda, AWS Amplify, Amazon Cognito, Amazon API Gateway, Geolocation, Databases, User Authentication, Python, Serverless, React

Data Warehouse Developer

2020 - 2020

Confidential NDA (Toptal Client)

Designed and developed a production data warehouse with denormalized tables in BigQuery using a MongoDB database on Heroku as the data source and Stitch Data as an ETL tool.
Scheduled extractions from MongoDB every six hours using Stitch Data, ensuring that only recently-updated data was included.
Created scheduled query to join and load data to a denormalized table in BigQuery after extraction from MongoDB is complete.
Created graphs and geospatial plots from BigQuery data for customer demo to his client using Plotly and Jupyter Notebooks.
Thoroughly documented instructions for setting up and querying BigQuery for future developers working on the project.
Researched integrating Google Analytics into BigQuery to track the customer lifecycle from acquisition onwards.
Created views on the denormalized BigQuery table to allow users to easily see the most recent state of the database.
Worked closely with QA lead to the end-to-end test data warehouse, from automated extractions to loads and views.

Technologies: Plotly, Heroku, Stitch Data, MongoDB, BigQuery

Founder

2020 - 2020

Firmation

Founded a legal tech startup to help Indian lawyers automate their time-keeping and billing, saving them time and increasing their revenue. Led product development, sales, marketing, and development.
Built an Azure app to integrate with law firms' timesheets in OneDrive, and automatically generate invoices from them; used Oauth 2, Microsoft Graph API, and Pandas.
Used Oauth 2 and Microsoft Graph API to build an Azure app that automatically generates summaries of billable work lawyers have done by reading from their Outlook emails and calendars.
Extended this functionality to Google accounts through Oauth 2, Gmail API, and Google Calendar API.
Created AWS Lambda functions with API Gateway endpoints that end-users could access to generate invoices and summaries of their billable work.
Used AWS SES to deliver invoices and billable work summaries to lawyers whenever they were generated.
Called and emailed potential clients, demoed our product to several potential users, and successfully set up customers with the product for usage.
Designed a website and handled marketing using Google Ads.
Wrote a Python script to automatically contact potential leads on LinkedIn.

Technologies: Automation, REST APIs, Google Analytics, Google Ads, OneDrive, Google APIs, Microsoft Graph API, Pandas, Amazon Simple Email Service (SES), Amazon S3 (AWS S3), AWS Lambda, OAuth 2, Python

Scraping Engineer

2019 - 2020

Tether Energy

Wrote Bash and SQL scripts that ran on a cron job to download data from the New York ISO website and upload it to Tether's data warehouse using Presto and Hive.
Created scripts to automatically fetch electricity bill data for Brazilian consumers and then upload them to an S3 bucket using JavaScript, Puppeteer, and AWS.
Automated solving of ReCAPTCHAs using JavaScript and 2captcha.
Developed Python scripts to scrape data from various formats of PDF electricity bills and then upload them to an internal service using Tabula and Pandas.
Implemented a robust regression testing framework using Pytest to ensure that PDFs are correctly scraped.
Augmented an internal API by adding new endpoints and models using Ruby on Rails.
Improved an internal cron service by adding a JSON schedule that methods could run on.
Added documentation on how to set up and test various internal services locally.

Technologies: Automation, Apache Hive, Presto, Pandas, Ruby on Rails (RoR), Puppeteer, Node.js, JavaScript, SQL, Python

Data Science Consultant

2019 - 2020

Brookings Institution India Center

Created a data warehouse in Redshift with one-minute resolution demand data for a large Indian state using Python, Pandas, and EC2. The data was about six TB of column-formatted .xls files compressed as .rars.
Built a real-time carbon emissions tracker for India at carbontracker in using Vue.js and Plotly and AWS S3, Route 53, and Cloudfront for hosting.
Featured accomplishments in the Wall Street Journal.
Scraped data for the carbon tracker using Python, BeautifulSoup, and a Digital Ocean Droplet, storing it in the RDS instance used by the Lambda API.
Created a machine learning model using Scikit-learn, Python, and Pandas to predict daily electricity demand for a large Indian state trained on data from a Redshift warehouse.
Developed Python scripts to scrape housing data from various Indian state government websites using Selenium and Pandas.
Built an API for the carbon emissions tracker using AWS Lambda, AWS API Gateway, Python, and an AWS RDS MySQL instance to serve real-time generation data and various statistics.

Technologies: Automation, REST APIs, Amazon Web Services (AWS), Selenium, Beautiful Soup, Amazon Route 53, Amazon CloudFront CDN, Droplets, DigitalOcean, Amazon EC2, Amazon S3 (AWS S3), Scikit-learn, AWS Lambda, Redshift, Pandas, Python

Senior Software Engineer

2014 - 2018

AutoGrid Systems, Inc.

Led an engineering team both on- and off-shore and drove on-time development and deployment of product features using Agile.
Implemented several features across AutoGrid's suite of applications using Ruby on Rails, MySQL, RSpec, Cucumber, Python, and Nose Tests.
Created PySpark jobs to aggregate daily and monthly electricity usage reports for viewing through AutoGrid's customer portal using HBase, Redis, and RabbitMQ.
Designed and developed a data warehouse for use by customers using Hive, HBase, and Oozie. This data warehouse was used to replace all custom in-house visualizations done by AutoGrid.
Built an API endpoint to allow end-users to opt-out of demand response events via SMS using Ruby on Rails and Twilio.
Optimized SQL queries to take 40% less time, making loading times much quicker in the UI.
Designed a messaging microservice to send and track emails, SMS, and phone calls via Twilio and SendGrid.

Technologies: REST APIs, Docker, Kubernetes, YARN, Apache Kafka, RabbitMQ, Celery, Resque, Redis, HBase, Apache Hive, Spark, Python, Ruby on Rails (RoR), Ruby

Experience

Brookings India Electricity and Carbon Tracker

http://carbontracker.in

Created a near real-time electricity and carbon tracker for India for use for policy analysis. The data for the tracker is continuously scraped from meritindia.in, stored in an AWS RDS instance, and served to the website via an API using AWS Lambda. The website itself uses Vue.js and plotly.js. It was featured in the Wall Street Journal - https://www.wsj.com/articles/solar-power-is-beginning-to-eclipse-fossil-fuels-11581964338?mod=hp_lead_pos5

Firmation Time Tracker

Created a time tracking tool for Indian lawyers to help Indian lawyers fill out their timesheets more quickly and ensure they don't forget about any billable work they've done. The tool integrated with Outlook/Google emails and calendar, as well as OneDrive, using OAuth, to give users a summary of the work they've done by parsing emails, calendar events and files worked on.

Successfully piloted the tool with two medium-sized law firms, and iteratively improved it to meet customer needs.

Led design, development, sales, and marketing for the product.

Used AWS Lambda with API Gateway as the back end, and DynamoDB as the database. Also used Python, Pandas, AWS S3, and SES to generate and send out billable work summaries to users via email. Used AWS Cloudwatch to automatically generate and send out billable work summaries to users weekly. Used the Microsoft Graph API and Gmail and Google Calendar APIs to read user data.

Used Google Ads and Analytics to sell to users, Leadpages to host our landing page, and a combination of personal referrals and cold emails for sales.

Democrafy

Created and launched an Android app to make governance more accountable by allowing users to post about issues they face, view existing issues posted by other users, and hold elected officials accountable to their actions.

This uses a Ruby on Rails back end hosted on Heroku with MongoDB as the database.

Bombay Food Blog

Created a fully-automated Instagram page that scrapes and reposts photos of food from Mumbai, crediting the source. Trained a neural network using transfer learning to classify photos of food, and trained a random forest to predict which users are likely to follow the page. All of this is hosted on an EC2 instance.

Haryana Medical Scraper

Built a scraper using Python, Pandas, and Tabula to extract data about doctors in the Indian state of Haryana from PDFs and graph various statistics about them.

CustomJob

Created an Android app to connect buyers with local sellers of items. The buyers can post for items they would like to have made, and sellers can bid on the price they would charge to make the item. Used Parse as the database and for authentication.

WhatsApp Fact Checker

Created a WhatsApp bot which crowdsources reports of fake news and works with text, photos, videos and audio. I used Python, Lambda, DynamoDB, API Gateway, Docker, ECR, ECS and S3 for this.
Users can forward suspected fake news to the bot to report it, or see how many other users have reported it and the reasons they reported it.

Restaurant Reservation Manager

Built a reservation manager for restaurants, built using React for the front end and Firebase for the back end and authentication.

The manager allows restaurants to log in, create tables, create/edit reservations, and see a summary of reservations on a given day.

Toptal React Academy

The Toptal React Academy is an exclusive learning program that teaches the React framework to select members of the Toptal network. After a month of study, all graduates are tasked with completing a 30-40 hour final project to build and deliver a React app from scratch. The above is a walkthrough of my final project.

Skills

Languages

Python, SQL, Ruby, Java, Bash, XML, C++, HTML, CSS, JavaScript, Python 3

Frameworks

Ruby on Rails (RoR), Spark, Apache Spark, OAuth 2, Selenium, YARN, Redux, Flask, Hadoop, Presto, Django, AngularJS

Libraries/APIs

REST APIs, Pandas, PySpark, Gmail API, Google Calendar API, Google Apps, Natural Language Toolkit (NLTK), PiLLoW, React, Scikit-learn, Matplotlib, Instagram API, Selenium WebDriver, Beautiful Soup, Puppeteer, Node.js, SQLAlchemy, Google APIs, OneDrive, Resque, React Redux, Vue, Keras, AWS Amplify, YouTube API

Tools

IPython, IPython Notebook, Jupyter, Stitch Data, Microsoft Outlook, Azure App Service, Seaborn, MATLAB, RabbitMQ, SendGrid, Cloudera, BigQuery, GitHub, PyCharm, RubyMine, Amazon Simple Email Service (SES), Google Analytics, Amazon CloudFront CDN, Celery, Plotly, Amazon CloudWatch, Microsoft Excel, LeadPages, Tableau, Looker, Amazon Elastic Container Service (Amazon ECS), Amazon Cognito, Cron, Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (SQS)

Paradigms

Automation, ETL, REST, Testing, Automated Testing, Agile, Data Science, Business Intelligence (BI), Microservices

Platforms

AWS Lambda, Amazon Web Services (AWS), Jupyter Notebook, Google Cloud Platform (GCP), Amazon EC2, Apache Kafka, Twilio, DigitalOcean, Linux, MacOS, Droplets, Heroku, Firebase, Android, Kubernetes, Docker

Storage

MySQL, PostgreSQL, Redis, Relational Databases, Amazon S3 (AWS S3), Redshift, Apache Hive, HBase, NoSQL, JSON, Amazon DynamoDB, MongoDB, Databases, Elasticsearch

Other

Scraping, Web Scraping, PDF Scraping, Data Scraping, Data Engineering, Big Data, Data Analytics, Data Analysis, Data Warehousing, Data Warehouse Design, APIs, Web Crawlers, Microsoft Graph API, Forecasting, Machine Learning, Serverless, Data Visualization, Cloud, Google Ads, Amazon Route 53, OAuth, Google Tag Manager, Image Processing, Time Series, Google BigQuery, Instagram Growth, Natural Language Processing (NLP), Statistics, ECS, Amazon API Gateway, Geolocation, User Authentication, Amazon Kinesis, Architecture, Cost Reduction & Optimization, BlueSnap, GPT, Generative Pre-trained Transformers (GPT)

Education

2010 - 2014

Bachelor of Arts Degree in Computer Science, Political Economy

UC Berkeley - Berkeley, CA

Certifications

DECEMBER 2020 - PRESENT

Toptal React Academy Graduate

Toptal, LLC

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring