Utkarsh Dalal
Verified Expert in Engineering
Big Data Developer
Mumbai, Maharashtra, India
Toptal member since June 3, 2019
Utkarsh currently works as a full-time freelance developer, specializing in Data Engineering, scraping, and back-end development. Before this, he worked as a data scientist and researcher at the Brookings Institution's India Center and at a Silicon Valley cleantech startup called AutoGrid. Utkarsh has a degree in computer science and political economy from UC Berkeley; he has extensive experience with Python, AWS, big data, and loves fun projects with a social impact.
Portfolio
Experience
Availability
Preferred Environment
RubyMine, PyCharm, GitHub, MacOS, Linux
The most amazing...
...project I've worked on is creating a live carbon emissions tracker for India, which the Indian government now uses for planning.
Work Experience
Tech Lead and Full-stack Developer
ThoughtLeaders (via Toptal)
- Worked as a full-stack developer and technical lead, implementing several new features in the platform using Django, AngularJS, Heroku, and AWS and ramping up data gathering efforts and mentoring other team members.
- Added new formats to the platform, including Twitch, and designed processes for enhancing automated scraping of YouTube and Podcast data while reducing API usage.
- Designed several new tables in the Postgres database and worked extensively with Elasticsearch to query and store data.
- Developed a cost-effective solution to generate transcripts for podcasts using Zapier, Celery, and AWS.
- Reduced monthly AWS costs by around 70% by optimizing Lambda usage, S3 storage, and removing redundant processes.
- Optimized Heroku dyno usage to reduce budget, preventing a potential cost increase of 100%. Additionally handled the upgrade of several libraries and migration of infrastructure when a security incident occurred between Heroku and GitHub.
- Designed and developed an authorization back-end and integrated the application with a payment gateway via BlueSnap.
- Created several predictive and automation features, reducing the burden on employees in other verticals of the company.
- Designed end-to-end data pipelines using AWS Kinesis Firehose, SNS, SQS, Lambda, and S3.
- Integrated AWS Athena with an existing JSON data lake in S3, allowing the querying of unstructured data.
Back-end Developer
Hoomi (Toptal client)
- Designed and created infrastructure, databases, and APIs for a baked-goods delivery app using DynamoDB, Lambda, API Gateway, and Python with AWS Cognito for authentication.
- Created an order management front-end system for bakeries with React and authentication using AWS Cognito, using APIs that I wrote connecting to the DynamoDB database.
- Used geo libraries in DynamoDB to allow indexing and sorting by location, allowing APIs to return bakeries by distance.
- Utilized local secondary indexes in DynamoDB to index and sort by various attributes, such as rating, price, distance, etc.
- Created various APIs for both the customer and bakery apps, correctly handling authentication, order histories, order statuses, etc.
Data Warehouse Developer
Confidential NDA (Toptal Client)
- Designed and developed a production data warehouse with denormalized tables in BigQuery using a MongoDB database on Heroku as the data source and Stitch Data as an ETL tool.
- Scheduled extractions from MongoDB every six hours using Stitch Data, ensuring that only recently-updated data was included.
- Created scheduled query to join and load data to a denormalized table in BigQuery after extraction from MongoDB is complete.
- Created graphs and geospatial plots from BigQuery data for customer demo to his client using Plotly and Jupyter Notebooks.
- Thoroughly documented instructions for setting up and querying BigQuery for future developers working on the project.
- Researched integrating Google Analytics into BigQuery to track the customer lifecycle from acquisition onwards.
- Created views on the denormalized BigQuery table to allow users to easily see the most recent state of the database.
- Worked closely with QA lead to the end-to-end test data warehouse, from automated extractions to loads and views.
Founder
Firmation
- Founded a legal tech startup to help Indian lawyers automate their time-keeping and billing, saving them time and increasing their revenue. Led product development, sales, marketing, and development.
- Built an Azure app to integrate with law firms' timesheets in OneDrive, and automatically generate invoices from them; used Oauth 2, Microsoft Graph API, and Pandas.
- Used Oauth 2 and Microsoft Graph API to build an Azure app that automatically generates summaries of billable work lawyers have done by reading from their Outlook emails and calendars.
- Extended this functionality to Google accounts through Oauth 2, Gmail API, and Google Calendar API.
- Created AWS Lambda functions with API Gateway endpoints that end-users could access to generate invoices and summaries of their billable work.
- Used AWS SES to deliver invoices and billable work summaries to lawyers whenever they were generated.
- Called and emailed potential clients, demoed our product to several potential users, and successfully set up customers with the product for usage.
- Designed a website and handled marketing using Google Ads.
- Wrote a Python script to automatically contact potential leads on LinkedIn.
Scraping Engineer
Tether Energy
- Wrote Bash and SQL scripts that ran on a cron job to download data from the New York ISO website and upload it to Tether's data warehouse using Presto and Hive.
- Created scripts to automatically fetch electricity bill data for Brazilian consumers and then upload them to an S3 bucket using JavaScript, Puppeteer, and AWS.
- Automated solving of ReCAPTCHAs using JavaScript and 2captcha.
- Developed Python scripts to scrape data from various formats of PDF electricity bills and then upload them to an internal service using Tabula and Pandas.
- Implemented a robust regression testing framework using Pytest to ensure that PDFs are correctly scraped.
- Augmented an internal API by adding new endpoints and models using Ruby on Rails.
- Improved an internal cron service by adding a JSON schedule that methods could run on.
- Added documentation on how to set up and test various internal services locally.
Data Science Consultant
Brookings Institution India Center
- Created a data warehouse in Redshift with one-minute resolution demand data for a large Indian state using Python, Pandas, and EC2. The data was about six TB of column-formatted .xls files compressed as .rars.
- Built a real-time carbon emissions tracker for India at carbontracker in using Vue.js and Plotly and AWS S3, Route 53, and Cloudfront for hosting.
- Featured accomplishments in the Wall Street Journal.
- Scraped data for the carbon tracker using Python, BeautifulSoup, and a Digital Ocean Droplet, storing it in the RDS instance used by the Lambda API.
- Created a machine learning model using Scikit-learn, Python, and Pandas to predict daily electricity demand for a large Indian state trained on data from a Redshift warehouse.
- Developed Python scripts to scrape housing data from various Indian state government websites using Selenium and Pandas.
- Built an API for the carbon emissions tracker using AWS Lambda, AWS API Gateway, Python, and an AWS RDS MySQL instance to serve real-time generation data and various statistics.
Senior Software Engineer
AutoGrid Systems, Inc.
- Led an engineering team both on- and off-shore and drove on-time development and deployment of product features using Agile.
- Implemented several features across AutoGrid's suite of applications using Ruby on Rails, MySQL, RSpec, Cucumber, Python, and Nose Tests.
- Created PySpark jobs to aggregate daily and monthly electricity usage reports for viewing through AutoGrid's customer portal using HBase, Redis, and RabbitMQ.
- Designed and developed a data warehouse for use by customers using Hive, HBase, and Oozie. This data warehouse was used to replace all custom in-house visualizations done by AutoGrid.
- Built an API endpoint to allow end-users to opt-out of demand response events via SMS using Ruby on Rails and Twilio.
- Optimized SQL queries to take 40% less time, making loading times much quicker in the UI.
- Designed a messaging microservice to send and track emails, SMS, and phone calls via Twilio and SendGrid.
Experience
Brookings India Electricity and Carbon Tracker
http://carbontracker.inFirmation Time Tracker
Successfully piloted the tool with two medium-sized law firms, and iteratively improved it to meet customer needs.
Led design, development, sales, and marketing for the product.
Used AWS Lambda with API Gateway as the back end, and DynamoDB as the database. Also used Python, Pandas, AWS S3, and SES to generate and send out billable work summaries to users via email. Used AWS Cloudwatch to automatically generate and send out billable work summaries to users weekly. Used the Microsoft Graph API and Gmail and Google Calendar APIs to read user data.
Used Google Ads and Analytics to sell to users, Leadpages to host our landing page, and a combination of personal referrals and cold emails for sales.
Democrafy
This uses a Ruby on Rails back end hosted on Heroku with MongoDB as the database.
Bombay Food Blog
Haryana Medical Scraper
CustomJob
WhatsApp Fact Checker
Users can forward suspected fake news to the bot to report it, or see how many other users have reported it and the reasons they reported it.
Restaurant Reservation Manager
The manager allows restaurants to log in, create tables, create/edit reservations, and see a summary of reservations on a given day.
Toptal React Academy
Education
Bachelor of Arts Degree in Computer Science, Political Economy
UC Berkeley - Berkeley, CA
Certifications
Toptal React Academy Graduate
Toptal, LLC
Skills
Libraries/APIs
REST APIs, Pandas, PySpark, Gmail API, Google Calendar API, Google Apps, Natural Language Toolkit (NLTK), PiLLoW, React, Scikit-learn, Matplotlib, Instagram API, Selenium WebDriver, Beautiful Soup, Puppeteer, Node.js, SQLAlchemy, Google APIs, OneDrive, Resque, React Redux, Vue, Keras, AWS Amplify, YouTube API
Tools
IPython, IPython Notebook, Jupyter, Stitch Data, Microsoft Outlook, Azure App Service, Seaborn, MATLAB, RabbitMQ, SendGrid, Cloudera, BigQuery, GitHub, PyCharm, RubyMine, Amazon Simple Email Service (SES), Google Analytics, Amazon CloudFront CDN, Celery, Plotly, Amazon CloudWatch, Microsoft Excel, LeadPages, Tableau, Looker, Amazon Elastic Container Service (ECS), Amazon Cognito, Cron, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS)
Languages
Python, SQL, Ruby, Java, Bash, XML, C++, HTML, CSS, JavaScript, Python 3
Frameworks
Ruby on Rails (RoR), Spark, Apache Spark, OAuth 2, Selenium, Yarn, Redux, Flask, Hadoop, Presto, Django, AngularJS
Paradigms
Automation, ETL, REST, Testing, Automated Testing, Agile, Business Intelligence (BI), Microservices
Platforms
AWS Lambda, Amazon Web Services (AWS), Jupyter Notebook, Google Cloud Platform (GCP), Amazon EC2, Apache Kafka, Twilio, DigitalOcean, Linux, MacOS, Google Ads, Droplets, Heroku, Firebase, Android, Kubernetes, Docker
Storage
MySQL, PostgreSQL, Redis, Relational Databases, Amazon S3 (AWS S3), Redshift, Apache Hive, HBase, NoSQL, JSON, Amazon DynamoDB, MongoDB, Databases, Elasticsearch
Other
Scraping, Web Scraping, PDF Scraping, Data Scraping, Data Engineering, Big Data, Data Analytics, Data Analysis, Data Warehouse, Data Warehouse Design, APIs, Web Crawlers, Microsoft Graph API, Forecasting, Data Science, Machine Learning, Serverless, Data Visualization, Cloud, Amazon Route 53, OAuth, Google Tag Manager, Image Processing, Time Series, Google BigQuery, Instagram Growth, Natural Language Processing (NLP), Statistics, ECS, Amazon API Gateway, Geolocation, User Authentication, Amazon Kinesis, Architecture, Cost Reduction & Optimization (Cost-down), BlueSnap, Generative Pre-trained Transformers (GPT)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring