Anish is available for hire

Anish Chakraborty

Verified Expert in Engineering

Software Developer

Location

Stockholm, Sweden

Toptal Member Since

August 4, 2020

Anish is an experienced software engineer with very deep knowledge of back-end systems, databases, data warehousing, data engineering, and building data-driven products and services. Although he's self-taught in SQL, Scala, and Python, Anish has previously won international SQL coding contests.

Portfolio

Ask Iggy

Python, Google BigQuery, Google Cloud Platform (GCP), Terraform, SQL, DevOps...

Freelance

Python 3, Go, Google Cloud, Google Cloud Platform (GCP), Google App Engine...

Spotify

SQL, Java, Python, Scala, Google Cloud Platform (GCP), NoSQL, Big Data, Scaling...

Experience

SQL - 8 years Java - 4 years Apache Spark - 4 years Scala - 4 years Docker - 4 years Python - 3 years Data Analysis - 2 years Snowpark - 1 year

Availability

Part-time

Preferred Environment

Visual Studio Code (VS Code), IntelliJ IDEA, MacOS, Slack

The most amazing...

...thing I've developed is a smart mirror as part of a hackathon at Philips.

Work Experience

Software, DevOps, and Back-end Engineer

2022 - 2023

Ask Iggy

Developed the complete back-end infrastructure on the Google Kubernetes Engine (GKE) for a routing engine, a core component for running geospatial algorithms for machine learning (ML) feature generation.
Scaled back-end services for routing engines to run throughputs in the range of 800,000, around 1 million requests per second.
Conducted iterative research and development of geospatial algorithms for real estate investments and spatial search engines.
Designed and created a geospatial data warehouse for data delivery in BigQuery, running custom spatial data processing algorithms and with container orchestration using Argo, a Kubernetes native container orchestration system.

Technologies: Python, Google BigQuery, Google Cloud Platform (GCP), Terraform, SQL, DevOps, PostgreSQL, Kubernetes, Google Kubernetes Engine (GKE), Redis, Geospatial Data, Geospatial Analytics, CI/CD Pipelines, Prometheus, Big Data, Scaling, BigQuery, ETL, ELT, Data Pipelines, Data Architecture, Big Data Architecture, Architecture, Cloud, Shell Scripting, Postman, Data Integration, REST APIs, API Integration, Data Build Tool (dbt), Database Optimization, pgAdmin, APIs

Lead Software Development Engineer | Software Architect

2021 - 2023

Freelance

Developed the core back-end infrastructure to manage 80+ microservices, including CI/CD, authentication, service discovery, and edge layer API gateway.
Designed and developed a cloud configuration management system using version management from GitLab backed by GCS and Firebase and deployed across microservices in GKE, Cloud Run, and App Engine.
Guided developers across data engineering, DevOps, and back-end disciplines to build microservices on Google Cloud.

Technologies: Python 3, Go, Google Cloud, Google Cloud Platform (GCP), Google App Engine, Firebase, Firebase Realtime Database, Google Cloud Functions, Kubernetes, Google Kubernetes Engine (GKE), Mobile Analytics, Shell Scripting, Django ORM, Django, Postman, Data Integration, REST APIs, API Integration, Database Administration (DBA), Database Optimization, pgAdmin, APIs, Distributed Databases, Real-time Data

Senior Software Engineer

2018 - 2022

Spotify

Served as a data and back-end engineer for Spotify's core payments and subscription engine that handles over 100 million monthly subscribers and $6 billion in revenue (early 2019).
Contributed to open-source software like Scio (Scala API of Apache Beam). I am a co-maintainer of DBeam (RDBMS iOS for Apache Beam).
Created courses and taught engineering practices to several fellow engineers through structured classes, leading to engineers working in cross-functional teams.
Designed frameworks and created tools for creating and managing high SLO data pipelines with automated monitoring and fault tolerance. Oversaw the adoption of the tool, leading to lower SLO breaches.
Architected and developed infrastructure for high throughput and low latency, real-time feature lookup services for recommendation systems.

Technologies: SQL, Java, Python, Scala, Google Cloud Platform (GCP), NoSQL, Big Data, Scaling, BigQuery, Data Warehousing, Hadoop, Shell Scripting, Postman, Data Integration, REST APIs, API Integration, Machine Learning Operations (MLOps), Revenue Management, PostgreSQL, Envoy Proxy, API Design, Data Pipelines, Kubernetes, Google Kubernetes Engine (GKE), Apache Beam, Cloud Dataflow, Data Build Tool (dbt), SBT, Database Administration (DBA), Database Optimization, pgAdmin, APIs, Distributed Databases, Real-time Data

Data Engineer

2016 - 2017

Philips

Created a microservice-based big data platform, allowing data analysts and scientists to access anonymized data collected by Philips for remarketing.
Designed a framework to process vast amounts of mobile clickstream data collected from mobile apps using Adobe SiteCatalyst in machine learning models for churn prediction.
Productionized data mining algorithms developed in association with Philips R&D to detect sleep patterns for babies using camera monitors. Implemented this data product using Apache Spark.
Implemented interfaces to collect and store sensor data from connected devices, which can support various data-driven products for the internet of things use cases using a microservice-based architecture.
Designed and developed a rule-based engine to detect and enhance data quality in a distributed and scalable environment using Apache Spark as a processing engine.

Technologies: Scala, Java, Python 3, PySpark, Apache Spark, Akka HTTP, PostgreSQL, Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (Amazon SNS), Big Data, Mobile Analytics, Data Pipelines, Database Architecture, Data Architecture, Big Data Architecture, Hadoop, MySQL, Shell Scripting, Postman, Data Integration, REST APIs, Database Administration (DBA), APIs, HIPAA Compliance, Electronic Medical Records (EMR), HIPAA Electronic Data Interchange (EDI), Real-time Data

Experience

Sleep Pattern Detection for Baby Monitors

A data product for Philips Healthcare. I was a data and back-end engineer for the Philips IoT division, working on products used as at-home baby monitors. My role involved implementing the data flow for processing data collected via sensors in devices and processing them to detect user patterns to create custom actions based on the results.

Real-time Analytics

A Google Dataflow/Apache Beam application for real-time dashboards.

The application sourced event logs from Google Pubsub and processed them in Apache Beam, storing results in Google Cloud Datastore and used for real-time dashboards.

IoT Data Platform for Real-time Data Processing

I was the back-end and data engineer for developing a microservice-based system for ingesting events from IoT devices (like Philips Hue) and storing them for analytical and alerting purposes. The system was initially designed on RabbitMQ, and I migrated it to Amazon SQS.

Scala DSL for an Alerting Framework

I wrote a developer-facing SDK in Scala for defining and managing alerts on data pipelines. This provides a custom DSL for managing metrics and defining alerts on these metrics on the Google Cloud Platform.

Migrated PyTorch Modeling to Cloud

I led the migration of PyTorch-based DNN training models to run native on AWS Fargate and orchestrated it via Apache Airflow. This project involved migrating multiple stakeholders using an internal machine learning library for training models on on-prem infra to move to a wholly managed Dockerized environment in the cloud.

Optimizing Payment Retries

I was the lead data engineer for a project involving machine learning using TensorFlow to predict the outcome of payment retries and plan retry patterns leading to a decrease in payment failures. The project involved understanding the payment processing domain, designing warehouses for efficiently analyzing logs at scale using Big Query, and training models to serve the back ends.

Financial Accounting for Payouts

I led the user-level data sourcing for financial payout processing for one of my clients. The project included migrating an existing sourcing infra from an API-based workflow to a data export-based workflow using Google Cloud Dataflow and Apache Beam for data processing.

Scio | Open-source Scala API for Apache Beam

https://github.com/spotify/scio

As a contributor to Scio, which is the Scala API for Apache Beam, I've written sparse join modules and co-implemented the use of Bloom filters to improve the performance of certain join operations in Scio.

Feature Engineering for Transaction Risk in Bitcoin

I scaled up and optimized an Apache Spark pipeline to run feature engineering on billions of records (25+ TB) in Bitcoin transactions. These features were used for training machine learning models for creating a risk score for transactions on the Bitcoin blockchain.

Data Warehousing | Casino Gaming Business

A data warehouse for a casino gaming business for real-time analytics and feature engineering based on data sourced from Kafka.

This was built on AWS using technologies like EMR, Apache Spark (Python), AWS Glue, and AWS Athena. The warehouse supported real-time queries on the data from Kafka with the infrastructure configured using Terraform.

SQL-based Data Platform on Fivetran, DBT, and Snowflake

A cloud-based data platform built with dbt, running on Snowflake, created for an IoT company with 30+ data scientists organized across multiple domains and teams. The project involved creating a data warehouse for the supply chain and IoT domain following the ELT method, with data processed using Snowflake and Snowpark.

The platform was built from scratch, then pipelines were migrated from a custom Scala framework onto Kafka, Fivetran, Snowpark, and dbt on Snowflake. The tech stacks used included the orchestration of Snowflake SQL via dbt Cloud with Amazon Managed Workflows for Apache Airflow and Terraform to manage the infrastructure. We used SnowPark for the heavily customized pipelines and AWS as the cloud platform.

Edge Layer Infrastructure for Web App

Designed and rolled out the complete Edge layer infrastructure and API Gateways using Google Cloud Endpoints, Envoy, and Google ESP to a complex web app consisting of 80+ microservices and server-less applications.

This consisted of analyzing requirements around security concerns, user authentication, CDN setups, evaluating several options for implementing authentication and API Gateway design, monitoring traffic, understanding the use of microservices in the domain, and finally planning and rolling out this architecture over eight months across all services.

Geospatial Processing Infra

A system designed to process geo-spatially distributed data to create features for ML models. The system is designed to run on thousands of servers at a massively parallel scale and uses road network data from Open Street Maps for calculating routes and features related to routes at scale.

Skills

Languages

Scala, Python, SQL, Java, Python 3, Go, Bash Script, Snowflake

Frameworks

Apache Spark, Spark, Django, Hadoop, Spark Structured Streaming, Akka, Play SDK, Google Cloud Endpoints

Libraries/APIs

PySpark, Django ORM, REST APIs, Redis Queue, PyTorch, TensorFlow, spray, Terragrunt, Luigi

Tools

GIS, IntelliJ IDEA, Spark SQL, Apache Beam, Apache Airflow, SBT, BigQuery, Composer, Google Cloud Composer, Terraform, Google Kubernetes Engine (GKE), Postman, pgAdmin, Grafana, Slack, Amazon Simple Queue Service (SQS), RabbitMQ, AWS Fargate, Cloud Dataflow, AWS Glue, Amazon Athena, Amazon Elastic MapReduce (EMR), Amazon Simple Notification Service (Amazon SNS), Flink, Google Cloud Dataproc, Docker Compose, Envoy Proxy

Paradigms

ETL, Database Design, Testing, DevOps, Data Science, HIPAA Compliance, Anomaly Detection, Microservices, REST

Platforms

Google Cloud Platform (GCP), AWS Lambda, Amazon Web Services (AWS), Firebase, Kubernetes, Visual Studio Code (VS Code), Docker, Apache Kafka, Azure, MacOS, Apache Flink, VMware Tanzu Application Service (TAS) (Pivotal Cloud Foundry (PCF)), Google App Engine

Storage

PostgreSQL, MySQL, Database Management, Google Cloud, Database Migration, Firebase Realtime Database, Redis, Databases, NoSQL, Database Administration (DBA), Data Pipelines, Database Architecture, Data Integration, Distributed Databases, Redshift, Google Cloud Spanner, Google Cloud Datastore, Apache Hive, MongoDB, Amazon S3 (AWS S3)

Other

Data Analysis, Google BigQuery, Pub/Sub, Data Engineering, Data, Data Modeling, Data Migration, Data Build Tool (dbt), Database Schema Design, Google Cloud Functions, Geospatial Data, CI/CD Pipelines, Data Profiling, Data Cleaning, Data Cleansing, Big Data, Scaling, Big Data Architecture, Data Warehousing, Data Architecture, Architecture, ELT, Shell Scripting, API Integration, Database Optimization, APIs, Real-time Data, Geospatial Analytics, Mobile Analytics, Snowpark, Machine Learning Operations (MLOps), Revenue Management, Electronic Medical Records (EMR), HIPAA Electronic Data Interchange (EDI), Machine Learning, Akka HTTP, SDKs, Data Processing, Streaming, Amazon Managed Workflows for Apache Airflow (MWAA), Dbt Cloud, Content Delivery Networks (CDN), Prometheus, Cloud, Fivetran, Data Warehouse Design, API Design

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring