
Anish Chakraborty
Verified Expert in Engineering
Software Developer
Stockholm, Sweden
Toptal member since August 4, 2020
Anish is an experienced software engineer with a profound knowledge of back-end systems, databases, data warehousing, data engineering, and building data-driven products and services. Although self-taught in SQL, Scala, and Python, Anish has previously won international SQL coding contests.
Portfolio
Experience
- SQL - 8 years
- Java - 4 years
- Apache Spark - 4 years
- Scala - 4 years
- Docker - 4 years
- Python - 3 years
- Data Analysis - 2 years
- Snowpark - 1 year
Availability
Preferred Environment
Visual Studio Code (VS Code), IntelliJ IDEA, MacOS, Slack
The most amazing...
...thing I've developed is a smart mirror as part of a hackathon at Philips.
Work Experience
Software, DevOps, and Back-end Engineer
Ask Iggy
- Created the complete back-end infrastructure from scratch on the Google Kubernetes Engine (GKE) for a road network routing engine, a core component for running geospatial algorithms for machine learning (ML) feature generation.
- Scaled back-end services for routing engines for a throughput range of 800,000 to 1 million requests per second, with 80ms p90 latency, and 100 TB data storage and transfer.
- Conducted iterative research and development of geospatial algorithms for real estate investments and spatial search engines.
- Designed and created a geospatial data warehouse for data delivery in BigQuery, managing 100TB of data and running custom spatial data processing algorithms. It has container orchestration using Argo, a Kubernetes native container orchestration system.
- Created in-memory geospatial queues using Redis for dynamically managing request routing to pods for processing spatial data on a large scale (billions of data points) with a data size of 50TB.
Lead Software Development Engineer | Software Architect
Freelance
- Developed the core back-end infrastructure to manage 80+ microservices, including CI/CD, authentication, service discovery, and edge layer API gateway.
- Designed and developed a cloud configuration management system using version management from GitLab backed by GCS and Firebase and deployed across microservices in GKE, Cloud Run, and App Engine.
- Guided developers across data engineering, DevOps, and back-end disciplines to build microservices on Google Cloud.
Senior Software Engineer
Spotify
- Served as a data and back-end engineer for Spotify's core payments and subscription engine that handles over 100 million monthly subscribers and $6 billion in revenue (early 2019).
- Contributed to open source software like Scio (Scala API of Apache Beam). I am a co-maintainer of DBeam (RDBMS iOS for Apache Beam).
- Created courses and taught engineering practices to several fellow engineers through structured classes, leading to engineers working in cross-functional teams building data products running at 200TB+ scale and processing 2+ million events per second.
- Designed frameworks and created tools for creating and managing high SLO data pipelines with automated monitoring and fault tolerance. Oversaw the adoption of the tool, leading to lower SLO breaches.
- Architected and developed an infrastructure for high throughput (2 million requests per second) and low latency (p99 of 12ms), with real-time feature lookup services for recommendation systems.
Data Engineer
Philips
- Created a microservice-based big data platform, allowing data analysts and scientists to access anonymized data collected by Philips for remarketing.
- Designed a framework to process vast amounts of mobile clickstream data collected from mobile apps using Adobe SiteCatalyst in machine learning models for churn prediction.
- Productionized data mining algorithms developed in association with Philips R&D to detect sleep patterns for babies using camera monitors. Implemented this data product using Apache Spark.
- Implemented interfaces to collect and store sensor data from connected devices, which can support various data-driven products for the internet of things use cases using a microservice-based architecture.
- Designed and developed a rule-based engine to detect and enhance data quality in a distributed and scalable environment using Apache Spark as a processing engine.
Experience
Sleep Pattern Detection for Baby Monitors
Real-time Analytics
The application sourced event logs from Google Pubsub and processed them in Apache Beam, storing results in Google Cloud Datastore and used for real-time dashboards.
IoT Data Platform for Real-time Data Processing
Scala DSL for an Alerting Framework
Migrated PyTorch Modeling to Cloud
Optimizing Payment Retries
Financial Accounting for Payouts
Scio | Open-source Scala API for Apache Beam
https://github.com/spotify/scioFeature Engineering for Transaction Risk in Bitcoin
Data Warehousing | Casino Gaming Business
This was built on AWS using technologies like EMR, Apache Spark (Python), AWS Glue, and AWS Athena. The warehouse supported real-time queries on the data from Kafka with the infrastructure configured using Terraform.
SQL-based Data Platform on Fivetran, DBT, and Snowflake
The platform was built from scratch, then pipelines were migrated from a custom Scala framework onto Kafka, Fivetran, Snowpark, and dbt on Snowflake. The tech stacks used included the orchestration of Snowflake SQL via dbt Cloud with Amazon Managed Workflows for Apache Airflow and Terraform to manage the infrastructure. We used SnowPark for the heavily customized pipelines and AWS as the cloud platform.
Edge Layer Infrastructure for Web App
This consisted of analyzing requirements around security concerns, user authentication, CDN setups, evaluating several options for implementing authentication and API Gateway design, monitoring traffic, understanding the use of microservices in the domain, and finally planning and rolling out this architecture over eight months across all services.
Geospatial Processing Infra
Skills
Libraries/APIs
PySpark, Django ORM, REST APIs, Complex SQL Queries, Redis Queue, Snowpark, PyTorch, TensorFlow, spray, Terragrunt, Luigi
Tools
GIS, IntelliJ IDEA, Spark SQL, Apache Beam, Apache Airflow, SBT, BigQuery, Composer, Google Cloud Composer, Terraform, Google Kubernetes Engine (GKE), Postman, pgAdmin, Grafana, Slack, Amazon Simple Queue Service (SQS), RabbitMQ, AWS Fargate, Cloud Dataflow, AWS Glue, Amazon Athena, Amazon Elastic MapReduce (EMR), Amazon Simple Notification Service (SNS), Flink, Google Cloud Dataproc, dbt Cloud, Docker Compose, Envoy Proxy
Languages
Scala, Python, SQL, Java, Python 3, Snowflake, Go, Bash Script
Frameworks
Apache Spark, Spark, Django, Hadoop, Spark Structured Streaming, Akka, Play SDK, Google Cloud Endpoints
Paradigms
ETL, Database Design, Testing, DevOps, HIPAA Compliance, Anomaly Detection, Microservices, REST
Platforms
Google Cloud Platform (GCP), AWS Lambda, Amazon Web Services (AWS), Firebase, Kubernetes, Visual Studio Code (VS Code), Docker, Apache Kafka, Azure, MacOS, Apache Flink, VMware Tanzu Application Service (TAS) (Pivotal Cloud Foundry (PCF)), Google App Engine
Storage
PostgreSQL, MySQL, Database Management, Google Cloud, Database Migration, Firebase Realtime Database, Redis, Databases, NoSQL, Database Administration (DBA), Data Pipelines, Database Architecture, Data Integration, Distributed Databases, Redshift, Google Cloud Spanner, Google Cloud Datastore, Apache Hive, MongoDB, Amazon S3 (AWS S3)
Other
Data Analysis, Google BigQuery, Pub/Sub, Data Engineering, Data, Data Modeling, Data Migration, Data Build Tool (dbt), Database Schema Design, Google Cloud Functions, Geospatial Data, CI/CD Pipelines, Data Profiling, Data Cleaning, Data Cleansing, Big Data, Scaling, Big Data Architecture, Data Warehousing, Data Architecture, Architecture, ELT, Shell Scripting, API Integration, Database Optimization, APIs, Real-time Data, Amazon RDS, Geospatial Analytics, Mobile Analytics, Data Science, Fivetran, Machine Learning Operations (MLOps), Revenue Management, Electronic Medical Records (EMR), HIPAA Electronic Data Interchange (EDI), Machine Learning, Akka HTTP, SDKs, Data Processing, Streaming, Amazon Managed Workflows for Apache Airflow (MWAA), Content Delivery Networks (CDN), Prometheus, Cloud, Data Warehouse Design, API Design
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring