
Ayan Chakraborty
Verified Expert in Engineering
Data Engineer and Developer
Kolkata, West Bengal, India
Toptal member since February 19, 2021
Ayan is a developer and seasoned technical leader with enthusiasm for data and specializing in leading data analytics projects and architecting data. Over the past decade, Ayan has worked hands-on with every part of the data lifecycle for data engineering and analytics, primarily in education, manufacturing, and retail. Thanks to his experience and expertise, Ayan understands business priorities and develops the project accordingly to achieve project goals efficiently.
Portfolio
Experience
- Business Intelligence (BI) Platforms - 10 years
- Stored Procedure - 9 years
- Data Warehousing - 9 years
- ETL Development - 9 years
- SQL - 9 years
- Tableau - 5 years
- SQL Server 2010 - 5 years
- Snowflake - 4 years
Preferred Environment
MacOS, Windows, PyCharm, DataGrip
The most amazing...
...thing I've made for an edtech startup is an end-to-end reporting and warehouse solution that scaled from ten students to 150,000 students without any issues.
Work Experience
Data Analysis Expertise
PepsiCo Global - Main
- Migrated legacy Snowflake SQL views to dbt models for Instacart and DoorDash RSV and unmapped products, keeping business logic consistent.
- Repointed ThoughtSpot Digital Shelf DQ Monitoring liveboards from views to new dbt tables, matching filters and KPIs, and improving performance.
- Tuned SQL using EXPLAIN and warehouse diagnostics to reduce cost and runtime while maintaining result parity.
- Created a safe rollout process using development-clone liveboards, TML backups, and “do not use” flags on deprecated models.
Data Engineer
Brightly - Main
- Optimized SQL Server query performance by analyzing execution plans, refactoring complex T-SQL queries, and implementing appropriate indexing strategies, resulting in reduced query execution time.
- Monitored and troubleshot SQL Server performance issues using DMVs and query statistics, identifying bottlenecks related to CPU, memory, and I/O, and applying fixes to ensure stable and scalable data pipelines.
- Performed performance tuning on large transactional and analytical tables by optimizing joins, partitions, and stored procedures, and resolving data type mismatches to improve overall database efficiency.
Data Architect Consultant
Shega LLC
- Built and optimized end-to-end ETL pipelines on Azure using Azure Databricks (PySpark and SQL) for scalable data processing.
- Implemented Delta Lake best practices (Z-Order, Liquid Clustering, Auto-Compaction, CDF, and Time Travel) to improve performance and reliability.
- Designed curated analytics layers in ADLS Gen2 (Bronze, Silver, and Gold) and served reporting datasets via Azure Synapse Analytics.
- Managed CI/CD and delivery using Azure DevOps, including version control, automated deployments, and production support.
Senior Data Architect
LotLinx, Inc
- Processed 10+ terabytes of data using BigQuery, Remote Function, and Cloud Function within four hours daily using parallel processing.
- Designed the data mesh from scratch and trained the team on best practices for implementing cloud data warehouses and how to scale.
- Architected the data governance and security implementation process for over 16 million cars, 500+ dealers, and 1+ million customer data.
- Designed and architected Looker as a visualization solution for the company and released more than 35+ dashboards.
- Led the Looker implementation, building 35+ dashboards with row-level security and a scalable semantic layer, driving real-time insights across 16+ million vehicles and 500+ dealers through seamless BigQuery integration.
Senior Data Warehouse Architect
solarisBank
- Architected a data warehouse (data mesh) from scratch, from S3 as a data lake to Snowflake as a data warehouse.
- Handled data from multiple data platforms, for example, Samsung Pay. Dealt with a data warehouse design that could scale more than 15 TB of data.
- Mentored a team size of more than 7+. Involved in scrum process implementation for the first time in the team and the design of agile processes.
- Deployed Airflow with dbt in Snowflake, and in terms of data modeling, Data Vault 2.0, data mesh, and dimensional modeling. Also, employed data governance, metadata management, and data catalog with Collibra.
- Built processes supporting data transformation, data structures, metadata, dependency, and workload management. Used streams, tasks, multi-table inserts, and Snowpipe.
- Managed workload management, such as frequency, concurrency, scan size, copy, and SLA.
Senior Data Architect
Yara International
- Completed a comparative POC between Snowflake and Redshift, defined the main 15 use cases in the context of Yara as a business, and implemented Snowflake as final data as a service. Processed 500 MB geo files per batch from S3.
- Implemented the scrum process in the team and guided three people as one of the founders, and hired another two for the company. Established scrum as a ceremony and implemented it for the team.
- Deployed Airflow inside Docker and PostgreSQL for parallel processing for loading data into the current Hive data store from four different sources with even three layers of nested JSON.
- Implemented DBT (data build tool) for around 1.8 million rows to process for each small-market state in India, Thailand, and APAC countries.
Business Intelligence Development Lead | Data Architect
Alef Education
- Designed and architected a data warehouse with Snowflake and made it a data vault from scratch to accommodate existing features and changes for the new features up to 10 TB in data size.
- Created an automated reporting platform that has cut down $12,000 per year on licensing costs; the same platform has cut down 60% manual effort for custom reports and managed multiple data sources.
- Configured Snowpipe for real-time reports from xAPI data on each click from students. Managed data governance and administration with Snowflake and Collibra.
- Built a meaningful dashboard for seven different teams of stakeholders. Included CXOs in Tableau, reached out to over 450 school leaders, and impacted over 150,000 students' lives.
Senior Business Intelligence Analyst
Mediabrands
- Developed new data pipelines and workflows using Python, Apache Airflow, and Redshift, which reduced costs by 15% on custom schedulers, the cost of managing multiple platforms by 51%, and the whole SQL server maintenance cost.
- Designed and developed the whole warehouse in Redshift with a data volume of 2.5 GB daily incremental on an 8-node cluster.
- Co-led marketing data mining projects and pushed data to the data warehouse in Redshift from multiple sources like Facebook, Google DoubleClick, Google AdWords, and Datapoint.
Senior Software Engineer (Data and Analytics)
Nous Infosystems
- Led four data analytics projects and developed data models and visualizations in Qlik and Tableau.
- Architected a data warehouse for Liaison International in Boston (a company that analyzed educational data of US university applicants from all over the world) and grew the team from two people to six.
- Won the award of star performer thanks to client satisfaction and increasing new revenue impact by 12% on two projects.
- Performance-tuned SQL procedures for multiple clients such as Deloitte and Everest Reinsurance.
Senior Database Designer and Programmer
NetZoom
- Designed the database schema, programmed T-SQL procedures, and performance-tuned, which improved the response time from 15% to 21%.
- Created dashboards with Tableau for the CXO audience.
- Implemented an Agile process to maintain team workflows with members in different geographic locations.
Development Team Lead
Acronym Solutions
- Led teams on data warehouse projects regarding sales and marketing data analysis to restructure the cost and provide management with a solid dashboard to make better decisions for Khadim India, which reduced the cost of opening a new showroom by 26%.
- Developed two data mining projects with a three million row intake per day, using SQL Server and Google Cloud SQL.
- Created new business opportunities with clients, such as Khadim India and Electrosteel.
CRM Consultant
Cognizant
- Implemented and designed critical business requirements for the interface design using Informatica and Oracle's Siebel CRM. Processed both batch and on-request data with Informatica auto-scheduler.
- Designed the Informatica data flow for AstraZeneca Australia and AstraZeneca Germany to get CRM data from the European and Australian markets and store them in the Oracle database for further report generation.
- Created complex data transformation with Informatica and worked on the project for migration to Informatica from Datastage.
- Won the "Best Trainee of the Year" award in 2011 for Oracle and Siebel training.
Experience
Student Admission Reporting | Enterprise Data Warehouse (EDW) and Analysis
I developed and led a team of four to build a pipeline for a data flow in SSIS. We were also responsible for more than 60 stored procedures that handled data processing. I also built the EDW architecture for reporting purposes and designed the Tableau dashboards; we integrated them with the platform so that the end-users could obtain insights.
Liaison ETL and Data Warehouse Design and Data Analytics
More than 7,000 programs rely on them for help identifying, engaging, and enrolling prospective students. They needed a data warehouse and dashboards for the university management to understand the applicant's details in a more meaningful way which would drive the future admission process to be better and smoother.
My Responsibilities:
• Understood the requirements from the users and the existing system.
• Designed and deployed the SSIS package for the data warehouse.
• Created critical T-SQL procedures to meet facts and dimensions needed.
• Designed cycle-wise application status and the application number heat map in Tableau Desktop.
• Implemented security-based user access organization-wise.
• Scheduled tasks for incremental data loads.
Transportation Analysis and Reports
The app provides an analysis of MMBF of all trips for all routes, service type, peak type, and other parameters as well as to assess fleet performance—reliability and availability. Users will be able to analyze the data presented in detail at various levels through visualizations.
My Responsibilities:
• Thoroughly comprehended the requirements from the users and the existing system.
• Worked in Tableau, Python, and SSRS
• Created a complex Python script to build the data model for having rolling rank and
expected visualization.
• Designed fleet performance, rank, and locomotive geographical visualizations.
• Implemented security-based user access location-wise.
• Scheduled tasks for incremental data loads.
ERO Data Migration
The data related to demographics, objective, activity, and product was migrated from the Siebel database to their respective systems and then once the data was validated and irrelevant data was filtered out, the required data was migrated to the CRMOD system.
My Responsibilities:
• Helped with an in-depth analysis of the business requirements.
• Used my complete understanding of the SIEBEL data model (party model, activity, meeting, product, and samples).
• Drew extracts from the Siebel database and in a particular model as it can be loaded to the OCOD database.
• Designed and implemented critical business requirements for the interface design
using Informatica.
• Developed Informatica mappings, workflows, and worklets (a group of tasks) for the implementation of critical business requirements of integrating multiple systems.
• Unit-tested the app in various modes and with various types of users.
• Involved in the system-integration testing of the app with the owner teams of the
aligned systems.
• Helped with market engagement from off-shore.
Education
Post-graduate Work in Data Science and Business Analytics
McCombs School of Business | University of Texas at Austin - Austin, TX, United States
Bachelor of Technology Degree in Electronics and Communication Engineering
West Bengal State University - Kolkata, India
Certifications
AWS Certified Solutions Architect – Associate
Amazon Web Services
SnowPro Core Certification
Snowflake.com
Modelling Data Warehouse with Data Vault 2.0
Udemy
Apache Airflow Certification
Udemy
Skills
Libraries/APIs
PySpark
Tools
Microsoft Power BI, Tableau, Apache Airflow, Siebel CRM, Looker, BigQuery, AWS Glue, AWS CodeBuild, AWS Step Functions
Languages
SQL, Transact-SQL (T-SQL), Stored Procedure, Snowflake, Python, Python 3, Bash
Paradigms
ETL, HIPAA Compliance
Platforms
Azure, Amazon Web Services (AWS), Amazon EC2, Microsoft Fabric, Google Cloud Platform (GCP), Linux, AWS Lambda
Storage
Data Pipelines, Redshift, SQL Server Integration Services (SSIS), SQL Server 2010, Database Modeling, PostgreSQL, Data Lakes
Other
Data Warehousing, ELT, ETL Development, Data Architecture, Business Intelligence (BI) Platforms, Google BigQuery, Data Mining, Data Analytics, Data Build Tool (dbt), Data Engineering, DAX, Data Modeling, Data Vaults, Data Analysis, Data Mesh, APIs, Cloud Architecture, Cloud Services, AWS Cloud Architecture, ETL Tools, Large Language Models (LLMs), Mobile Apps, Artificial Intelligence (AI), General Data Protection Regulation (GDPR), Big Data, Data Privacy, Data Quality Governance, Data Security, Data Auditing, Data Visualization, Datasets, Ad-hoc Analysis, eCommerce
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring