Faisal Malik Widya Prasetya, Developer in Sleman Sub-District, Sleman Regency, Special Region of Yogyakarta, Indonesia
Faisal is available for hire
Hire Faisal

Faisal Malik Widya Prasetya

Verified Expert  in Engineering

Data Engineer and Developer

Sleman Sub-District, Sleman Regency, Special Region of Yogyakarta, Indonesia

Toptal member since April 25, 2022

Bio

Faisal is a data engineer specializing in cloud data technologies like Google and AWS and end-to-end data engineering processes. From designing the architecture and building the infrastructure to developing pipeline operations, he is highly adaptable to new cloud-based, open source, or SaaS technologies. Faisal has solid experience contributing to early-stage startups by directly building end-to-end data pipelines or providing consulting services in his fields of expertise.

Portfolio

Pathbox AI Inc.
AWS Lambda, Amazon Web Services (AWS), Node.js, Amazon RDS, Amazon API Gateway...
Burak Karakaya
Web Scraping, Data Scraping, Scraping, Amazon Web Services (AWS), JavaScript...
XpressLane, Inc.
Data Engineering, Python, Google Data Studio, PostgreSQL, Google BigQuery...

Experience

  • Python - 5 years
  • Google Cloud Platform (GCP) - 4 years
  • BigQuery - 4 years
  • Apache Airflow - 4 years
  • Amazon Web Services (AWS) - 3 years
  • PySpark - 3 years
  • Data Warehousing - 3 years
  • AWS Lambda - 3 years

Availability

Part-time

Preferred Environment

Visual Studio Code (VS Code), Conda, Linux, Docker, Docker Compose, Google Cloud Platform (GCP), Amazon Web Services (AWS), Jira, OpenAI

The most amazing...

...project I've ever done was implementing a cost optimization strategy on the client data warehouse, reducing BI usage costs up to 100 times.

Work Experience

Senior Software Engineer

2023 - 2024
Pathbox AI Inc.
  • Developed a serverless REST API on AWS Lambda, API Gateway, Cognito Aurora Serverless, and DynamoDB using the Express.js web framework.
  • Built an optimized machine learning inference system using ECS tasks on Fargate. Allowed parallelization by utilizing SQS and concurrent ECS tasks.
  • Developed machine learning inference with GPU enabled using AWS Batch.
  • Standardized the machine learning workflow from dataset collection, data preprocessing, model setup, training, and validation to inference deployment.
  • Implemented analytics on workflow operations to monitor and optimize the process further.
Technologies: AWS Lambda, Amazon Web Services (AWS), Node.js, Amazon RDS, Amazon API Gateway, Amazon S3 (AWS S3), Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Python, AWS Batch, Amazon Elastic Container Service (ECS), Terraform, Metabase, AWS DevOps, OpenTelemetry, Prometheus, API Observability, Amazon SageMaker, Observability Tools, Data Lakehouse, Data Stewardship, Data Lake Design, SQLAlchemy, OAuth, NestJS, TypeScript, TypeORM, Domain-driven Development, Google Cloud Storage, Data Encoding, Nonlinear Optimization, Linear Optimization

Web Scraping Expert

2023 - 2023
Burak Karakaya
  • Developed a real-time web scraper to scrape data from various sources, such as Twitter, Binance Futures Leaderboard, etc., to feed data to the client's trading bot. The scraper can ingest tweets within 200 ms after it is published.
  • Provided the infrastructure on AWS to enable a high-performance network to enable the scraper to work in real time. I set up the IP rotation so that the scraper didn't get blocked by bypassing the IP rate limit from the news sources.
  • Provided an interface for non-technical clients to administer and operate the scraper conveniently. I use Streamlit and FastAPI to develop these interfaces.
  • Utilized Redis and high-performance Python extensions like C to improve the storage and runtime performance of the scraper.
Technologies: Web Scraping, Data Scraping, Scraping, Amazon Web Services (AWS), JavaScript, Python, Streaming Data, Data Integration, Orchestration, Generative Pre-trained Transformers (GPT), LangChain, Solution Architecture, Technical Architecture, Monitoring, Data Auditing, Agile, T-SQL (Transact-SQL), Business Architecture, Enterprise Architecture, Interactive Brokers API, Multithreading, Entity Relationships, Stored Procedure, Software Design, Workflow, Microservices Architecture, API Design, AWS Cloud Architecture, Celery, RabbitMQ, Performance Tuning, Database Design, Amazon API Gateway, Amazon Simple Queue Service (SQS), SSH, Load Testing, AWS DevOps, Data Stewardship, Data Lake Design, SQLAlchemy, OAuth, Domain-driven Development, Google Cloud Storage, Data Encoding, Nonlinear Optimization, Linear Optimization

Data Engineer

2023 - 2023
XpressLane, Inc.
  • Developed scraping tools to scrape data from various websites and push it to BigQuery.
  • Created development and operations documentation so that the client could maintain the solution and can develop more features on it in the future.
  • Delivered reports and dashboards to clients from the scraped data to help clients better make decisions for M&A use cases.
Technologies: Data Engineering, Python, Google Data Studio, PostgreSQL, Google BigQuery, Dataproc, Google Cloud Dataproc, Looker, Apache Airflow, Redis, Spark, PySpark, Web Scraping, Scraping, Data Wrangling, Data Modeling, Excel 365, Dashboards, Amazon Elastic MapReduce (EMR), Amazon EKS, Data Manipulation, Shell Scripting, MapReduce, Business Intelligence (BI), Business Analysis, Benchmarking, Databases, Performance, Performance Testing, Caching, Stress Testing, Data Reporting, Pandas, Asyncio, Software Architecture, Swagger, DevOps, Artificial Intelligence (AI), Python API, Data Scraping, REST, HTML, CSS, OpenAI GPT-3 API, REST APIs, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, Excel Macros, Database Modeling, Data-driven Design, SaaS, NumPy, API Integration, Natural Language Processing (NLP), Serverless, SharePoint, Amazon ElastiCache, Amazon Simple Notification Service (SNS), Python 3, Git, Lint, OpenAPI, Jupyter, Jupyter Notebook, Design Patterns, Kubernetes, Pytest, FastAPI, eCommerce APIs, Extensions, Scrapy, Data, Apache Spark, Streaming Data, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, Generative Pre-trained Transformers (GPT), LangChain, Solution Architecture, SharePoint Online, Technical Architecture, Monitoring, Data Auditing, Agile, T-SQL (Transact-SQL), Business Architecture, Enterprise Architecture, Interactive Brokers API, Multithreading, Entity Relationships, Stored Procedure, Software Design, Workflow, Microservices, Microservices Architecture, Go, API Design, AWS Cloud Architecture, MongoDB Atlas, Performance Tuning, Dynamic SQL, Database Design, Amazon API Gateway, Amazon Simple Queue Service (SQS), SSH, Load Testing, Prometheus, API Observability, Observability Tools, Data Lakehouse, Data Stewardship, Data Lake Design, SQLAlchemy, OAuth, TypeScript, Domain-driven Development, Google Cloud Storage, Data Encoding

Senior Data Engineer

2022 - 2023
Toptal
  • Designed and implemented a robust data pipeline that extracted data from multiple marketing tools and APIs like Google Ads, Facebook Ads, and Twitter Ads, and transferred it to BigQuery using in-house data pipeline tools based on Luigi.
  • Created a data pipeline solution that efficiently extracted data from various learning platforms such as Polly, Udemy, and Lessonly and consolidated it with BigQuery utilizing Composer, a managed Apache Airflow service provided by GCP.
  • Participated in the data engineering team split brainstorming session and came up with the idea of breaking the team into the data platform and analytics engineering teams. The analytics engineering team focuses on ETL logic, while the data platform team maintains the infrastructure.
Technologies: Python, SQL, Pandas, Data Engineering, Object-oriented Design (OOD), Object-oriented Programming (OOP), Data Modeling, Scala, Luigi, Apache Airflow, BigQuery, Distributed Computing, Dimensional Modeling, ETL, Google Cloud, Google Cloud Storage, ETL Tools, Scripting Languages, Data Analytics, Data Architecture, Data Management, Data Pipelines, ELT, Big Data Architecture, Snowpark, Architecture, Big Data, Kanban, Project Planning, Agile Project Management, Technical Project Management, Azure Data Lake, Data Wrangling, APIs, Dashboards, Data Manipulation, Shell Scripting, MapReduce, Google Analytics, Web Scraping, Benchmarking, Databases, Performance, Performance Testing, Caching, Stress Testing, Asyncio, Software Architecture, Back-end, GraphQL, DevOps, Artificial Intelligence (AI), Python API, Scraping, Data Scraping, REST, REST APIs, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, Database Modeling, Data-driven Design, SaaS, NumPy, API Integration, Serverless, Amazon ElastiCache, Amazon Simple Notification Service (SNS), Python 3, Git, Lint, Hadoop, Jupyter, Jupyter Notebook, Design Patterns, Kubernetes, Pytest, FastAPI, eCommerce APIs, Amazon API, Scrapy, Data, Apache Spark, Kibana, Streaming Data, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, Solution Architecture, Technical Architecture, Monitoring, Data Auditing, Agile, T-SQL (Transact-SQL), Business Architecture, Enterprise Architecture, Multithreading, Entity Relationships, Stored Procedure, Software Design, Workflow, Microservices, Microservices Architecture, Go, API Design, AWS Cloud Architecture, MongoDB Atlas, Performance Tuning, Database Design, Amazon Simple Queue Service (SQS), SSH, Load Testing, Amazon Elastic Container Service (ECS), OpenTelemetry, Prometheus, API Observability, Amazon SageMaker, Observability Tools, SDKs, Data Lakehouse, Data Stewardship, Data Lake Design, SQLAlchemy, OAuth, TypeScript, Domain-driven Development, Data Encoding

Data Engineer

2021 - 2023
QuantumBlack
  • Developed internal data analytics tools that can simplify deployment on the client site. The feature I built is to ingest data from various sources and store them incrementally on Snowflake.
  • Handled a client request to build a data analytics pipeline and APIs.
  • Worked closely with clients' analytics teams and leadership to gather analytics requirements and carefully plan from the architecture design, to implementation and delivery.
Technologies: Python, Kedro, Apache Airflow, Amazon Web Services (AWS), Google Cloud Platform (GCP), Alibaba Cloud, Spark, PySpark, GitHub, Terraform, ETL Tools, Scripting Languages, SQL, Data Analytics, Amazon Athena, Amazon Redshift Spectrum, AWS Glue, Data Engineering, Microsoft Power BI, Amazon Neptune, Microsoft SQL Server, Oracle Database, Database Administration (DBA), Redshift, NoSQL, Data Architecture, Data Management, Data Lakes, Azure, Database Migration, Amazon RDS, CDC, Amazon Aurora, Data Build Tool (dbt), Snowflake, Data Pipelines, Neo4j, Apache Kafka, ETL, Cloud Migration, IIS SQL Server, Domo, ELT, Big Data Architecture, Snowpark, Oracle, Architecture, Big Data, Azure Data Factory, Kanban, Project Planning, Agile Project Management, Technical Project Management, Azure Data Lake, Data Wrangling, Azure Databricks, Data Modeling, APIs, Databricks, Django, Excel 365, Dashboards, Amazon Elastic MapReduce (EMR), Amazon EKS, Data Manipulation, Spark ML, Amazon QuickSight, Elasticsearch, AWS Step Functions, Shell Scripting, MapReduce, Business Intelligence (BI), Business Analysis, Web Scraping, Benchmarking, Databases, Performance, Performance Testing, Caching, Data Reporting, Pandas, Asyncio, Software Architecture, Back-end, GraphQL, Amazon Cognito, Swagger, DevOps, Artificial Intelligence (AI), Python API, Scraping, Data Scraping, PDF Scraping, REST, AWS Lambda, Flask, OpenCV, Tesseract, QGIS, GIS, GRASS GIS, Flutter, OpenAI GPT-3 API, REST APIs, AWS Elastic Beanstalk, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, eCommerce, Amazon DynamoDB, Database Modeling, Data-driven Design, Neural Networks, SaaS, NumPy, GeoPandas, Shapely, Scikit-learn, API Integration, X (formerly Twitter) API, Node.js, Natural Language Processing (NLP), Serverless, SharePoint, Amazon ElastiCache, Amazon Simple Notification Service (SNS), Python 3, Git, Lint, Hadoop, OpenAPI, Jupyter, Jupyter Notebook, Credit Modeling, Consumer Packaged Goods (CPG), Azure Synapse, Back-end Development, Design Patterns, Kubernetes, Pytest, FastAPI, eCommerce APIs, Amazon API, Extensions, Scrapy, Data, Apache Spark, Kibana, Streaming Data, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, Solution Architecture, SharePoint Online, Technical Architecture, Monitoring, Data Auditing, Agile, Dedicated SQL Pool (formerly SQL DW), Azure SQL Data Warehouse, T-SQL (Transact-SQL), Business Architecture, Enterprise Architecture, Interactive Brokers API, Multithreading, Entity Relationships, PL/SQL, Stored Procedure, Software Design, Workflow, Microservices, Microservices Architecture, Go, API Design, R, AWS Cloud Architecture, MongoDB Atlas, Celery, RabbitMQ, Performance Tuning, Dynamic SQL, Database Design, Amazon API Gateway, Amazon Simple Queue Service (SQS), SSH, Load Testing, Web Scalability, Amazon Elastic Container Service (ECS), AWS DevOps, OpenTelemetry, Prometheus, API Observability, Amazon SageMaker, Observability Tools, SDKs, Data Lakehouse, Data Stewardship, Data Lake Design, SQLAlchemy, OAuth, TypeScript, Domain-driven Development, Data Encoding, Nonlinear Optimization, Linear Optimization

Senior Data Engineer

2021 - 2021
Flip
  • Built a data analytics ecosystem using native Google Cloud Platform technologies, such as Datastream, Google Cloud Storage, Pub/Sub, Dataflow, and BigQuery.
  • Improved the analytics waiting time from a 3-hour worst-case scenario to 30 seconds for one big report.
  • Maintained the legacy technologies for data analytics on MySQL and on-server cron jobs by creating scheduled jobs on a heavy but frequently used query. The heavy query was accessible in less than 30 minutes with daily data freshness.
  • Built the data engineering team and onboarded team members on the legacy, current, and future implementation.
Technologies: Python, Google Cloud Platform (GCP), MySQL, BigQuery, Google BigQuery, Metabase, Data Warehousing, CI/CD Pipelines, GitHub, Data Migration, ETL Tools, Scripting Languages, SQL, Data Analytics, AWS Glue, Data Engineering, Data Analysis, NoSQL, Data Architecture, Data Management, Data Lakes, Database Migration, Amazon RDS, CDC, Amazon Aurora, Data Build Tool (dbt), Data Pipelines, Apache Kafka, ETL, Cloud Migration, ELT, Big Data Architecture, Architecture, Big Data, Kanban, Agile Project Management, Technical Project Management, Microsoft Power BI, Data Wrangling, Data Modeling, APIs, Excel 365, Dashboards, Amazon Elastic MapReduce (EMR), Data Manipulation, Amazon QuickSight, AWS Step Functions, Shell Scripting, Google Analytics, MySQL Performance Tuning, Benchmarking, Databases, Performance, Performance Testing, Data Reporting, Pandas, Asyncio, Software Architecture, Back-end, GraphQL, Swagger, Python API, PDF Scraping, REST, AWS Lambda, Flask, HTML, REST APIs, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, Database Modeling, SaaS, NumPy, API Integration, Serverless, Python 3, Git, Lint, OpenAPI, Jupyter, Jupyter Notebook, Back-end Development, Design Patterns, Elasticsearch, Kubernetes, Pytest, Amazon API, Extensions, Data, Apache Spark, Kibana, Streaming Data, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, Solution Architecture, Technical Architecture, Monitoring, Data Auditing, Agile, Dedicated SQL Pool (formerly SQL DW), Azure SQL Data Warehouse, T-SQL (Transact-SQL), Business Architecture, Enterprise Architecture, Multithreading, Entity Relationships, Stored Procedure, Software Design, Workflow, Microservices, Microservices Architecture, AWS Cloud Architecture, Celery, RabbitMQ, Performance Tuning, Database Design, Amazon API Gateway, Amazon Simple Queue Service (SQS), SSH, Data Stewardship, Data Lake Design, Data Encoding

Data Engineer

2020 - 2021
Pintu
  • Developed an ELT data pipeline on Amazon EC2. It is turned on and off by AWS Lambda, triggered by using CloudWatch scheduler from various data sources (MySQL, PostgreSQL, MongoDB, Google Sheets, crypto exchange APIs) to the BigQuery data warehouse.
  • Implemented partition, clustering, and materialized views on BigQuery and reduced the cost of analytics by up to 100 times.
  • Collaborated with the financial expert to generate the optimum market-making strategy. Implemented and improved the model on the published paper, increasing the liquidity and market activity of the owned asset by 67%.
  • Developed a fraud detection system to alert fraudulent activity in case of a security breach on the system. This alert notifies the executive team and captures the fraudster within four hours. It secured $2 million worth of assets.
  • Trained the business users to develop their own BI reporting using Metabase and Google Data Studio. It led to 70% of Metabase reports being created by the business team, while the other 30% required complex queries.
  • Led the data analytics team and implemented an agile culture by running sprint planning, standup, and sprint retrospective meetings. It allowed tracking business user requests, data pipeline issues, and improvements.
Technologies: Python, Google Cloud Platform (GCP), Amazon Web Services (AWS), Amazon EC2, AWS Lambda, BigQuery, Google BigQuery, Amazon S3 (AWS S3), Metabase, Redash, Google Data Studio, Business Intelligence (BI), Data Visualization, Data Warehousing, Amazon CloudWatch, PostgreSQL, MongoDB, GitHub, ETL Tools, Scripting Languages, SQL, Data Migration, Data Analytics, Data Engineering, Tableau, NoSQL, Data Architecture, Data Management, Data Lakes, Amazon RDS, Amazon Aurora, Data Pipelines, Neo4j, Apache Kafka, ETL, Cloud Migration, Looker, Architecture, Big Data, Kanban, Agile Project Management, Technical Project Management, Snowflake, Data Wrangling, APIs, Excel 365, Dashboards, Data Manipulation, Data Science, Amazon QuickSight, AWS Step Functions, Shell Scripting, MapReduce, Google Analytics, JavaScript, MySQL Performance Tuning, Benchmarking, Databases, Performance, Data Reporting, Pandas, Amazon Cognito, PDF Scraping, REST, Flask, HTML, CSS, REST APIs, AWS Elastic Beanstalk, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, Excel Macros, Amazon DynamoDB, Database Modeling, Automated Trading Software, Neural Networks, SaaS, NumPy, Scikit-learn, API Integration, X (formerly Twitter) API, Natural Language Processing (NLP), Firebase, Serverless, SharePoint, Python 3, Git, Hadoop, SciPy, Jupyter, Jupyter Notebook, TensorFlow, Back-end Development, Design Patterns, Elasticsearch, Kubernetes, Pytest, Amazon API, Extensions, Data, Apache Spark, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, Solution Architecture, Technical Architecture, Monitoring, Data Auditing, Agile, Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW), T-SQL (Transact-SQL), Business Architecture, Enterprise Architecture, Multithreading, Entity Relationships, PL/SQL, Stored Procedure, Software Design, Workflow, Microservices, Microservices Architecture, API Design, AWS Cloud Architecture, MongoDB Atlas, Performance Tuning, Dynamic SQL, Database Design, Amazon API Gateway, Amazon Simple Queue Service (SQS), SSH, AWS DevOps, Data Encoding

Data Engineer

2019 - 2020
Kulina
  • Developed ELT processes from application databases, third-party marketing tools, and Google Sheets to BigQuery using Stitch data, which reduced the number of query conflicts on the production database, indirectly improving application performance.
  • Developed the Snowflake schema on the data warehouse, increasing data visibility among the business team.
  • Deployed, maintained, and administered several BI tools, such as Redash, Data Studio, and Metabase, to gain data governance at the business unit level and answer data-related questions with proper tools.
Technologies: Python, Google Cloud Platform (GCP), Business Intelligence (BI), Data Warehousing, Cryptography, Data Visualization, BigQuery, Google BigQuery, Stitch Data, ETL Tools, Scripting Languages, SQL, Data Analytics, Data Engineering, Data Analysis, Tableau, Data Architecture, Data Management, Amazon RDS, Data-driven Dashboards, Data Pipelines, ETL, Looker, Snowflake, Data Wrangling, Dashboards, Data Manipulation, Data Science, Amazon QuickSight, Shell Scripting, JavaScript, MySQL Performance Tuning, Benchmarking, Databases, Performance, Data Reporting, Pandas, PDF Scraping, REST, HTML, CSS, REST APIs, Algorithms, Data Structures, Software Development, Optimization, Cloud, eCommerce, Excel Macros, Database Modeling, Neural Networks, SaaS, NumPy, Scikit-learn, API Integration, Natural Language Processing (NLP), Firebase, Serverless, Python 3, Git, Hadoop, SciPy, Jupyter, Jupyter Notebook, TensorFlow, Node.js, Amazon API, Data, Apache Spark, Data Integration, Orchestration, Monitoring, Data Auditing, Agile, Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW), T-SQL (Transact-SQL), Multithreading, Entity Relationships, PL/SQL, Stored Procedure, Software Design, Workflow, Microservices, Microservices Architecture, R, AWS Cloud Architecture, Performance Tuning, Database Design, SSH, Data Encoding

NASA API Python Wrapper

https://pypi.org/project/python-nasa/
An unofficial Python wrapper for the NASA API based on the official NASA API documentation, https://api.nasa.gov/. This project is an open source project that I did to improve my portfolio and enhance my knowledge of developing an API wrapper.

Scalable Web Scraper

We developed and deployed a scalable web scraper on GCP. We use Airflow as the workflow orchestrator with CeleryExecutor under Redis Broker. I set up those infrastructures so the scraping process could be done concurrently.

Then for the transformation, we use PySpark deployed on Dataproc. We manifest Serverless Spark Dataproc to make our transformation pipeline cost-effective. We use GCS as the data lake, so all data ingested from the website will reside in GCS and the transformation output. The clean data will then be stored in BigQuery using the BigQuery load job, also orchestrated on Airflow. When the data arrives on BigQuery, the stakeholder dashboard will automatically be updated with the recent data. We also set up a rotating proxy to avoid getting caught as a bot.

Data Pipeline on GCP

Developed a data pipeline from third-party APIs to BigQuery using Airflow and an in-house framework. I implemented incremental load to the system to retrieve only new data, avoiding unnecessary full load.

Serverless Chi Boilerplate

https://github.com/serverless-boilerplate/serverless-chi
A serverless boilerplate of Go Chi Web Framework in AWS Lambda with Cognito auth management and DynamoDB database. Streaming to data lake implementation is also included. The main reason of this implementation is because Go has the best AWS Lambda benchmark compared to other runtimes.
2015 - 2019

Bachelor's Degree in Computer Science

Gadjah Mada University - Yogyakarta, Indonesia

FEBRUARY 2022 - PRESENT

Infrastructure Automation with Terraform Cloud

Udemy

JANUARY 2022 - PRESENT

Google Cloud Professional Data Engineer

Udemy

Libraries/APIs

Pandas, Asyncio, Python API, REST APIs, NumPy, Shapely, Scikit-learn, Node.js, OpenAPI, Amazon API, SQLAlchemy, PySpark, Spark ML, OpenCV, X (formerly Twitter) API, SciPy, TensorFlow, Interactive Brokers API, Luigi, Snowpark

Tools

BigQuery, Apache Airflow, GitHub, AWS Glue, Microsoft Power BI, Tableau, Amazon Elastic MapReduce (EMR), Amazon QuickSight, AWS Step Functions, MySQL Performance Tuning, Amazon ElastiCache, Amazon Simple Notification Service (SNS), Git, Jupyter, Pytest, Kibana, Cloud Dataflow, Apache Beam, Celery, RabbitMQ, Amazon Simple Queue Service (SQS), Amazon Elastic Container Service (ECS), Docker Compose, Redash, Amazon CloudWatch, Terraform, Amazon Athena, Amazon Redshift Spectrum, Looker, Amazon EKS, Google Analytics, Amazon Cognito, GIS, GRASS GIS, PhpStorm, Navicat, MongoDB Atlas, Amazon SageMaker, Observability Tools, Stitch Data, Jira, Domo, Google Cloud Dataproc, AWS Batch

Languages

Python, SQL, Snowflake, JavaScript, HTML, Python 3, T-SQL (Transact-SQL), Stored Procedure, TypeScript, GraphQL, CSS, PHP, Go, R, Scala

Frameworks

Django, Swagger, Flask, Hadoop, Scrapy, Apache Spark, Data Lakehouse, Spark, Flutter, CodeIgniter, NestJS, Kedro

Paradigms

Business Intelligence (BI), ETL, MapReduce, Stress Testing, REST, Data-driven Design, Design Patterns, Microservices, Microservices Architecture, Database Design, Domain-driven Development, Kanban, Agile Project Management, DevOps, Agile, Load Testing, API Observability, Object-oriented Design (OOD), Object-oriented Programming (OOP), Distributed Computing, Dimensional Modeling

Platforms

Visual Studio Code (VS Code), Linux, Google Cloud Platform (GCP), Amazon Web Services (AWS), AWS Lambda, AWS Elastic Beanstalk, SharePoint, Jupyter Notebook, Docker, Amazon EC2, Oracle Database, Azure, Apache Kafka, Oracle, Databricks, Firebase, Azure Synapse, Kubernetes, Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW)

Storage

MySQL, PostgreSQL, Google Cloud Storage, Microsoft SQL Server, NoSQL, Data Lakes, Database Migration, Amazon Aurora, Data Pipelines, Elasticsearch, Databases, Amazon DynamoDB, Database Modeling, Data Integration, PL/SQL, Data Lake Design, Amazon S3 (AWS S3), MongoDB, Database Administration (DBA), Redshift, Neo4j, Dynamic SQL, Alibaba Cloud, Google Cloud, IIS SQL Server, Redis

Other

Conda, Machine Learning, Google BigQuery, Data Engineering, Data Modeling, Data Migration, ETL Tools, Data Analytics, Data Analysis, Data Architecture, Data Management, Amazon RDS, CDC, Data Build Tool (dbt), Cloud Migration, ELT, Big Data Architecture, Architecture, Big Data, Project Planning, Web Scraping, Scraping, Data Wrangling, APIs, Excel 365, Dashboards, Data Manipulation, Shell Scripting, Benchmarking, Performance, Performance Testing, Caching, Data Reporting, Software Architecture, Back-end, Artificial Intelligence (AI), Data Scraping, PDF Scraping, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, eCommerce, Excel Macros, Automated Trading Software, SaaS, GeoPandas, API Integration, Natural Language Processing (NLP), Serverless, Lint, Consumer Packaged Goods (CPG), Back-end Development, FastAPI, Extensions, Data, Streaming Data, Data Governance, Orchestration, Solution Architecture, Technical Architecture, Monitoring, Multithreading, Entity Relationships, Software Design, Workflow, API Design, AWS Cloud Architecture, Performance Tuning, Amazon API Gateway, SSH, AWS DevOps, OpenTelemetry, OAuth, Data Encoding, Cryptography, Research, Data Warehousing, Data Visualization, Metabase, Google Data Studio, CI/CD Pipelines, GitHub Actions, Scripting Languages, Data-driven Dashboards, Azure Data Factory, Technical Project Management, Azure Data Lake, Azure Databricks, Data Science, Business Analysis, Tesseract, QGIS, OpenAI GPT-3 API, Neural Networks, eCommerce APIs, Generative Pre-trained Transformers (GPT), LangChain, SharePoint Online, Data Auditing, Business Architecture, Enterprise Architecture, Mathematics, Web Scalability, Prometheus, SDKs, Data Stewardship, TypeORM, Nonlinear Optimization, Linear Optimization, Amazon Neptune, Dataproc, Credit Modeling, OpenAI

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring