Faisal is available for hire

Faisal Malik Widya Prasetya

Verified Expert in Engineering

Data Engineer and Developer

Sleman Sub-District, Sleman Regency, Special Region of Yogyakarta, Indonesia

Toptal member since April 25, 2022

Expertise

Data Engineering Data Science Integration Machine Learning Back-end Developers API Algorithms Software Development SaaS Serverless Multithreading Big Data Architecture ETL Excel Macros BigQuery SharePoint Development Web Scraping

Bio

Faisal is a data engineer specializing in cloud data technologies like Google and AWS and end-to-end data engineering processes. From designing the architecture and building the infrastructure to developing pipeline operations, he is highly adaptable to new cloud-based, open source, or SaaS technologies. Faisal has solid experience contributing to early-stage startups by directly building end-to-end data pipelines or providing consulting services in his fields of expertise.

Portfolio

Aurion Holdings Ltd

Python, WinAPI, UI Automation, Robotic Process Automation (RPA), Cloudflare...

Walleye Capital

Python, Google Cloud Platform (GCP), Vertex AI, Kubeflow, Pulumi, Cloud Run...

Hivello Operations B.V.

Data Engineering, ETL, Google Cloud Platform (GCP), SQL, Google Cloud Storage...

Experience

Python - 7 years
Amazon Web Services (AWS) - 6 years
Google Cloud Platform (GCP) - 6 years
PySpark - 5 years
Apache Airflow - 4 years
BigQuery - 4 years
AWS Lambda - 3 years
Data Warehousing - 3 years

Preferred Environment

Visual Studio Code (VS Code), Conda, Linux, Docker, Docker Compose, Google Cloud Platform (GCP), Amazon Web Services (AWS), Jira, OpenAI, Mentorship

The most amazing...

...project I've ever done was implementing a cost optimization strategy on the client data warehouse, reducing BI usage costs up to 100 times.

Work Experience

Senior AI Automation Engineer

2025 - PRESENT

Aurion Holdings Ltd

Designed and built a production-grade algorithmic trading platform with over 50 specialized agents, 28 subsystems, and 20 microservices. The system trades live on a £140,000 FTMO prop account with full autonomy.
Built a fail-closed governance framework with 10 canonical laws by implementing 10 enforceable laws, 12 sequential validation gates, cryptographic envelope signing (HMAC-SHA256), and separation of duties.
Designed an immutable forensic audit pipeline by building a hash-chain verified, append-only audit trail across 500+ sealed files with tamper detection, 6-hourly automated audits, and a bypass detector.
Achieved enterprise-grade security hardening by implementing AES-256-GCM at-rest encryption with Windows DPAPI key binding, loopback-only network binding, file integrity monitoring, and automated secret scanning in CI/CD.

Technologies: Python, WinAPI, UI Automation, Robotic Process Automation (RPA), Cloudflare, Automated Trading Software, AI Automation, Claude, Integration, Security, AI Tools, Stock Trading, Stock Price Analysis, Trading, API Architecture, Agentic AI, Quantitative Finance, Telemetry, Logging, AI-assisted Development, Generative Artificial Intelligence (GenAI), Agentic Coding, Claude Code, Bots, Quantitative Modeling, Financial Data, Observability, Third-party APIs, Finance, Relational Databases, Finance APIs, Financial APIs

Senior MLOps Engineer

2024 - 2025

Walleye Capital

Migrated all async calls implementation to the OpenAI API to the OpenAI Batch API in the Kubeflow Pipeline. This migration reduced OpenAI token consumption costs by 50% and eliminated all parallel invocation instances from Kubeflow components.
Standardize the pipeline using multiple reusable Kubeflow components, allowing all team members to productionize their Pipelines seamlessly.
Set up CI/CD Pipeline to test and deploy from GitHub to the GCP ecosystem.
Set up GitHub Actions Workflow Dispatch to submit and schedule Vertex AI Pipelines.
Provisioned and managed GCP infrastructure using IaC tools like Pulumi.
Refactored experimental codes from data scientists to meet production standards and deployment.
Integrated different services and components built by other team members so the system can run smoothly and efficiently.

Technologies: Python, Google Cloud Platform (GCP), Vertex AI, Kubeflow, Pulumi, Cloud Run, Cloud Firestore, GitHub, OpenAI API, Vector Search, Word Embedding, Pydantic, Generative Artificial Intelligence (GenAI), Scripting, Python Script, System Design, Integration, Security, AI Tools, Compliance, Stock Trading, Stock Price Analysis, Trading, Serverless Framework, API Architecture, Agentic AI, Quantitative Finance, Telemetry, Logging, Latency & Throughput Analysis, Streamlit, AI-assisted Development, Agentic Coding, Bots, Quantitative Modeling, DataOps, Financial Data, Observability, Third-party APIs, Finance, Relational Databases, Finance APIs, Financial APIs

Back-end Data Engineer

2024 - 2025

Hivello Operations B.V.

Designed and implemented log centralization from end-user applications to Cloud Logging, which was routed for analytics use case using Kubeflow and Spark to handle the orchestration and the transformation.
Developed a blockchain indexing framework using a subgraph to index multiple DePINs' on-chain earnings data. This allows the data to be accessed using GraphQL and analytics purposes.
Enabled self-serve analytics throughout the company by provisioning Metabase and connecting BigQuery as a data source with business-friendly data models. This makes the company data-driven.
Designed and implemented a Go-based event-driven microservice back-end system deployed on Cloud Run to handle streaming events from Pub/Sub containing real-time device logs.

Technologies: Data Engineering, ETL, Google Cloud Platform (GCP), SQL, Google Cloud Storage, Data Pipelines, Google BigQuery, Data Encoding, Data Modeling, Blockchain, Back-end, APIs, Kubeflow, Production Support, Jinja, Star Schema, Crypto, WebSockets, Cloud Infrastructure, Serverless Architecture, Infrastructure as Code (IaC), Machine Learning Operations (MLOps), Unit Testing, AI Chatbots, PyTorch, RDBMS, Technical Leadership, API Development, RESTFul APIs, Directed Acrylic Graphs (DAG), Terraform, Event-driven Architecture, Distributed Architecture, Reverse Engineering, OAuth 2, Message Queues, Large Language Models (LLMs), AI Agents, Data Cleaning, ETL Pipelines, Data Cleansing, Retrieval-augmented Generation (RAG), Pinecone, Vector Search, Pydantic, Scripting, Python Script, System Design, Medallion Architecture, Integration, Security, Serverless Framework, API Architecture, Go, Cloud Run, Proxy Servers, Telemetry, Logging, Latency & Throughput Analysis, Real-time Systems, Retool, DataOps, Financial Data, Observability, Third-party APIs, Finance, Leadership, Relational Databases, Finance APIs, Financial APIs

Lead Data Engineer

2023 - 2024

Quadrant

Led a cross-functional data engineering team of five to design, build, and maintain robust, high-availability data pipelines.
Integrated and scaled the Databricks platform to orchestrate data preparation and delivery operations, seamlessly processing hundreds of terabytes of data daily.
Optimized overarching data pipelines by meticulously reconfiguring Apache Spark jobs and cluster environments, achieving significant cost reductions while meeting stringent performance requirements.
Instituted comprehensive data quality validation checks using Great Expectations, deep-integrated with Spark DataFrames, enabling automated anomaly detection on massive datasets post-processing.
Architected and operated real-time event ingestion pipelines using Amazon Kinesis and Amazon MSK (Kafka) to process consented mobile location events directly from the company's mobile SDK.

Technologies: Python, Spark, PySpark, Databricks, Amazon Elastic MapReduce (EMR), AWS Glue, Amazon MSK, Apache Kafka, Amazon Kinesis, Amazon S3 (AWS S3), Medallion Architecture, Amazon Athena, Parquet, Geohash, PostGIS, GeoJSON, Geospatial Data, Spatial Data Infrastructure, Amazon Redshift, Logging, Latency & Throughput Analysis, Real-time Systems, DataOps, Observability, Third-party APIs, Leadership, Relational Databases

Senior Software Engineer

2023 - 2024

Pathbox AI Inc.

Developed a serverless REST API on AWS Lambda, API Gateway, Cognito Aurora Serverless, and DynamoDB using the Express.js web framework.
Built an optimized machine learning inference system using ECS tasks on Fargate. Allowed parallelization by utilizing SQS and concurrent ECS tasks.
Developed GPU-enabled machine learning inference using AWS Batch.
Standardized the machine learning workflow from dataset collection, data preprocessing, model setup, training, and validation to inference deployment.
Implemented analytics on workflow operations to monitor and optimize the process further.

Technologies: AWS Lambda, Amazon Web Services (AWS), Node.js, Amazon RDS, Amazon API Gateway, Amazon S3 (AWS S3), Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Python, AWS Batch, Amazon Elastic Container Service (ECS), Terraform, Metabase, AWS DevOps, OpenTelemetry, Prometheus, API Observability, Amazon SageMaker, Observability Tools, Data Lakehouse, Data Stewardship, Data Lake Design, SQLAlchemy, OAuth, NestJS, TypeScript, TypeORM, Domain-driven Development, Google Cloud Storage, Data Encoding, Nonlinear Optimization, Linear Optimization, Grafana, Redis, Database Architecture, Distributed Systems, Artificial Intelligence (AI), Machine Learning, Time Series Databases, TimescaleDB, Query Optimization, Express.js, Containerization, DB, Back-end Architecture, Production Support, Jinja, WebSockets, Cloud Infrastructure, Serverless Architecture, Infrastructure as Code (IaC), AWS CloudFormation, Machine Learning Operations (MLOps), Unit Testing, Computer Vision, AI Chatbots, RDBMS, Technical Leadership, API Development, RESTFul APIs, Directed Acrylic Graphs (DAG), Event-driven Architecture, Distributed Architecture, Reverse Engineering, Fastify, OAuth 2, Amazon Glacier, Message Queues, Large Language Models (LLMs), AI Agents, Data Cleaning, ETL Pipelines, Data Cleansing, Stripe API, Retrieval-augmented Generation (RAG), Pinecone, Vector Search, Pydantic, Scripting, Python Script, System Design, Bioinformatics, AWS Fargate, HIPAA, Integration, Security, Compliance, Stripe, Serverless Framework, API Architecture, Amazon EventBridge, Amazon Kinesis Data Firehose, GeoJSON, API Gateways, Load Balancers, Proxy Servers, Telemetry, Logging, Latency & Throughput Analysis, HAProxy, AWS IAM, Generative Artificial Intelligence (GenAI), Agentic Coding, Real-time Systems, Rust, HIPAA Compliance, DataOps, Observability, Third-party APIs, Relational Databases, Finance APIs, Financial APIs, Healthcare Services

Web Scraping Expert

2023 - 2023

Burak Karakaya

Developed a real-time web scraper to scrape data from various sources, such as Twitter, Binance Futures Leaderboard, etc., to feed data to the client's trading bot. The scraper can ingest tweets within 200 ms after it is published.
Provided the infrastructure on AWS to enable a high-performance network to enable the scraper to work in real time. I set up the IP rotation so that the scraper didn't get blocked by bypassing the IP rate limit from the news sources.
Provided an interface for non-technical clients to administer and operate the scraper conveniently. I use Streamlit and FastAPI to develop these interfaces.
Utilized Redis and high-performance Python extensions like C to improve the storage and runtime performance of the scraper.

Technologies: Web Scraping, Data Scraping, Scraping, Amazon Web Services (AWS), JavaScript, Python, Streaming Data, Data Integration, Orchestration, Generative Pre-trained Transformers (GPT), LangChain, Solution Architecture, Technical Architecture, Monitoring, Data Auditing, Agile, Transact-SQL (T-SQL), Business Architecture, Enterprise Architecture, Interactive Brokers API, Multithreading, Entity Relationships, Stored Procedure, Software Design, Workflow, Microservices Architecture, API Design, AWS Cloud Architecture, Celery, RabbitMQ, Performance Tuning, Database Design, Amazon API Gateway, Amazon Simple Queue Service (SQS), SSH, Load Testing, AWS DevOps, Data Stewardship, Data Lake Design, SQLAlchemy, OAuth, Domain-driven Development, Google Cloud Storage, Data Encoding, Nonlinear Optimization, Linear Optimization, Redis, Database Architecture, Distributed Systems, Fintech, High-performance Computing (HPC), Time Series Databases, TimescaleDB, Query Optimization, Containerization, DB, Back-end Architecture, Production Support, Jinja, Crypto, WebSockets, Telegram Bot API, Cloud Infrastructure, Serverless Architecture, Unit Testing, RDBMS, Directed Acrylic Graphs (DAG), Event-driven Architecture, Distributed Architecture, CAPTCHA, Reverse Engineering, Message Queues, Data Cleaning, ETL Pipelines, Data Cleansing, Scripting, Python Script, Playwright, Selenium, System Design, AWS Fargate, Integration, Security, Stock Trading, Stock Price Analysis, Trading, Amazon Kinesis, Quantitative Finance, Logging, Latency & Throughput Analysis, Real-time Systems, Rust, Bots, Quantitative Modeling, Financial Data, Observability, Third-party APIs, Finance, Relational Databases, Finance APIs, Financial APIs

Data Engineer

2023 - 2023

XpressLane, Inc.

Developed scraping tools to scrape data from various websites and push it to BigQuery.
Created development and operations documentation so that the client could maintain the solution and can develop more features on it in the future.
Delivered reports and dashboards to clients from the scraped data to help clients better make decisions for M&A use cases.

Technologies: Data Engineering, Python, Google Data Studio, PostgreSQL, Google BigQuery, Dataproc, Google Cloud Dataproc, Looker, Apache Airflow, Redis, Spark, PySpark, Web Scraping, Scraping, Data Wrangling, Data Modeling, Excel 365, Dashboards, Amazon Elastic MapReduce (EMR), Amazon EKS, Data Manipulation, Shell Scripting, MapReduce, Business Intelligence (BI), Business Analysis, Benchmarking, Databases, Performance, Performance Testing, Caching, Stress Testing, Data Reporting, Pandas, Asyncio, Software Architecture, Swagger, DevOps, Artificial Intelligence (AI), Python API, Data Scraping, REST, HTML, CSS, OpenAI GPT-3 API, REST APIs, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, Excel Macros, Database Modeling, Data-driven Design, SaaS, NumPy, API Integration, Natural Language Processing (NLP), Serverless, SharePoint, Amazon ElastiCache, Amazon Simple Notification Service (SNS), Python 3, Git, Lint, OpenAPI, Jupyter, Jupyter Notebook, Design Patterns, Kubernetes, Pytest, FastAPI, eCommerce APIs, Extensions, Scrapy, Data, Apache Spark, Streaming Data, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, Generative Pre-trained Transformers (GPT), LangChain, Solution Architecture, SharePoint Online, Technical Architecture, Monitoring, Data Auditing, Agile, Transact-SQL (T-SQL), Business Architecture, Enterprise Architecture, Interactive Brokers API, Multithreading, Entity Relationships, Stored Procedure, Software Design, Workflow, Microservices, Microservices Architecture, Go, API Design, AWS Cloud Architecture, MongoDB Atlas, Performance Tuning, Dynamic SQL, Database Design, Amazon API Gateway, Amazon Simple Queue Service (SQS), SSH, Load Testing, Prometheus, API Observability, Observability Tools, Data Lakehouse, Data Stewardship, Data Lake Design, SQLAlchemy, OAuth, TypeScript, Domain-driven Development, Google Cloud Storage, Data Encoding, Database Architecture, Distributed Systems, High-performance Computing (HPC), Machine Learning, Time Series Databases, Query Optimization, Containerization, DB, Back-end Architecture, Production Support, Jinja, Star Schema, Cloud Infrastructure, Serverless Architecture, Unit Testing, PyTorch, RDBMS, Directed Acrylic Graphs (DAG), Event-driven Architecture, MongoDB, Distributed Architecture, CAPTCHA, Reverse Engineering, Data Cleaning, ETL Pipelines, Data Cleansing, Pydantic, Scripting, Python Script, Playwright, Selenium, System Design, Medallion Architecture, Integration, Security, Serverless Framework, Logging, DataOps, Observability, Third-party APIs, Relational Databases

Senior Data Engineer

2022 - 2023

Toptal

Designed and implemented a robust data pipeline that extracted data from multiple marketing tools and APIs like Google Ads, Facebook Ads, and Twitter Ads, and transferred it to BigQuery using in-house data pipeline tools based on Luigi.
Created a data pipeline solution that efficiently extracted data from various learning platforms such as Polly, Udemy, and Lessonly and consolidated it with BigQuery utilizing Composer, a managed Apache Airflow service provided by GCP.
Participated in the data engineering team split brainstorming session and came up with the idea of breaking the team into the data platform and analytics engineering teams. The analytics engineering team focuses on ETL logic, while the data platform team maintains the infrastructure.

Technologies: Python, SQL, Pandas, Data Engineering, Object-oriented Design (OOD), Object-oriented Programming (OOP), Data Modeling, Scala, Luigi, Apache Airflow, BigQuery, Distributed Computing, Dimensional Modeling, ETL, Google Cloud, Google Cloud Storage, ETL Tools, Scripting Languages, Data Analytics, Data Architecture, Data Management, Data Pipelines, ELT, Big Data Architecture, Snowpark, Architecture, Big Data, Kanban, Project Planning, Agile Project Management, Technical Project Management, Azure Data Lake, Data Wrangling, APIs, Dashboards, Data Manipulation, Shell Scripting, MapReduce, Google Analytics, Web Scraping, Benchmarking, Databases, Performance, Performance Testing, Caching, Stress Testing, Asyncio, Software Architecture, Back-end, GraphQL, DevOps, Artificial Intelligence (AI), Python API, Scraping, Data Scraping, REST, REST APIs, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, Database Modeling, Data-driven Design, SaaS, NumPy, API Integration, Serverless, Amazon ElastiCache, Amazon Simple Notification Service (SNS), Python 3, Git, Lint, Hadoop, Jupyter, Jupyter Notebook, Design Patterns, Kubernetes, Pytest, FastAPI, eCommerce APIs, Amazon API, Scrapy, Data, Apache Spark, Kibana, Streaming Data, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, Solution Architecture, Technical Architecture, Monitoring, Data Auditing, Agile, Transact-SQL (T-SQL), Business Architecture, Enterprise Architecture, Multithreading, Entity Relationships, Stored Procedure, Software Design, Workflow, Microservices, Microservices Architecture, Go, API Design, AWS Cloud Architecture, MongoDB Atlas, Performance Tuning, Database Design, Amazon Simple Queue Service (SQS), SSH, Load Testing, Amazon Elastic Container Service (ECS), OpenTelemetry, Prometheus, API Observability, Amazon SageMaker, Observability Tools, SDKs, Data Lakehouse, Data Stewardship, Data Lake Design, SQLAlchemy, OAuth, TypeScript, Domain-driven Development, Data Encoding, Grafana, Database Architecture, Distributed Systems, Machine Learning, Time Series Databases, Query Optimization, Containerization, ClickHouse, DB, Back-end Architecture, Production Support, Jinja, Star Schema, Cloud Infrastructure, Serverless Architecture, Infrastructure as Code (IaC), Machine Learning Operations (MLOps), Unit Testing, PyTorch, RDBMS, Directed Acrylic Graphs (DAG), Distributed Architecture, Data Cleaning, ETL Pipelines, Data Cleansing, Pydantic, Scripting, Python Script, Playwright, Selenium, System Design, Medallion Architecture, Integration, Security, Serverless Framework, API Architecture, Logging, DataOps, Third-party APIs, Relational Databases

Data Engineer

2021 - 2023

QuantumBlack

Developed internal data analytics tools that can simplify deployment on the client site. The feature I built is to ingest data from various sources and store them incrementally on Snowflake.
Handled a client request to build a data analytics pipeline and APIs.
Worked closely with clients' analytics teams and leadership to gather analytics requirements and carefully plan from the architecture design, to implementation and delivery.

Technologies: Python, Kedro, Apache Airflow, Amazon Web Services (AWS), Google Cloud Platform (GCP), Alibaba Cloud, Spark, PySpark, GitHub, Terraform, ETL Tools, Scripting Languages, SQL, Data Analytics, Amazon Athena, Amazon Redshift Spectrum, AWS Glue, Data Engineering, Microsoft Power BI, Amazon Neptune, Microsoft SQL Server, Oracle Database, Database Administration (DBA), Redshift, NoSQL, Data Architecture, Data Management, Data Lakes, Azure, Database Migration, Amazon RDS, CDC, Amazon Aurora, Data Build Tool (dbt), Snowflake, Data Pipelines, Neo4j, Apache Kafka, ETL, Cloud Migration, IIS SQL Server, Domo, ELT, Big Data Architecture, Snowpark, Oracle, Architecture, Big Data, Azure Data Factory (ADF), Kanban, Project Planning, Agile Project Management, Technical Project Management, Azure Data Lake, Data Wrangling, Azure Databricks, Data Modeling, APIs, Databricks, Django, Excel 365, Dashboards, Amazon Elastic MapReduce (EMR), Amazon EKS, Data Manipulation, Spark ML, Amazon QuickSight, Elasticsearch, AWS Step Functions, Shell Scripting, MapReduce, Business Intelligence (BI), Business Analysis, Web Scraping, Benchmarking, Databases, Performance, Performance Testing, Caching, Data Reporting, Pandas, Asyncio, Software Architecture, Back-end, GraphQL, Amazon Cognito, Swagger, DevOps, Artificial Intelligence (AI), Python API, Scraping, Data Scraping, PDF Scraping, REST, AWS Lambda, Flask, OpenCV, Tesseract, QGIS, GIS, GRASS GIS, Flutter, OpenAI GPT-3 API, REST APIs, AWS Elastic Beanstalk, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, eCommerce, Amazon DynamoDB, Database Modeling, Data-driven Design, Neural Networks, SaaS, NumPy, GeoPandas, Shapely, Scikit-learn, API Integration, X (formerly Twitter) API, Node.js, Natural Language Processing (NLP), Serverless, SharePoint, Amazon ElastiCache, Amazon Simple Notification Service (SNS), Python 3, Git, Lint, Hadoop, OpenAPI, Jupyter, Jupyter Notebook, Credit Modeling, Consumer Packaged Goods (CPG), Azure Synapse, Back-end Development, Design Patterns, Kubernetes, Pytest, FastAPI, eCommerce APIs, Amazon API, Extensions, Scrapy, Data, Apache Spark, Kibana, Streaming Data, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, Solution Architecture, SharePoint Online, Technical Architecture, Monitoring, Data Auditing, Agile, Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW), Transact-SQL (T-SQL), Business Architecture, Enterprise Architecture, Interactive Brokers API, Multithreading, Entity Relationships, PL/SQL, Stored Procedure, Software Design, Workflow, Microservices, Microservices Architecture, Go, API Design, R, AWS Cloud Architecture, MongoDB Atlas, Celery, RabbitMQ, Performance Tuning, Dynamic SQL, Database Design, Amazon API Gateway, Amazon Simple Queue Service (SQS), SSH, Load Testing, Web Scalability, Amazon Elastic Container Service (ECS), AWS DevOps, OpenTelemetry, Prometheus, API Observability, Amazon SageMaker, Observability Tools, SDKs, Data Lakehouse, Data Stewardship, Data Lake Design, SQLAlchemy, OAuth, TypeScript, Domain-driven Development, Data Encoding, Nonlinear Optimization, Linear Optimization, Database Architecture, Distributed Systems, Fintech, Machine Learning, Time Series Databases, Query Optimization, Containerization, ClickHouse, DB, Back-end Architecture, Production Support, Jinja, Star Schema, Crypto, WebSockets, Telegram Bot API, Cloud Infrastructure, Serverless Architecture, Infrastructure as Code (IaC), AWS CloudFormation, Machine Learning Operations (MLOps), Computer Vision, PyTorch, RDBMS, Technical Leadership, API Development, RESTFul APIs, Directed Acrylic Graphs (DAG), Event-driven Architecture, MongoDB, Distributed Architecture, CAPTCHA, Reverse Engineering, Fastify, OAuth 2, Amazon S3 (AWS S3), Message Queues, GraphDB, Data Cleaning, ETL Pipelines, Data Cleansing, Recommendation Systems, Pydantic, Scripting, Python Script, Playwright, Selenium, System Design, AWS Fargate, Delta Lake, Medallion Architecture, Integration, Security, Compliance, Serverless Framework, API Architecture, Amazon EventBridge, Amazon Kinesis, Amazon Kinesis Data Firehose, PostGIS, GeoJSON, Geospatial Data, Spatial Data Infrastructure, Amazon Redshift, API Gateways, Load Balancers, Proxy Servers, Logging, HAProxy, Real-time Systems, Rust, DataOps, Third-party APIs, Finance, Finance APIs, Financial APIs

Data Engineering Course Mentor

2021 - 2022

MentorCruise

Motivate mentees to engage with the Data Engineering field by showing the job market supply and demand conditions, including the prospects.
Sharing industrial knowledge and insights about the field.
Guide the mentee to focus on which subject to prioritize and focus on.

Technologies: Mentorship, Mentorship & Coaching, Team Mentoring, Data Engineering, Serverless Framework, API Architecture

Senior Data Engineer

2021 - 2021

Flip

Built a data analytics ecosystem using native Google Cloud Platform technologies, such as Datastream, Google Cloud Storage, Pub/Sub, Dataflow, and BigQuery.
Improved the analytics waiting time from a 3-hour worst-case scenario to 30 seconds for one big report.
Maintained the legacy technologies for data analytics on MySQL and on-server cron jobs by creating scheduled jobs on a heavy but frequently used query. The heavy query was accessible in less than 30 minutes with daily data freshness.
Built the data engineering team and onboarded team members on the legacy, current, and future implementation.

Technologies: Python, Google Cloud Platform (GCP), MySQL, BigQuery, Google BigQuery, Metabase, Data Warehousing, CI/CD Pipelines, GitHub, Data Migration, ETL Tools, Scripting Languages, SQL, Data Analytics, AWS Glue, Data Engineering, Data Analysis, NoSQL, Data Architecture, Data Management, Data Lakes, Database Migration, Amazon RDS, CDC, Amazon Aurora, Data Build Tool (dbt), Data Pipelines, Apache Kafka, ETL, Cloud Migration, ELT, Big Data Architecture, Architecture, Big Data, Kanban, Agile Project Management, Technical Project Management, Microsoft Power BI, Data Wrangling, Data Modeling, APIs, Excel 365, Dashboards, Amazon Elastic MapReduce (EMR), Data Manipulation, Amazon QuickSight, AWS Step Functions, Shell Scripting, Google Analytics, MySQL Performance Tuning, Benchmarking, Databases, Performance, Performance Testing, Data Reporting, Pandas, Asyncio, Software Architecture, Back-end, GraphQL, Swagger, Python API, PDF Scraping, REST, AWS Lambda, Flask, HTML, REST APIs, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, Database Modeling, SaaS, NumPy, API Integration, Serverless, Python 3, Git, Lint, OpenAPI, Jupyter, Jupyter Notebook, Back-end Development, Design Patterns, Elasticsearch, Kubernetes, Pytest, Amazon API, Extensions, Data, Apache Spark, Kibana, Streaming Data, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, Solution Architecture, Technical Architecture, Monitoring, Data Auditing, Agile, Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW), Transact-SQL (T-SQL), Business Architecture, Enterprise Architecture, Multithreading, Entity Relationships, Stored Procedure, Software Design, Workflow, Microservices, Microservices Architecture, AWS Cloud Architecture, Celery, RabbitMQ, Performance Tuning, Database Design, Amazon API Gateway, Amazon Simple Queue Service (SQS), SSH, Data Stewardship, Data Lake Design, Data Encoding, Database Architecture, Distributed Systems, Fintech, Machine Learning, Artificial Intelligence (AI), Query Optimization, Containerization, DB, Back-end Architecture, Production Support, Jinja, Star Schema, Cloud Infrastructure, Serverless Architecture, Unit Testing, PyTorch, RDBMS, Directed Acrylic Graphs (DAG), Distributed Architecture, Data Cleaning, ETL Pipelines, Data Cleansing, Recommendation Systems, Scripting, Python Script, System Design, Medallion Architecture, Integration, Serverless Framework, API Architecture, DataOps, Financial Data, Third-party APIs, Finance, Relational Databases, Finance APIs, Financial APIs

Data Engineer

2020 - 2021

Pintu

Developed an ELT data pipeline on Amazon EC2. It is turned on and off by AWS Lambda, triggered by using CloudWatch scheduler from various data sources (MySQL, PostgreSQL, MongoDB, Google Sheets, crypto exchange APIs) to the BigQuery data warehouse.
Implemented partition, clustering, and materialized views on BigQuery and reduced the cost of analytics by up to 100 times.
Collaborated with the financial expert to generate the optimum market-making strategy. Implemented and improved the model on the published paper, increasing the liquidity and market activity of the owned asset by 67%.
Developed a fraud detection system to alert fraudulent activity in case of a security breach on the system. This alert notifies the executive team and captures the fraudster within four hours. It secured $2 million worth of assets.
Trained the business users to develop their own BI reporting using Metabase and Google Data Studio. It led to 70% of Metabase reports being created by the business team, while the other 30% required complex queries.
Led the data analytics team and implemented an agile culture by running sprint planning, standup, and sprint retrospective meetings. It allowed tracking business user requests, data pipeline issues, and improvements.

Technologies: Python, Google Cloud Platform (GCP), Amazon Web Services (AWS), Amazon EC2, AWS Lambda, BigQuery, Google BigQuery, Amazon S3 (AWS S3), Metabase, Redash, Google Data Studio, Business Intelligence (BI), Data Visualization, Data Warehousing, Amazon CloudWatch, PostgreSQL, MongoDB, GitHub, ETL Tools, Scripting Languages, SQL, Data Migration, Data Analytics, Data Engineering, Tableau, NoSQL, Data Architecture, Data Management, Data Lakes, Amazon RDS, Amazon Aurora, Data Pipelines, Neo4j, Apache Kafka, ETL, Cloud Migration, Looker, Architecture, Big Data, Kanban, Agile Project Management, Technical Project Management, Snowflake, Data Wrangling, APIs, Excel 365, Dashboards, Data Manipulation, Data Science, Amazon QuickSight, AWS Step Functions, Shell Scripting, MapReduce, Google Analytics, JavaScript, MySQL Performance Tuning, Benchmarking, Databases, Performance, Data Reporting, Pandas, Amazon Cognito, PDF Scraping, REST, Flask, HTML, CSS, REST APIs, AWS Elastic Beanstalk, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, Excel Macros, Amazon DynamoDB, Database Modeling, Automated Trading Software, Neural Networks, SaaS, NumPy, Scikit-learn, API Integration, X (formerly Twitter) API, Natural Language Processing (NLP), Firebase, Serverless, SharePoint, Python 3, Git, Hadoop, SciPy, Jupyter, Jupyter Notebook, TensorFlow, Back-end Development, Design Patterns, Elasticsearch, Kubernetes, Pytest, Amazon API, Extensions, Data, Apache Spark, Data Governance, Data Integration, Cloud Dataflow, Apache Beam, Orchestration, Solution Architecture, Technical Architecture, Monitoring, Data Auditing, Agile, Dedicated SQL Pool (formerly SQL DW), Azure SQL Data Warehouse, Transact-SQL (T-SQL), Business Architecture, Enterprise Architecture, Multithreading, Entity Relationships, PL/SQL, Stored Procedure, Software Design, Workflow, Microservices, Microservices Architecture, API Design, AWS Cloud Architecture, MongoDB Atlas, Performance Tuning, Dynamic SQL, Database Design, Amazon API Gateway, Amazon Simple Queue Service (SQS), SSH, AWS DevOps, Data Encoding, Database Architecture, Fintech, Machine Learning, Artificial Intelligence (AI), Query Optimization, Containerization, DB, Back-end Architecture, Production Support, Jinja, Star Schema, Crypto, Cloud Infrastructure, Serverless Architecture, Unit Testing, PyTorch, RDBMS, Technical Leadership, API Development, RESTFul APIs, Directed Acrylic Graphs (DAG), Distributed Architecture, GraphDB, Data Cleaning, ETL Pipelines, Data Cleansing, Scripting, Python Script, System Design, Medallion Architecture, Integration, Stock Trading, Stock Price Analysis, Trading, Serverless Framework, API Architecture, Amazon Redshift, Bots, Quantitative Modeling, DataOps, Financial Data, Third-party APIs, Finance, Relational Databases, Finance APIs, Financial APIs

Data Engineer

2019 - 2020

Kulina

Developed ELT processes from application databases, third-party marketing tools, and Google Sheets to BigQuery using Stitch data, which reduced the number of query conflicts on the production database, indirectly improving application performance.
Developed the Snowflake schema on the data warehouse, increasing data visibility among the business team.
Deployed, maintained, and administered several BI tools, such as Redash, Data Studio, and Metabase, to gain data governance at the business unit level and answer data-related questions with proper tools.

Technologies: Python, Google Cloud Platform (GCP), Business Intelligence (BI), Data Warehousing, Cryptography, Data Visualization, BigQuery, Google BigQuery, Stitch Data, ETL Tools, Scripting Languages, SQL, Data Analytics, Data Engineering, Data Analysis, Tableau, Data Architecture, Data Management, Amazon RDS, Data-driven Dashboards, Data Pipelines, ETL, Looker, Snowflake, Data Wrangling, Dashboards, Data Manipulation, Data Science, Amazon QuickSight, Shell Scripting, JavaScript, MySQL Performance Tuning, Benchmarking, Databases, Performance, Data Reporting, Pandas, PDF Scraping, REST, HTML, CSS, REST APIs, Algorithms, Data Structures, Software Development, Optimization, Cloud, eCommerce, Excel Macros, Database Modeling, Neural Networks, SaaS, NumPy, Scikit-learn, API Integration, Natural Language Processing (NLP), Firebase, Serverless, Python 3, Git, Hadoop, SciPy, Jupyter, Jupyter Notebook, TensorFlow, Node.js, Amazon API, Data, Apache Spark, Data Integration, Orchestration, Monitoring, Data Auditing, Agile, Dedicated SQL Pool (formerly SQL DW), Azure SQL Data Warehouse, Transact-SQL (T-SQL), Multithreading, Entity Relationships, PL/SQL, Stored Procedure, Software Design, Workflow, Microservices, Microservices Architecture, R, AWS Cloud Architecture, Performance Tuning, Database Design, SSH, Data Encoding, Database Architecture, Query Optimization, DB, Back-end Architecture, Production Support, Star Schema, Cloud Infrastructure, Serverless Architecture, POS, RDBMS, Directed Acrylic Graphs (DAG), Data Cleaning, ETL Pipelines, Data Cleansing, Recommendation Systems, Scripting, Python Script, System Design, Medallion Architecture, Integration, Serverless Framework, PostGIS, GeoJSON, Geospatial Data, Spatial Data Infrastructure, DataOps, Third-party APIs, Relational Databases

Experience

NASA API Python Wrapper

https://pypi.org/project/python-nasa/

An unofficial Python wrapper for the NASA API based on the official NASA API documentation, https://api.nasa.gov/. This project is an open source project that I did to improve my portfolio and enhance my knowledge of developing an API wrapper.

Scalable Web Scraper

We developed and deployed a scalable web scraper on GCP. We use Airflow as the workflow orchestrator with CeleryExecutor under Redis Broker. I set up those infrastructures so the scraping process could be done concurrently.

Then for the transformation, we use PySpark deployed on Dataproc. We manifest Serverless Spark Dataproc to make our transformation pipeline cost-effective. We use GCS as the data lake, so all data ingested from the website will reside in GCS and the transformation output. The clean data will then be stored in BigQuery using the BigQuery load job, also orchestrated on Airflow. When the data arrives on BigQuery, the stakeholder dashboard will automatically be updated with the recent data. We also set up a rotating proxy to avoid getting caught as a bot.

Data Pipeline on GCP

Developed a data pipeline from third-party APIs to BigQuery using Airflow and an in-house framework. I implemented incremental load to the system to retrieve only new data, avoiding unnecessary full load.

Serverless Chi Boilerplate

https://github.com/serverless-boilerplate/serverless-chi

A serverless boilerplate of Go Chi Web Framework in AWS Lambda with Cognito auth management and DynamoDB database. Streaming to data lake implementation is also included. The main reason of this implementation is because Go has the best AWS Lambda benchmark compared to other runtimes.

Education

2015 - 2019

Bachelor's Degree in Computer Science

Gadjah Mada University - Yogyakarta, Indonesia

Certifications

FEBRUARY 2022 - PRESENT

Infrastructure Automation with Terraform Cloud

Udemy

JANUARY 2022 - PRESENT

Google Cloud Professional Data Engineer

Udemy

Skills

Libraries/APIs

PySpark, Pandas, Asyncio, Python API, REST APIs, NumPy, Shapely, Scikit-learn, Node.js, OpenAPI, Amazon API, SQLAlchemy, Telegram Bot API, PyTorch, API Development, Spark ML, OpenCV, X (formerly Twitter) API, SciPy, TensorFlow, Interactive Brokers API, Stripe API, Pydantic, Playwright, Stripe, Luigi, Snowpark, OpenAI API, WinAPI

Tools

BigQuery, Apache Airflow, GitHub, Terraform, AWS Glue, Microsoft Power BI, Tableau, Amazon Elastic MapReduce (EMR), Amazon QuickSight, AWS Step Functions, MySQL Performance Tuning, Amazon ElastiCache, Amazon Simple Notification Service (SNS), Git, Jupyter, Pytest, Kibana, Cloud Dataflow, Apache Beam, Celery, RabbitMQ, Amazon Simple Queue Service (SQS), Amazon Elastic Container Service (ECS), AWS CloudFormation, Logging, AWS IAM, Docker Compose, Redash, Amazon CloudWatch, Amazon Athena, Amazon Redshift Spectrum, Looker, Amazon EKS, Google Analytics, Amazon Cognito, GIS, GRASS GIS, PhpStorm, Navicat, MongoDB Atlas, Amazon SageMaker, Observability Tools, Grafana, AWS Fargate, Claude, Amazon Kinesis Data Firehose, Claude Code, Stitch Data, Jira, Domo, Google Cloud Dataproc, AWS Batch, Retool

Languages

Python, SQL, Snowflake, JavaScript, HTML, Python 3, Transact-SQL (T-SQL), Stored Procedure, Go, TypeScript, Python Script, Rust, GraphQL, CSS, PHP, R, Scala

Frameworks

Django, Swagger, Flask, Hadoop, Scrapy, Apache Spark, Data Lakehouse, Jinja, Fastify, Serverless Framework, Spark, Flutter, CodeIgniter, NestJS, Express.js, OAuth 2, Selenium, Streamlit, Kedro

Paradigms

Business Intelligence (BI), ETL, MapReduce, Stress Testing, REST, Data-driven Design, Design Patterns, Microservices, Microservices Architecture, Database Design, Domain-driven Development, Back-end Architecture, Serverless Architecture, Unit Testing, Event-driven Architecture, API Architecture, Real-time Systems, Kanban, Agile Project Management, DevOps, Agile, Load Testing, API Observability, High-performance Computing (HPC), Object-oriented Design (OOD), Object-oriented Programming (OOP), Distributed Computing, Dimensional Modeling, HIPAA Compliance

Platforms

Visual Studio Code (VS Code), Linux, Docker, Google Cloud Platform (GCP), Amazon Web Services (AWS), AWS Lambda, AWS Elastic Beanstalk, SharePoint, Jupyter Notebook, Kubernetes, Amazon EC2, Oracle Database, Azure, Apache Kafka, Oracle, Databricks, Firebase, Azure Synapse, Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW), MetaTrader, MetaTrader 5, Blockchain, Kubeflow, Vertex AI, Cloud Run

Storage

Amazon S3 (AWS S3), MySQL, PostgreSQL, MongoDB, Google Cloud Storage, Microsoft SQL Server, NoSQL, Data Lakes, Database Migration, Amazon Aurora, Data Pipelines, Redis, Elasticsearch, Databases, Amazon DynamoDB, Database Modeling, Data Integration, PL/SQL, Data Lake Design, Database Architecture, ClickHouse, DB, RDBMS, PostGIS, Relational Databases, Database Administration (DBA), Redshift, Neo4j, Dynamic SQL, Alibaba Cloud, Google Cloud, IIS SQL Server, Cloud Firestore

Industry Expertise

Bioinformatics

Other

Conda, Machine Learning, Google BigQuery, Data Engineering, Data Modeling, Data Migration, ETL Tools, Data Analytics, Data Analysis, Data Architecture, Data Management, Amazon RDS, CDC, Data Build Tool (dbt), Cloud Migration, ELT, Big Data Architecture, Architecture, Big Data, Project Planning, Web Scraping, Scraping, Data Wrangling, Azure Databricks, APIs, Excel 365, Dashboards, Data Manipulation, Shell Scripting, Benchmarking, Performance, Performance Testing, Caching, Data Reporting, Software Architecture, Back-end, Artificial Intelligence (AI), Data Scraping, PDF Scraping, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, eCommerce, Excel Macros, Automated Trading Software, SaaS, GeoPandas, API Integration, Natural Language Processing (NLP), Serverless, Lint, Consumer Packaged Goods (CPG), Back-end Development, FastAPI, Extensions, Data, Streaming Data, Data Governance, Orchestration, Solution Architecture, Technical Architecture, Monitoring, Multithreading, Entity Relationships, Software Design, Workflow, API Design, AWS Cloud Architecture, Performance Tuning, Amazon API Gateway, SSH, AWS DevOps, OpenTelemetry, OAuth, Data Encoding, Distributed Systems, Fintech, Time Series Databases, Query Optimization, Containerization, Production Support, Star Schema, Crypto, WebSockets, Cloud Infrastructure, Infrastructure as Code (IaC), Machine Learning Operations (MLOps), Computer Vision, Technical Leadership, RESTFul APIs, Directed Acrylic Graphs (DAG), Distributed Architecture, Data Cleaning, ETL Pipelines, Data Cleansing, Generative Artificial Intelligence (GenAI), Scripting, System Design, Delta Lake, Medallion Architecture, Integration, Quantitative Finance, GeoJSON, Geospatial Data, Spatial Data Infrastructure, Amazon Redshift, Latency & Throughput Analysis, Agentic Coding, DataOps, Financial Data, Observability, Third-party APIs, Finance, Leadership, Financial APIs, Cryptography, Research, Data Warehousing, Data Visualization, Metabase, Google Data Studio, CI/CD Pipelines, GitHub Actions, Scripting Languages, Data-driven Dashboards, Azure Data Factory (ADF), Technical Project Management, Azure Data Lake, Data Science, Business Analysis, Tesseract, QGIS, OpenAI GPT-3 API, Neural Networks, eCommerce APIs, Generative Pre-trained Transformers (GPT), LangChain, SharePoint Online, Data Auditing, Business Architecture, Enterprise Architecture, Mathematics, Web Scalability, Prometheus, SDKs, Data Stewardship, TypeORM, Nonlinear Optimization, Linear Optimization, TimescaleDB, AI Chatbots, POS, CAPTCHA, Reverse Engineering, Amazon Glacier, Message Queues, Large Language Models (LLMs), AI Agents, GraphDB, Recommendation Systems, Retrieval-augmented Generation (RAG), Pinecone, Vector Search, HIPAA, Security, AI Tools, Compliance, Mentorship, Stock Trading, Stock Price Analysis, Trading, Amazon EventBridge, Agentic AI, Amazon Kinesis, Real-time Processing, API Gateways, Load Balancers, Proxy Servers, Telemetry, HAProxy, AI-assisted Development, Bots, Quantitative Modeling, Finance APIs, Healthcare Services, Amazon Neptune, Dataproc, Credit Modeling, OpenAI, Pulumi, Word Embedding, UI Automation, Robotic Process Automation (RPA), Cloudflare, AI Automation, Mentorship & Coaching, Team Mentoring, Amazon MSK, Parquet, Geohash

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring