Faisal Malik Widya Prasetya
Verified Expert in Engineering
Data Engineer and Developer
Sleman Sub-District, Sleman Regency, Special Region of Yogyakarta, Indonesia
Toptal member since April 25, 2022
Faisal is a data engineer specializing in cloud data technologies like Google and AWS and end-to-end data engineering processes. From designing the architecture and building the infrastructure to developing pipeline operations, he is highly adaptable to new cloud-based, open source, or SaaS technologies. Faisal has solid experience contributing to early-stage startups by directly building end-to-end data pipelines or providing consulting services in his fields of expertise.
Portfolio
Experience
- Python - 5 years
- Google Cloud Platform (GCP) - 4 years
- BigQuery - 4 years
- Apache Airflow - 4 years
- Amazon Web Services (AWS) - 3 years
- PySpark - 3 years
- Data Warehousing - 3 years
- AWS Lambda - 3 years
Availability
Preferred Environment
Visual Studio Code (VS Code), Conda, Linux, Docker, Docker Compose, Google Cloud Platform (GCP), Amazon Web Services (AWS), Jira, OpenAI
The most amazing...
...project I've ever done was implementing a cost optimization strategy on the client data warehouse, reducing BI usage costs up to 100 times.
Work Experience
Senior Software Engineer
Pathbox AI Inc.
- Developed a serverless REST API on AWS Lambda, API Gateway, Cognito Aurora Serverless, and DynamoDB using the Express.js web framework.
- Built an optimized machine learning inference system using ECS tasks on Fargate. Allowed parallelization by utilizing SQS and concurrent ECS tasks.
- Developed machine learning inference with GPU enabled using AWS Batch.
- Standardized the machine learning workflow from dataset collection, data preprocessing, model setup, training, and validation to inference deployment.
- Implemented analytics on workflow operations to monitor and optimize the process further.
Web Scraping Expert
Burak Karakaya
- Developed a real-time web scraper to scrape data from various sources, such as Twitter, Binance Futures Leaderboard, etc., to feed data to the client's trading bot. The scraper can ingest tweets within 200 ms after it is published.
- Provided the infrastructure on AWS to enable a high-performance network to enable the scraper to work in real time. I set up the IP rotation so that the scraper didn't get blocked by bypassing the IP rate limit from the news sources.
- Provided an interface for non-technical clients to administer and operate the scraper conveniently. I use Streamlit and FastAPI to develop these interfaces.
- Utilized Redis and high-performance Python extensions like C to improve the storage and runtime performance of the scraper.
Data Engineer
XpressLane, Inc.
- Developed scraping tools to scrape data from various websites and push it to BigQuery.
- Created development and operations documentation so that the client could maintain the solution and can develop more features on it in the future.
- Delivered reports and dashboards to clients from the scraped data to help clients better make decisions for M&A use cases.
Senior Data Engineer
Toptal
- Designed and implemented a robust data pipeline that extracted data from multiple marketing tools and APIs like Google Ads, Facebook Ads, and Twitter Ads, and transferred it to BigQuery using in-house data pipeline tools based on Luigi.
- Created a data pipeline solution that efficiently extracted data from various learning platforms such as Polly, Udemy, and Lessonly and consolidated it with BigQuery utilizing Composer, a managed Apache Airflow service provided by GCP.
- Participated in the data engineering team split brainstorming session and came up with the idea of breaking the team into the data platform and analytics engineering teams. The analytics engineering team focuses on ETL logic, while the data platform team maintains the infrastructure.
Data Engineer
QuantumBlack
- Developed internal data analytics tools that can simplify deployment on the client site. The feature I built is to ingest data from various sources and store them incrementally on Snowflake.
- Handled a client request to build a data analytics pipeline and APIs.
- Worked closely with clients' analytics teams and leadership to gather analytics requirements and carefully plan from the architecture design, to implementation and delivery.
Senior Data Engineer
Flip
- Built a data analytics ecosystem using native Google Cloud Platform technologies, such as Datastream, Google Cloud Storage, Pub/Sub, Dataflow, and BigQuery.
- Improved the analytics waiting time from a 3-hour worst-case scenario to 30 seconds for one big report.
- Maintained the legacy technologies for data analytics on MySQL and on-server cron jobs by creating scheduled jobs on a heavy but frequently used query. The heavy query was accessible in less than 30 minutes with daily data freshness.
- Built the data engineering team and onboarded team members on the legacy, current, and future implementation.
Data Engineer
Pintu
- Developed an ELT data pipeline on Amazon EC2. It is turned on and off by AWS Lambda, triggered by using CloudWatch scheduler from various data sources (MySQL, PostgreSQL, MongoDB, Google Sheets, crypto exchange APIs) to the BigQuery data warehouse.
- Implemented partition, clustering, and materialized views on BigQuery and reduced the cost of analytics by up to 100 times.
- Collaborated with the financial expert to generate the optimum market-making strategy. Implemented and improved the model on the published paper, increasing the liquidity and market activity of the owned asset by 67%.
- Developed a fraud detection system to alert fraudulent activity in case of a security breach on the system. This alert notifies the executive team and captures the fraudster within four hours. It secured $2 million worth of assets.
- Trained the business users to develop their own BI reporting using Metabase and Google Data Studio. It led to 70% of Metabase reports being created by the business team, while the other 30% required complex queries.
- Led the data analytics team and implemented an agile culture by running sprint planning, standup, and sprint retrospective meetings. It allowed tracking business user requests, data pipeline issues, and improvements.
Data Engineer
Kulina
- Developed ELT processes from application databases, third-party marketing tools, and Google Sheets to BigQuery using Stitch data, which reduced the number of query conflicts on the production database, indirectly improving application performance.
- Developed the Snowflake schema on the data warehouse, increasing data visibility among the business team.
- Deployed, maintained, and administered several BI tools, such as Redash, Data Studio, and Metabase, to gain data governance at the business unit level and answer data-related questions with proper tools.
Experience
NASA API Python Wrapper
https://pypi.org/project/python-nasa/Scalable Web Scraper
Then for the transformation, we use PySpark deployed on Dataproc. We manifest Serverless Spark Dataproc to make our transformation pipeline cost-effective. We use GCS as the data lake, so all data ingested from the website will reside in GCS and the transformation output. The clean data will then be stored in BigQuery using the BigQuery load job, also orchestrated on Airflow. When the data arrives on BigQuery, the stakeholder dashboard will automatically be updated with the recent data. We also set up a rotating proxy to avoid getting caught as a bot.
Data Pipeline on GCP
Serverless Chi Boilerplate
https://github.com/serverless-boilerplate/serverless-chiEducation
Bachelor's Degree in Computer Science
Gadjah Mada University - Yogyakarta, Indonesia
Certifications
Infrastructure Automation with Terraform Cloud
Udemy
Google Cloud Professional Data Engineer
Udemy
Skills
Libraries/APIs
Pandas, Asyncio, Python API, REST APIs, NumPy, Shapely, Scikit-learn, Node.js, OpenAPI, Amazon API, SQLAlchemy, PySpark, Spark ML, OpenCV, X (formerly Twitter) API, SciPy, TensorFlow, Interactive Brokers API, Luigi, Snowpark
Tools
BigQuery, Apache Airflow, GitHub, AWS Glue, Microsoft Power BI, Tableau, Amazon Elastic MapReduce (EMR), Amazon QuickSight, AWS Step Functions, MySQL Performance Tuning, Amazon ElastiCache, Amazon Simple Notification Service (SNS), Git, Jupyter, Pytest, Kibana, Cloud Dataflow, Apache Beam, Celery, RabbitMQ, Amazon Simple Queue Service (SQS), Amazon Elastic Container Service (ECS), Docker Compose, Redash, Amazon CloudWatch, Terraform, Amazon Athena, Amazon Redshift Spectrum, Looker, Amazon EKS, Google Analytics, Amazon Cognito, GIS, GRASS GIS, PhpStorm, Navicat, MongoDB Atlas, Amazon SageMaker, Observability Tools, Stitch Data, Jira, Domo, Google Cloud Dataproc, AWS Batch
Languages
Python, SQL, Snowflake, JavaScript, HTML, Python 3, T-SQL (Transact-SQL), Stored Procedure, TypeScript, GraphQL, CSS, PHP, Go, R, Scala
Frameworks
Django, Swagger, Flask, Hadoop, Scrapy, Apache Spark, Data Lakehouse, Spark, Flutter, CodeIgniter, NestJS, Kedro
Paradigms
Business Intelligence (BI), ETL, MapReduce, Stress Testing, REST, Data-driven Design, Design Patterns, Microservices, Microservices Architecture, Database Design, Domain-driven Development, Kanban, Agile Project Management, DevOps, Agile, Load Testing, API Observability, Object-oriented Design (OOD), Object-oriented Programming (OOP), Distributed Computing, Dimensional Modeling
Platforms
Visual Studio Code (VS Code), Linux, Google Cloud Platform (GCP), Amazon Web Services (AWS), AWS Lambda, AWS Elastic Beanstalk, SharePoint, Jupyter Notebook, Docker, Amazon EC2, Oracle Database, Azure, Apache Kafka, Oracle, Databricks, Firebase, Azure Synapse, Kubernetes, Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW)
Storage
MySQL, PostgreSQL, Google Cloud Storage, Microsoft SQL Server, NoSQL, Data Lakes, Database Migration, Amazon Aurora, Data Pipelines, Elasticsearch, Databases, Amazon DynamoDB, Database Modeling, Data Integration, PL/SQL, Data Lake Design, Amazon S3 (AWS S3), MongoDB, Database Administration (DBA), Redshift, Neo4j, Dynamic SQL, Alibaba Cloud, Google Cloud, IIS SQL Server, Redis
Other
Conda, Machine Learning, Google BigQuery, Data Engineering, Data Modeling, Data Migration, ETL Tools, Data Analytics, Data Analysis, Data Architecture, Data Management, Amazon RDS, CDC, Data Build Tool (dbt), Cloud Migration, ELT, Big Data Architecture, Architecture, Big Data, Project Planning, Web Scraping, Scraping, Data Wrangling, APIs, Excel 365, Dashboards, Data Manipulation, Shell Scripting, Benchmarking, Performance, Performance Testing, Caching, Data Reporting, Software Architecture, Back-end, Artificial Intelligence (AI), Data Scraping, PDF Scraping, Scalability, Algorithms, Data Structures, Software Development, Optimization, Cloud, eCommerce, Excel Macros, Automated Trading Software, SaaS, GeoPandas, API Integration, Natural Language Processing (NLP), Serverless, Lint, Consumer Packaged Goods (CPG), Back-end Development, FastAPI, Extensions, Data, Streaming Data, Data Governance, Orchestration, Solution Architecture, Technical Architecture, Monitoring, Multithreading, Entity Relationships, Software Design, Workflow, API Design, AWS Cloud Architecture, Performance Tuning, Amazon API Gateway, SSH, AWS DevOps, OpenTelemetry, OAuth, Data Encoding, Cryptography, Research, Data Warehousing, Data Visualization, Metabase, Google Data Studio, CI/CD Pipelines, GitHub Actions, Scripting Languages, Data-driven Dashboards, Azure Data Factory, Technical Project Management, Azure Data Lake, Azure Databricks, Data Science, Business Analysis, Tesseract, QGIS, OpenAI GPT-3 API, Neural Networks, eCommerce APIs, Generative Pre-trained Transformers (GPT), LangChain, SharePoint Online, Data Auditing, Business Architecture, Enterprise Architecture, Mathematics, Web Scalability, Prometheus, SDKs, Data Stewardship, TypeORM, Nonlinear Optimization, Linear Optimization, Amazon Neptune, Dataproc, Credit Modeling, OpenAI
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring