Sung is available for hire

Sung Jun (Andrew) Kim

Verified Expert in Engineering

Big Data Developer

Location

Sydney, New South Wales, Australia

Toptal Member Since

June 18, 2020

As a highly effective technical leader with over 20 years of experience, Andrew specializes in data: integration, conversion, engineering, analytics, visualization, science, ETL, big data architecture, analytics platforms, and cloud architecture. He has an array of skills in building data platforms, analytic consulting, trend monitoring, data modeling, data governance, and machine learning.

Portfolio

Med Tech Solutions

Microsoft Power BI, Azure SQL, Azure Data Factory, Azure Synapse, APIs...

Shippo

Amazon Web Services (AWS), Amazon Athena, AWS Glue, Apache Airflow, Spark...

Verizon

Spark, Scala, Phoenix, HBase, Hadoop

Experience

Big Data - 19 years Microsoft Power BI - 15 years Data Visualization - 15 years ETL - 15 years Databricks - 5 years Spark - 5 years Data Warehouse Design - 5 years Azure Data Factory - 4 years

Availability

Full-time

Preferred Environment

Informatica, SQL, PySpark, Spark, Hadoop, Data Visualization, Data Warehouse Design

The most amazing...

...thing I've coded is a data ingestion and transformation algorithm to explode and normalize very complex multi-hierarchal data structure.

Work Experience

Data Engineer

2021 - PRESENT

Med Tech Solutions

Managed data warehouse modeling of system resources from multiple data sources, including MySQL, Azure SQL, and APIs.
Worked on data ingestion and transformation using Azure data stacks, including Data Factory, Synapse, Databricks, SQL, and Elasticsearch.
Created multiple Power BI paginated reports and Power BI reports and dashboards.

Technologies: Microsoft Power BI, Azure SQL, Azure Data Factory, Azure Synapse, APIs, Snowflake, Data Build Tool (dbt), Azure Databricks

Senior Data Engineer

2022 - 2022

Shippo

Designed and developed data pipelines to ingest shipping data, transform, and aggregate using S3, AWS Glue, Spark (PySpark), S3, Athena, and Postgres.
Developed end-to-end data pipeline workflow using Airflow.
Created end-to-end architecture design for the end-to-end data platform, pipeline, and storage.

Technologies: Amazon Web Services (AWS), Amazon Athena, AWS Glue, Apache Airflow, Spark, Spark SQL, Docker, PostgreSQL 9, Data Build Tool (dbt)

Big Data Engineer

2020 - 2021

Verizon

Worked on troubleshooting of Hadoop and HBase clusters, Phoenix, and Spark crash.
Reviewed Spark (Scala) code to improve performance and optimization.
Reviewed Spark, Hbase, and Phonex server configurations.
Recommended the right configuration and Scala code changes to prevent server crashes and optimal performance.

Technologies: Spark, Scala, Phoenix, HBase, Hadoop

Lead Data Architect | Data Engineer

2020 - 2021

ImportGenius

Designed the entire ETL/ELT by using Informatica PowerCenter and BDM process for a massive global trade data pipeline.
Ingested big data (more than 10 TB) of ten years of global trade data, parsed and transformed using AWS S3, Glue, Spark (PySpark), and Athena.
Ingested and transformed S3 data into AWS Elasticsearch.
Optimized Glue and Spark performance and AWS Elasticsearch.

Technologies: Elasticsearch, PySpark, Spark, Amazon Athena, AWS Glue, Amazon S3 (AWS S3), ELK (Elastic Stack), Informatica ETL

Data Engineer

2020 - 2020

Recko

Designed and implemented CDC pipeline from PostgresSQL and MySQL to S3 using Kafka, Kafka Connect, Debezium, NiFi, and Python.
Designed and developed Lambda function, which ingests payload from S3 event and transforms payload data into readable audit data then writes to S3 in parquet format. Then, created a Glue Data Catalog for external schema and external table creation.
Created AWS Glue PySpark, workflow, and triggers that ingest and transform data from S3 to Redshift. Designed a Redshift table and schema.
Set and configured Kafka, Debezium, and Kafka Connect server configurations.
Wrote SQL queries for the latest data from ELT procedures using SQL Windows functions.
Set up ELK stack including Logstach, Filebeat, and Kibana. Created ELK index and ingested transaction data into ELK index using Logstash and API.

Technologies: Python 3, Debezium, Apache Kafka, PySpark, Apache NiFi, PostgreSQL, AWS Lambda, Redshift, AWS Glue, Amazon S3 (AWS S3)

Data Engineer

2020 - 2020

Dermalogica Unilever

Created ETL from JDE ERP system to Data Warehouse using SQL and SSIS.
Designed the DW data model.
Created batch SQL from various systems to DW.
Created Power BI Data Model.
Wrote complex DAX functions and in Power BI and Power Pivot.
Transformed data using M-Query.
Created sales/revenue and field service dashboard for consultants.
Implemented YTD, QTD, and other visuals.

Technologies: Azure SQL, Microsoft Power BI

Data Engineer

2019 - 2020

10th Man Media

Designed and implemented data ingestion and transformation framework from various social media to the Azure platform. Social media data is extracted using API then ingested into Data Lake using ADF.
Created complex data transformation logic using PyParks which involves time series trend, aggregation, and time windows comparison. Data is moved to downstream Azure Data Warehouse for Power BI data visualization.
Designed the entire pipeline from upstream to downstream using Azure Data products.

Technologies: Azure Blob Storage API, Azure Data Lake, Data Warehouse Design, Data Warehousing, SQL, Databricks, Azure Data Factory, Azure

Big Data Architect | Lead Data Engineer

2019 - 2019

TechMahindra/Optus

Spearheaded big data architecture and engineering under the ambit of Optus, developed a proof of concept (POC), architecture design, drive analytics, and managed technical project delivery in line with expectations.
Led legacy DW migration project from Teradata to Cloudera using Informatica PowerCenter, Informatica BDM, Scala, Hive, HDFS, Impala, Elastic Stack, Splunk, DevOps, and CI/CD.
Migrated Cloudera Data Platform data and code to AWS and Azure Platform.

Technologies: Teradata, Informatica, HBase, Apache Hive, Spark, Hadoop, Big Data

Lead Big Data Architect | Lead Data Engineer

2018 - 2019

Cognizant/Westpac

Led a big data team of data engineers/developers and delivered real-time and batch data processing projects using Agile Scrum.
Designed and delivered a metadata-driven data ingestion framework that ingests data from various Westpac data sources to Westpac Data Hub (HDFS).
Integrated, transformed, and published metadata-driven data ingestion framework to target sources including Kafka, RDBMS (Teradata, Oracle, and SQL Server) and SFTP, and more.
Used Python, Spark, Spark SQL, Hadoop, HDFS, Hive, Hbase, Kafka, NIFI, and Atlas.
Led CCR project which ingests data from the customer rating bureau including Equifax, Illion, and Experian.
Designed the entire XML explosion pattern which involves multi-level XML explosion and normalized table creation in the HDFS platform using PySpark, Hive, Spark SQL, and Hbase.
Created entire downstream conceptual, logical, and physical data models for downstream users including credit risk analysts and data scientists.

Technologies: Hadoop, Spark SQL, PySpark, Python, Microsoft SQL Server, Oracle, Teradata, RDBMS, Apache Kafka, Foundry

Analytic Lead | Data Architect

2009 - 2018

OneGov, Department of Finance & Services and Innovation, NSW Government, Sydney, Australia

Managed BI team of 8 BI/ETL developers and was responsible for OneGov’s entire analytic, data science and big data projects and BAU activities for a number of large NSW government agencies including DAC (Data Analytic Centre), Service NSW, RMS, Fair Trading, NSW Health, etc. Worked closely with product owner, scrum master, developers, BA, architects, support team, external agency users and other stakeholders then delivered a number of critical analytic projects successfully.
Delivered the entire analytics platform, applications, data visualization, prediction model, and ETL process from scratch and continuously enhanced the system by adopting new technologies and new processes. Developed ETL process using SSIS (2016) which integrated data from sources including SQL Server 2016, Siebel CRM, websites via APIs and flat files (CSV/XLSX/XLS/XML/JSON). Responsible for daily ETL refresh and on-going maintenance. Also responsible for SQL Server database tuning, upgrade, query optimization, and also index maintenance. Built SSAS cube for KPI and management reporting. Also built a number of dashboard and reports for executives, managers and operation people using Power BI, DOMO, Tableau, and OBIEE.
Created a prediction model for license renewal reminder campaign using Logistic Regression, Petrol Station Grouping model by using unsupervised learning technique (K-Means cluster). Involved several other machine learning projects including CTP and Fuel Pricing in NSW using various ML libraries on Hortonworks Hadoop Platform.
Created a dashboard for ministers to monitor fuel price update, compliance, and price trend using Power BI and DOMO. Analyzed sophisticated real-time and historical fuel price by using Python, Spark (PySpark), and Hive. Analyzed customer feedback using NLP/data mining techniques with R programming.
Built HDP (Hadoop cluster) and HDF (NIFI) clusters for data scientists and academics for their large data analytic and prediction model build. Public and confidential data ingested across from AWS EMR/S3/Redshift to on-premise Hadoop using Spark ETL framework program, Glue and NIFI. Provided consulting service for data ingests and other big data technologies to data scientists and engineers.
Developed data ingest flow from various data sources to Hive by using Spark, NIFI, HDFS, and Sqoop in near real-time basis for Service NSW OTC. Managed the entire Hadoop cluster including day to day server maintenance and daily delta data ingest. Power BI is used for data visualization.

Technologies: Amazon Web Services (AWS), PySpark, R, Python, Superset, System Advisor Model (SAM), Apache Hive, NiFi, HDF, Cluster, Hadoop, Informatica, Oracle Business Intelligence Enterprise Edition 11g (OBIEE), Oracle, Azure, Microsoft Power BI, SQL Server Reporting Services (SSRS), SQL Server Integration Services (SSIS), SSAS, SQL

CRM Lead | BI Lead

2007 - 2008

IBM Global Business Service

Delivered core case management system, integration services to internal/external systems, and upgraded detention portal system.
Led a team of six consultants and responsible for the implementation of main case management modules.
Handled resource management, task distribution, and schedule management.
Wrote technical and integration specification.
Configured various Siebel Public Sector Case Management.
Created SOA Integration interface to the department.
Implemented Oracle Business Intelligence Enterprise Edition.
Delivered unified systems for border security, case management, and detention for national security. The system involved a complicated process which started from border entry to granting of a visa.
Contributed to the team awarded by Secretary of Department of Immigration and Citizen on 26th Jan, 2010 “Australia Day 2010 secretary’s citation’ for the delivery of the Service provide Portal within the Systems for People 9 Compliance, Case Management and Detention Release.”.

Technologies: Siebel CRM, Oracle Business Intelligence Applications (OBIA)

Program Manager (BI/CRM)

2004 - 2008

Samsung

Designed and implemented CRM and analytics.
Created application standard, interface and configuration framework, and development guideline.
Integrated Siebel.
Converted data using SQL and other ETL tools.
Performed technical requirement analysis, configuration, and report creation.
Installed and configured OBIEE.
Installed and configured data warehouse including environment setup, DAC, Informatica ETL modification, data model change, performance tuning, and optimization.
Designed system architecture and sized hardware.
Provided various in-house Siebel technical and business consulting as BI and CRM subject matter expert.
Managed the team and mentored junior team members.

Technologies: SQL Server Integration Services (SSIS), Informatica, Microsoft SQL Server, Siebel CRM, Oracle Business Intelligence Applications (OBIA)

Senior Principle Consultant

2000 - 2004

Oracle (Siebel)

Engaged multiple Siebel CRM/Analytic projects across Asia Pacific with leading multinational customers and partners. Provided various technical, system design, business requirement analysis, and project management services to the partners and customers. This included technical system architect design and implementation, enterprise application integration (EAI), project management, and application configuration. Involved responses to RFP and RFI, wrote consulting proposals, supported pre-sales, resource planning, mentored junior consultants, team lead, practice development, and management and operational procedures for consulting assignments.

Technologies: Oracle Business Intelligence Applications (OBIA), Siebel CRM

Lead DBA

1998 - 2000

SIEMENS

Handled database management, administration, data conversion and migration, SQL and database engine turning and optimization and release of a new database.

Technologies: Sybase, Microsoft SQL Server

Senior Development DBA

1997 - 1999

Bankers Trust Fund Management

Successfully delivered a number of projects, including the Unit Trust System Database Conversion from SQLBase to Microsoft SQL Server, the Investment Product Marketing Data Mart/Warehouse ET, and the Web Data Warehouse Reporting projects.
Developed an Informatica-ETL workflow to extract and load a number of DWs.
Created a star schema-based fund management DW and data mart.

Technologies: Informatica, Oracle, Sybase, Microsoft SQL Server, ETL

Senior Systems Developer

1995 - 1997

Colonial Insurance

Headed various system analysis, design, data modeling, programming, and testing as well as internal technical and external consultation and support. The role also has included analysis, design, implementation, and support of two mission-critical systems: UPMS (Unit Price Management System) and New Business 400 system.

Technologies: Sybase, Microsoft SQL Server, C++

Senior Systems Analyst/Programmer

1995 - 1995

Reserve Bank of Australia

Served as the system analyst/programmer in designing, developing and implementing various banking applications and automated fund transfer systems for the central bank of Australia.
Oversaw the development process and managed the integration of various internal and external systems, reporting processes and applications to streamline and simplify the external as well as internal reporting activities.

Technologies: C++

Experience

Optus Big Data Project

Lead big data architect for Optus legacy data warehouse migration project which migrates data from legacy Teradata to Cloudera Big Data Platform and designed data ingestion/transformation framework using Informatica BDM, Scala, and DevOps.

Westpac Big Data Platform

I served as lead solution architect and lead engineer on Westpac’s big data platform and comprehensive credit reporting projects. I also led a team of data engineers and solution engineers. My contributions also included creating a data platform solution that was delivered via AWS data stacks (S3, Glue, Lambda) and Palantir Foundry.

NSW Government's Analytic Platform Build

I successfully delivered award-winning NSW State Government’s in-house and cloud big data, data science, and business intelligence projects as the lead data architect.

Skillset

Languages

Scala, Python 2, Python, R, JavaScript, Visual Basic for Applications (VBA), SQL, C++, Python 3, Snowflake

Frameworks

Angular, Hadoop, Spark, YARN, Flutter, React Native, Redux, Phoenix, TOGAF

Libraries/APIs

Node.js, Flask-RESTful, PySpark, MLlib, TensorFlow, Stanford NLP, Ggplot2, React, Azure Blob Storage API

Tools

ELK (Elastic Stack), Kibana, Logstash, cURL Command Line Tool, Dplyr, Superset, Solr, Apache Sqoop, Impala, Cloudera, SSAS, Domo, Oracle Business Intelligence Enterprise Edition 11g (OBIEE), Microsoft Power BI, Tableau, Amazon Athena, AWS Glue, Azure HDInsight, Spark SQL, Oracle Business Intelligence Applications (OBIA), Siebel CRM, Cluster, Apache NiFi, Synapse, Apache Airflow, Informatica ETL

Paradigms

ETL, Data Science, OLAP, System Advisor Model (SAM)

Platforms

Firebase, Amazon Web Services (AWS), Azure, RStudio, Apache Kafka, Hortonworks Data Platform (HDP), Oracle, Databricks, Android, iOS, AWS Lambda, Azure Synapse, Docker

Storage

Oracle RDBMS, Elasticsearch, HDFS, Apache Hive, Essbase, PostgreSQL, MySQL, Teradata, Microsoft SQL Server, Redshift, Amazon DynamoDB, Amazon S3 (AWS S3), Azure Blobs, HBase, RDBMS, SQL Server Integration Services (SSIS), Sybase, SQL Server Reporting Services (SSRS), Azure SQL

Other

APIs, Big Data, Data Visualization, Filebeat, Microsoft Data Transformation Services (now SSIS), Informatica, Engineering, Schemas, Ranger, NiFi, DAX, Data Warehouse Design, Software Development, Freelancing, Palantir, React Native Bridge, Foundry, HDF, Debezium, Data Warehousing, Azure Data Lake, Computer Science, Information Systems, Azure Data Factory, Microsoft Azure, Enterprise Architecture, Solution Architecture, PostgreSQL 9, Analytics, Data Build Tool (dbt), Azure Databricks

Education

1993 - 1996

Master of Science Degree in Computer Science

University of Technology, Sydney (UTS) - Sydney, Australia

1990 - 1993

Bachelor's Degree in Information and Communication Systems

Macquarie University - Sydney, Australia

Certifications

OCTOBER 2021 - PRESENT

Microsoft Certified Azure Data Engineer Associate

Microsoft

JUNE 2021 - PRESENT

TOGAF Certified Enterprise Architect

The Open Group

MARCH 2020 - MARCH 2023

AWS Certified Data Analytics - Specialty

AWS

MARCH 2004 - PRESENT

PMP

PMI

MARCH 1999 - PRESENT

Microsoft Certified DBA

Microsoft

JANUARY 1998 - PRESENT

Oracle Certified DBA

Oracle

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring