José Molano, Developer in Bogotá - Bogota, Colombia
José is available for hire
Hire José

José Molano

Verified Expert  in Engineering

Data Engineer and Developer

Location
Bogotá - Bogota, Colombia
Toptal Member Since
November 1, 2022

José is a data engineer with more than six years of experience in extract, transform and load (ETL) pipeline development, data warehouse and data lake design, query performance tuning, and database cloud infrastructure management. With his background spanning multiple domains, José has designed and built scalable data platforms in contexts like ticket exchange and resale, urban traffic, customer service, and tax evasion analysis.

Portfolio

Globant
Apache Airflow, AWS Lambda, Redshift, Snowflake, Amazon RDS...
SKG Tecnologia
Google Cloud Platform (GCP), Python, Apache Kafka, Pandas, PostgreSQL, Flask...
Alianza CAOBA
Apache Spark, Python, AWS Glue, AWS Lambda, Cloudera, MySQL, Docker...

Experience

Availability

Part-time

Preferred Environment

Apache Airflow, Apache Spark, Snowflake, BigQuery, Amazon RDS, MySQL, Python, Redshift, Terraform, AWS Glue

The most amazing...

...product I've built is an automation pipeline for migrating terabytes of data from a MongoDB database to a data warehouse using Apache Airflow and AWS.

Work Experience

Senior Data Engineer

2020 - PRESENT
Globant
  • Executed job automation and scheduling with Airflow to support monthly accounting reviews.
  • Developed ETL pipelines for ingesting external data sources, such as MongoDB, into Snowflake using Apache Airflow and AWS.
  • Planned and executed transactional database migrations using AWS Database Migration Service (DMS).
  • Implemented AWS CloudWatch, Datadog monitors, and OpsGenie integration on critical database metrics like CPU and memory consumption and replication lag.
  • Managed the Amazon Relational Database Service infrastructure and related resources, such as Amazon Virtual Private Cloud security groups and parameter groups in source control with Terraform.
  • Improved SQL query performance in Amazon Aurora MySQL, reducing execution times and cloud infrastructure costs by introducing indexes and partitions on key tables.
Technologies: Apache Airflow, AWS Lambda, Redshift, Snowflake, Amazon RDS, Amazon Elastic Container Service (Amazon ECS), Terraform, MySQL, Docker, Amazon CloudWatch, Amazon S3 (AWS S3), Data Engineering, Amazon Web Services (AWS), Boto, MariaDB, High Availability Disaster Recovery (HADR), Database Optimization, AWS HA, AWS Database Migration Service (DMS), Databases, Database Migration, Data Pipelines, Relational Databases, GitHub, Data, Reporting, Data Transformation, CRM APIs, CSV, Python Boolean, Boolean Search, ETL Tools, Data Migration, Data Management

Data Engineer

2020 - 2020
SKG Tecnologia
  • Implemented streaming ingestion pipelines for urban traffic mobility data using Apache Kafka and Python.
  • Designed traffic analytics applications based on BigQuery.
  • Provisioned the SQL database cloud infrastructure with PostgreSQL and managed it using Google Cloud Platform.
  • Designed dashboards and processed data related to urban mobility traffic speed analysis.
Technologies: Google Cloud Platform (GCP), Python, Apache Kafka, Pandas, PostgreSQL, Flask, Docker, Data Engineering, JSON, Databases, Relational Databases, GitHub, Data, Data Transformation, Data Visualization, Microsoft Power BI, Data Analytics, Excel 365, Reports, CSV, Python Boolean, Boolean Search, Visualization, ETL Tools, Data Management, Spark

Data Technical Lead

2018 - 2020
Alianza CAOBA
  • Designed and developed a big data lab environment using VirtualBox, Apache Hadoop, Apache Ambari, and Cloudera, reducing feature development and deployment time.
  • Developed a tool for anonymizing sensitive customer information on big data sets using Apache Spark and Apache Hive.
  • Devised and developed health analytics applications using Amazon Elastic Compute Cloud (Amazon EC2), Amazon S3, and Amazon Athena.
  • Provisioned the SQL database virtual infrastructure with PostgreSQL and managed the database administration. Designed an entity-relationship model oriented to customer service and retail use cases and provisioned database access and user grants.
  • Designed and developed Microsoft Power BI dashboards for analyzing data produced by natural language processing (NLP) machine learning models.
Technologies: Apache Spark, Python, AWS Glue, AWS Lambda, Cloudera, MySQL, Docker, Apache Airflow, Data Engineering, JSON, ETL, Amazon Web Services (AWS), Databases, Data Visualization, GitHub, Data, NumPy, Reporting, Data Transformation, Microsoft Power BI, Data Analytics, Excel 365, Office 365, CSV File Processing, Data Analysis, Elasticsearch, JavaScript, Node.js, Reports, CSV, Python Boolean, Boolean Search, Visualization, ETL Tools, Tableau, Amazon Athena, Amazon Neptune, Data Management, Azure, Spark

Big Data Developer

2016 - 2017
Alianza CAOBA
  • Created an automation pipeline to calculate the expected tax amount for construction projects in Bogotá, Colombia using pandas.
  • Developed an automation pipeline for cleaning and processing urban traffic mobility to be available and usable by an interactive dashboard using Apache Spark.
  • Managed the big data infrastructure, providing new services such as MongoDB, Apache Spark, Apache Hive, and Hadoop Distributed File System (HDFS).
  • Designed and developed Microsoft Power BI dashboards for analyzing urban transportation and mobility data.
Technologies: Apache Spark, Apache Hive, HDFS, Python, MongoDB, APIs, REST, JSON, ETL, OCR, Databases, Data Pipelines, Relational Databases, GitHub, Data, NumPy, Reporting, Data Transformation, Excel 365, Office 365, CSV File Processing, Data Analysis, Data Visualization, Microsoft Power BI, Data Analytics, Reports, CSV, Python Boolean, Boolean Search, Visualization, ETL Tools, Tableau, Spark

Adaptable Daily Living Activity Identification from Sensor Data Streams

https://www.sciencedirect.com/science/article/pii/S1877050918304551
This project proposes an adaptable daily living (ADL) discovering system which considers factors such as personal behavior changes and respect for privacy.

The proposed system is tested and validated under a dataset from a real user. The results show that it can operate adequately in a real scenario with the respective constraints.

The main contribution of this project is a system for ADL detection that can adapt to user behavior changes without retraining the model, considering sensor failures, and preserving user privacy.

ADACOP: A Big Data Platform for Open Government Data

ADACOP is an open government big data tool for monitoring and exploiting the potential of data from open government data portals. The solution automatically generates descriptive statistics about open government data and verifies open government data quality by contrasting different versions of the data over time.

Low-cost and low-precision 2D tracking system for virtual reality and augmented reality applications

https://ceur-ws.org/Vol-1957/CoSeCiVi17_paper_9.pdf
This work presents a low-cost 2D position tracker. The low-cost feature applies to both the hardware and software components. Additionally, this work includes user testing employing a prototype game application. The validation process proves that the proposed tracker achieves an acceptable performance in context with low precision requirements.

Languages

Snowflake, Python, SQL, Java, JavaScript

Libraries/APIs

Pandas, NumPy, OpenCV, Node.js

Tools

Apache Airflow, GitHub, Microsoft Power BI, Tableau, BigQuery, Terraform, AWS Glue, Cloudera, Amazon Elastic Container Service (Amazon ECS), Amazon CloudWatch, Weka, Boto, Amazon Athena

Paradigms

ETL, Business Intelligence (BI), REST

Platforms

Amazon Web Services (AWS), AWS Lambda, Google Cloud Platform (GCP), Apache Kafka, Docker, Azure

Storage

MySQL, Databases, Data Pipelines, Relational Databases, MongoDB, JSON, MariaDB, Database Migration, Redshift, Apache Hive, HDFS, PostgreSQL, Amazon S3 (AWS S3), Data Lakes, Elasticsearch

Other

Amazon RDS, Data Engineering, Data, CSV File Processing, Data Analysis, CSV, Python Boolean, Boolean Search, ETL Tools, Data Migration, Data Management, Database Optimization, AWS Database Migration Service (DMS), Data Visualization, Reporting, Data Transformation, Data Analytics, Office 365, Reports, Visualization, Streaming Data, APIs, Big Data, OCR, High Availability Disaster Recovery (HADR), CRM APIs, Excel 365, Amazon Neptune, AWS Certified Solution Architect

Frameworks

Apache Spark, Spark, Hadoop, Flask, AWS HA

2016 - 2018

Master's Degree in Computer Engineering

University of The Andes - Bogotá, Colombia

2012 - 2016

Bachelor's Degree in Computer Engineering

University of The Andes - Bogotá, Colombia

DECEMBER 2022 - DECEMBER 2025

AWS Certified Cloud Practitioner

Amazon Web Services

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring