Alejandro Olivares Benítez, Developer in Zaragoza, Spain
Alejandro is available for hire
Hire Alejandro

Alejandro Olivares Benítez

Verified Expert  in Engineering

Data Engineer and Developer

Zaragoza, Spain

Toptal member since July 12, 2021

Bio

Alex is a data engineer and developer with over eight years of experience coding ETL/ELTs, designing and implementing data warehouses with various SQL and NoSQL databases, and developing big data processes (batch and streaming) using Spark. Alex combines robust technical skills with a rigorous ability to comprehend the client's business area and goals. Alex has recently been focused on cloud computing, AWS, and GCP services.

Portfolio

SumerSports, LLC - Main
Python, SQL, Data Engineering, PySpark, Databricks, Kubernetes, Apache Airflow...
Chartboost
Google BigQuery, Google Cloud Storage, Google Bigtable, Spark, Apache Beam...
Adidas Netherlands
Spark, Spark Streaming, Amazon Web Services (AWS)...

Experience

  • Data Warehousing - 8 years
  • Python - 8 years
  • ETL - 8 years
  • Apache Airflow - 8 years
  • Spark - 6 years
  • Amazon Web Services (AWS) - 6 years
  • Scrum - 5 years
  • Google Cloud Platform (GCP) - 1 year

Availability

Part-time

Preferred Environment

MacOS, PyCharm, Visual Studio Code (VS Code), DataGrip, Git

The most amazing...

...thing I've developed was a near real-time ETL project on the cloud that provides current reliable data to an airline.

Work Experience

Data Engineer

2024 - 2025
SumerSports, LLC - Main
  • Developed and maintained data pipelines using Databricks and Airflow, ensuring efficient data processing and integration across multiple sources.
  • Implemented and optimized ETL workflows in Python, handling large-scale football analytics data while maintaining data quality and consistency.
  • Integrated data from various sources, including internal NFL data and third-party providers, ensuring accurate and timely data availability for stakeholders.
  • Contributed to cloud-based architectures, leveraging AWS services for data storage, processing, and orchestration.
  • Enhanced developer experience by improving tooling, automating workflows, and reducing operational overhead for data pipeline management (https://sumersports.com).
  • Developed and deployed web scraping solutions to extract and process NCAA draft data from various sources, enhancing data completeness and integration for football analytics.
Technologies: Python, SQL, Data Engineering, PySpark, Databricks, Kubernetes, Apache Airflow, AWS IoT, Medallion Architecture, Web Scraping

Data Engineer

2023 - 2024
Chartboost
  • Orchestrated the design and implementation of a scalable data warehouse solution on BigQuery, GCS, and BigTable, optimizing data storage and retrieval for enhanced analytics performance.
  • Developed and optimized ETL pipelines using Apache Spark (PySpark), ensuring efficient data extraction, transformation, and loading processes, improving data accuracy and timeliness.
  • Engineered ETL workflows using Apache Beam (Java) with Protobuf, facilitating seamless data processing and integration across diverse data sources and enhancing data consistency and reliability.
  • Implemented real-time ETL processes leveraging Apache Beam (Java) to ingest and process data from Kafka streams, enabling immediate insights into business operations and trends.
  • Leveraged a suite of Google Cloud Platform (GCP) technologies, including Dataflow, Google Cloud Storage, BigQuery, Bigtable, Dataproc, and Composer, to build robust and scalable data solutions, ensuring high availability and reliability.
  • Led the creation and maintenance of infrastructure as code using Terraform, streamlining environment provisioning and ensuring consistency across deployments, enhancing operational efficiency.
  • Established continuous deployment/continuous integration (CD/CI) pipelines using GitHub Actions/Jenkins and orchestrated data workflows with Airflow, automating deployment processes and ensuring pipeline reliability and efficiency.
  • Worked with analytics teams in adtech to gather requirements, model data in Looker, and build dashboards tracking installs, impressions, and CTRs—ensuring accurate insights and seamless collaboration between data engineering and business teams.
  • Collaborated closely with the data science team, gathering requirements, providing data support, and assisting in the execution and optimization of their pipelines, fostering a seamless integration of data engineering and data science efforts.
Technologies: Google BigQuery, Google Cloud Storage, Google Bigtable, Spark, Apache Beam, Protobuf, Apache Kafka, Google Cloud Platform (GCP), Terraform, Apache Airflow, Jenkins, GitHub Actions, Jenkins Pipeline, ETL, ELT, Streaming, Java, Apache Maven, Advertising Technology (Adtech), Looker, Looker Studio

Data Engineer | Tech Lead

2022 - 2023
Adidas Netherlands
  • Designed and implemented a scalable data warehouse solution leveraging Aurora, S3, and Redshift, improving query performance and reducing storage costs.
  • Migrated the data warehouse infrastructure to a Databricks lakehouse architecture, enhancing data accessibility and reducing data latency.
  • Developed ETL pipelines using PySpark, enabling efficient data transformation and processing. Utilized AWS Glue for seamless integration, leading to decreased time-to-insight for business analytics.
  • Implemented real-time ETL processes utilizing Spark Streaming with Kafka integration, enhancing data freshness and enabling near-instantaneous decision-making for critical business operations. Utilized AWS Glue Streaming.
  • Leveraged a suite of AWS technologies, including EMR, S3, Aurora, EC2, CodeCommit, and Athena, to build robust data solutions while maintaining high availability and scalability.
  • Created and maintained infrastructure as code using Terraform, streamlining environment provisioning and ensuring consistency across deployments.
  • Established continuous deployment/continuous integration (CD/CI) pipelines with Jenkins and orchestrated data workflows with Airflow, reducing manual intervention and enhancing overall system reliability and efficiency.
Technologies: Spark, Spark Streaming, Amazon Web Services (AWS), Amazon Elastic MapReduce (EMR), Amazon S3 (AWS S3), Amazon Aurora, Apache Kafka, Amazon EC2, Terraform, Amazon Athena, Amazon DynamoDB, Jenkins, Databricks, Exasol, SQL, AWS Glue, Redshift

Data Engineer

2020 - 2022
Vueling Airlines
  • Implemented and optimized AWS Glue to develop a robust near real-time ETL pipeline powered by Spark Streaming, facilitating seamless data integration and analysis.
  • Utilized Jira for streamlined task management and Confluence for comprehensive documentation, facilitating efficient collaboration and knowledge sharing within the team.
  • Designed and implemented a DWH for live data on Aurora and Redshift for historical data. Architected and deployed a robust data warehouse solution on Aurora and Redshift, both live and historical data, ensuring data accessibility and performance.
  • Spearheaded the development of a data upload process from Aurora to S3 in PKL format, addressing the need for real-time data availability in the warehouse and enhancing data processing efficiency.
  • Acted as a primary liaison between stakeholders and development teams, gathering requirements directly from clients and providing regular progress updates, ensuring alignment with project objectives.
  • Implemented PySpark ETL workflows on EMR infrastructure for a large-scale big data project, optimizing data processing efficiency and scalability.
  • Integrated Elasticsearch and CloudWatch for comprehensive project logging and error management, ensuring effective monitoring and control over project activities and system health.
  • Developed and deployed a RESTful API using Flask to serve data to a web tool, enabling real-time data visualization and interaction. Ensured efficient data retrieval, security, and scalability.
Technologies: Python, Apache Airflow, Cloud, ETL, Amazon Web Services (AWS), SQL Server 2014, Amazon S3 (AWS S3), Amazon Aurora, Oracle, Data Warehousing, Redshift, Amazon Elastic Container Service (ECS), Docker, Git, Confluence, Jira, Flask, PySpark, Data Engineering, NoSQL, ELT, Elasticsearch, Amazon CloudWatch, APIs, Flask API

Data Engineer

2018 - 2020
CaixaBank
  • Designed and developed 360 data warehouses (for various users and companies) using different databases such as Salesforce, SQL Server, and Oracle.
  • Implemented and designed the ETL and data quality processes with Python for populating and updating a data warehouse.
  • Directly interacted with the client as well as pitched new projects and improved current ones.
  • Managed tasks with Jira and documentation with Confluence.
Technologies: Python, ETL, Salesforce, SQL Server 2014, Amazon Web Services (AWS), Oracle, Confluence, Jira, Amazon Elastic Container Service (ECS), Data Engineering

Business Consultant

2016 - 2018
Lobe Constructions
  • Designed and developed the ETL process (extract, transform, load) with Python and Pandas.
  • Implemented and built a data warehouse on SQL Server.
  • Gathered requirements directly from the client, managed tasks with Jira, and documented with Confluence.
  • Built and designed the ETL process with Informatica PowerCenter.
Technologies: Python, Analysis, Data Warehousing, ETL, Git, SQL Server 2014, Jira, Confluence, Informatica PowerCenter

Experience

Dynamic Allocation Tool

A near real-time ETL to provide current reliable data to an application that helps to operate a flight network. It relied on an Aurora data warehouse for current data and a Redshift data warehouse for historical data. The ETL was built in Python and the process runs through Airflow which executes ECS tasks.

Scalable Data Pipeline with AWS Lambda & Step Functions

Designed and implemented a scalable data pipeline using AWS Lambda and Step Functions to efficiently process and load medium-sized files into a PostgreSQL database.

The solution included:
• A Lambda function to split large files into smaller chunks for parallel processing.
• Multiple Lambda functions, each handling specific processing logic based on file content.
• Step Functions orchestrating the execution flow, ensuring fault tolerance and efficient retries.
• Seamless integration with S3 for file storage and RDS (PostgreSQL) for structured data persistence

High-throughput Streaming Data Pipeline with Apache Beam

Developed a real-time streaming data pipeline using Apache Beam (Java) to process terabytes of adtech events per hour efficiently.

The solution involved:
• Apache Beam (Dataflow) for distributed stream processing, ensuring low-latency event handling.
• Protobuf for efficient message parsing, reducing serialization overhead.
• Google Cloud Storage (GCS) as a staging area for reprocessing when needed.
• BigQuery as the final destination for analytics and reporting.
• Terraform for infrastructure as code (IaC), automating deployment and scaling.
• Performance optimizations to handle high-throughput event ingestion while ensuring cost efficiency and fault tolerance.

Education

2018 - 2019

Master's Degree in Big Data and Visual Analytics

International University of La Rioja - Logroño, Spain

2012 - 2016

Engineer's Degree in Telecommunications

University of Zaragoza - Zaragoza, Spain

Certifications

NOVEMBER 2022 - PRESENT

Databricks Lakehouse Fundamentals

Databricks

MARCH 2021 - MARCH 2024

AWS Certified Solutions Architect Associate

AWS

Skills

Libraries/APIs

PySpark, Pandas, Spark Streaming, Protobuf, Jenkins Pipeline, Flask API

Tools

DataGrip, Jira, Confluence, Apache Airflow, Git, PyCharm, Jenkins, Amazon Elastic Container Service (ECS), Google Cloud Dataproc, Cloud Dataflow, Informatica PowerCenter, Amazon Elastic MapReduce (EMR), Terraform, Amazon Athena, Apache Beam, Amazon CloudWatch, AWS Glue, Apache Maven, AWS Step Functions, BigQuery, Looker

Languages

Python, SQL, Java

Frameworks

Spark, Flask, Data Lakehouse

Paradigms

Scrum, ETL

Platforms

Amazon Web Services (AWS), MacOS, Oracle, Docker, Google Cloud Platform (GCP), AWS Lambda, Visual Studio Code (VS Code), Salesforce, Apache Kafka, Amazon EC2, Databricks, Kubernetes, AWS IoT

Storage

SQL Server 2014, Redshift, Amazon Aurora, Amazon S3 (AWS S3), NoSQL, Amazon DynamoDB, Exasol, Elasticsearch, Google Cloud Storage, Google Bigtable, API Databases

Other

AWS Cloud Architecture, Data, Data Warehousing, Analysis, Complex Problem Solving, Data Engineering, ELT, Programming, Cloud, Google BigQuery, Streaming, Big Data, Big Data Architecture, GitHub Actions, Security, Medallion Architecture, Data Science, Data Reporting, Data Analytics, Google Cloud Dataflow, Web Scraping, APIs, Advertising Technology (Adtech), Looker Studio

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring