
Alejandro Olivares Benítez
Verified Expert in Engineering
Data Engineer and Developer
Zaragoza, Spain
Toptal member since July 12, 2021
Alex is a data engineer and developer with over eight years of experience coding ETL/ELTs, designing and implementing data warehouses with various SQL and NoSQL databases, and developing big data processes (batch and streaming) using Spark. Alex combines robust technical skills with a rigorous ability to comprehend the client's business area and goals. Alex has recently been focused on cloud computing, AWS, and GCP services.
Portfolio
Experience
- Data Warehousing - 8 years
- Python - 8 years
- ETL - 8 years
- Apache Airflow - 8 years
- Spark - 6 years
- Amazon Web Services (AWS) - 6 years
- Scrum - 5 years
- Google Cloud Platform (GCP) - 1 year
Availability
Preferred Environment
MacOS, PyCharm, Visual Studio Code (VS Code), DataGrip, Git
The most amazing...
...thing I've developed was a near real-time ETL project on the cloud that provides current reliable data to an airline.
Work Experience
Data Engineer
SumerSports, LLC - Main
- Developed and maintained data pipelines using Databricks and Airflow, ensuring efficient data processing and integration across multiple sources.
- Implemented and optimized ETL workflows in Python, handling large-scale football analytics data while maintaining data quality and consistency.
- Integrated data from various sources, including internal NFL data and third-party providers, ensuring accurate and timely data availability for stakeholders.
- Contributed to cloud-based architectures, leveraging AWS services for data storage, processing, and orchestration.
- Enhanced developer experience by improving tooling, automating workflows, and reducing operational overhead for data pipeline management (https://sumersports.com).
- Developed and deployed web scraping solutions to extract and process NCAA draft data from various sources, enhancing data completeness and integration for football analytics.
Data Engineer
Chartboost
- Orchestrated the design and implementation of a scalable data warehouse solution on BigQuery, GCS, and BigTable, optimizing data storage and retrieval for enhanced analytics performance.
- Developed and optimized ETL pipelines using Apache Spark (PySpark), ensuring efficient data extraction, transformation, and loading processes, improving data accuracy and timeliness.
- Engineered ETL workflows using Apache Beam (Java) with Protobuf, facilitating seamless data processing and integration across diverse data sources and enhancing data consistency and reliability.
- Implemented real-time ETL processes leveraging Apache Beam (Java) to ingest and process data from Kafka streams, enabling immediate insights into business operations and trends.
- Leveraged a suite of Google Cloud Platform (GCP) technologies, including Dataflow, Google Cloud Storage, BigQuery, Bigtable, Dataproc, and Composer, to build robust and scalable data solutions, ensuring high availability and reliability.
- Led the creation and maintenance of infrastructure as code using Terraform, streamlining environment provisioning and ensuring consistency across deployments, enhancing operational efficiency.
- Established continuous deployment/continuous integration (CD/CI) pipelines using GitHub Actions/Jenkins and orchestrated data workflows with Airflow, automating deployment processes and ensuring pipeline reliability and efficiency.
- Worked with analytics teams in adtech to gather requirements, model data in Looker, and build dashboards tracking installs, impressions, and CTRs—ensuring accurate insights and seamless collaboration between data engineering and business teams.
- Collaborated closely with the data science team, gathering requirements, providing data support, and assisting in the execution and optimization of their pipelines, fostering a seamless integration of data engineering and data science efforts.
Data Engineer | Tech Lead
Adidas Netherlands
- Designed and implemented a scalable data warehouse solution leveraging Aurora, S3, and Redshift, improving query performance and reducing storage costs.
- Migrated the data warehouse infrastructure to a Databricks lakehouse architecture, enhancing data accessibility and reducing data latency.
- Developed ETL pipelines using PySpark, enabling efficient data transformation and processing. Utilized AWS Glue for seamless integration, leading to decreased time-to-insight for business analytics.
- Implemented real-time ETL processes utilizing Spark Streaming with Kafka integration, enhancing data freshness and enabling near-instantaneous decision-making for critical business operations. Utilized AWS Glue Streaming.
- Leveraged a suite of AWS technologies, including EMR, S3, Aurora, EC2, CodeCommit, and Athena, to build robust data solutions while maintaining high availability and scalability.
- Created and maintained infrastructure as code using Terraform, streamlining environment provisioning and ensuring consistency across deployments.
- Established continuous deployment/continuous integration (CD/CI) pipelines with Jenkins and orchestrated data workflows with Airflow, reducing manual intervention and enhancing overall system reliability and efficiency.
Data Engineer
Vueling Airlines
- Implemented and optimized AWS Glue to develop a robust near real-time ETL pipeline powered by Spark Streaming, facilitating seamless data integration and analysis.
- Utilized Jira for streamlined task management and Confluence for comprehensive documentation, facilitating efficient collaboration and knowledge sharing within the team.
- Designed and implemented a DWH for live data on Aurora and Redshift for historical data. Architected and deployed a robust data warehouse solution on Aurora and Redshift, both live and historical data, ensuring data accessibility and performance.
- Spearheaded the development of a data upload process from Aurora to S3 in PKL format, addressing the need for real-time data availability in the warehouse and enhancing data processing efficiency.
- Acted as a primary liaison between stakeholders and development teams, gathering requirements directly from clients and providing regular progress updates, ensuring alignment with project objectives.
- Implemented PySpark ETL workflows on EMR infrastructure for a large-scale big data project, optimizing data processing efficiency and scalability.
- Integrated Elasticsearch and CloudWatch for comprehensive project logging and error management, ensuring effective monitoring and control over project activities and system health.
- Developed and deployed a RESTful API using Flask to serve data to a web tool, enabling real-time data visualization and interaction. Ensured efficient data retrieval, security, and scalability.
Data Engineer
CaixaBank
- Designed and developed 360 data warehouses (for various users and companies) using different databases such as Salesforce, SQL Server, and Oracle.
- Implemented and designed the ETL and data quality processes with Python for populating and updating a data warehouse.
- Directly interacted with the client as well as pitched new projects and improved current ones.
- Managed tasks with Jira and documentation with Confluence.
Business Consultant
Lobe Constructions
- Designed and developed the ETL process (extract, transform, load) with Python and Pandas.
- Implemented and built a data warehouse on SQL Server.
- Gathered requirements directly from the client, managed tasks with Jira, and documented with Confluence.
- Built and designed the ETL process with Informatica PowerCenter.
Experience
Dynamic Allocation Tool
Scalable Data Pipeline with AWS Lambda & Step Functions
The solution included:
• A Lambda function to split large files into smaller chunks for parallel processing.
• Multiple Lambda functions, each handling specific processing logic based on file content.
• Step Functions orchestrating the execution flow, ensuring fault tolerance and efficient retries.
• Seamless integration with S3 for file storage and RDS (PostgreSQL) for structured data persistence
High-throughput Streaming Data Pipeline with Apache Beam
The solution involved:
• Apache Beam (Dataflow) for distributed stream processing, ensuring low-latency event handling.
• Protobuf for efficient message parsing, reducing serialization overhead.
• Google Cloud Storage (GCS) as a staging area for reprocessing when needed.
• BigQuery as the final destination for analytics and reporting.
• Terraform for infrastructure as code (IaC), automating deployment and scaling.
• Performance optimizations to handle high-throughput event ingestion while ensuring cost efficiency and fault tolerance.
Education
Master's Degree in Big Data and Visual Analytics
International University of La Rioja - Logroño, Spain
Engineer's Degree in Telecommunications
University of Zaragoza - Zaragoza, Spain
Certifications
Databricks Lakehouse Fundamentals
Databricks
AWS Certified Solutions Architect Associate
AWS
Skills
Libraries/APIs
PySpark, Pandas, Spark Streaming, Protobuf, Jenkins Pipeline, Flask API
Tools
DataGrip, Jira, Confluence, Apache Airflow, Git, PyCharm, Jenkins, Amazon Elastic Container Service (ECS), Google Cloud Dataproc, Cloud Dataflow, Informatica PowerCenter, Amazon Elastic MapReduce (EMR), Terraform, Amazon Athena, Apache Beam, Amazon CloudWatch, AWS Glue, Apache Maven, AWS Step Functions, BigQuery, Looker
Languages
Python, SQL, Java
Frameworks
Spark, Flask, Data Lakehouse
Paradigms
Scrum, ETL
Platforms
Amazon Web Services (AWS), MacOS, Oracle, Docker, Google Cloud Platform (GCP), AWS Lambda, Visual Studio Code (VS Code), Salesforce, Apache Kafka, Amazon EC2, Databricks, Kubernetes, AWS IoT
Storage
SQL Server 2014, Redshift, Amazon Aurora, Amazon S3 (AWS S3), NoSQL, Amazon DynamoDB, Exasol, Elasticsearch, Google Cloud Storage, Google Bigtable, API Databases
Other
AWS Cloud Architecture, Data, Data Warehousing, Analysis, Complex Problem Solving, Data Engineering, ELT, Programming, Cloud, Google BigQuery, Streaming, Big Data, Big Data Architecture, GitHub Actions, Security, Medallion Architecture, Data Science, Data Reporting, Data Analytics, Google Cloud Dataflow, Web Scraping, APIs, Advertising Technology (Adtech), Looker Studio
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring