Joaquim Ventura, Developer in Rotterdam, Netherlands
Joaquim is available for hire
Hire Joaquim

Joaquim Ventura

Verified Expert  in Engineering

Software Developer

Location
Rotterdam, Netherlands
Toptal Member Since
July 27, 2020

Joaquim has experience working as a data engineer for over six years for industry leaders such as Nielsen and Royal Haskoning, on the Azure and AWS clouds. He has a pragmatic approach towards software development and thinks testing is of the utmost importance, with a preference toward integration tests and assuring overall behavior, rather than individual unit tests. He does not believe in testing for the sake of testing.

Portfolio

RHDHV
Amazon Web Services (AWS), Azure, Building Information Modeling (BIM), Docker...
Pointlogic (Nielsen)
Amazon Web Services (AWS), PySpark, Python
Coolblue
Amazon Web Services (AWS), SQL Server Analysis Services (SSAS)...

Experience

Availability

Part-time

Preferred Environment

Bitbucket, Slack, PyCharm, Linux

The most amazing...

...thing I've ever built was an API hosting architectural parametric models, where architects and engineers could define and download complete structures.

Work Experience

Data Engineer

2018 - PRESENT
RHDHV
  • Developed and deployed a machine learning pipeline for wastewater treatment plants in Azure, with a fully integrated CI/CD pipeline, using data from external storage.
  • Created a live database of AIS ship tracking data, collecting over 3GB of data daily, both from online community networks and private receivers, with both real time and large aggregations for reporting.
  • Developed the software that allows inexpensive computing platforms (Raspberry Pi) to collect AIS data from remote locations, capable of Over The Air updates.
  • Developed a pure Python implementation of the OpenBIM IFC2x3 schema, allowing architects and civil engineers to read, write and author IFC STEP files with building models.
  • Developed and deployed a machine learning pipeline for shipping container modality prediction (road, train, and barge) for container terminals, with a fully integrated CI/CD pipeline. Used data from a messaging system.
  • Designed and partially implemented the information platform for a large port community system with real time messaging, machine learning, analytics, and cost-based billing capability.
  • Delivered a self-updating database of environmental and landscape protection areas in the UK, collecting shapefile to a GeoJSON geographical data from multiple government sources and exposing it via a unified API to the impact assessment tool ENSIS.
Technologies: Amazon Web Services (AWS), Azure, Building Information Modeling (BIM), Docker, Python

Data Enginner

2015 - 2017
Pointlogic (Nielsen)
  • Migrated the legacy behavioral data processing from SQLServer to PySpark (AWS EMR), re-implemented the stored procedure business logic in Python, reduced costs, and sped up development and data processing activities.
  • Introduced company-wide job scheduling and orchestration using Azkaban, created visibility on data processing tasks and their status, and made the delivery of data a more repeatable and reliable process.
  • Expanded and coached the data engineering team, leading to greater transparency in the development process and much shorter lead times.
Technologies: Amazon Web Services (AWS), PySpark, Python

Data Engineer

2013 - 2014
Coolblue
  • Implemented and maintained the automated processing of SSAS cubes, including dynamic partitioning. Ensured cubes were up to date every morning.
  • Designed and managed the implementation of a CI/CD pipeline for the company-wide data warehouse, introducing testing in SSAS. Ensured the new code was properly tested before deployment and developer access to core systems remained minimal.
  • Migrated data processing and orchestration from SSIS to Python and Azkaban, creating greater visibility and enabling proper versioning and source control.
  • Delivered a full-stack local BI development environment, using Vagrant, allowing individual developers to test code locally, removing the need for each developer to configure a work station from scratch.
  • Optimized large SQLServer stored procedures, reaching up to 10x faster performance and lower server loads, especially on overnight processes.
Technologies: Amazon Web Services (AWS), SQL Server Analysis Services (SSAS), SQL Server Integration Services (SSIS), Python

Business Analyst

2012 - 2013
CA Seguros
  • Developed the data mart supporting compliance with Solvency II requirements.
  • Developed the data mart supporting the future policy and client portfolio predictive modeling by the actuarial team.
  • Automated monthly and quarterly regulatory reporting.
  • Delivered a RoamBI mobile reporting PoC, enabling mobile users to access daily KPI reports.
Technologies: Microsoft SQL Server

NIelsen Media Impact

https://www.nielsen.com/us/en/solutions/capabilities/media-impact/
A Spark-based application to generate insights into audience behavior in multiple regions and across multiple media. I was the data engineer responsible for preparing the databases that were served in the tool, this involved the processing of various text file sources from up to 100,000 panelists with TV behavior down to minute level resolution and rich demographic segmentation. The final artifacts were parquet files that could be consumed by the back-end application.

Aquasuite ML

https://aquasuite.ai/en/
Aquasuite ML provides the machine learning insights for the greater Aquasuite application. As a data engineer, I worked together with the data scientists in order to make the machine learning code production-grade and implemented the CI/CD pipelines responsible for model training and deployment. I also collaborated with the Aquasuite back-end developers on the data pipeline that received fresh data from plant systems and pushed new predictions back.
2010 - 2011

Postgraduate Degree in Business Intelligence and Information Management

ISEG - Lisbon School of Economics & Management - Lisbon, Portugal

1998 - 2004

Master's Degree in Pharmaceutical Sciences

University of Lisbon - Lisbon, Portugal

MAY 2014 - PRESENT

Implementing a Data Warehouse with Microsoft SQL Server 2012/2014

Microsoft

Libraries/APIs

PySpark

Tools

Amazon Elastic MapReduce (EMR), PyCharm, Slack, Bitbucket

Storage

Databases, SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), Microsoft SQL Server

Platforms

Docker, Amazon Web Services (AWS), Azure, Linux

Languages

Python 3, Go, Python

Paradigms

Business Intelligence (BI), Building Information Modeling (BIM)

Other

Programming, Statistics, Machine Learning, Knowledge Management

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring