Carlos Guerreiro, Developer in Tuusula, Finland
Carlos is available for hire
Hire Carlos

Carlos Guerreiro

Verified Expert  in Engineering

Machine Learning Developer

Tuusula, Finland

Toptal member since April 23, 2013

Bio

Carlos is an exceptional data generalist who brings vast experience in the design, implementation, and validation of data-intensive systems to all of his projects, along with deep expertise in machine learning and real-time stream processing. He has worked in the eCommerce and media industries, working for large corporations and startups. Carlos is a versatile engineer and looks forward to his next challenge.

Portfolio

Freelance Clients
Redis, C++, Node.js, JavaScript, R, Python, Machine Learning, Amazon Kinesis...
MarkaVIP
Oracle, MySQL, Redshift, Amazon Kinesis, C++, Java, R, Python, Optimization...
Codento
Ruby on Rails (RoR), Java, CoffeeScript, Node.js, JavaScript, Python, Scala...

Experience

  • C++ - 20 years
  • Python - 13 years
  • Machine Learning - 10 years
  • NumPy - 8 years
  • Pandas - 7 years
  • Scala - 5 years
  • Apache Spark - 5 years
  • TensorFlow - 3 years

Availability

Full-time

Preferred Environment

Apache Spark, Amazon Web Services (AWS), Python, Scala, Machine Learning, TensorFlow, PyMC, Apache Kafka, Torch, Natural Language Processing (NLP)

The most amazing...

...thing I've built is a low latency custom recommendation system for an eCommerce startup doing flash sales.

Work Experience

Data Scientist | Engineer

2010 - PRESENT
Freelance Clients
  • Built NLP and MV models and algorithms for a recruitment startup that extracted and normalized structured information from resumes and deployed them in FastAPI microservices. Designed and implemented custom annotation UIs for structured data.
  • Developed an end-to-end Python NLP pipeline for a cybersecurity firm to scrape, transform, and extract information from discussion forums, fine-tuning BERT with Hugging Face and Langchain on AWS, deployed as FastAPI microservices.
  • Built an end-to-end Python NLP/RAG pipeline for a fintech startup to aid compliance officers, covering text extraction, user interaction, and Llama2 fine-tuning. Integrated Hugging Face and Langchain on GCP, deployed as FastAPI microservices.
  • Designed MLOps practices and built support infrastructure for a wearables startup to use AWS resources optimally in collaboration with the data infrastructure and ML teams. Streamlined the operation of the ML pipeline to fit.
  • Developed and built various machine learning models for a retail bank for risk assessment, customer churn, and rate optimization using Python, Pandas, NumPy, SciPy, PyMC, TensorFlow, and scikit-learn. Deployed as FastAPI microservices.
  • Designed and built a custom system for a retail bank for products offering recommendations from transactions and demographic data using Python, pandas, NumPy, and C++. It was deployed as a FastAPI micro-service.
  • Conceived and built a continuous analytics backbone and a data warehouse in a hybrid onsite/AWS environment with Kafka, Scala, Python, and Redshift for a retail bank.
  • Designed and built a real-time data fusion pipeline for a retail bank to create a complete picture of customer transactions from different systems.
  • Designed and implemented a low-latency custom recommender system for an eCommerce startup for flash sales using Python and C++. Deployed as a custom C++/Boost.Asio back end.
  • Ported to Scala and optimized Spark UDFs for an analytics startup, dealing with text and URL matching and information extraction. Wrote Python bindings. Refactored and optimized a complex Airflow delivery workflow.
Technologies: Redis, C++, Node.js, JavaScript, R, Python, Machine Learning, Amazon Kinesis, Amazon Elastic MapReduce (EMR), AWS IAM, AWS ELB, AWS CLI, Redshift, Amazon Redshift Spectrum, Amazon Athena, Spark SQL, Spark ML, Apache Spark, FastAPI, Flask, Apache Airflow, TensorFlow, Keras, PyTorch, Pandas, SciPy, NumPy, PyMC, GitHub, GitHub API, Docker, MLflow, Prometheus, Grafana, Apache Kafka, Confluence, Scala, Deep Learning, Elasticsearch, SQL, Delta Lake, PySpark, Bayesian Statistics, Statistics, PostgreSQL, C, D3.js, Optimization, Mixed-integer Linear Programming, PuLP, RocksDB, Recommendation Systems, Distributed Computing, Natural Language Processing (NLP), Jupyter Notebook, Hadoop, Git, Eigen, Scikit-learn, StatsModels, Data Science, Data Engineering, Theano, Seaborn, Matplotlib, ETL, Linux, Scripting, Data Extraction, Beautiful Soup, Command-line Interface (CLI), DevOps, Kubernetes, Data Architecture, Database Architecture, Architecture, Back-end, SaaS, GeoPandas, Shapely, Algorithms, Microservices, RESTful Microservices, REST APIs, Pytest, Amazon S3 (AWS S3), Boost.Asio, Google Cloud Platform (GCP), Hugging Face, Llama 2, LangChain, Beautiful Soup 4, Selenium, FAISS, Abstract Syntax Trees (AST), pylint, Unit Testing, Databricks, SQLAlchemy, Pydantic, ChatGPT, Go, Retool, Metaflow, GraphQL, RunPod, APIs, Data Scraping, Web Scraping, Web Development, TypeScript, Machine Learning Operations (MLOps), Cybersecurity, Large Language Models (LLMs), Docker Compose, Artificial Intelligence (AI), Python 3, AWS CloudFormation, Multithreading, Multiprocessing, React, Asyncio, Constraint Programming, Local Search, Rust, Google OR-Tools

Director of Data Science

2015 - 2016
MarkaVIP
  • Implemented real-time analytics on operations, modeled interventions on customer experience to address returns and cancellations, built a policy optimizer through retrospective simulation with historical data, and enabled it as a microservice.
  • Expanded the policy optimizer to improve order profitability by optimizing basket constraints and incentives.
  • Implemented various improvements to the product recommender, including the use of fine-grained recorded impressions as a negative signal and more flexibility in handling catalog metadata.
  • Built and deployed a foundational analytical backbone for the company in AWS with Kinesis, Redshift, and Spark.
  • Integrated continuous data ingestion from key systems into the analytical backbone, whenever practical, through low latency interfaces such as database replication.
  • Migrated some interaction tracking systems to the backbone and the recommender.
  • Conducted retrospective sourcing performance and pricing analysis by replaying row mutations continuously captured from database replication logs and stored in Redshift (Python/C++).
Technologies: Oracle, MySQL, Redshift, Amazon Kinesis, C++, Java, R, Python, Optimization, D3.js, Machine Learning, Statistics, Bayesian Statistics, Recommendation Systems, Apache Spark, Git, Data Science, Data Engineering, ETL, Linux, Microservices, RESTful Microservices, REST APIs, Amazon S3 (AWS S3), Pytest, pylint, Unit Testing, Python 3, Multithreading

Software and Data Architect

2011 - 2015
Codento
  • Built an image upload/pre-processing pipeline for a media startup using Node.js and MongoDB on AWS. Included single sign-on with a Ruby on Rails app on the back end.
  • Created custom, interactive data displays for a bespoke structured messaging application using D3.js. Implemented real-time updates.
  • Implemented a structured messaging application. Contributed to the Python/Django back end and the CoffeeScript front end.
  • Built a custom C# distributed data analysis pipeline to perform MATLAB jobs on AWS.
  • Designed and implemented a custom interactive data analysis and visualization for economic data along with a Python back end and D3.js visualization.
  • Assembled a custom nurse schedule and route optimization system for a healthcare software startup. Worked on pre-processing and mixed integer model formulation for Gurobi with Python/Pandas/NumPy/PuLP, D3.js visualization of solutions, and Flask API.
  • Modernized the system design and implementation of a Java/Spring back end for real-time transport logistics. Improved scalability and performance.
  • Designed and implemented a reference application for a high-security network architecture for a banking customer with Scala/Play, Slick, and 2-factor authentication.
  • Contributed to a large-scale online storage system implementation using Python and PostgreSQL. Contributed to embedded security appliances in C.
  • Developed a custom MATLAB system to tune a legacy application from data during black-box optimization (derivative-free).
Technologies: Ruby on Rails (RoR), Java, CoffeeScript, Node.js, JavaScript, Python, Scala, Slick, MongoDB, PostgreSQL, D3.js, AWS CLI, C, Gurobi, PuLP, Optimization, Mixed-integer Linear Programming, Front-end, CSS, HTML, Flask, Bottle.py, CVXOPT, Git, MATLAB, Data Science, Data Engineering, Tornado, Linux, C#, .NET, Data Architecture, Architecture, SaaS, Microservices, RESTful Microservices, REST APIs, Spring, Amazon S3 (AWS S3), Pytest, pylint, Abstract Syntax Trees (AST), Unit Testing, Web Development, Cybersecurity, Docker Compose, Python 3, AWS CloudFormation, Multiprocessing, Multithreading, Asyncio

Chief Software Architect

2009 - 2010
Nokia
  • Prototyped a voice- and gesture-based user interface for in-car mobile phone usage at various levels of fidelity ranging from Wizard of Oz to software proof-of-concept (Python, Java, Sphinx).
  • Defined software architecture for a family of in-car products, with input to hardware platform selection.
  • Planned costs, schedule, and execution of multiple new product development scenarios.
  • Organized and moderated usability studies for prototype validation and iteration.
  • Conducted rigorous feasibility studies and software architecture reviews at Gear.
Technologies: Java, Python, Software Architecture, Bluetooth, Planning, Usability, Usability Testing, Speech Recognition, Architecture, Technical Leadership, Leadership, Unit Testing

Senior R&D Manager

2003 - 2009
Nokia
  • Recruited and ramped up the Maemo application framework team from scratch.
  • Defined the application framework architecture and development strategy.
  • Led the implementation of three major software generations along with updates.
  • Impacted Nokia's entry into open-source development.
  • Developed a considerable subcontracting and partnering network for Linux development.
  • Contributed to the initial product concept definition.
Technologies: IT Project Management, Agile Project Management, Software Architecture, Open Source, Due Diligence, Recruitment, Leadership

Senior Software Engineer

2001 - 2003
Nokia
  • Prototyped a small-footprint relational database for small Linux devices in C++ for the Nokia Research Center.
  • Prototyped a personal information manager for handheld devices based on semantic web technology in Python.
  • Studied and evaluated architectural options for an application framework aimed at Linux-based handheld devices adopted by the nascent Maemo project.
Technologies: Python, C, C++, Databases, Embedded Linux, Semantic Web, RDF, Software Architecture, Graphical User Interface (GUI), GNOME, Qt, GTK+, ANTLR

GIS/Computer Graphics Freelancer

1998 - 2001
Freelancer clients
  • Built a geographic information system (GIS) to edit the land cadaster for the Portuguese Ministry of Agriculture using C++, Windows, and Oracle technologies.
  • Constructed a custom C++ framework for real-time manipulation of topologically integrated geographic vector data.
  • Assembled a geographical decision support system for semi-automated execution and optimization of land-consolidation projects for specialized consultancy using C++.
  • Developed, licensed, and finally sold a ray-tracing rendering module for use with interior design software written in C++.
  • Shaped GIS to edit an olive tree cadaster for the Portuguese Ministry of Agriculture, with integrated olive tree recognition from aerial photography, built with C++ in Windows.
  • Designed and implemented flooring tiling algorithms for 3D interior design software.
Technologies: Oracle, Python, C++, Computational Geometry, Computer Graphics, GIS, Optimization, Unit Testing

Experience

RawHash

https://github.com/pconstr/rawhash
An experimental, binary-friendly alternative to using a hash as a key value cache, in C++, for Node.js.

Keys are binary buffer objects rather than strings. Values are arbitrary objects.

RawHash is built on Google SparseHash and MurmurHash3.

Rdb-parser

https://github.com/pconstr/rdb-parser
An asynchronous streaming parser for Redis RDB database dumps, written in 100% JavaScript, intended for use in Node.js, and released as open-source. It's useful for diagnostics, data conversion, or even as part of a data processing pipeline.

Incremental Random Forest

https://github.com/pconstr/irf
An implementation in C++, with Node.js and Python bindings, of a variant of Leo Breiman's random forests.

The forest is maintained incrementally as samples are added or removed - rather than fully rebuilt from scratch every time to save resources.

It is not a streaming implementation, as all the samples are stored and will be re-seen when required to recursively rebuild invalidated subtrees. The effort to update each tree can vary substantially, but the overall effort to update the forest is averaged across the trees and tends not to vary significantly.

Data-Graft.js

https://github.com/pconstr/data-graft.js
An animation-friendly, differential document object model (DOM) template engine that is self-contained and framework-agnostic. Built to experiment with dynamic data/DOM binding, focusing on flexibility for animating data-change transitions.

Education

1991 - 1996

Master's Degree in Computer Science

Universidade Nova de Lisboa - Lisbon, Portugal

Certifications

OCTOBER 2024 - PRESENT

Combinatorial Optimization

University of Melbourne

FEBRUARY 2024 - PRESENT

Discrete Optimization

University of Melbourne

Skills

Libraries/APIs

Pandas, Node.js, NumPy, Matplotlib, Bottle.py, Eigen, Scikit-learn, SciPy, Theano, D3.js, TensorFlow, Keras, PyMC, PySpark, REST APIs, SQLAlchemy, Pydantic, Asyncio, Spark ML, PyTorch, GitHub API, Slick, Beautiful Soup, Shapely, Beautiful Soup 4, React

Tools

Git, Amazon Redshift Spectrum, Spark SQL, Apache Airflow, GitHub, StatsModels, Seaborn, Pytest, pylint, ChatGPT, Retool, Docker Compose, AWS CloudFormation, Terraform, MATLAB, Amazon Elastic MapReduce (EMR), AWS IAM, AWS ELB, AWS CLI, Amazon Athena, Grafana, Confluence, Gurobi, GNOME, GTK+, GIS, ANTLR, Google OR-Tools

Languages

Python, C++, C, Python 3, Java, SQL, R, Scala, HTML, Go, GraphQL, TypeScript, CoffeeScript, JavaScript, CSS, RDF, Prolog, C#, Rust

Paradigms

Unit Testing, Distributed Computing, Parallel Computing, Distributed Programming, ETL, DevOps, Microservices, Constraint Programming, Linear Programming, Functional Programming, Usability Testing, Agile Project Management

Platforms

Linux, Amazon Web Services (AWS), Docker, Apache Kafka, Jupyter Notebook, Kubernetes, Databricks, RunPod, Oracle, Embedded Linux, Google Cloud Platform (GCP)

Storage

Redis, Redshift, MongoDB, PostgreSQL, Databases, Database Architecture, Amazon S3 (AWS S3), MySQL, RocksDB, Elasticsearch

Frameworks

Apache Spark, Hadoop, Flask, Metaflow, Ruby on Rails (RoR), Django, Qt, .NET, Spring, Selenium

Industry Expertise

Cybersecurity

Other

Machine Learning, Data Science, FastAPI, Algorithms, Large Language Models (LLMs), Artificial Intelligence (AI), Amazon Kinesis, Software Engineering, Mathematics, Deep Learning, Bayesian Statistics, PuLP, Optimization, Recommendation Systems, Software Architecture, IT Project Management, Open Source, Random Forests, Data Engineering, Scripting, Data Extraction, Command-line Interface (CLI), Data Architecture, Architecture, Technical Leadership, Back-end, Leadership, SaaS, Abstract Syntax Trees (AST), Torch, APIs, Data Scraping, Web Scraping, Web Development, Multithreading, Multiprocessing, Local Search, Natural Language Processing (NLP), Tornado, MLflow, Prometheus, Delta Lake, Statistics, Mixed-integer Linear Programming, Front-end, CVXOPT, Bluetooth, Planning, Usability, Speech Recognition, Due Diligence, Recruitment, Semantic Web, Graphical User Interface (GUI), Computational Geometry, Computer Graphics, DOM, GeoPandas, RESTful Microservices, Boost.Asio, Hugging Face, Llama 2, LangChain, FAISS, Machine Learning Operations (MLOps)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring