Carlos Guerreiro, Natural Language Processing (NLP) Developer in Espoo, Finland
Carlos Guerreiro

Natural Language Processing (NLP) Developer in Espoo, Finland

Member since April 23, 2013
Carlos is an exceptional data generalist who brings a vast amount of experience in the design, implementation, and validation of data-intensive systems to all of his projects, along with deep expertise in machine learning and real-time stream processing.
Carlos is now available for hire


  • Perceptive Constructs
    Redis, C++, Node.js, JavaScript, R, Python, Machine Learning
  • MarkaVIP
    Oracle, MySQL, Spark, Redshift, AWS Kinesis, C++, Java, R, Python
  • Codento
    Ruby on Rails (RoR), Java, CoffeeScript, Node.js, JavaScript, Python



Espoo, Finland



Preferred Environment

IPython, Command-line Interface (CLI), Emacs, Git, Linux, MacOS

The most amazing...

...thing I've built is an activity stream relevance filter - a low-latency, supervised learning loop over a deep neural net trained from 1/2 TB of unlabeled data.


  • Founder

    2010 - PRESENT
    Perceptive Constructs
    • Built a custom activity stream data processing pipeline with Node.js and Redis.
    • Build an unsupervised training pipeline for deep neural network architectures aimed at feature extraction from free text, using Python and Node.js.
    • Built low-latency activity stream relevance filters using Node.js and C++.
    • Built optimized random forest and Naive Bayes classifiers in C++ with bindings to Node.js and Python.
    • Built a real-time web UI for activity stream relevance filtering, using Node.js, Socket.IO, and a custom data/DOM binding framework.
    • Built a low-latency framework for training classifiers in an active learning settings using Node.js, Redis, Socket.IO, and jQuery.
    • Built a hybrid native/HTML custom activity stream client for Android, integrated with filtering.
    • Built a real-time custom recommender system for eCommerce. Hybrid collaborative filtering + content (text and metadata). Python, C++. Distributed and multicore.
    • Built a bespoke transaction risk analysis system for eCommerce. Python + R.
    • Built a custom marketing message timing optimizer for eCommerce. Python + R.
    Technologies: Redis, C++, Node.js, JavaScript, R, Python, Machine Learning
  • Director of Data Science

    2015 - 2016
    • Built and deployed a foundational analytical backbone for the company in AWS, around Kinesis, Redshift, and Spark. The design balances key goals of scalability, accessibility to analysts, low admin overhead, and support for both batch and streaming analysis.
    • Integrated continuous data ingestion from key systems into the analytical backbone, whenever practical, through low latency interfaces such as database replication.
    • Migrated some interaction tracking systems to sink directly to the backbone.
    • Migrated key analytical systems to the backbone, including the recommender.
    • Mad various improvements to the recommender, including use of fine-grained recorded impressions as a negative signal, and more flexibility in handling of catalog metadata.
    • Made analysis and real-time operation for interventions on customer experience to reduce the impact of returns and cancellations. Optimization is by policy search through retrospective simulation on historical data. Operationalized as HTTP micro-service (Python, Kinesis, Redshift).
    • Expanded the above system to improve order profitability by optimizing basket constraints and incentives.
    • Conducted retrospective sourcing performance and pricing analysis. Our systems don't maintain the full history of changes to all the relevant data, so this analysis was done by replaying row mutations continuously captured from database replication logs and stored in Redshift (Python/C++).
    Technologies: Oracle, MySQL, Spark, Redshift, AWS Kinesis, C++, Java, R, Python
  • Software/Data Architect

    2011 - 2015
    • Built an image upload/pre-processing pipeline for a media startup, using Node.js and MongoDB on AWS. Included single sign-on with a Ruby on Rails app in the back-end.
    • Implemented real-time path updates on a bespoke structured messaging app, using Node.js (integrated with a Python back-end) and Batman.js.
    • Built custom, interactive data displays for a bespoke structured messaging application using d3.js.
    • Implemented a complex data entry UI for a structured messaging application using Batman.js in CoffeeScript.
    • Built a custom C# distributed data analysis pipeline to perform Matlab jobs on AWS.
    • Contributed to embedded security appliances in C.
    • Contributed to the back-end for structured messaging applications in Python, with Django.
    • Designed and implemented custom interactive data analysis and visualization for economic data. Python back-end + d3.js visualization.
    • Built a custom nurse schedule and route optimization system for a healthcare startup. Pre-processing and mixed integer model formulation for Gurobi in Python and d3.js visualization of solutions.
    • Modernized the system design of a pre-existing real-time transport logistics system for scalability and higher performance. Enterprise Java.
    • Designed and implemented a reference application for a high-security network architecture for a banking customer. Scala/Play, Slick, two-factor authentication.
    • Contributed to large scale online storage system implementation. Python + PostgreSQL.
    • Built a custom Matlab system to tune a legacy application from data during black-box optimization (derivative free).
    Technologies: Ruby on Rails (RoR), Java, CoffeeScript, Node.js, JavaScript, Python
  • Chief Software Architect

    2009 - 2010
    Nokia | Gear
    • Prototyped a voice- and gesture-based user interface for in-car mobile phone usage at various levels of fidelity ranging from Wizard of Oz to software proof-of-concept (Python, Java, Sphinx).
    • Defined software architecture for a family of in-car products, with input to hardware platform selection.
    • Planned costs, schedule, and execution of multiple new product development scenarios.
    • Planned and moderated usability studies for prototype validation and iteration.
    • Conducted rigorous feasibility studies and software architecture reviews at Gear.
    Technologies: Sphinx Search Engine, Java, Python
  • Team Lead, Senior R&D Manager

    2003 - 2009
    Nokia | Maemo
    • Recruited and ramped up the Maemo Application Framework team from scratch.
    • Defined application framework architecture and development strategy.
    • Led the implementation of three major software generations along with updates.
    • Impacted Nokia's entry into open-source development.
    • Developed a considerable subcontracting and partnering network for Linux development.
    • Contributed to initial product concept definition.
    Technologies: Maemo
  • Senior Software Engineer

    2001 - 2003
    Nokia | Research Center
    • Prototyped a small-footprint relational database for small Linux devices in C++.
    • Prototyped a personal information manager for handheld devices based on semantic web technology in Python.
    • Studied and evaluated architectural options for an application framework aimed at Linux-based handheld devices, adopted by the nascent Maemo project.
    Technologies: Python, C, C++
  • GIS/Computer Graphics Freelancer

    1998 - 2001
    • Built a GIS to edit land cadaster for the Portuguese Ministry of Agriculture using C++, Windows, and Oracle technologies.
    • Built a custom C++ framework to offer real-time manipulation of topologically integrated geographic vector data.
    • Built a geographical decision support system for semi-automated execution (optimization) of land-consolidation projects for specialized consultancy, using C++, in Windows.
    • Developed, licensed, and finally sold a ray-tracing rendering module for use with interior design software, written in C++.
    • Build GIS to edit an olive tree cadaster for the Portuguese Ministry of Agriculture, with integrated olive tree recognition from aerial photography, built with C++ in Windows.
    Technologies: Oracle, Python, C++


  • rawhash

    An experimental, binary, friendly alternative to using a hash as a key:value cache, for Node.js.

    Keys are binary buffer objects rather than strings. Values are arbitrary objects.

    rawhash is built on google-sparsehash and murmurhash3 (included).

  • rdb-parser

    An asynchronous streaming parser for Redis RDB database dumps, written in 100% JavaScript, intended for use in Node.js.

  • Incremental Random Forest

    An implementation in C++ (with Node.js and Python bindings) of a variant of Leo Breiman's Random Forests

    The forest is maintained incrementally as samples are added or removed - rather than fully rebuilt from scratch every time - to save resources.

    It is not a streaming implementation, as all the samples are stored and will be re-seen when required to recursively rebuild invalidated subtrees. The effort to update each individual tree can vary substantially but the overall effort to update the forest is averaged across the trees and tends not to vary significantly.

  • catsagram

    Rolling instagram photos of cats, built to experiment with custom data/DOM bindings (data-graft.js), responsive layout (try resizing the window), and

  • data-graft.js

    An animation-friendly, differential DOM template engine, self-contained and framework-agnostic. Built to experiment with dynamic data/DOM binding, with a particular focus on flexibility for animating data-change transitions.


  • Languages

    Python, C++, C, Go, JavaScript, SQL, R, CoffeeScript, Java, Scala
  • Libraries/APIs

    Node.js, Spark Streaming,, Eigen, Scikit-learn, SciPy, Twitter API, NumPy, D3.js, jQuery, Matplotlib, Pandas, Theano, Facebook API
  • Paradigms

    Data Science, Distributed Computing, Parallel Computing, Distributed Programming, Functional Programming
  • Platforms

    Linux, AWS Kinesis, Amazon Web Services (AWS), Google App Engine, Maemo, Oracle, MacOS, Android
  • Storage

    Redis, Redshift, LevelDB, MongoDB, Sphinx Search Engine, MySQL, RocksDB, Cassandra
  • Frameworks

    Apache Spark, Django, Hadoop, Ruby on Rails (RoR), Spark
  • Tools

    MATLAB, Git, Emacs, IPython
  • Other

    Machine Learning, Scientific Computing, Command-line Interface (CLI), Natural Language Processing (NLP), Tornado


  • Master's Degree in Computer Science
    1991 - 1996
    Universidade Nova de Lisboa - Lisbon, Portugal

To view more profiles

Join Toptal
Share it with others