Felipe Costa Farias, Developer in Recife, Brazil
Felipe is available for hire
Hire Felipe

Felipe Costa Farias

Verified Expert  in Engineering

Machine Learning and Software Developer

Recife, Brazil
Toptal Member Since
February 8, 2021

Felipe has a Ph.D. in machine learning and a professional software development background. He has worked with machine learning since 2011 and has the experience and skills necessary to fulfill AI/DS roles, from development to team management. Felipe has applied AI to several areas, such as computer vision, natural language processing, time series, bioinformatics, and pharmacy. He is a quick learner with excellent knowledge of Python and artificial intelligence.


PyTorch, Distributed Systems, Azure, Amazon Web Services (AWS), Git, Amazon...
Elife Brasil
Machine Learning, Natural Language Processing (NLP), GPT, Sentiment Analysis...
Confidential Name
Data Science, Machine Learning, Big Data, Algorithms, Neural Networks, MATLAB




Preferred Environment

Slack, Visual Studio Code (VS Code), Vim Text Editor, Linux, Docker, Terminal

The most amazing...

...thing that I've developed was a machine learning model to predict molecular properties to improve drug discovery.

Work Experience

Machine Learning Researcher

2021 - PRESENT
  • Developed machine learning models to predict molecular properties.
  • Performed machine learning distributed training in 288 GPUs.
  • Implemented different graph neural network algorithms.
  • Performed statistical analysis on different models.
  • Developed a machine learning pipeline with Apache Airflow.
  • Participated as the team leader of ML-related projects.
Technologies: PyTorch, Distributed Systems, Azure, Amazon Web Services (AWS), Git, Amazon, Plotly, Data Visualization, Statistical Learning, Pandas, Jupyter Notebook, Slack, Scientific Computing, Linux, Supervised Learning, Unsupervised Learning, Python, Graph Neural Networks, Drug Development, Pharmaceuticals, Apache Airflow, Large Language Models (LLMs), Generative Pre-trained Transformers (GPT), Generative Pre-trained Transformer 3 (GPT-3), OpenAI GPT-3 API, OpenAI, OpenAI GPT-4 API, Applied Research, Fine-tuning, Training, ChatGPT, Chatbots, Leadership, Technical Leadership

AI Consultant

2020 - PRESENT
Elife Brasil
  • Created models in Python to perform sentiment analysis on social media texts about companies using scikit-learn and Spacy/Gensim.
  • Used state-of-the-art AI models such as transformers and classical word-embedding to perform text understanding with Python and NLTK.
  • Developed a model, which uses unsupervised and supervised learning, to automatically suggest responses to clients' emails based on previous human written responses.
  • Created PyTorch deep learning models with convolutional neural networks to identify a person's age based on their photo.
  • Created PyTorch deep learning models with convolutional neural networks to identify a person's gender based on their photo.
  • Deployed models as APIs to a cloud computing platform (Hetzner).
Technologies: Machine Learning, GPT, Natural Language Processing (NLP), Sentiment Analysis, Text Mining, Information Retrieval, Social Media, Text Analytics, PyTorch, Data Preprocessing, Text Processing

Data Scientist

2021 - 2022
Confidential Name
  • Developed different ML models for a battery lifecycle management startup to detect anomalies in batteries.
  • Developed an API to serve different ML battery models.
  • Performed exploratory data analysis and cleaned the data to create the ML models.
Technologies: Data Science, Machine Learning, Big Data, Algorithms, Neural Networks, MATLAB


2014 - 2021
IFPE Instituto Federal de Pernambuco
  • Presented lectures on several topics regarding computer science field, specifically data science and artificial intelligence courses.
  • Participated in industrial R&D to develop methods to solve complex problems using AI.
  • Led a research group (BRAINS - Brazilian Research in Artificial Intelligence and Systems).
Technologies: Machine Learning, Deep Learning, Optimization, Metaheuristics, Search Algorithm Design, Data Science, Scientific Data Analysis, Exploratory Data Analysis, Artificial Intelligence (AI), Artificial Neural Networks (ANN), Neural Networks, IP Networks, Computer Science

AI Consultant

2020 - 2020
Elife Brasil
  • Developed a machine learning model with biomarkers, social data, and laboratory test results to predict sepsis six hours in advance.
  • Preprocessed a very noisy dataset to allow the model to learn with this specific data.
  • Benchmarked several machine learning algorithms such as gradient boosting, convolutional, and recurrent neural networks with several architectures.
Technologies: Python, Scikit-learn, Pandas, Time Series, Recurrent Neural Networks (RNNs), Gradient Boosting

AI Consultant

2018 - 2019
Elife Brasil
  • Developed a deep learning model using long short-term memory (LSTM) convolutional neural networks (CNN) to classify electrocardiogram (ECG) signals of a patient as normal or a potential anomaly.
  • Preprocessed the ECG data using linear and non-linear filters to ease the training phase of the deep learning models.
  • Deployed the model as an API to a private cloud computing platform (Hetzner).
  • Monitored the model accuracy and time performance.
Technologies: Deep Learning, Data Preprocessing, Web Services, Cloud Computing, Healthcare, Convolutional Neural Networks (CNN), Long Short-term Memory (LSTM), Signal Processing

Software Developer

2012 - 2014
Stefanini Group
  • Maintained and optimized a custom matrix/linear algebra library written in C and C++ to support the development of neural networks.
  • Developed a pipeline with document segmentation/categorization, handwritten recognition, and OCR to retrieve textual information from documents and handwritten forms to open banking accounts for one of Brazil's most important banks.
  • Developed a distributed system of nodes with specific responsibilities using C# and web services endpoints.
Technologies: Artificial Neural Networks (ANN), C, Matrix Algebra, C++, C#, Java, MySQL, PostgreSQL, Computer Vision, Image Processing, Python, Text Mining, Information Retrieval, SQL


1. A natural language processing (NLP) system written in Python using state-of-the-art Machine Learning algorithms, with Scikit-Learn. The model helps customer service teams by suggesting the top three most plausible responses to questions/emails sent by clients. The model trains on previously answered emails written by each team. Analysts were able to increase their response time by 12x. The selection of ideal responses occurs in 30% of the cases. This number increases while more data are gathered and presented during retraining. This project uses a lightweight Flask web service and works for English, Portuguese, and Spanish languages.

2. A system that performs sentiment analysis using state-of-the-art machine learning algorithms on several social media texts with SpaCy/Gensim. It currently handles thousands of requests per second using a lightweight Flask web service.

3. Module to identify a person (i) age and (ii) gender using social media profile photos applying deep convolutional neural networks with PyTorch.

4. Module for automatic Tagging of social media posts based on previously tagged ones written in Python with scikit-learn.


A SaaS to perform classification of Electrocardiogram (ECG) data. The ECG data was acquired by custom hardware attached to a smartphone. The smartphone communicates with a web service endpoint to send the data to further AI analysis. I was the AI consultant leading an AI team to create and deploy the models.

NASA Robotic and Engineering Bootcamp

In this project, I was a software consultant for robotics, mentoring many engineering interns from all over the USA and several other countries. It was a project where we had to develop the hardware/software of robots to collaboratively create 3D maps. Each robot has a LIDAR camera that takes 360° point cloud pictures of its location. Each robot has two spheres with unique distances on top of them, serving as a visual identifier that we had to detect in the point cloud. As the robots took several images from different overlapping areas, we performed image registration using SLAM, taking as landmarks the spheres position, fusing data from these landmarks and each robot motor encoder, as we have no access to GPS data. After this, we had to set the next best location to send each robot to take other pictures, maximizing the area over time. To perform the path planning, we have used a classical AI algorithm (A*). We implemented the software using Robotic Operating System (ROS) and OpenCV in C/C++ and Python. We also used the Point Cloud Library. To filter the point cloud data, we have used signal processing techniques. To visually analyze our data, we have used the rviz package and custom OpenGL software.

Electroencephalography Classification

A system to classify imagined and executed movements using deep learning algorithms such as convolutional neural networks, long short-term memory, and gated recurrent units from subjects' electroencephalography (EEG) input signals. It was built in Python with the Keras framework.

Django System for Law Offices

A time-tracking system developed for lawyers' activities and support a law office's operations, controlling all the documents with several reports built to support the work audit.

We used Django and PostgreSQL to create both systems.

Molecular Property Prediction System

A flexible ML framework that enables non-ML expert people to train their models based on their own data and facilitate the implementation of new training tasks and model architecture definitions for the ML team regarding molecular property prediction. We also coupled this system with automated training such that the models are always up-to-date.


Python, C, Java, C#, SQL, C++, XML, C#.NET


PyTorch, NumPy, Pandas, SciPy, OpenCV, TensorFlow, Keras, Scikit-learn, Matplotlib, SpaCy, Natural Language Toolkit (NLTK), Django ORM


Data Science, Scrum, REST, Wavelets, ETL


Optimization, Signal Analysis, Artificial Intelligence (AI), Artificial Neural Networks (ANN), Neural Networks, Data Mining, Computer Vision, Pattern Recognition, Deep Learning, Machine Learning, Data Preprocessing, Predictive Analytics, Scientific Computing, Document Processing, Text Analytics, Text Processing, Data Visualization, Data, Code Review, Source Code Review, Generative Pre-trained Transformers (GPT), Large Language Models (LLMs), Applied Research, Statistics, Natural Language Processing (NLP), OCR, Containers, Graph Neural Networks, DGL, Scripting, GPT, Generative Pre-trained Transformer 3 (GPT-3), OpenAI GPT-3 API, OpenAI, OpenAI GPT-4 API, Fine-tuning, Training, ChatGPT, Chatbots, Leadership, Technical Leadership, Probability Theory, Fuzzy Logic, Geometry, Calculus, Physics, Linear Algebra, Discrete Mathematics, Data Structures, Signals, Software Engineering, Circuit Board Design, Electronics, IP Networks, Digital Communication, Genetic Algorithms, Decision Trees, Decision Support Systems, Random Forests, Random Forest Regression, Naive Bayes, Bayesian Inference & Modeling, Evolutionary Algorithms, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNNs), Deep Neural Networks, Certified ScrumMaster (CSM), Gradient Boosting, Regression Modeling, Clustering, Classification, Text Classification, K-means Clustering, K-nearest Neighbors (KNN), Ensemble Methods, Support Vector Machines (SVM), Support Vector Regression, Normalization, Standardization, Gradient Boosted Trees, Tree Structures, Binary Search Trees, Graphs, Processing & Threading, AdaBoost, Feature Analysis, Metaheuristics, Principal Component Analysis (PCA), Concurrent Computing, Concurrency, Synchronization, Resource Allocation, Information Systems, Image Processing, Information Theory, Microcontrollers, Computer Vision Algorithms, Stochastic Modeling, Markov Model, Hypothesis Testing, Exploratory Data Analysis, Experimental Research, Web Services, Cloud Computing, Long Short-term Memory (LSTM), Sentiment Analysis, Text Mining, Information Retrieval, Search Algorithm Design, Scientific Data Analysis, Computer Science, APIs, Digital Signal Processing, Web Development, EEG, CSV, PMI, Supervised Learning, Unsupervised Learning, Matrix Algebra, Statistical Learning, Signal Processing, Time Series, Simultaneous Localization & Mapping (SLAM), Robot Operating System (ROS), Analytical Geometry, Robotics, DC Motor Drive, Gated Recurrent Unit (GRU), Distributed Systems, Pharmaceuticals, Drug Development, Big Data, Algorithms, Interviewing, Technical Hiring, Team Management


Git, IPython, Docker Compose, Logging, Hidden Markov Model, Scikit-image, StatsModels, MATLAB, Apache Airflow, Slack, Vim Text Editor, Terminal, Gensim, Plotly


Docker, Linux, Jupyter Notebook, Azure, Visual Studio Code (VS Code), Amazon, Amazon Web Services (AWS)


Databases, MySQL, Data Pipelines, JSON, PostgreSQL, SQL Server 2012


Flask, Django

Industry Expertise

Project Management, Healthcare, Social Media

2016 - 2022

Ph.D. in Computer Science

Federal University of Pernambuco - Recife, PE, Brazil

2014 - 2016

Master's Degree in Computer Engineering

University of Pernambuco - Recife, PE, Brazil

2009 - 2014

Bachelor's Degree in Computer Engineering

University of Pernambuco - Recife, PE, Brazil


Microsoft Certified Professional



Certified Scrum Master

Scrum Alliance


Oracle Certified Professional



Certified Associate in Project Management (CAPM)

Project Management Institute (PMI)

NOVEMBER 2011 - JULY 2014

Microsoft Certified Professional


Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.


Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring