
João Rafael
Verified Expert in Engineering
Data Scientist and Machine Learning Developer
Porto, Portugal
Toptal member since October 29, 2020
João is an applied data science specialist who bridges the gaps between business requirements, engineering constraints, and machine learning research. He leads the development of data science projects and has deployed products in multiple industries, including telco and fintech. João has developed and implemented novel machine learning algorithms for research institutions and custom solutions for commercial clients.
Portfolio
Experience
- Python - 16 years
- Data Science - 12 years
- Artificial Intelligence (AI) - 12 years
- Machine Learning - 10 years
- Distributed Computing - 9 years
- Software Project Management - 8 years
- Deep Learning - 4 years
- PyTorch - 4 years
Preferred Environment
Python, Ubuntu, Jupyter Notebook, XGBoost, PySpark, PyTorch, Scikit-learn, Python 3, Crypto, APIs
The most amazing...
...project I've worked on is a credit card fraud detection service for one of the largest payment processors in the US, handling over $1 billion in payments daily.
Work Experience
Founder
Upper Delta
- Founded Upper Delta, a specialized data science and machine learning consultancy.
- Developed large projects in the telco industry, including product recommendation systems, churn prediction models, call center optimization products, and quality-of-service degradation prediction models.
- Oversaw the work conducted by employee and client teams, ensuring on-time delivery and visibility of project status.
- Supervised the research conducted for several master's theses in collaboration with universities in Portugal.
- Mentored 20+ data scientists and software engineers, providing them with professional growth opportunities through one-on-one sessions, reading groups, and workshops.
- Demystified the role of data science and machine learning for C-level executives in our clients' organizations.
Co-founder
Powercall
- Led the technical implementation of the entire product, including infrastructure, ETL pipelines, machine learning models, and dashboards.
- Co-founded Powercall, a company that delivers a call center optimization product. By using AI to identify the best hour to contact each customer, we improve call center operations with respect to answer rates, sales per client, and client reach.
- Engaged with clients to showcase the product, set up pilot programs, discuss integration solutions, and finalize pricing options.
Senior Software Engineer
Feedzai
- Implemented a suite of data science tools that became part of the core product. Improved SOTA machine learning algorithms by conducting research and implementation, thereby improving results for all clients.
- Played a key role in the delivery team that implemented fraud detection solutions for large banks, payment processors, and merchants, including First Data and JIO Wallet.
- Served as the tech lead for a multimillion-dollar project, defining the solution's architecture. Coordinated and communicated requirements, progress, and deadlines across the client's technical staff and internal product and research teams.
- Supervised the work of my team from a technical perspective, ensuring high-quality code and documentation.
- Conducted regular one-on-one meetings with every team member to assess performance, future goals, culture match, and potential actions to improve their satisfaction within the team and the company.
Researcher
University of Coimbra
- Designed and implemented a novel programming language for parallel, event-driven programming with deadlock-free semantics.
- Implemented a framework for automatic parallelization of existing Java applications by detecting data dependencies at a granular level and scheduling execution with a work-stealing algorithm.
- Co-authored two scientific papers for the International Journal of Parallel Programming and the Euro-Par Conference on parallel and distributed computing.
Experience
Fraud Detection System
In addition to defining the solution's architecture, I coordinated and communicated requirements, progress, and deadlines across the client's technical staff, the company's project managers, and internal product engineering and research teams; ensured high-quality code and documentation, and coached each team member through one-on-one meetings.
Product Recommendation System
I developed this system from scratch. A content-based approach was used to incorporate information from multiple domains, including product usage, billing information, previous customer interactions, and demographics, as well as product characteristics and price points. The recommendations were measured against the previous strategy in A/B tests, and a statistically significant increase in average revenue per user (ARPU) was achieved.
Call Center Optimization
The system communicates with multiple partner companies and accesses several back-end systems and databases to collect the necessary information. A dashboard was created to monitor both the system and the final business metrics.
I led the development of the process from proof-of-concept to production, ensuring correct development processes and code quality by means of unit and integration tests, code linters, CI/CD, and code reviews.
This project drove a 15% increase in client reach for the selected marketing campaigns. It was showcased as a case study for the data science community and presented to an audience of 80+ data scientists, industry players, and C-level executives.
Quality of Service Degradation Prediction
The model combines anomaly detection and predictive algorithms to identify which clients are facing or will face network issues. Due to the large amount of data collected, PySpark was used to process the information in a cluster. Specific code was developed in Java for extra optimizations.
Throughout this project, several distinct patterns were discovered in the data and relayed to the company's engineering team to fix. Additionally, a survey was conducted, contacting the clients who were most likely to be facing problems, and 98% confirmed our findings.
Rooftop Obstacle Detection for Solar Panel Company
Sales Forecast for a Beverage Company
I implemented the project, which included discussions with the client data and product teams to understand and clean data issues. Datasets were enriched with external data sources for weather, demographics, and events. The final deliverable included a dashboard where the client could visualize the data geographically and uncover patterns across locations and time spans.
Data Lake Implementation in AWS
The data lake supported both bulk and streaming data ingestion from various sources, including data brokers, product usage, SaaS services, operational logs, and APIs. I ensured data cleanup and transformation processes were in place to meet both operational and analytical requirements effectively.
Education
Essential Molecular Biology - 'Hands On' Laboratory Course in Molecular Biology
University of Porto - Porto, Portugal
Master's Degree in Computer Science
University of Coimbra - Coimbra, Portugal
Bachelor's Degree in Computer Science
University of Coimbra - Coimbra, Portugal
Certifications
Certified DataStax Architect
DataStax
Skills
Libraries/APIs
Scikit-learn, XGBoost, Pandas, OpenAI API, PyTorch, PySpark
Tools
Amazon SageMaker, Amazon Athena, Apache Iceberg, RabbitMQ, Syslog, Apache Airflow
Languages
Python, Java, Scala, JavaScript, R, Rust, Python 3, SQL, C++, XML
Paradigms
Distributed Computing, High-performance Computing (HPC), Parallel Programming, Continuous Integration (CI), Anomaly Detection, Management
Platforms
Amazon Web Services (AWS), Linux, Ubuntu, Docker, Databricks, Jupyter Notebook, Oracle, Google Cloud Platform (GCP), AWS Lambda
Frameworks
Apache Spark, Spark
Storage
Amazon S3 (AWS S3), PostgreSQL, Distributed Databases
Industry Expertise
Project Management
Other
Machine Learning, Data Science, Fraud Prevention, Recommendation Systems, Artificial Intelligence (AI), Data Engineering, Algorithms, APIs, Document Parsing, OpenAI, Generative Artificial Intelligence (GenAI), Machine Learning Operations (MLOps), Neural Networks, Deep Learning, Software Project Management, Apache Cassandra, Distributed Systems, Mathematics, Computer Vision, Large Language Models (LLMs), Options, Reinforcement Learning, Crypto, Prompt Engineering, Customer Lifetime Value (CLV), Compilers, Deep Reinforcement Learning, Predictive Analytics, Optimization, Predictive Modeling, Network Topology, FTTH, Software Development, Computer Graphics, Digital Electronics, Digital Signal Processing, Call Centers, Customer Experience, IT Consulting, Forecasting, Web Dashboards, Bayesian Statistics, Bayesian Inference & Modeling, Amazon RDS, Amazon Managed Workflows for Apache Airflow (MWAA), Equity Market Data, Financial Markets, Equity, Quantitative Finance, Trading, Molecular Biology, DNA Sequencing, Plasmid Engineering, Transfection, PDF
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring