Jose Luis Moreira Arruda, Developer in São José dos Campos - State of São Paulo, Brazil
Jose is available for hire
Hire Jose

Jose Luis Moreira Arruda

Verified Expert  in Engineering

Data Scientist and Python Developer

São José dos Campos - State of São Paulo, Brazil

Toptal member since April 19, 2023

Bio

Jose is a data scientist/ML engineer who's experienced across multiple sectors, including eCommerce, healthcare, and fintech. He is an expert in developing strategic projects and building AI/data products. He easily translates pain points and business goals into tailored AI products and designs. He deploys machine learning and deep learning models for time series, CV, and NLP and integrates them into company systems on the cloud. Jose has led data science projects while mentoring colleagues.

Portfolio

Flinks
Deep Learning, Generative Pre-trained Transformers (GPT)...
Farfetch
Computer Vision, Deep Learning, PySpark, Machine Learning...
J!Quant
Computer Vision, Deep Learning, Machine Learning, SQL, Microsoft Power BI...

Experience

  • Scikit-learn - 7 years
  • Python - 7 years
  • Statistics - 7 years
  • Machine Learning - 7 years
  • PyTorch - 5 years
  • Pandas - 5 years
  • Deep Learning - 5 years
  • Recommendation Systems - 3 years

Availability

Part-time

Preferred Environment

Python, Pandas, Linux, PyTorch, Scikit-learn, NumPy, SQL, Artificial Intelligence (AI), Google Cloud Platform (GCP), Azure

The most amazing...

...algorithm I've worked on is a near real-time ML recommendation system to rank fashion eCommerce products.

Work Experience

Senior Data Scientist and ML Engineer

2022 - PRESENT
Flinks
  • Designed and developed new data products based on stakeholders' requirements and strategic business goals in the payments and open banking field. Designed and built ML systems to mitigate fraud risks in the payment sector.
  • Developed systems using OpenAI LLM capabilities to improve transactional data enrichment, ensuring robustness. Designed and deployed scalable ML systems using cloud resources on GCP and AWS.
  • Architected a system to improve web scrapping automation using ML and computer vision models.
Technologies: Deep Learning, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Machine Learning, Google Cloud Platform (GCP), Python, PyTorch, Linux, Pandas, NumPy, SQL, Programming, Microsoft Power BI, Artificial Intelligence (AI), Data Scientist, Explainable Artificial Intelligence (XAI), Recurrent Neural Networks (RNNs), Time Series, Data Science, Transformer Models, Data Visualization, Data Engineering, Data Analytics, Data Interpretation, Data Analysis, Forecasting, Data Build Tool (dbt), Jupyter, Large Language Models (LLMs), Supervised Learning, Generative Artificial Intelligence (GenAI), Statistical Analysis, Regression Modeling, PostgreSQL, Data Modeling, Algorithms, Hierarchical Clustering, Clustering Algorithms, Clustering, K-means Clustering, Unstructured Data Analysis, Ollama, Amazon Web Services (AWS), Open-source LLMs, LangChain, OpenAI GPT-4 API, Keras, Rapid Prototyping, OpenAI API, GitHub, Architecture, Gemini, Prompt Engineering, Retrieval-augmented Generation (RAG), Llama, Machine Learning Operations (MLOps), Large Language Model Operations (LLMOps), Computer Vision, Document Databases, NoSQL, Leadership, Cloud, Azure OpenAI Service, Vector Databases, Vertex AI, Anthropic, OpenAI, Web Scraping, Google Cloud, GitOps, Kubernetes, APIs, Big Data, Docker

Senior Data Scientist

2021 - 2022
Farfetch
  • Led the data science development of data-driven initiatives toward company strategic goals, such as increasing profitability and user engagement, while collaborating closely with product and engineering teams.
  • Delivered multiple data analyses to better understand customers, products, and their relations. Supplied multiple POCs to provide better clarity on commercial and fashion requirements.
  • Implemented deep learning models for information retrieval by extracting good representations from product images, product descriptions, user interactions, and other parameters.
  • Monitored live systems, fixed bugs, and implemented production-level features to current systems tested and released on production.
Technologies: Computer Vision, Deep Learning, PySpark, Machine Learning, Recommendation Systems, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), QA Testing, Azure, Python, PyTorch, Linux, Pandas, NumPy, SQL, Programming, Statistics, Artificial Intelligence (AI), Data Scientist, Explainable Artificial Intelligence (XAI), Recurrent Neural Networks (RNNs), eCommerce, Time Series, Data Science, Transformer Models, Time Series Analysis, Data Visualization, Data Engineering, Data Analytics, Data Interpretation, Data Analysis, Forecasting, Jupyter, Large Language Models (LLMs), Supervised Learning, Generative Artificial Intelligence (GenAI), Statistical Analysis, Regression Modeling, Data Modeling, Algorithms, DBSCAN, Hierarchical Clustering, Clustering Algorithms, Clustering, K-means Clustering, Unstructured Data Analysis, Keras, A/B Testing, Rapid Prototyping, GitHub, Architecture, Machine Learning Operations (MLOps), Optimization Algorithms, AI-enabled Search, Cloud, Vector Databases, Spark, APIs, Big Data, Docker

Head of AI | Senior Data Scientist

2018 - 2021
J!Quant
  • Delivered more than five strategic data science projects as a principal contributor, from conception to all phases of development and delivery.
  • Drove research roadmaps to deliver state-of-the-art (SOTA) solutions and build AI products. All AI products that I created are based on computer vision, NLP, time-series forecasting, and multi-modal representation learning.
  • Built a data science team, which included recruiting, teaching, and mentoring new data scientists. Taught data science to teams in different companies and individual professionals.
  • Assessed different companies' data-driven opportunities, making commercial proposals and selling data science projects to big companies.
Technologies: Computer Vision, Deep Learning, Machine Learning, SQL, Microsoft Power BI, Python, PyTorch, Linux, Pandas, NumPy, Programming, Statistics, TensorFlow, Optical Character Recognition (OCR), Data Scientist, Explainable Artificial Intelligence (XAI), Recurrent Neural Networks (RNNs), Time Series, Data Science, Transformer Models, Time Series Analysis, Data Visualization, Data Analytics, Supply Chain Management (SCM), Inventory Management, Data Interpretation, Data Analysis, Forecasting, Jupyter, Supervised Learning, Reinforcement Learning, Statistical Analysis, Regression Modeling, PostgreSQL, Data Modeling, Artificial Intelligence (AI), Algorithms, DBSCAN, Hierarchical Clustering, Clustering Algorithms, Clustering, K-means Clustering, Natural Language Processing (NLP), Unstructured Data Analysis, Amazon Web Services (AWS), Graph Databases, Neo4j, Keras, Rapid Prototyping, GitHub, Architecture, Demand Planning, AI Consulting, Optimization Algorithms, Logistics, Azure, Document Databases, AI-enabled Search, Leadership, Cloud, Vector Databases, icr, Geospatial Data, APIs, Docker, Supply Chain

Research Intern

2017 - 2018
Werkzeugmaschinenlabor WZL der RWTH Aachen
  • Created programming solutions to analyze manufacturing data, signal processing, drive insights, and develop systems to improve production quality.
  • Developed two industrial research consulting projects to improve the quality and efficiency of hobbing and milling processes.
  • Built analytical, geometric, and statistical models to simulate industrial processes.
Technologies: MATLAB, Machine Learning, Simulations, Fourier Analysis, Python, Programming, Statistics, Supervised Learning, Statistical Analysis, Regression Modeling

Experience

SenseAI | Predicting and Monitoring Beer Quality for the World's Largest Brewery

A machine learning system to monitor beer quality during and after production. We perform the ETL process of thousands of industrial sensor signals using SQL. Also, we deploy machine learning models to predict the quality of the beer in real time and identify correlated root causes for production problems or improvements.

Our system also performs simulations, giving real-time visibility to the brewers and a way to act in real time during the process and improve product quality. It was deployed in the Azure/Databricks environment. The project was deployed in Brazilian breweries and mainly impacted the improvement of beer quality and waste reduction, saving over $1 million per year.

Self-supervised Computer Vision Model for Fashion

Developed a new approach to extract image information from products for a big fashion marketplace. Leveraging the amount of data and self-supervised techniques, we could reduce human bias and capture rich features as texture improved other downstream systems.

Seventh Place in the International Forecast Competition

https://www.kaggle.com/c/m5-forecasting-uncertainty
My colleague and I competed in one of the largest international competitions for forecasting, the M5 competition, in 2020. We won a gold medal for 7th place among 900+ teams worldwide.

That year, the competition task was to forecasts a daily demand distribution in quantiles for each Walmart product and three Walmart stores. This was a combination of over 40,000 time series.

Since we wanted to make our solution more general and practical and turn it into a product, we developed a unique end-to-end deep learning model based on recent advances of transformers for time-series forecasting instead of multiple ensembles or other nonsuitable models for production approaches.

Personalized Ranking System for Two-sided Marketplace

A ranking system for one of the world's largest luxury marketplaces. While developing this ranking system, I considered multi-modal data, multiple business requirements, and relevance to each user to improve conversion, profitability, and other target metrics.

AI-powered Gift Recommendations

A smart giftee recommendation app based on occasion and questionnaire. As an AI engineer, I leveraged ChatGPT capabilities, Amazon web scraping, data science, and system design patterns to recommend a set of diverse and relevant products to users.

Education

2014 - 2019

Bachelor's Degree in Mechanical Engineering

Aeronautics Institute of Technology - São José dos Campos, São Paulo, Brazil

2018 - 2018

Progress Toward Master's Degree in Deep Learning and Machine Learning

RWTH Aachen University - Aachen, Germany

Certifications

MARCH 2019 - PRESENT

Deep Learning Specialization

Deep Learning.AI | via Coursera

Skills

Libraries/APIs

Pandas, PyTorch, Scikit-learn, Keras, OpenAI API, NumPy, TensorFlow, PySpark

Tools

Jupyter, GitHub, Microsoft Power BI, Azure OpenAI Service, MATLAB

Languages

Python, SQL, C

Paradigms

Rapid Prototyping

Platforms

Azure, Amazon Web Services (AWS), Vertex AI, Docker, Linux, Google Cloud Platform (GCP), Ollama, Kubernetes, Databricks

Storage

PostgreSQL, Google Cloud, Graph Databases, Document Databases, NoSQL, Neo4j

Frameworks

Spark, Django

Other

Deep Learning, Computer Vision, Natural Language Processing (NLP), Machine Learning, Recommendation Systems, Convolutional Neural Networks (CNNs), Explainable Artificial Intelligence (XAI), Artificial Intelligence (AI), Data Scientist, Recurrent Neural Networks (RNNs), eCommerce, Time Series, Data Science, Transformer Models, Time Series Analysis, Data Visualization, Data Analytics, Data Interpretation, Data Analysis, Forecasting, Large Language Models (LLMs), Supervised Learning, Generative Artificial Intelligence (GenAI), Statistical Analysis, Regression Modeling, Data Modeling, Algorithms, DBSCAN, Hierarchical Clustering, Clustering Algorithms, Clustering, K-means Clustering, Unstructured Data Analysis, OpenAI GPT-4 API, A/B Testing, Architecture, Demand Planning, AI Consulting, Machine Learning Operations (MLOps), Optimization Algorithms, AI-enabled Search, Leadership, Cloud, Vector Databases, OpenAI, icr, APIs, Statistics, Sequence Models, Neural Networks, Self-supervised Learning, Generative Pre-trained Transformers (GPT), Optical Character Recognition (OCR), Data Engineering, Supply Chain Management (SCM), Inventory Management, Data Build Tool (dbt), Reinforcement Learning, Open-source LLMs, LangChain, Gemini, Prompt Engineering, Retrieval-augmented Generation (RAG), Llama, Large Language Model Operations (LLMOps), Logistics, Anthropic, Web Scraping, Geospatial Data, GitOps, Big Data, Supply Chain, Programming, Simulations, QA Testing, Fourier Analysis, path optimization

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring