Fabio Fujii, Developer in São Paulo - State of São Paulo, Brazil
Fabio is available for hire
Hire Fabio

Fabio Fujii

Verified Expert  in Engineering

Data Scientist and Developer

Location
São Paulo - State of São Paulo, Brazil
Toptal Member Since
September 14, 2022

Fabio is a data scientist with over five years of experience in the finance industry. He's delivered multiple successful ML models, from deploying critical credit risk models for asset-backed loans at one of the biggest banks in Brazil to data-driven solutions against complex and nonlinear fraud incidents. Fabio is independent, learns quickly, and values collaboration and constant feedback. He is eager to widen his business knowledge and motivated to solve your company's challenges.

Portfolio

Signifyd
Amazon S3 (AWS S3), Redshift, Python 3, Databricks, Dashboards, Looker...
Kiavi
Python 3, Tableau, Snowflake, Data Visualization, Pandas, SQL, Apache Airflow...
Banco Itaú
Python 3, Pandas, Keras, Statistics, Hadoop, Teradata, Machine Learning, SAS...

Experience

Availability

Part-time

Preferred Environment

Tableau, SAS, Snowflake, Apache Hive, Databricks, Redshift, Amazon S3 (AWS S3), Looker, GitHub, Fraud Prevention

The most amazing...

...project I've built was a loan loss provision model used by all products at the bank—all business squads widely used it to support their forecasts.

Work Experience

Data Scientist

2022 - PRESENT
Signifyd
  • Developed a tool using graphs to visualize fraud rings better, capture more fraud, and reduce false positives in declines. This tool was used for early fraud alerting.
  • Managed merchants in their entire lifecycle, from onboarding the client through fraud discovery, creating rules against the fraud modus operandi, adjusting models to increase approval rate, and creating dashboards to monitor traffic.
  • Developed models for region-specific, handling the entire model development cycle: getting the data, cleaning the data, finding the best features, testing against current champions through AB test, and deploying.
Technologies: Amazon S3 (AWS S3), Redshift, Python 3, Databricks, Dashboards, Looker, Fraud Prevention, Fraud Investigation, Machine Learning Operations (MLOps)

Freelance Data Scientist

2022 - 2022
Kiavi
  • Worked as a data scientist for the operation success team; we provided data-driven solutions to improve the efficiency and costs of operations, working very closely with product managers.
  • Built a new prediction model to estimate the likelihood of Kiavi closing a given loan within the due date requested by the borrower.
  • Developed multiple dashboards to analyze our loan pipeline and identify improvement points within our processes.
  • Followed the CRISP-DM as guidance for my projects, which included using the Jira tool to organize, document, and improve the agility of the whole project development.
Technologies: Python 3, Tableau, Snowflake, Data Visualization, Pandas, SQL, Apache Airflow, Jira, CRISP-DM, Python, Data Analysis, Data Mining, Data Modeling, Agile, NumPy, Statistical Modeling, Business Logic, Data Analytics, Jupyter, Predictive Learning

Data Scientist II

2021 - 2022
Banco Itaú
  • Helped build statistical models that predict a bank's loan loss provision for each retail product. Businesses used these projections for portfolio analysis.
  • Developed a model for vehicle seizures. The purpose of the model was to seize cars more intelligently, avoiding overall costs for the bank; this model is currently deployed.
  • Constructed credit risk models for car loans offered to natural persons and entities. I've updated the previous model with more and newer data, highly improving its performance.
  • Served as a subject matter expert in the car collateral loans squad, helping the product owner devise data-driven solutions to improve and achieve key results.
  • Used Confluence to document models and the squad knowledge repository.
  • Used often the CRISP-DM methodology to help incorporate Agile practices into my projects.
Technologies: Python 3, Pandas, Keras, Statistics, Hadoop, Teradata, Machine Learning, SAS, Data Visualization, Artificial Intelligence (AI), SQL, Confluence, CRISP-DM, Docker, Python, Predictive Modeling, Data Analysis, Machine Learning Operations (MLOps), Amazon Web Services (AWS), Deep Learning, Data Mining, Git, Data Modeling, Agile, NumPy, Dask, Statistical Modeling, Dashboards, Teradata SQL Assistant, Business Logic, Risk Models, Credit Risk, Credit Collection, Predictive Analytics, Data Analytics, Clustering, Jupyter, Redshift, Predictive Learning

Data Scientist I

2019 - 2021
Banco Itaú
  • Built a model from the data acquisition to model deployment and forecast loan loss provision. As a result, the business squad used my model's forecast to support theirs. I used SAS, Hive, and Hadoop.
  • Developed a machine learning model to optimize discounts on debts for credit card clients. The model aimed to provide better discounts for clients with high risk.
  • Worked closely with business managers to support decision-making using data from diverse sources and identify opportunities.
Technologies: Agile, Apache Hive, Artificial Intelligence (AI), CI/CD Pipelines, Confluence, CRISP-DM, Dask, Data Analysis, Data Mining, Data Modeling, Data Science, Python, Data Structures, Data Visualization, Deep Learning, Docker, Git, SQL, Teradata SQL Assistant, Business Logic, Risk Models, Credit Risk, Credit Collection, Predictive Analytics, Data Analytics, Forecasting, Clustering, Jupyter, Redshift, Predictive Learning, Machine Learning Operations (MLOps)

Data Analyst

2018 - 2018
Banco Itaú
  • Attended ITA's data science specialization program, where I took classes from linear algebra, statistics, and programming to more advanced topics such as image and text processing with deep learning.
  • Worked on a fraud project in banking transactions. The purpose was to enhance the model built by the senior data scientists to learn more about model development; I successfully improved the model in the validation set.
  • Collaborated closely with senior data science and business squads to learn about model development, validation, and monitoring at Itau.
Technologies: Python 3, PySpark, Hadoop, SAS, Teradata, SQL, Pandas, Machine Learning, Optimization, Statistics, Python, Predictive Modeling, Data Analysis, Keras, Deep Learning, Data Mining, Git, Data Modeling, Agile, NumPy, Dask, Statistical Modeling, Business Logic, Risk Models, Credit Risk, Predictive Analytics, Apache Spark, Data Analytics, Clustering, Jupyter, Predictive Learning

Loan Loss Provision Forecast

I have built a model from data acquisition to model deployment to forecast loan loss provision. The main challenge of this project was dealing with massive datasets with over 50MM rows each month. As a result, the business squad used my model's forecast to support theirs. I used SAS, Hive, and Hadoop technologies to preprocess the data and Python to implement a machine learning algorithm.

Vehicle Seizure Model

I developed a vehicle seizure model to predict whether the bank can seize the client's car as collateral. The purpose of this project was to improve efficiency and reduce costs because of the expenses involving vehicle seizing (judicial actions, auctions, and fees). I successfully segmented the portfolio, identifying a group of riskier clients. The model is currently deployed and used by the squad to rank clients.

Credit Recovery Model

I developed a collection model focused on car loan products for natural persons and entities. This model was crucial for the bank's strategy to improve KPIs regarding credit performance. The main challenge was dealing with entities because of data scarcity. The model used far more features, significantly enhancing the performance. The model was deployed as a Docker, consuming data from the data lake and generating predictions in production.

Loan-signing Date

I deeply analyzed the bridge loan signing date to increase its efficiency. I successfully identified the root cause within our application process and how analysts prioritized loans. My analysis led the PM to focus on the correct problems and devise solutions that directly impact our OKRs.

Pricing for Default Card Holders

A machine learning model I built to improve discounts given to defaulted cardholders. The main challenge was removing the highly biased data because high discounts were already given to clients who were less likely to pay their debts. My approach was to devise new variables using a limited number of features that would slightly remove the bias. As a result, the model successfully identified the risky group of clients, giving less discount to clients.

Fraud Ring Identification

A tool I created to enable analysts to quickly visualize the fraud rings by using our traffic data and modeling it into a graph structure. This project helped analysts spot fraud trends before they occurred by looking at the graphs forming.
2018 - 2018

Specialization in Data Science

Instituto Tecnologico de Aeronautica - Sao Paulo, SP, Brazil

2013 - 2017

Bachelor's Degree in Computer Science

Universidade Federal de Mato Grosso do Sul - Campo Grande, MS, Brazil

JULY 2018 - PRESENT

Artificial Intelligence Egineer

Udacity

MAY 2018 - PRESENT

Inteligencia Artificial

Gama Academy

Libraries/APIs

Pandas, NumPy, Scikit-learn, PyTorch, Dask, Keras, PySpark

Tools

Jupyter, Tableau, Jira, Confluence, Git, Teradata SQL Assistant, Apache Airflow, CircleCI, Looker, GitHub

Languages

Python 3, SAS, SQL, Python, Snowflake

Paradigms

Data Science, CRISP-DM, Test-driven Development (TDD), Agile, REST

Storage

Teradata, Apache Hive, Redshift, Amazon S3 (AWS S3)

Frameworks

Hadoop, Apache Spark

Platforms

Databricks, Heroku, MacOS, Windows, Docker, Amazon Web Services (AWS)

Other

Machine Learning, Deep Learning, Data Visualization, Predictive Modeling, Data Analysis, Data Mining, Data Modeling, Risk Models, Credit Risk, Credit Collection, Predictive Analytics, Predictive Learning, Data Structures, Artificial Intelligence (AI), Statistics, Presentations, Machine Learning Operations (MLOps), Statistical Modeling, Business Logic, Data Analytics, Forecasting, Clustering, Computer Vision, Graphs, Optimization, Statistical Methods, FastAPI, CI/CD Pipelines, Dashboards, Logistic Regression, Fraud Prevention, Fraud Investigation

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring