Caio Taniguchi, Developer in Zaragoza, Spain
Caio is available for hire
Hire Caio

Caio Taniguchi

Verified Expert  in Engineering

Bio

Caio is a data scientist and back-end developer with experience in the whole data science pipeline, from data collection to model deployment. His main goal is to use data science and machine learning techniques to help businesses extract the most value out of their data.

Portfolio

Stone
Big Data, ETL, Streaming Data, Kotlin, Apache Kafka, Apache Flink, Debezium
Stone
Apache Airflow, Spark, BigQuery, PostgreSQL, Graph Databases, Apache Kafka...
Accenture
Amazon Web Services (AWS), Docker, Azure, Java, PySpark, Pandas, Python, Cloud...

Experience

  • Machine Learning - 4 years
  • Python - 4 years
  • Data Science - 4 years
  • Docker - 3 years
  • Agile - 3 years
  • MongoDB - 3 years
  • AWS Cloud Computing Services - 3 years
  • Node.js - 3 years

Availability

Part-time

Preferred Environment

Jupyter, Atom, Git, MacOS

The most amazing...

...learning model I've trained was a classifier to determine the political inclination of users, using engineered features and model ensemble.

Work Experience

Senior Data Engineer

2022 - PRESENT
Stone
  • Designed and developed a general-purpose feature processing system for use in anti-fraud processes company-wide, using Kotlin, Flink, Kafka, S3, and MongoDB.
  • Developed an event-driven anti-fraud system for card transactions, processing operations through batch and streaming pipelines with Apache Flink.
  • Experimentally developed real-time ELT pipelines for PostgreSQL based on change data capture (CDC) tools using Kafka Connect and Debezium.
Technologies: Big Data, ETL, Streaming Data, Kotlin, Apache Kafka, Apache Flink, Debezium

Machine Learning Engineer

2019 - 2022
Stone
  • Developed a data-driven real-time fraud detection process for banking transactions, making use of client behavior and known fraud patterns to reach a decision. Applied both heuristics-based and machine learning approaches.
  • Designed and developed a real-time system based on facial recognition, used for onboarding clients and as an additional security measure for banking transactions.
  • Implemented the data processing pipelines and initial analysis for the batch AML system. A solution made use of heuristics and graph analysis.
Technologies: Apache Airflow, Spark, BigQuery, PostgreSQL, Graph Databases, Apache Kafka, XGBoost, Python, Pandas, Scikit-learn

Software Engineer

2017 - 2018
Accenture
  • Created a recommender system framework and trained model for deployment using Docker, PySpark, and AWS.
  • Coordinated a team and architected a serverless web app for fraud detection and credit approval with Java and AWS.
  • Architected and developed an image fraud detection system in Python and AWS.
Technologies: Amazon Web Services (AWS), Docker, Azure, Java, PySpark, Pandas, Python, Cloud, Machine Learning, Recommendation Systems, Back-end, Software Design, Software Development

Junior DevOps Engineer

2016 - 2017
Concrete Solutions
  • Developed plugins and maintained CD Jenkins pipelines.
  • Created an on-premise testing framework for mobile apps using physical devices with Node.js, React, MongoDB, and Redis.
  • Supported a video streaming platform with thousands of views per day built with Python and Django.
Technologies: Python, Jenkins, Redis, MongoDB, React, Node.js, JavaScript, Linux, MacOS

Experience

HackerRank's Machine Learning CodeSprint 2016

https://github.com/caiotaniguchi/hackerrank-ml-sprint
Machine Learning competition promoted by HackerRank, with one week to solve two problems:

- A classification problem to predict whether or not an email would be opened by a HackerRank user, which involved data cleaning, data exploration, feature engineering and model training and validation with Python, Pandas, Matplotlib, and XGBoost.

- A ranking problem to select a number of competitions to recommend for HackerRank users. Solved by coding an item-based recommender system from scratch in Python.

Besides the model predictions, the competition also required the source code used and documentation about the methods applied. Earned a silver medal by finishing in the top 7% of the leaderboard.

Competition: https://www.hackerrank.com/machine-learning-codesprint
Post about the classifier: https://medium.com/@caiotaniguchi/one-week-of-machine-learning-madness-with-hackerrank-part-1-bde90dd30d2f
Post about the recommender: https://medium.com/@caiotaniguchi/one-week-of-machine-learning-madness-with-hackerrank-part-2-783328191f7e

HomeBroker Automator

https://github.com/caiotaniguchi/hb-automator
Framework to automate the sending of buy and sell orders for an asset through a homebroker. Designed to be able to support any number of homebrokers, with an initial implementation for Modalmais. Implemented with Python and Selenium.

Education

2009 - 2016

Bachelor's Degree in Electronics and Computer Engineering

Universidade Federal Do Rio De Janeiro (UFRJ) - Rio de Janeiro, Brazil

Certifications

APRIL 2019 - APRIL 2022

AWS Certified Cloud Practitioner

Amazon Web Services

MAY 2018 - PRESENT

Deep Learning

Coursera

OCTOBER 2017 - OCTOBER 2020

AWS Certified Solutions Architect Associate

AWS

MAY 2017 - NOVEMBER 2020

AWS Certified Developer – Associate

Amazon Web Services

Skills

Libraries/APIs

Scikit-learn, Pandas, Node.js, React, PySpark, Keras, XGBoost

Tools

Plotly, Git, Atom, Jupyter, Jenkins, Apache Airflow, BigQuery

Platforms

AWS Cloud Computing Services, Docker, MacOS, Linux, Azure, Amazon Web Services (AWS), Apache Kafka, Apache Flink, Debezium

Languages

JavaScript, Python, Java, SQL, C++, C, Kotlin

Frameworks

Express.js, Spring, Bootstrap, AngularJS, Scrapy, Selenium, Spark

Paradigms

Scrum, Object-oriented Programming (OOP), Agile, Test-driven Development (TDD), DevOps, ETL

Storage

MongoDB, Redis, NoSQL, MySQL, Databases, PostgreSQL, Graph Databases

Other

AWS Cloud Architecture, Data Science, Machine Learning, Data Analytics, Data Analysis, Statistics, Data Scraping, Web Scraping, Data Visualization, Cloud, Deep Learning, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Sequence Models, Electronics, Programming, Software Engineering, Recommendation Systems, Back-end, Software Design, Software Development, Big Data, Streaming Data

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring