Andrew Chauzov, Machine Learning Engineer and Developer in North Kuta, Badung Regency, Bali, Indonesia
Andrew Chauzov

Machine Learning Engineer and Developer in North Kuta, Badung Regency, Bali, Indonesia

Member since September 6, 2016
Andrew has over eight years of experience in machine learning, deep learning, and data science. He worked with actively growing start-ups (>10,000 active users), government companies (energetics), and AI consulting companies (banking, retail). Andrew's passion, creativity, attention to detail, and the use of practical tools allows him to help businesses achieve their goals and grow. His areas of expertise are machine and deep learning, NLP, Computer Vision, and Python.
Andrew is now available for hire

Portfolio

  • Self-employed
    Word2vec, Unsupervised Learning, Unstructured Data Analysis, Test Automation...
  • Self-employed
    Word2vec, Web Scraping, Unstructured Data Analysis, Trading, Test Automation...
  • Power Industry
    Time Series Analysis, Time Series, Test Automation, Statistics...

Experience

Location

North Kuta, Badung Regency, Bali, Indonesia

Availability

Part-time

Preferred Environment

Bash, Linux, GitHub, Jupyter, PyCharm, Python

The most amazing...

...project I've developed was a cryptocurrency trading model based on simple mathematics and combinatorics that showed terrific efficiency.

Employment

  • Machine Learning and Deep Learning Consultant

    2018 - PRESENT
    Self-employed
    • Developed a recommendation system for an activity startup. The model has been implemented into a personalized email generation system. Results showed a 30 percent increase in the matching rate.
    • Created a solar panel crack detection model. Developed several additional scripts for image aligning, grid to cell cropping, an automatic relabeling tool for supervise.ly, and a memory-efficient forecasting pipeline with 96 percent accuracy.
    • Built a dental x-ray (bite-wings) disease detection model. Semantic segmentation with extended augmentation techniques was used with an accuracy of 88 percent for cavity class.
    • Developed an automatic children's speech defects detection model. Provided extended feature engineering and cleaning work. The f1-metric across 60 tasks showed an average 0.90 score.
    Technologies: Word2vec, Unsupervised Learning, Unstructured Data Analysis, Test Automation, TensorBoard, Supervised Learning, Statistics, Statistical Data Analysis, Statistical Analysis, SpaCy, Software Development, Scrum, Scikit-learn, Scikit-image, SciPy, SQL, Research, Requirements Analysis, Recurrent Neural Networks, Random Forests, REST APIs, PyCharm, Project Management, Probability Theory, Plotly, Pandas, Optimization, OpenCV, Object Recognition, Object Detection, Natural Language Processing (NLP), NLTK, MySQL, Modeling, Matplotlib, Machine Learning Automation, Linux, Jupyter Notebook, Jupyter, JSON, Gradient Boosting, Gradient Boosted Trees, Google Cloud Platform (GCP), Google APIs, GitHub, Git, Geometry, Feature Analysis, Exploratory Data Analysis, EDA, Docker, Deep Neural Networks, Decision Trees, Decision Tree Classification, Data Science, Automated Data Processing, Data Processing Automation, Data Processing, Data Preprocessing, Data Preparation, Database Modeling, Data Mining, Data Collection, Data Cleaning, Convolutional Neural Networks, Computer Vision Algorithms, Complex Data Analysis, Communication, Classification Algorithms, Classification, Bash Scripting, Bash Script, Bash, Artificial Intelligence (AI), Analytics, Amazon Web Services (AWS), Amazon API, Algorithms, Agile, API Development, Deep Learning, Machine Learning, Signal Filtering, Signal Analysis, Audio Processing, Time Series Analysis, Time Series, Digital Signal Processing, Scripting, Image Recognition, Image Processing, XGBoost, LightGBM, NumPy, REST API, REST, Flask, Back-end, Recommendation Systems, Keras, TensorFlow, Computer Vision, Python 3, Python 2, Python
  • Data Science Consultant

    2017 - PRESENT
    Self-employed
    • Developed a time series common pattern detection approach. The flexibility of this algorithm allows using it in different data domains (FMCG, energy consumption data, and others).
    • Created an efficient employee adaptation quality estimation model for sparse and short data. Real-time predictions show that the model detects bad adaptation signs one to three months earlier than the linear managers.
    • Implemented an optimal wake-up and sleep-times detection model into a habit tracker app. The model handled several manually defined rules as well as ML-approaches.
    • Built a recurrent transaction detection approach based on 2D token representation with projection onto a Poincare disk model. As the result, the algorithm efficiently detected transaction subgroups on different depth levels.
    • Developed an optimal selling price detection model that incorporated the following features: sellers' data, supply data, ask-demand curves, and manually defined rules.
    Technologies: Word2vec, Web Scraping, Unstructured Data Analysis, Trading, Test Automation, Statistical Data Analysis, Statistical Analysis, SpaCy, Software Development, Numerical Simulations, Simulations, Signal Filtering, Signal Analysis, Selenium, Scrum, Scripting, Scikit-learn, SQL, Research, Requirements Analysis, Reports, Reporting, Regression Models, Redis, Recommendation Systems, Random Forests, REST APIs, REST API, REST, Python 3, Python 2, Python, PyCharm, Plotly, Pandas, PDF Scraping, NoSQL, NLTK, MySQL, MongoDB, Modeling, Matplotlib, Discrete Mathematics, Mathematics, Mathematical Models, Machine Learning Automation, Machine Learning, Linux, Jupyter Notebook, Jupyter, JSON, High-frequency Trading (HFT), HTML, H2O AutoML, Graph Theory, Google Cloud Platform (GCP), Google APIs, GitHub, Git, Gensim, Flask, Financial Markets, Feature Analysis, Exploratory Data Analysis, EDA, Docker, Digital Signal Processing, Decision Trees, Relational Database Design, Database Schema Design, Database Design, Data Visualization, Data Scraping, Data Science, Data Reporting, Automated Data Processing, Data Processing Automation, Data Processing, Data Preprocessing, Data Preparation, Database Modeling, Data Modeling, Data Mining, Data Collection, Data Cleaning, Data Analysis, Web Dashboards, Dask, Dashboards, Computer Science, Complex Data Analysis, Communication, Clustering Algorithms, CSS, Classification Algorithms, Decision Tree Classification, Text Classification, Classification, Beautiful Soup, Bayesian Statistics, Bash Scripting, Bash Script, Bash, Artificial Intelligence (AI), Anomaly Detection, Data Analytics, Analytics, Amazon Web Services (AWS), Amazon API, Agile, API Development, Recurrent Neural Networks, Unsupervised Learning, Supervised Learning, Predictive Analytics, Predictive Modeling, Optimization, Geometry, Applied Mathematics, Mobile App Development, Back-end, Statistics, Probability Theory, CatBoost, Gradient Boosted Trees, XGBoost, LightGBM, Gradient Boosting, PyMC, SciPy, NumPy, Algorithms, Cluster Analysis, Clustering, Time Series Analysis, Time Series, Natural Language Processing (NLP), TensorFlow
  • Senior Data Scientist

    2014 - 2016
    Power Industry
    • Developed a clustering tool for power entities (more than 10'000 items; various types of data). The outcomes gave an opportunity to simplify and speed-up the graph model that was used for simulation purposes.
    • Created an NLP power-news classification tool (including a scraping pipeline). This helped the department to generate a database of actual news for each entity type/region. Further, this data was successfully used in various models.
    • Handled technical reports, presentations, and their defense in front of semi- and non-technical audiences.
    • Provided mentorship and supervision for junior analysts and commercial projects.
    Technologies: Time Series Analysis, Time Series, Test Automation, Statistics, Statistical Analysis, Software Development, Signal Filtering, Signal Analysis, Scripting, Scikit-learn, SciPy, Research, Regression Models, Random Forests, RStudio, R, PyCharm, Probability Theory, Predictive Modeling, Predictive Analytics, PowerPoint Design, Pandas, Optimization, NumPy, NLTK, MySQL, Modeling, Microsoft PowerPoint, Matplotlib, Discrete Mathematics, Mathematics, Mathematical Models, Machine Learning Automation, Linux, Linear Regression, Jupyter Notebook, Jupyter, JSON, Feature Analysis, EDA, Digital Signal Processing, Relational Database Design, Database Schema Design, Database Design, Data Visualization, Data Reporting, Data Processing Automation, Data Processing, Data Preprocessing, Data Preparation, Data Modeling, Data Mining, Data Collection, Data Cleaning, Scientific Data Analysis, Unstructured Data Analysis, Exploratory Data Analysis, Statistical Data Analysis, Data Analysis, Analytical Dashboards, Web Dashboards, Dashboards, Complex Data Analysis, Bayesian Statistics, Bash Scripting, Bash Script, Bash, Back-end, Applied Mathematics, Anomaly Detection, Data Analytics, Analytics, Clustering Algorithms, Algorithms, API Development, Unsupervised Learning, Supervised Learning, Requirements Analysis, Project Management, Oracle, SQL, Python 3, Python 2, Simulations, Graph Theory, Natural Language Processing (NLP), Cluster Analysis, Clustering, Numerical Simulations, Classification, Classification Algorithms, Decision Tree Classification, Decision Trees, Text Classification, Naive Bayes, Presentations, Reporting, Reports, Communication, Machine Learning, Data Scraping, Data Science, Python
  • Data Scientist

    2012 - 2014
    Power Industry
    • Developed power price/volume prediction models (long- and short-term) with robustness to outliers. As a result, prediction accuracy was improved significantly (MAE was reduced by two and a half times).
    • Created a daily/weekly to hourly/daily conversion model. This approach allowed the department to get more accurate high-granularity predictions and use them in various reports and as inputs to models-on-top.
    • Developed scripts for several business processes automation (C#, C++, Python, Delphi, VBA, R). Therefore, the speed of some processes was increased four times.
    • Built a model and visualization tool on-top for anomalies detection in prices/volumes data. The results have been implemented into the dashboard for real-time emergency early detection.
    Technologies: Unstructured Data Analysis, Statistics, Statistical Data Analysis, Statistical Analysis, Software Development, Signal Filtering, Signal Analysis, Scikit-learn, SciPy, Research, Requirements Analysis, Reports, Reporting, Random Forests, Probability Theory, Presentations, Predictive Analytics, PowerPoint Design, Pandas, Optimization, Numerical Simulations, NumPy, Naive Bayes, Modeling, Microsoft PowerPoint, Microsoft Excel, Matplotlib, Mathematics, Mathematical Models, Geometry, Machine Learning Automation, Linux, Linear Regression, Jupyter Notebook, Jupyter, Gradient Boosting, Gradient Boosted Trees, Feature Analysis, Digital Signal Processing, Decision Trees, Database Design, Data Reporting, Data Preprocessing, Data Modeling, Data Cleaning, Complex Data Analysis, Communication, Clustering Algorithms, Clustering, Cluster Analysis, C, Bayesian Statistics, Bash Scripting, Bash Script, Bash, Back-end, Applied Mathematics, Analytics, Unsupervised Learning, Supervised Learning, Data Mining, EDA, Data Analytics, Data Analysis, Exploratory Data Analysis, Data Collection, Data Preparation, Data Processing, Visual Basic for Applications (VBA), Excel VBA, Oracle, MySQL, Scripting, Decision Tree Classification, Classification Algorithms, Classification, Time Series, Dashboards, Data Visualization, Algorithms, Delphi, R, C++, C#, Graph Theory, Simulations, Regression Models, Python 2, Python 3, SQL, Predictive Modeling, Anomaly Detection, Time Series Analysis, Machine Learning, Data Science, Python
  • Junior Data Scientist (ROI Modeling, Media/Marketing Mix)

    2011 - 2012
    BBDO Group
    • Developed an anomaly detection function for an ROI predictive model. The MAE score was reduced by two times: the client's marketing budget became much more efficient.
    • Created an optimal marketing campaign budget estimation approach. It has been successfully used for both prior and posterior budget estimation/correction.
    • Implemented a cluster-based approach for campaign poor performance early detection: A useful tool that helped our clients to correct their marketing strategies in advance.
    • Built a CATI to CAWI (computer-assisted telephones interviewing to computer-aided web interviewing) conversion model. The database was extended and aligned; using this dataset showed a significant increase in models' validation quality.
    • Developed a totally new approach for EDA by creating more analytical outcomes. The results have been added to our weekly clients' reports (among them is MTS, one of the biggest mobile operators in Russia).
    • Created an effective script for automatic media data collection and processing. The department’s processes went four times quicker.
    Technologies: Statistical Analysis, Research, Requirements Analysis, Reports, Regression Models, RStudio, Probability Theory, Oracle, Optimization, Naive Bayes, Modeling, Mathematics, Mathematical Models, Machine Learning, Linux, Feature Analysis, Database Design, Data Reporting, Data Cleaning, Classification Algorithms, Bayesian Statistics, Supervised Learning, Time Series Analysis, Unsupervised Learning, Statistical Data Analysis, Complex Data Analysis, Unstructured Data Analysis, Dashboards, Communication, Classification, Applied Mathematics, Analytics, Clustering Algorithms, Scripting, Data Processing, Data Preparation, Data Collection, Exploratory Data Analysis, PowerPoint Design, Presentations, Reporting, Data Analytics, Visual Basic for Applications (VBA), MySQL, Microsoft Access, Budget Modeling, Media Marketing, Marketing Mix, Data Preprocessing, Microsoft PowerPoint, Econometrics, EDA, Clustering, Marketing Strategy, Marketing Mix Modeling, Excel VBA, Microsoft Excel, Anomaly Detection, Linear Regression, Time Series, Cluster Analysis, Algorithms, SQL, ROI, Predictive Analytics, Predictive Modeling, Statistics, Data Modeling, Data Visualization, Data Mining, Data Analysis, Data Science, R

Experience

  • Core | A Web Scraping Tool for Amazon Product Pages (Development)

    Client: Startup (Lithuania)
    Role: Software Developer
    A tool for Amazon product page data scraping. It contains the following steps:
    - Reading tool (Amazon API and BeautifulSoup)
    - Blocking problem-solving (proxy)
    - JavaScript blocks reading (Selenium)
    - Back-end work for data storing.
    Across all categories on average for bestsellers, around 97% of data was gathered correctly for a shortlist of blocks and 92% for an extended list of blocks.

  • Machine Learning: Time Series Hierarchical Clustering using Dynamic Time Warping in Python (Other amazing things)
    https://bit.ly/2WJBicd

    Python implementation of the time series clustering algorithm. As a base, I used the DTW (dynamic time warping) approach that can efficiently calculate the distance between the time series of different lengths. On top, I created a custom hierarchical clustering model that groups time series step-by-step. I recently improved the algorithm: boosted speed and accuracy. It was successfully tested on the real FMCG data (>3,000 columns): https://github.com/avchauzov/tsClusteringFMCG

  • Machine Learning: Defects Detection in Children's Speech (Development)

    Client: Pharmaceutical company (Russia Top-20).
    Role: Researcher and developer.
    The project involved several steps of samples cleaning (Librosa, SciPy, and NumPy libraries), feature engineering (including techniques such as clustering and NMF decomposition), and model development (Scikit-learn). The cross-validation score was 0.92 (roc_auc), which is considered very good because of the small amount of training data (3.5 thousand).

  • Computer Vision: Dental Diseases Detection Tool (Development)

    This project contains two tasks; cavity detection and teeth segmentation. The cavity detection pipeline includes the normalizing and augmentation of bite-wise x-rays, normalizing and augmentation, and UNet. For teeth segmentation, FPN with three classes-tooth, tooth border, and gum-was used. The tooth border class helped to split inseparable output teeth masks.

  • Computer Vision: Solar Panels Defects Detection Tool (Development)

    I was a full-stack Python developer on this project. The total amount of classes: 13. Tools that I used: OpenCV, TensorFlow/Keras to build models (UNet), active learning approach, and Streamlit for the front end. The range of accuracy scores is 0.90-0.98 across all classes.

Skills

  • Languages

    Python, SQL, Python 3, Bash, Bash Script, R, C++, C#, HTML, CSS, Excel VBA, Python 2, Delphi, Visual Basic for Applications (VBA), C, Java
  • Frameworks

    Flask, LightGBM, Selenium
  • Libraries/APIs

    Google APIs, TensorFlow, Keras, Scikit-learn, OpenCV, NumPy, XGBoost, REST APIs, API Development, Matplotlib, NLTK, SpaCy, CatBoost, SciPy, Dask, Pandas, PyTorch, Amazon API, PyMC, REST API, Beautiful Soup
  • Tools

    TensorBoard, Git, GitHub, Scikit-image, Plotly, Gensim, Jupyter, H2O AutoML, PyCharm, Microsoft Excel, Microsoft PowerPoint, Microsoft Access
  • Paradigms

    Data Science, Anomaly Detection, Agile, Scrum, REST, Requirements Analysis, Test Automation, Database Design
  • Platforms

    Google Cloud Platform (GCP), RStudio, Amazon Web Services (AWS), Docker, Linux, Oracle, Jupyter Notebook
  • Storage

    MySQL, NoSQL, Redis, JSON, MongoDB, Database Modeling
  • Industry Expertise

    Project Management, High-frequency Trading (HFT), Trading
  • Other

    Deep Neural Networks, Object Detection, Image Processing, Data Scraping, Image Recognition, Classification Algorithms, Mathematics, Modeling, Clustering, Machine Learning, Deep Learning, Natural Language Processing (NLP), Computer Vision, Cluster Analysis, Time Series Analysis, Data Analysis, Data Visualization, Probability Theory, Supervised Learning, Unsupervised Learning, Predictive Modeling, Gradient Boosting, Object Recognition, Image Segmentation, Recurrent Neural Networks, Convolutional Neural Networks, Artificial Intelligence (AI), Predictive Analytics, Optimization, Algorithms, Analytics, Word2vec, Machine Learning Automation, Statistical Analysis, Computer Science, Communication, Applied Mathematics, Software Development, Graph Theory, Research, Random Forests, Web Scraping, PDF Scraping, Decision Trees, Data Mining, Statistics, Time Series, Data Cleaning, Data Reporting, Financial Markets, Recommendation Systems, Bayesian Statistics, Variational Autoencoders, Econometrics, Mathematical Models, Data Modeling, ROI, Linear Regression, Marketing Mix Modeling, Marketing Strategy, EDA, Data Preprocessing, Regression Models, Simulations, Dashboards, Audio Processing, Feature Analysis, Computer Vision Algorithms, Marketing Mix, Media Marketing, Budget Modeling, Data Analytics, Reporting, Presentations, PowerPoint Design, Exploratory Data Analysis, Data Collection, Data Preparation, Data Processing, Scripting, Numerical Simulations, Classification, Decision Tree Classification, Text Classification, Naive Bayes, Reports, Gradient Boosted Trees, Back-end, Mobile App Development, Geometry, Digital Signal Processing, Signal Analysis, Signal Filtering, Clustering Algorithms, Unstructured Data Analysis, Complex Data Analysis, Statistical Data Analysis, Bash Scripting, Web Dashboards, Analytical Dashboards, Scientific Data Analysis, Data Processing Automation, Database Schema Design, Relational Database Design, Discrete Mathematics, Automated Data Processing

Education

  • Master's degree in Applied Mathematics and Computer Science
    2004 - 2011
    Peoples' Friendship University of Russia - Moscow, Russia

To view more profiles

Join Toptal
Share it with others