Gijs is available for hire

Gijs Joost Brouwer

Verified Expert in Engineering

Data Scientist and Python Developer

Location

New York, NY, United States

Toptal Member Since

November 28, 2022

Gijs is a data scientist, machine learning engineer, and innovation expert. In various roles, he has built solutions at the intersection of data science, emerging technologies, AI, computer vision, and NLP. His background is in computational neuroscience and psychology. In addition, Gijs is an experienced software developer who has closely collaborated with software engineers, data engineers, and big data production systems using tools like Hadoop, Spark, and AWS cloud computing services.

Portfolio

Memorial Sloan Kettering Cancer Center

Python, TensorFlow, Keras, Unity, C#, Virtual Reality App Design...

Girl Scouts of the USA

Python, Looker, SQL, Jupyter Notebook, Pandas, Scikit-learn, GPT...

SparkNeuro

Python, Computer Vision, Medical Imaging, Scikit-learn, Time Series Analysis...

Experience

Science - 18 years Neuroscience - 18 years Machine Learning - 16 years Modeling - 16 years Data Science - 12 years Python 3 - 12 years Scikit-learn - 6 years TensorFlow - 4 years

Availability

Part-time

Preferred Environment

Python 3, Amazon Web Services (AWS), MacOS, Diffusion Models, PyCharm

The most amazing...

...thing I've done is build a virtual AI pet companion for pediatric patients using virtual reality, augmented reality, and IoT.

Work Experience

Data Science and Tech Research Lead

2010 - 2022

Memorial Sloan Kettering Cancer Center

Designed MSKCATS, a virtual companion for MSK's pediatric patients, an ambient AI brought to life by IoTs, screens, and augmented and virtual reality, supporting children in providing security, a bonding experience, motivation, and distraction.
Used virtual reality to relieve anxiety around medical procedures by simulating this experience before actual procedures.
Used augmented reality to visualize important medication information through medication barcodes.
Designed a touch-free gesture detection TEM system using machine vision to manipulate on-screen medical images, removing a need for surgeons to de-glove and re-sterilize after viewing images mid-surgery.
Used agent-based models and design simulations, combining game engine Unity, NetLogo, and genetic algorithms to simulate an effect of layout and architecture.
Designed graph databases and NLP tools for researchers to find relevant scientific publications.
Developed an extensive data science and machine learning course with accompanying Python code and data.

Technologies: Python, TensorFlow, Keras, Unity, C#, Virtual Reality App Design, Augmented Reality (AR), Virtual Reality (VR), Jupyter, Deep Learning, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Computer Vision, Amazon Web Services (AWS), Azure, NVIDIA CUDA, Raspberry Pi, Medical Imaging, Blender, Apple HealthKit, Apple, Scikit-learn, NetLogo, Agent-based Modeling, Mixed Reality (MR), PyCharm, MacOS, Python 3, Oculus, Linear Regression, Logistic Regression, Data Mining, Technology, Predictive Modeling, Statistics, Research, iOS, Jupyter Notebook, JavaScript, Neural Networks, Clustering, Graphs, Artificial Intelligence (AI), Data Modeling, Autoencoders, Convolutional Neural Networks (CNN), Diffusion Models, PyTorch

Lead Data Scientist

2019 - 2020

Girl Scouts of the USA

Managed a team of data analysts and data visualization experts to generate reports to be consumed by the Girl Scouts of the USA (GSUSA) councils.
Created machine learning models predicting the churn of girl scout members based on member demographics, overall experience, and troop diversity.
Developed an NLP software suite matching PII records across different data sources.

Technologies: Python, Looker, SQL, Jupyter Notebook, Pandas, Scikit-learn, GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Snowflake, Machine Learning, MySQL, MacOS, Python 3, Slack, Linear Regression, Data Mining, Technology, Predictive Modeling, Statistics, Data Modeling

Staff Data Scientist and Neuroscientist

2018 - 2019

SparkNeuro

Developed several proprietary machine learning algorithms that could decode brain activity into levels of emotional response and attention over time.
Headed a team of data and neuroscientists to develop new machine learning algorithms to predict cognitive, attentional, and emotional states from EEG data.
Designed and analyzed novel neuroimaging studies, such as EEG, GSR, and fNIRS, to benchmark algorithms.

Technologies: Python, Computer Vision, Medical Imaging, Scikit-learn, Time Series Analysis, Machine Learning, Data Science, Hardware, Experimental Design, Experimental Research, PyCharm, MacOS, Python 3, Slack, Linear Regression, Logistic Regression, Data Mining, Technology, Predictive Modeling, Statistics, Jupyter Notebook, Data Modeling, Autoencoders, Fourier Analysis

Senior Machine Learning Engineer

2017 - 2019

Foursquare

Increased the precision and accuracy of our location intelligence systems by adding and combining new and novel signals to our existing machine learning models.
Developed new methods to incorporate third-party data into Foursquare's infrastructure.
Implemented new Scala, Scalding, and Luigi pipelines to put models into production.

Technologies: Luigi, Scala, Jenkins, XGBoost, Android, iOS, MacOS, Python 3, Technology, Statistics

Lead Data Scientist

2016 - 2017

United Nations Global Pulse

Built deep belief convolutional nets to detect settlements from satellite imagery, predict a landcover type, and predict malaria prevalence in collaboration with UNHCR and UNOSAT.
Built natural language models on understanding emergent topics in UN survey responses and sentiment toward them in collaboration with the UN World Food Programme (WFP).
Developed models based on cell phone data to predict the outbreak of infectious diseases in collaboration with UNICEF.

Technologies: Torch, TensorFlow, NVIDIA DIGITS, Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), Amazon Web Services (AWS), Python, HTML5, JavaScript Libraries, Scikit-learn, MacOS, Python 3, Slack, MySQL, NVIDIA CUDA, Computer Vision, Linear Regression, Logistic Regression, Data Mining, Technology, Predictive Modeling, Statistics, Jupyter Notebook, Neural Networks, Clustering, Data Modeling, Autoencoders, Convolutional Neural Networks (CNN)

Senior Data Scientist

2013 - 2016

Integral Ad Science

Built neural networks to detect questionable content on the web, e.g., pornography.
Developed algorithms to predict the viewability of digital advertisements. I received an award for this patent.
Co-developed tools to measure a causal impact of advertisement on product revenue.
Developed models to predict daily user activity and monitor consumer sentiment in the US.
Created models of user purchase intent from internet usage and activity patterns.
Introduced new big data technologies, such as Scale, H2O, Spark, and Impala.

Technologies: Apache Spark, Apache Hive, Apache Pig, Apache, Hadoop, MapReduce, Impala, Deep Learning, Computer Vision, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), GPT, Causal Inference, Python, H2O Deep Learning Platform, Python 3, Scikit-learn, MacOS, Slack, Amazon Web Services (AWS), Linear Regression, Logistic Regression, Data Mining, Technology, Predictive Modeling, Statistics, Jupyter Notebook, Neural Networks, Clustering, Data Modeling, Convolutional Neural Networks (CNN)

Research Scientist

2007 - 2013

New York University

Studied the neural representation of visual information in the human cortex using fMRI. The results were published in high-impact peer-reviewed journals.
Developed machine learning algorithms to reconstruct visual stimuli from brain activity. The results were published in high-impact peer-reviewed journals.
Designed and performed neuroimaging, eye tracking, EEG, and psychophysical experiments. The results were published in high-impact peer-reviewed journals.
Built models of visual processing in the human brain to explain experimental data. The results were published in high-impact peer-reviewed journals.

Technologies: Science, Research, Experimental Design, Experimental Research, Neuroscience, Computer Vision, Machine Learning, Data Science, Modeling, Technical Writing, Scikit-learn, MacOS, Python 3, Linear Regression, Logistic Regression, Data Mining, Technology, Predictive Modeling, iOS, Neural Networks, Data Modeling, Fourier Analysis

Experience

Mycelium - Data Science Course

My Mycelium project is an extensive online course about machine learning, data science, and artificial intelligence, primarily focusing on machine learning but written for a broad audience. Specifically, it is for those of us who find ourselves on highly interdisciplinary technology teams but who are not data scientists or machine learning engineers but would like to learn.

Please note that this is currently a work in progress.

Virtual Companion Samson

Samson is a concept project I introduced to my innovation team while working at Memorial Sloan Kettering, a large oncology center in New York City.

Samson The Cat is a virtual companion that can accompany a child during their journey at the hospital and even after they leave. But unlike an app or a toy, my team and I thought of Samson as an embedded AI, a ghost in the shell brought to life by any available technology. This makes Samson largely device and technology-independent. A phone, a screen, a toy, a creature living in a metaverse, while also capable of taking on a physical form through any IoT technology available. This creates a sense of presence and continuity.

Questionable Content Detection in Web Images

This project aimed at detecting questionable, offensive, or otherwise undesired content on web pages through the images it hosted. To do so, I augmented a pre-trained convolutional deep belief neural network with categories automatically discovered through a clustering approach that used my company's existing NLP metrics.

Human Mobility Pattern Detection

At the United Nations, I worked on The D4D-Senegal challenge, an open innovation data challenge on anonymous call patterns of Orange's mobile phone users in Senegal. The challenge's goal is to help address society development questions in novel ways by contributing to the socio-economic development and well-being of the Senegalese population.

I worked with a large mobile phone dataset based on call detail records (CDR) of phone calls and text exchanges between more than 9 million customers of a large telecommunication company. I created an algorithm that captures both the regularities and anomalies in the patterns of hum mobility. I extract the most common cell phone tower each user's phone is connected to, from which the algorithm learned to output a 'surprise feature' as well as a probability of a transition between two separate towers from the input data.

The algorithm could detect regular and anomalous patterns by tracking these metrics over time to alert humans. I designed the algorithm to guide infrastructural planning from the regularities it detects and load higher capacity roads on public transport. Second, the algorithm can serve as an early warning system of disruptive events involving many human beings.

Sound Source Separation for ICU and ER settings

Intensive care units and emergency rooms are extremely noisy places in hospitals. They are filled with medical equipment that all produce various sounds. Some of these sounds are essential indicators, for example, heart rate monitors. Other sounds are the inevitable result of care given and the communication between staff and patients.

Within all of this, clinical staff will need to be able to hear specific alarms from a distance, as they cannot have eyes on every patient. However, a lot of the sounds are not relevant to anyone clinician. Similarly, not all conversations at any time are relevant for anyone clinician.

Finally, most sounds are irrelevant from the patient's point of view. They only really need to comprehend and understand things that are said to them, not necessarily to anyone else. Perhaps they would prefer focusing on the sound from a TV for distraction.

The real solution seems obvious: both patient and clinician should have some ability to tune in to specific sounds or voices while others are muted. Therefore, I experimented with a known technique called source separation that theoretically will allow for this, but as applied to a hospital setting, through simulation.

Education

2015 - 2016

Postbaccalaureate Program in Geospatial Information Systems

Penn State University - University Park, PA, United States

2001 - 2007

PhD in Computation Neuroscience

Utrecht University - Utrecht, Netherlands

1996 - 2000

Master's Degree in Cognitive Psychology

University of Amsterdam - Amsterdam, Netherlands

Certifications

AUGUST 2021 - PRESENT

US Patent 11100537

United States Patent and Trademark Office

AUGUST 2021 - PRESENT

US Patent 11100529

United States Patent and Trademark Office

Skills

Libraries/APIs

Scikit-learn, TensorFlow, OpenGL, Keras, Pandas, PyTorch, Luigi, XGBoost

Tools

Slack, PyCharm, Esri, Xcode, Jupyter, Apache, Blender, Apple HealthKit, Looker, Jenkins, Impala

Paradigms

Data Science, MapReduce, Agent-based Modeling

Platforms

MacOS, Apple, Jupyter Notebook, Raspberry Pi, Apache Pig, Amazon Web Services (AWS), Azure, NVIDIA CUDA, Android, iOS, H2O Deep Learning Platform, Oculus

Storage

MySQL, Apache Hive

Languages

Objective-C, C#, SQL, Python 3, Java, Python, NetLogo, Snowflake, Scala, HTML5, JavaScript

Frameworks

Unity, Apache Spark, Spark, Hadoop

Other

Science, Experimental Design, Medical Imaging, Programming, Modeling, Machine Learning, Neuroscience, Cognitive Science, Cognitive Psychology, Technical Writing, Computer Vision, Time Series Analysis, Experimental Research, Research, Statistics, Technology, Predictive Modeling, Data Mining, Linear Regression, Logistic Regression, Dimensionality Reduction, Clustering, Data Modeling, Geospatial Data, Geospatial Analytics, QGIS, 3D Rendering, Deep Learning, Natural Language Processing (NLP), Hardware, Torch, Causal Inference, Decision Trees, Neural Networks, Deep Neural Networks, Variational Autoencoders, Artificial Intelligence (AI), Graphs, Fourier Analysis, Convolutional Neural Networks (CNN), Autoencoders, Diffusion Models, GPT, Generative Pre-trained Transformers (GPT), Virtual Reality App Design, Augmented Reality (AR), Virtual Reality (VR), Mixed Reality (MR), NVIDIA DIGITS, JavaScript Libraries, Time Series, Telecom Equipment & Solutions, Audio, Mathematics, Independent Component Analysis (ICA)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring