Camila is available for hire

Camila Andrea Gonzalez Williamson

Verified Expert in Engineering

Data Scientist and Developer

Location

Ecublens, Switzerland

Toptal Member Since

October 22, 2020

Camila is a data scientist and software developer with more than four years of in-depth experience discovering statistical patterns in data, creating data visualizations, building machine learning models, and developing data-processing pipelines. She's worked on projects in various industries and been exposed to a very diverse set of technologies for data science. Camila has a high level of intellectual curiosity, creativity, and definitely enjoys helping businesses bring value from their data.

Portfolio

Chemos Sàrl

GitHub Actions, TypeScript, Plotly.js, HTML, CSS, Angular, Pytest, OpenAPI...

Philip Morris International

Spark, Scikit-learn, Presto, Jenkins, Dask, Pandas, Apache Spark, Docker...

Pictet Asset Management

StatsModels, Scikit-learn, Pandas, TensorFlow, Matplotlib, Seaborn...

Experience

Data Analytics - 5 years Python - 5 years Data Visualization - 4 years Data Science - 4 years Machine Learning - 4 years Statistical Inference - 3 years Data Engineering - 2 years Apache Spark - 2 years

Availability

Part-time

Preferred Environment

Unix, Jupyter Notebook, Visual Studio Code (VS Code), PyCharm, Slack, Git

The most amazing...

...project was incorporating multiple levels of seasonalities and temperature effects in a NARX model to make the short-term forecast of the Swiss electric load.

Work Experience

Data Scientist

2020 - PRESENT

Chemos Sàrl

Designed and mocked up a web application to create and modify interactive data visualizations.
Designed and developed a web application to create groups of users and exchange data files among users in the same group.
Added features to a web application that uses Bayesian optimization to accelerate the discovery of new materials.

Technologies: GitHub Actions, TypeScript, Plotly.js, HTML, CSS, Angular, Pytest, OpenAPI, Flask, SQLAlchemy, Alembic, PostgreSQL, Data Engineering, Data Science, Data Analytics, Python, Data Visualization, Data

Enterprise Data Scientist

2017 - 2020

Philip Morris International

Developed a statistical analysis, propensity models, and scoring models to predict consumers' conversion to reduced-risk products.
Implemented a data-processing-pipeline to cluster adoption patterns to reduced-risk products using distributed computing. This pipeline was deployed in 13 markets and brought tangible improvements to key performance indicators.
Industrialized a data-pipeline to analyze specific global trends—using techniques such as hierarchical clustering, regression, and statistical inference—with an estimated value in the order of tens of millions of dollars.
Designed, optimized, and implemented a methodology to evaluate similarities in a series of text documents to detect clusters of duplicates. Developed an API to serve the algorithm.
Trained, supported, and mentored interns or new data scientists joining the team and advocated for data science best practices (reproducible research, code versioning, use of docker containers, and TDD).

Technologies: Spark, Scikit-learn, Presto, Jenkins, Dask, Pandas, Apache Spark, Docker, NetworkX, Microsoft Power BI, Plotly, XGBoost, CatBoost, StatsModels, Tree-Based Pipeline Optimization Tool (TPOT), Flask, HDFS, Apache Hive, Data Engineering, Data Science, Data Analytics, Python, Data Visualization, PySpark, Data, Data Pipelines

Data Science Intern

2017 - 2017

Pictet Asset Management

Performed an exploratory data analysis of internal and external fund flows, macroeconomic variables, and market indices to detect leading and lagging variables.
Implemented multiple models to predict market indices' performance, covering diverse asset classes and geographical regions using a diverse set of machine learning techniques: Random Forests, Naive Bayes, Markov Chains, SVM, LSTM.
Conducted a rigorous statistical inference analysis to evaluate the performance of the models implemented using the Benjamini-Hochberg procedure to control the false discovery rate.

Technologies: StatsModels, Scikit-learn, Pandas, TensorFlow, Matplotlib, Seaborn, Jupyter Notebook, Data Science, Data Analytics, Python, Data Visualization, Data

Temporary Support for Data Science

2016 - 2016

Swissgrid

Researched state-of-the-art methodologies for short-term electric load forecasting.
Analyzed yearly, weekly, and daily patterns for the Swiss electric load as well as non-linear dependencies with the temperature.
Implemented a short-term forecast for the Swiss electric using a state-of-the-art modification of least-squares support-vector machines.

Technologies: Mathematics, PostgreSQL, Pandas, SQL, Tableau, Matplotlib, SciPy, NumPy, Data Science, Data Analytics, Python, Data Visualization, Data

Analyst — Future Atuaries Program

2014 - 2015

Seguros Bolívar

Priced insurance products based on mortality tables and clients' data distribution.
Implemented forecasts based on the Monte Carlo simulation for sales strategies.
Developed a prototype to automatize the monthly data risk profiling of one of the main insurance products.

Technologies: Mathematics, Python, PostgreSQL, SQL, Data Analytics, Data

Experience

Data Pipeline for Global Trends

Implementation of industrialization and automation of a data pipeline to analyze specific global trends.

This was a multidisciplinary team effort that involved the collection of external data sources, an extensive work of data wrangling and text manipulation, the use of data science techniques such as hierarchical clustering, regression, and statistical inference, and the exposure of the results via a dashboard accessible as a web application.

The estimated business value for this data product was in the order of tens of millions of dollars.

Consumer Segmentation

A data-processing-pipeline to cluster adoption patterns to a specific line of products using distributed computing.

This was a team effort that involved the analysis of behavioral patterns in multi-channel customer data to identify actionable opportunities for improvement in the consumer journey. During the development, we integrated data from different sources, verified the data integrity, processed the data with Python and Spark (outlier treatment, filtering, aggregation, feature generation), generated insights from clustering and conversion models, and exposed the final results in a dashboard.

This project was deployed in 13 markets and brought tangible improvements to key performance indicators (KPIs) with estimated business value in the order of millions of dollars.

Short-term Forecast of the Electric Load

A short-term forecast for the Swiss electric load for the day-ahead or intraday market.

I was the main person in charge of implementing and evaluating a novel machine learning technique for short-term load prediction used by the Swiss electric grid operator. The resulting model successfully incorporated seasonal patterns at the yearly, weekly, and daily levels and non-linear dependencies with the temperature.

Skills

Languages

Python, CSS, HTML, SQL, TypeScript, Scala

Frameworks

Apache Spark, Spark, Alembic, Flask, Angular, Presto

Libraries/APIs

PySpark, Pandas, SQLAlchemy, OpenAPI, Plotly.js, CatBoost, XGBoost, NetworkX, Matplotlib, TensorFlow, NumPy, SciPy, Dask, Scikit-learn, D3.js, REST APIs

Paradigms

Data Science, Scrum, Functional Programming, Agile Software Development, Continuous Integration (CI), Test-driven Development (TDD), RESTful Development

Other

Data Visualization, Data Analytics, Data, Statistical Inference, Data Engineering, Classification Algorithms, Econometrics, Time Series Analysis, Machine Learning, Big Data, Algorithms, Feature Engineering, Ensemble Methods, GitHub Actions, Text Classification, Regression Modeling, Full-stack, Mathematics, Energy, Markets

Tools

Git, Slack, PyCharm, Pytest, Tree-Based Pipeline Optimization Tool (TPOT), StatsModels, Plotly, Microsoft Power BI, Seaborn, Tableau, Jenkins, MATLAB, Apache Airflow

Platforms

Jupyter Notebook, Docker, Unix, Amazon Web Services (AWS), Visual Studio Code (VS Code)

Storage

PostgreSQL, Apache Hive, HDFS, Data Pipelines

Education

2015 - 2017

Master's Degree in Financial Engineering

École Polytechnique Fédérale de Lausanne (EPFL) - Lausanne, Switzerland

2007 - 2012

Engineer's Degree in Electrical Engineering

Universidad de Los Andes - Bogotá, Colombia

Certifications

NOVEMBER 2019 - PRESENT

How to Win a Data Science Competition by NRU HSE

Coursera

JUNE 2019 - PRESENT

Functional Programming Principles in Scala by EPFL

Coursera

NOVEMBER 2018 - PRESENT

Big Data Analysis with Scala and Spark by EPFL

Coursera

APRIL 2018 - PRESENT

Algorithmic Toolbox by UC San Diego and NRU HSE

Coursera

FEBRUARY 2018 - PRESENT

Big Data Analysis: Hive, Spark SQL, DataFrames, and GraphFrames

Coursera

NOVEMBER 2017 - PRESENT

Professional Scrum Developer I

Scrum.org

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring