Camila Andrea Gonzalez Williamson
Verified Expert in Engineering
Data Scientist and Developer
Ecublens, Switzerland
Toptal member since October 22, 2020
Camila is a data scientist and software developer with more than four years of in-depth experience discovering statistical patterns in data, creating data visualizations, building machine learning models, and developing data-processing pipelines. She's worked on projects in various industries and been exposed to a very diverse set of technologies for data science. Camila has a high level of intellectual curiosity, creativity, and definitely enjoys helping businesses bring value from their data.
Portfolio
Experience
- Data Analytics - 5 years
- Python - 5 years
- Data Visualization - 4 years
- Data Science - 4 years
- Machine Learning - 4 years
- Statistical Inference - 3 years
- Data Engineering - 2 years
- Apache Spark - 2 years
Availability
Preferred Environment
Unix, Jupyter Notebook, Visual Studio Code (VS Code), PyCharm, Slack, Git
The most amazing...
...project was incorporating multiple levels of seasonalities and temperature effects in a NARX model to make the short-term forecast of the Swiss electric load.
Work Experience
Data Scientist
Chemos Sàrl
- Designed and mocked up a web application to create and modify interactive data visualizations.
- Designed and developed a web application to create groups of users and exchange data files among users in the same group.
- Added features to a web application that uses Bayesian optimization to accelerate the discovery of new materials.
Enterprise Data Scientist
Philip Morris International
- Developed a statistical analysis, propensity models, and scoring models to predict consumers' conversion to reduced-risk products.
- Implemented a data-processing-pipeline to cluster adoption patterns to reduced-risk products using distributed computing. This pipeline was deployed in 13 markets and brought tangible improvements to key performance indicators.
- Industrialized a data-pipeline to analyze specific global trends—using techniques such as hierarchical clustering, regression, and statistical inference—with an estimated value in the order of tens of millions of dollars.
- Designed, optimized, and implemented a methodology to evaluate similarities in a series of text documents to detect clusters of duplicates. Developed an API to serve the algorithm.
- Trained, supported, and mentored interns or new data scientists joining the team and advocated for data science best practices (reproducible research, code versioning, use of docker containers, and TDD).
Data Science Intern
Pictet Asset Management
- Performed an exploratory data analysis of internal and external fund flows, macroeconomic variables, and market indices to detect leading and lagging variables.
- Implemented multiple models to predict market indices' performance, covering diverse asset classes and geographical regions using a diverse set of machine learning techniques: Random Forests, Naive Bayes, Markov Chains, SVM, LSTM.
- Conducted a rigorous statistical inference analysis to evaluate the performance of the models implemented using the Benjamini-Hochberg procedure to control the false discovery rate.
Temporary Support for Data Science
Swissgrid
- Researched state-of-the-art methodologies for short-term electric load forecasting.
- Analyzed yearly, weekly, and daily patterns for the Swiss electric load as well as non-linear dependencies with the temperature.
- Implemented a short-term forecast for the Swiss electric using a state-of-the-art modification of least-squares support-vector machines.
Analyst — Future Atuaries Program
Seguros Bolívar
- Priced insurance products based on mortality tables and clients' data distribution.
- Implemented forecasts based on the Monte Carlo simulation for sales strategies.
- Developed a prototype to automatize the monthly data risk profiling of one of the main insurance products.
Experience
Data Pipeline for Global Trends
This was a multidisciplinary team effort that involved the collection of external data sources, an extensive work of data wrangling and text manipulation, the use of data science techniques such as hierarchical clustering, regression, and statistical inference, and the exposure of the results via a dashboard accessible as a web application.
The estimated business value for this data product was in the order of tens of millions of dollars.
Consumer Segmentation
This was a team effort that involved the analysis of behavioral patterns in multi-channel customer data to identify actionable opportunities for improvement in the consumer journey. During the development, we integrated data from different sources, verified the data integrity, processed the data with Python and Spark (outlier treatment, filtering, aggregation, feature generation), generated insights from clustering and conversion models, and exposed the final results in a dashboard.
This project was deployed in 13 markets and brought tangible improvements to key performance indicators (KPIs) with estimated business value in the order of millions of dollars.
Short-term Forecast of the Electric Load
I was the main person in charge of implementing and evaluating a novel machine learning technique for short-term load prediction used by the Swiss electric grid operator. The resulting model successfully incorporated seasonal patterns at the yearly, weekly, and daily levels and non-linear dependencies with the temperature.
Education
Master's Degree in Financial Engineering
École Polytechnique Fédérale de Lausanne (EPFL) - Lausanne, Switzerland
Engineer's Degree in Electrical Engineering
Universidad de Los Andes - Bogotá, Colombia
Certifications
How to Win a Data Science Competition by NRU HSE
Coursera
Functional Programming Principles in Scala by EPFL
Coursera
Big Data Analysis with Scala and Spark by EPFL
Coursera
Algorithmic Toolbox by UC San Diego and NRU HSE
Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames, and GraphFrames
Coursera
Professional Scrum Developer I
Scrum.org
Skills
Libraries/APIs
PySpark, Pandas, SQLAlchemy, OpenAPI, Plotly.js, CatBoost, XGBoost, NetworkX, Matplotlib, TensorFlow, NumPy, SciPy, Dask, Scikit-learn, D3.js, REST APIs
Tools
Git, Slack, PyCharm, Pytest, Tree-Based Pipeline Optimization Tool (TPOT), StatsModels, Plotly, Microsoft Power BI, Seaborn, Tableau, Jenkins, MATLAB, Apache Airflow
Languages
Python, CSS, HTML, SQL, TypeScript, Scala
Frameworks
Apache Spark, Spark, Alembic, Flask, Angular, Presto
Paradigms
Scrum, Functional Programming, Agile Software Development, Continuous Integration (CI), Test-driven Development (TDD), RESTful Development
Platforms
Jupyter Notebook, Docker, Unix, Amazon Web Services (AWS), Visual Studio Code (VS Code)
Storage
PostgreSQL, Apache Hive, HDFS, Data Pipelines
Other
Data Visualization, Data Analytics, Data Science, Data, Statistical Inference, Data Engineering, Classification Algorithms, Econometrics, Time Series Analysis, Machine Learning, Big Data, Algorithms, Feature Engineering, Ensemble Methods, GitHub Actions, Text Classification, Regression Modeling, Full-stack, Mathematics, Energy, Markets
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring