Camila Andrea Gonzalez Williamson
Verified Expert in Engineering
Data Scientist and Developer
Camila is a data scientist and software developer with more than four years of in-depth experience discovering statistical patterns in data, creating data visualizations, building machine learning models, and developing data-processing pipelines. She's worked on projects in various industries and been exposed to a very diverse set of technologies for data science. Camila has a high level of intellectual curiosity, creativity, and definitely enjoys helping businesses bring value from their data.
Unix, Jupyter Notebook, Visual Studio Code (VS Code), PyCharm, Slack, Git
The most amazing...
...project was incorporating multiple levels of seasonalities and temperature effects in a NARX model to make the short-term forecast of the Swiss electric load.
- Designed and mocked up a web application to create and modify interactive data visualizations.
- Designed and developed a web application to create groups of users and exchange data files among users in the same group.
- Added features to a web application that uses Bayesian optimization to accelerate the discovery of new materials.
Enterprise Data Scientist
Philip Morris International
- Developed a statistical analysis, propensity models, and scoring models to predict consumers' conversion to reduced-risk products.
- Implemented a data-processing-pipeline to cluster adoption patterns to reduced-risk products using distributed computing. This pipeline was deployed in 13 markets and brought tangible improvements to key performance indicators.
- Industrialized a data-pipeline to analyze specific global trends—using techniques such as hierarchical clustering, regression, and statistical inference—with an estimated value in the order of tens of millions of dollars.
- Designed, optimized, and implemented a methodology to evaluate similarities in a series of text documents to detect clusters of duplicates. Developed an API to serve the algorithm.
- Trained, supported, and mentored interns or new data scientists joining the team and advocated for data science best practices (reproducible research, code versioning, use of docker containers, and TDD).
Data Science Intern
Pictet Asset Management
- Performed an exploratory data analysis of internal and external fund flows, macroeconomic variables, and market indices to detect leading and lagging variables.
- Implemented multiple models to predict market indices' performance, covering diverse asset classes and geographical regions using a diverse set of machine learning techniques: Random Forests, Naive Bayes, Markov Chains, SVM, LSTM.
- Conducted a rigorous statistical inference analysis to evaluate the performance of the models implemented using the Benjamini-Hochberg procedure to control the false discovery rate.
Temporary Support for Data Science
- Researched state-of-the-art methodologies for short-term electric load forecasting.
- Analyzed yearly, weekly, and daily patterns for the Swiss electric load as well as non-linear dependencies with the temperature.
- Implemented a short-term forecast for the Swiss electric using a state-of-the-art modification of least-squares support-vector machines.
Analyst — Future Atuaries Program
- Priced insurance products based on mortality tables and clients' data distribution.
- Implemented forecasts based on the Monte Carlo simulation for sales strategies.
- Developed a prototype to automatize the monthly data risk profiling of one of the main insurance products.
Data Pipeline for Global Trends
This was a multidisciplinary team effort that involved the collection of external data sources, an extensive work of data wrangling and text manipulation, the use of data science techniques such as hierarchical clustering, regression, and statistical inference, and the exposure of the results via a dashboard accessible as a web application.
The estimated business value for this data product was in the order of tens of millions of dollars.
This was a team effort that involved the analysis of behavioral patterns in multi-channel customer data to identify actionable opportunities for improvement in the consumer journey. During the development, we integrated data from different sources, verified the data integrity, processed the data with Python and Spark (outlier treatment, filtering, aggregation, feature generation), generated insights from clustering and conversion models, and exposed the final results in a dashboard.
This project was deployed in 13 markets and brought tangible improvements to key performance indicators (KPIs) with estimated business value in the order of millions of dollars.
Short-term Forecast of the Electric Load
I was the main person in charge of implementing and evaluating a novel machine learning technique for short-term load prediction used by the Swiss electric grid operator. The resulting model successfully incorporated seasonal patterns at the yearly, weekly, and daily levels and non-linear dependencies with the temperature.
Python, CSS, HTML, SQL, TypeScript, Scala
Apache Spark, Spark, Alembic, Flask, Angular, Presto DB
PySpark, Pandas, SQLAlchemy, OpenAPI, Plotly.js, CatBoost, XGBoost, NetworkX, Matplotlib, TensorFlow, NumPy, SciPy, Dask, Scikit-learn, D3.js, REST APIs
Data Science, Scrum, Functional Programming, Agile Software Development, Continuous Integration (CI), Test-driven Development (TDD), RESTful Development
Data Visualization, Data Analytics, Data, Statistical Inference, Data Engineering, Classification Algorithms, Econometrics, Time Series Analysis, Machine Learning, Big Data, Algorithms, Feature Engineering, Ensemble Methods, GitHub Actions, Text Classification, Regression Modeling, Full-stack, Mathematics, Energy, Markets
Git, Slack, PyCharm, Pytest, Tree-Based Pipeline Optimization Tool (TPOT), StatsModels, Plotly, Microsoft Power BI, Seaborn, Tableau, Jenkins, MATLAB, Apache Airflow
Jupyter Notebook, Docker, Unix, Amazon Web Services (AWS), Visual Studio Code (VS Code)
PostgreSQL, Apache Hive, HDFS, Data Pipelines
Master's Degree in Financial Engineering
École Polytechnique Fédérale de Lausanne (EPFL) - Lausanne, Switzerland
Engineer's Degree in Electrical Engineering
Universidad de Los Andes - Bogotá, Colombia
How to Win a Data Science Competition by NRU HSE
Functional Programming Principles in Scala by EPFL
Big Data Analysis with Scala and Spark by EPFL
Algorithmic Toolbox by UC San Diego and NRU HSE
Big Data Analysis: Hive, Spark SQL, DataFrames, and GraphFrames
Professional Scrum Developer I