Camila Andrea Gonzalez Williamson
Verified Expert in Engineering
Data Scientist and Developer
Ecublens, Switzerland
Toptal member since October 22, 2020
Camila is a senior software engineer with nine years of experience in data and machine learning (ML). She has deep experience analyzing and visualizing data, implementing and maintaining scalable data pipelines, prototyping and productionising ML models, and communicating effectively with stakeholders. Camila has worked in domains such as manufacturing, finance, and telecommunications, where she has collaborated on end-to-end solutions to transform raw data into actionable insights.
Portfolio
Experience
- Python - 9 years
- Data Analytics - 9 years
- Data Visualization - 8 years
- Apache Spark - 7 years
- Data Engineering - 7 years
- Machine Learning - 5 years
- Scala - 3 years
- Amazon Web Services (AWS) - 2 years
Availability
Preferred Environment
IntelliJ IDEA, JupyterLab, Amazon Web Services (AWS), Apache Airflow, Python, Scala, Hadoop, Apache Spark, Plotly, P5.js
The most amazing...
...project was building a new anomaly detection service, able to detect anomalies on multiple types of time series, and send alerts in near real time.
Work Experience
Data Scientist | Full-stack Software Engineer
Atinary Technologies
- Designed, implemented, and deployed front- and back-end web applications to transfer, analyze, and visualize data.
- Added features to existing web applications implementing a machine learning (ML) platform for orchestrating self-driving labs.
- Designed a web application to create and modify interactive data visualizations.
Enterprise Data Scientist
Philip Morris International
- Developed a statistical analysis, propensity models, and scoring models to predict consumers' conversion to reduced-risk products.
- Implemented a data-processing pipeline to cluster adoption patterns to reduced-risk products using distributed computing. This pipeline was deployed in 13 markets and brought tangible improvements to key performance indicators.
- Industrialized a data pipeline to analyze specific global trends—using techniques such as hierarchical clustering, regression, and statistical inference—with an estimated value of tens of millions of dollars.
- Designed, optimized, and implemented a methodology to evaluate similarities in a series of text documents to detect clusters of duplicates. Developed an API to serve the algorithm.
- Trained, supported, and mentored interns or new data scientists joining the team and advocated for data science best practices, such as reproducible research, code versioning, use of Docker containers, and test-driven development (TDD).
Data Science Intern
Pictet Asset Management
- Performed an exploratory data analysis of internal and external fund flows, macroeconomic variables, and market indices to detect leading and lagging variables.
- Implemented multiple models to predict market indices' performance, covering diverse asset classes and geographical regions using a diverse set of machine learning techniques: Random Forests, Naive Bayes, Markov Chains, SVM, LSTM.
- Conducted a rigorous statistical inference analysis to evaluate the performance of the models implemented using the Benjamini-Hochberg procedure to control the false discovery rate.
Temporary Support for Data Science
Swissgrid
- Researched state-of-the-art methodologies for short-term electric load forecasting.
- Analyzed yearly, weekly, and daily patterns for the Swiss electric load as well as non-linear dependencies with the temperature.
- Implemented a short-term forecast for the Swiss electric using a state-of-the-art modification of least-squares support-vector machines.
Analyst — Future Atuaries Program
Seguros Bolívar
- Priced insurance products based on mortality tables and clients' data distribution.
- Implemented forecasts based on the Monte Carlo simulation for sales strategies.
- Developed a prototype to automatize the monthly data risk profiling of one of the main insurance products.
Experience
Data Pipeline for Global Trends
This was a multidisciplinary team effort that involved the collection of external data sources, an extensive work of data wrangling and text manipulation, the use of data science techniques such as hierarchical clustering, regression, and statistical inference, and the exposure of the results via a dashboard accessible as a web application.
The estimated business value for this data product was in the order of tens of millions of dollars.
Consumer Segmentation
This was a team effort that involved the analysis of behavioral patterns in multi-channel customer data to identify actionable opportunities for improvement in the consumer journey. During the development, we integrated data from different sources, verified the data integrity, processed the data with Python and Spark (outlier treatment, filtering, aggregation, feature generation), generated insights from clustering and conversion models, and exposed the final results in a dashboard.
This project was deployed in 13 markets and brought tangible improvements to key performance indicators (KPIs) with estimated business value in the order of millions of dollars.
Short-term Forecast of the Electric Load
I was the main person in charge of implementing and evaluating a novel machine learning technique for short-term load prediction used by the Swiss electric grid operator. The resulting model successfully incorporated seasonal patterns at the yearly, weekly, and daily levels and non-linear dependencies with the temperature.
Education
Master's Degree in Financial Engineering
École Polytechnique Fédérale de Lausanne (EPFL) - Lausanne, Switzerland
Engineer's Degree in Electrical Engineering
University of the Andes - Bogota, Colombia
Engineer's Degree in Electronics Engineering
University of the Andes - Bogota, Colombia
Certifications
How to Win a Data Science Competition by NRU HSE
Coursera
Functional Programming Principles in Scala by EPFL
Coursera
Big Data Analysis with Scala and Spark by EPFL
Coursera
Algorithmic Toolbox by UC San Diego and NRU HSE
Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames, and GraphFrames
Coursera
Professional Scrum Developer I
Scrum.org
Skills
Libraries/APIs
PySpark, Matplotlib, Pandas, SQLAlchemy, OpenAPI, Plotly.js, CatBoost, XGBoost, NetworkX, TensorFlow, NumPy, SciPy, Dask, Scikit-learn, D3.js, REST APIs, P5.js
Tools
Plotly, Git, Slack, PyCharm, Pytest, Tree-Based Pipeline Optimization Tool (TPOT), StatsModels, Microsoft Power BI, Seaborn, Tableau, Jenkins, MATLAB, Apache Airflow, IntelliJ IDEA
Languages
Python, CSS, HTML, SQL, TypeScript, Scala
Frameworks
Apache Spark, Alembic, Flask, Angular, Presto, Hadoop
Platforms
Jupyter Notebook, Docker, Unix, Amazon Web Services (AWS), Visual Studio Code (VS Code)
Paradigms
Scrum, Functional Programming, Continuous Integration (CI), Test-driven Development (TDD), RESTful Development, Maintainability
Storage
PostgreSQL, Apache Hive, HDFS, Data Pipelines
Other
Data Visualization, Data Analytics, Data Engineering, Data Science, Data, Machine Learning, Statistical Inference, Classification Algorithms, Econometrics, Time Series Analysis, Big Data, Algorithms, Feature Engineering, Ensemble Methods, GitHub Actions, Text Classification, Regression Modeling, Full-stack, Mathematics, Energy, Markets, JupyterLab, Digital Electronics, Hardware, Computer Architecture, Probability Theory, Power Electronics, Reliability
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring