Data Engineer
2019 - 2019Greenchef (Toptal client)- Built a data warehouse on AWS Redshift.
- Used AWS DMS to synchronize the production database (MongoDB) with AWS Redshift within seconds.
- Developed AWS Lambda functions to validate data quality daily and raise alarms if necessary.
- Built a CI/CD framework to develop and automatically run data analysis queries on AWS Redshift.
- Developed a set of microservices with AWS Lambda to automatically restart data pipelines in case of a failure.
Technologies: Git, CircleCI, AWS Lambda, SQL, Python, PostgreSQL, AWS Database Migration ServiceData Engineer
2019 - 2019Xapo- Built a data warehouse on AWS (Airflow, Glue, Lambda, Redshift) to generate operational dashboards at every level in the business (customer support, compliance, debit card, etc.).
- Created ETL data pipelines with NiFi to sync data with databases in production.
- Created datamarts in BigQuery easily accessible using Excel, Tableau, or Google Data Studio.
- Collaborates with all areas of the organization to ensure data quality and integrity.
- Ensured compliance with the organization’s data governance policies.
- Created a model to predict the number of open tickets by customer. Data points such as number of transactions, number of new customers, current Bitcoin price were used. Running in production. Output is exposed as a service and shown on a web built with Shiny (R).
- Designed and analyzed debit card campaign KPIs such as card penetration, customer activity (e.g. time to first buy), retention (churn), transactions (amount, average amount, rejected), reordered cards, adoption,etc). Results were reported with Google Data Studio (cohorts, line charts, tables, etc).
- Created a dynamic Excel sheet to track cash reserves, balances, safeguarding, money in, money out, open transactions, exchange balance, etc. Excel daily updates data by using BigQuery.
Technologies: Microsoft Excel, Google Data Studio, Tableau, Redshift, NiFi, SQL, PythonData Scientist (Remote)
2017 - 2019Vodafone- Designed and developed large-scale machine learning algorithms with Impala, Spark, R (Shiny) and Python (Pandas/Numpy/Plotly/TF/Keras) to improve customer retention and product recommendation, analyze customer social network, and optimize marketing campaigns. Model was deployed in production using AWS SageMaker. A/B testing were used to validate offline results.
- Analyzed WhatsApp usage patterns with Spark to understand customer social network. This information would be used for marketing.
- Analyzed network performance and net promoter score to improve mobile network based on customer satisfaction.
- Designed pricing model with machine learning to offer dynamic pricing on Internet data tariffs. This project focused on customers who occasionally used mobile Internet data. Current customers' data usage, customer segment, customer location or current price elasticity were used to enhance right price estimation.
- Designed pricing model with machine learning that optimised counter offer price to increase revenue and reduce churn rate. Customer segmentation was used to optimise price. Model was deployed in production.
Technologies: Plotly, NumPy, Pandas, Keras, TensorFlow, Git, Scala, Python, PySpark, Cloudera, Impala, HDFS, HadoopData Scientist
2015 - 2017Jaguar Land Rover- Managed stakeholders, planned projects, and designed a strategic roadmap for the Research DataLab team.
- Directly involved in deploying a scalable automotive data logging system on a fleet of 150 engineering vehicles, and developing large-scale data pipelines on AWS.
- Analyzed driving patterns to enhance advanced driver-assistance systems, anomaly detection to improve vehicle reliability and enable failure prediction, analysis of vehicle component usage to optimize reliability and cost.
- Created a data quality testing framework to ensure data integrity.
- Designed and developed a library that made it easy to run queries on vehicle data.
Technologies: Docker, Logstash, Elasticsearch, BigQuery, Apache Kafka, Cassandra, HBase, Tableau, RStudio Shiny, R, Python, Scala, Spark, HadoopData Scientist
2015 - 2015Jaguar Land Rover- Contributed to the design and development of an intelligent car and native cloud application on AWS to offer fully personalized driving experience.
- Designed performance metric to measure the quality of service for each component of the application.
- Developed machine learning models to predict user driving routines. Predictions were used for car preconditioning, fuel consumption estimation, destination prediction, or estimating time of arrival.
- Created a model to predict user destination based on calendar and email using natural language processing.
Technologies: Amazon Web Services (AWS), Docker, Apache Kafka, Scala, Java, Python, HBase, Cassandra, AWSMachine Learning Engineer
2012 - 2015Biomedical Engineering Group- Improved state-of-art motor imagery brain-computer interface performance by 10% using online adaptive machine learning model. Spectral, temporal, and spatial EEG characteristics were analysed to decode motor tasks from brain activity.
- Developed a machine learning algorithm for automated diagnosis of obstructive sleep apnea–hypopnea syndrome (SAHS). Desaturations in blood oxygen saturation (SaO 2 ) recordings, respiratory rate variability (RRV) or ECG were measured to extract a set of statistical, spectral and nonlinear features that helped diagnosis.
- Assessed the effectiveness of a motor imagery brain computer interface application to rehabilitate cognitive functions by neurofeedback training (NFT). Electroencephalogram (EEG) changes measured by relative power (RP) showed evidence that visuospatial, oral language, memory, intellectual and attention functions improved after performing NFT sessions.
Technologies: Apache Hive, MATLABResearch Scientist
2014 - 2014Brain Computer Interface Group, University of Essex, UK- Worked on advanced brain signal processing with multitask learning, transfer learning, domain adaptation, deep learning, auto-encoders, and deep belief neural networks.
Technologies: Python, MATLABSoftware Engineer
2010 - 2012Agroguia- Developed a machine learning application that allows steering a tractor by means of an EMG-based human-machine Interface.
Technologies: GPS, Digital Signal Processing, Java, C++