Data Engineer2020 - 2021BAZZE & COMPANY
Technologies: Amazon Web Services (AWS), Spark, Python 3, Python API, Swagger, AWS Athena, AWS Lambda, AWS API Gateway, AWS S3, Amazon SQS
- Developed data pipelines to ingest more than 20GB of location data daily. Was responsible for developing and deploying the processes to clean data and monitor data quality.
- Contributed to design and developed the back end of the data marketplace, https://bazze.io/. Was responsible for optimizing SQL queries in AWS Athena to reduce response time and cost.
- Acted as a subject matter expert for logging (AWS Cloudwatch and Sentry), testing. (PyTest), CI/CD (GitLab), and monitoring (AWS CloudWatch).
- Developed data analysis models to extract location patterns (e.g daily commuting).
- Developed a PoC with Kafka and Python for streaming data analysis.
Data Engineer2019 - 2019Greenchef (Toptal client)
Technologies: Git, CircleCI, AWS Lambda, SQL, Python, PostgreSQL, AWS Database Migration Service
- Built a data warehouse on AWS Redshift.
- Used AWS DMS to synchronize the production database (MongoDB) with AWS Redshift within seconds.
- Developed AWS Lambda functions to validate data quality daily and raise alarms if necessary.
- Built a CI/CD framework to develop and automatically run data analysis queries on AWS Redshift.
- Developed a set of microservices with AWS Lambda to automatically restart data pipelines in case of a failure.
Data Engineer2019 - 2019Xapo
Technologies: Microsoft Excel, Google Data Studio, Tableau, Redshift, NiFi, SQL, Python
- Built a data warehouse on AWS (Airflow, Glue, Lambda, Redshift) to generate operational dashboards at every level in the business (customer support, compliance, debit card, and more).
- Created ETL data pipelines with Kinesis and Spark to sync data with databases in production.
- Created datamarts in BigQuery easily accessible using Excel, Tableau, or Google Data Studio.
- Collaborated with all areas of the organization to ensure data quality and integrity.
- Ensured compliance with the organization’s data governance policies.
- Created a model to predict the number of open tickets by customer. Data points such as number of transactions, number of new customers, current Bitcoin price were used. Output is exposed as a service and shown on a web built with Shiny (R).
- Designed and analyzed debit cards campaign KPIs such as card penetration, customer activity (e.g. time to first buy), retention (churn), transactions (amount, rejected), and more. Results were reported with Google Data Studio (cohorts, line charts).
- Created a dynamic Excel sheet to track cash reserves, balances, safeguarding, money in, money out, open transactions, exchange balance, and more. Excel daily updates data by using BigQuery.
Data Scientist (Remote)2017 - 2019Vodafone
Technologies: Plotly, NumPy, Pandas, Keras, TensorFlow, Git, Scala, Python, PySpark, Cloudera, Impala, HDFS, Hadoop
- Designed and developed large-scale machine learning algorithms with Impala, Spark, R (Shiny) and Python (Pandas/Numpy/Plotly/TF/Keras) to improve customer retention and product recommendation, analyze customer social network, and optimize marketing campaigns. Model was deployed in production using AWS SageMaker. A/B testing were used to validate offline results.
- Analyzed WhatsApp usage patterns with Spark to understand customer social network. This information would be used for marketing.
- Analyzed network performance and net promoter score to improve mobile network based on customer satisfaction.
- Designed pricing model with machine learning to offer dynamic pricing on Internet data tariffs. This project focused on customers who occasionally used mobile Internet data. Current customers' data usage, customer segment, customer location or current price elasticity were used to enhance right price estimation.
- Designed pricing model with machine learning that optimised counter offer price to increase revenue and reduce churn rate. Customer segmentation was used to optimise price. Model was deployed in production.
Data Scientist2015 - 2017Jaguar Land Rover
Technologies: Docker, Logstash, Elasticsearch, BigQuery, Apache Kafka, Cassandra, HBase, Tableau, RStudio Shiny, R, Python, Scala, Spark, Hadoop
- Managed stakeholders, planned projects, and designed a strategic roadmap for the research data lab team.
- Directly involved in deploying a scalable automotive data logging system on a fleet of 150 engineering vehicles, and developing large-scale data pipelines on AWS. Technologies used included Spark, Kafka, Parquet, S3, Akka, and Python.
- Analyzed driving patterns to enhance advanced driver-assistance systems, anomaly detection to improve vehicle reliability and enable failure prediction, and analysis of vehicle component usage to optimize reliability and cost.
- Created a data quality testing framework to ensure data integrity.
- Designed and developed a library that made it easy to run queries on vehicle data.
Data Scientist2015 - 2015Jaguar Land Rover
Technologies: Amazon Web Services (AWS), Docker, Apache Kafka, Scala, Java, Python, HBase, Cassandra, AWS
- Contributed to the design and development of an intelligent car and native cloud application on AWS to offer a fully personalized driving experience.
- Designed performance metrics to measure the quality of service for each component of the application.
- Developed streaming machine learning services to predict user driving routines with Python (Sklearn and Pandas) and Kafka. Predictions were used for car preconditioning, fuel consumption estimation, destination prediction, or estimating the time of arrival.
- Created a model to predict user destination based on calendar and email using natural language processing.
Machine Learning Engineer2012 - 2015Biomedical Engineering Group
Technologies: Apache Hive, MATLAB
- Improved state-of-art motor imagery brain-computer interface performance by 10% using online adaptive machine learning model. Spectral, temporal, and spatial EEG characteristics were analysed to decode motor tasks from brain activity.
- Developed a machine learning algorithm for automated diagnosis of obstructive sleep apnea–hypopnea syndrome (SAHS). Desaturations in blood oxygen saturation (SaO 2 ) recordings, respiratory rate variability (RRV) or ECG were measured to extract a set of statistical, spectral and nonlinear features that helped diagnosis.
- Assessed the effectiveness of a motor imagery brain computer interface application to rehabilitate cognitive functions by neurofeedback training (NFT). Electroencephalogram (EEG) changes measured by relative power (RP) showed evidence that visuospatial, oral language, memory, intellectual and attention functions improved after performing NFT sessions.
Research Scientist2014 - 2014Brain Computer Interface Group, University of Essex, UK
Technologies: Python, MATLAB
- Worked on advanced brain signal processing with multitask learning, transfer learning, domain adaptation, deep learning, auto-encoders, and deep belief neural networks.
Software Engineer2010 - 2012Agroguia
Technologies: GPS, Digital Signal Processing, Java, C++
- Developed a machine learning application that allows steering a tractor by means of an EMG-based human-machine Interface.