Luis Nicolas-Alonso
Verified Expert in Engineering
Machine Learning Developer
Luis is a seasoned data scientist with a strong background in mathematics, software engineering, and machine learning. He has a proven track record of success developing scalable data analytics applications in the cloud and collaborating with technical and non-technical stakeholders. Luis is passionate about running data science projects with agile management, test-driven development, and continuous delivery.
Portfolio
Experience
Availability
Preferred Environment
Jira, Git, OS X
The most amazing...
...project I've developed was an intelligent car that offered a fully personalized driving experience using deep learning and natural language processing.
Work Experience
Data Engineer
BAZZE & COMPANY
- Developed data pipelines to ingest more than 20GB of location data daily. Was responsible for developing and deploying the processes to clean data and monitor data quality.
- Contributed to design and developed the back end of the data marketplace, https://bazze.io/. Was responsible for optimizing SQL queries in AWS Athena to reduce response time and cost.
- Acted as a subject matter expert for logging (AWS Cloudwatch and Sentry), testing. (PyTest), CI/CD (GitLab), and monitoring (AWS CloudWatch).
- Developed data analysis models to extract location patterns (e.g daily commuting).
- Developed a PoC with Kafka and Python for streaming data analysis.
Data Engineer
Greenchef (Toptal client)
- Built a data warehouse on AWS Redshift.
- Used AWS DMS to synchronize the production database (MongoDB) with AWS Redshift within seconds.
- Developed AWS Lambda functions to validate data quality daily and raise alarms if necessary.
- Built a CI/CD framework to develop and automatically run data analysis queries on AWS Redshift.
- Developed a set of microservices with AWS Lambda to automatically restart data pipelines in case of a failure.
Data Engineer
Xapo
- Built a data warehouse on AWS (Airflow, Glue, Lambda, Redshift) to generate operational dashboards at every level in the business (customer support, compliance, debit card, and more).
- Created ETL data pipelines with Kinesis and Spark to sync data with databases in production.
- Created datamarts in BigQuery easily accessible using Excel, Tableau, or Google Data Studio.
- Collaborated with all areas of the organization to ensure data quality and integrity.
- Ensured compliance with the organization’s data governance policies.
- Created a model to predict the number of open tickets by customer. Data points such as number of transactions, number of new customers, current Bitcoin price were used. Output is exposed as a service and shown on a web built with Shiny (R).
- Designed and analyzed debit cards campaign KPIs such as card penetration, customer activity (e.g. time to first buy), retention (churn), transactions (amount, rejected), and more. Results were reported with Google Data Studio (cohorts, line charts).
- Created a dynamic Excel sheet to track cash reserves, balances, safeguarding, money in, money out, open transactions, exchange balance, and more. Excel daily updates data by using BigQuery.
Data Scientist (Remote)
Vodafone
- Designed and developed large-scale machine learning algorithms with Impala, Spark, R (Shiny) and Python (Pandas/Numpy/Plotly/TF/Keras) to improve customer retention and product recommendation, analyze customer social network, and optimize marketing campaigns. Model was deployed in production using AWS SageMaker. A/B testing were used to validate offline results.
- Analyzed WhatsApp usage patterns with Spark to understand customer social network. This information would be used for marketing.
- Analyzed network performance and net promoter score to improve mobile network based on customer satisfaction.
- Designed pricing model with machine learning to offer dynamic pricing on Internet data tariffs. This project focused on customers who occasionally used mobile Internet data. Current customers' data usage, customer segment, customer location or current price elasticity were used to enhance right price estimation.
- Designed pricing model with machine learning that optimised counter offer price to increase revenue and reduce churn rate. Customer segmentation was used to optimise price. Model was deployed in production.
Data Scientist
Jaguar Land Rover
- Managed stakeholders, planned projects, and designed a strategic roadmap for the research data lab team.
- Directly involved in deploying a scalable automotive data logging system on a fleet of 150 engineering vehicles, and developing large-scale data pipelines on AWS. Technologies used included Spark, Kafka, Parquet, S3, Akka, and Python.
- Analyzed driving patterns to enhance advanced driver-assistance systems, anomaly detection to improve vehicle reliability and enable failure prediction, and analysis of vehicle component usage to optimize reliability and cost.
- Created a data quality testing framework to ensure data integrity.
- Designed and developed a library that made it easy to run queries on vehicle data.
Data Scientist
Jaguar Land Rover
- Contributed to the design and development of an intelligent car and native cloud application on AWS to offer a fully personalized driving experience.
- Designed performance metrics to measure the quality of service for each component of the application.
- Developed streaming machine learning services to predict user driving routines with Python (Sklearn and Pandas) and Kafka. Predictions were used for car preconditioning, fuel consumption estimation, destination prediction, or estimating the time of arrival.
- Created a model to predict user destination based on calendar and email using natural language processing.
Machine Learning Engineer
Biomedical Engineering Group
- Improved state-of-art motor imagery brain-computer interface performance by 10% using online adaptive machine learning model. Spectral, temporal, and spatial EEG characteristics were analysed to decode motor tasks from brain activity.
- Developed a machine learning algorithm for automated diagnosis of obstructive sleep apnea–hypopnea syndrome (SAHS). Desaturations in blood oxygen saturation (SaO 2 ) recordings, respiratory rate variability (RRV) or ECG were measured to extract a set of statistical, spectral and nonlinear features that helped diagnosis.
- Assessed the effectiveness of a motor imagery brain computer interface application to rehabilitate cognitive functions by neurofeedback training (NFT). Electroencephalogram (EEG) changes measured by relative power (RP) showed evidence that visuospatial, oral language, memory, intellectual and attention functions improved after performing NFT sessions.
Research Scientist
Brain Computer Interface Group, University of Essex, UK
- Worked on advanced brain signal processing with multitask learning, transfer learning, domain adaptation, deep learning, auto-encoders, and deep belief neural networks.
Software Engineer
Agroguia
- Developed a machine learning application that allows steering a tractor by means of an EMG-based human-machine Interface.
Experience
Go I-PACE App
https://media.jaguar.com/news/2018/07/go-i-pace-app-puts-electric-jaguar-your-pocketGo I-PACE helps customers understand the potential cost savings of going electric compared to their existing vehicle. would-be buyers. The app estimates how I-PACE would fit into your life based on personal journey data.
The Go I-PACE app captures journey data to calculate potential cost savings, show how much battery would be used per trip and tell users how many charges they would need in a week if they were driving the I-PACE.
Calculates the range expected from a full charge based on your vehicle use, the number of charges required in a typical week and how frequently you would need to top up mid-journey.
It can also distinguish between different modes of transport to make sure it collects accurate data, even prompting users to confirm that individual trips were made by car for unusual routes – for instance on journeys made by cycling rather than behind the wheel.
Self-learning Car
https://www.youtube.com/watch?v=F923EuB06CIMain responsibilities and goals involved:
● Define and implement scalable real-time workflow to load data, quality management, and distribution across various system using Big Data technologies on Amazon Web Services.
● Contribute to the software development lifecycle including the analysis, architecture, design, implementation, and QA.
● Hands-on work directly implementing complex machine learning solutions using Natural Language Processing, recommender systems, neural networks and/or deep learning.
● Write technical documentation and presentation of results to technical and non-technical stakeholders.
Sensors
https://github.com/lnicalo/SensorsProcessing time series collected from different sensors poses several challenges as a result of data may not be aligned or have the same time sampling. Writing data queries can be quite hard for data scientists because data cannot be expressed in a tabular form.
This library makes it easy to write queries with this kind of datasets.
Driver Profile Analysis
- Fuel consumption
- Daily in-car time
- Commute schedule
- Regular routes
- Total distance
- Journey duration
- Driving style
- Refuelling events
- Phone call patterns
- Heated and cooled seat usage
- Phone call pattern
- Radio stations
Data Science Competition - CONNECTOMICS - (16th / 143)
https://www.kaggle.com/c/connectomicsThe goal of the data science competition was to predict the directed connection between 1000 neurons based on their time series of the activity.
My solution involved a mixture of several features such as correlation, mutual information, partial correlation, spectrogram, and frequency analysis.
Data Science Competition - Grasp-and-Lift EEG Detection - (15th / 379)
https://www.kaggle.com/c/grasp-and-lift-eeg-detectionMy solution involved a deep neural net developed with Python using Theano and Lasagne.
Skills
Languages
Python 2, Python 3, SQL, Python, R, Scala, Java, C++
Frameworks
Spark, Hadoop, RStudio Shiny, Swagger
Libraries/APIs
Pandas, Spark ML, Keras, NumPy, Scikit-learn, OpenCV, Python API, TensorFlow, Spark Streaming, PySpark, Google Cloud API
Tools
Spark SQL, Amazon Athena, Amazon Elastic MapReduce (EMR), CircleCI, Tableau, Impala, Cloudera, BigQuery, PyCharm, Git, Amazon SageMaker, MATLAB, Apache Airflow, AWS Glue, Jira, Plotly, Microsoft Excel, Superset, Logstash, Amazon CloudWatch, Amazon ElastiCache, Amazon Simple Queue Service (SQS)
Paradigms
Lambda Architecture, Siamese Neural Networks, Microservices Architecture, Agile, Scrum
Platforms
Spark Core, Amazon Web Services (AWS), Docker, Apache Kafka, Azure, OS X, AWS Lambda, Kubernetes
Storage
Apache Hive, HBase, HDFS, Cassandra, Amazon S3 (AWS S3), Redshift, MySQL, Elasticsearch, Google Cloud, PostgreSQL, NoSQL
Other
Convolutional Neural Networks (CNN), Deep Neural Networks, Neural Networks, Deep Learning, Predictive Modeling, Big Data, Machine Learning, Statistics, Recurrent Neural Networks (RNNs), Artificial Neural Networks (ANN), Statistical Analysis, A/B Testing, Customer Analysis, Cohort Analysis, Digital Signal Processing, Signal Processing, EEG, Google BigQuery, Data Visualization, Lambda Functions, Computer Vision, Natural Language Processing (NLP), Agile Data Science, Internet of Things (IoT), Google Cloud ML, OCR, Google Data Studio, Bayesian Statistics, Churn Analysis, Biomedical Skills, ECG, CI/CD Pipelines, GPT, Generative Pre-trained Transformers (GPT), GPS, NiFi, AWS Database Migration Service (DMS), Kappa Architecture, Amplitude, Amazon API Gateway
Education
Ph.D. in Biomedical Engineering
University of Valladolid - Valladolid, Spain
Master's Degree in Information Technology (Data analysis)
University of Valladolid - Valladolid, Spain
Master of Science Degree in Electronic and Telecommunication Engineering
University of Valladolid - Valladolid, Spain
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring