
Toni Cebrián
Verified Expert in Engineering
Machine Learning Developer
Barcelona, Spain
Toptal member since February 4, 2019
A rare mixture of data scientist and data engineer, Toni is able to lead projects from conception and prototyping to deploying at scale in the cloud.
Portfolio
Experience
- Machine Learning - 10 years
- SQL - 10 years
- Functional Programming - 10 years
- Haskell - 10 years
- Data Science - 10 years
- Scala - 8 years
- Python 3 - 6 years
- Akka - 4 years
Availability
Preferred Environment
Linux
The most amazing...
...experience has been teaching a typeclasses talk using Scala at a local Scala meetup group.
Work Experience
Consultant
Self-employed
- Ingested a bitcoin graph into a Neo4J database using Airflow to periodically crawl BigQuery tables with bitcoin transactions.
- Created asyncio web crawlers in Python to scrape websites with newsworthy content.
- Maintained and evolved an SDK in Scala and Haskell for accessing web APIs from customers using those languages.
- Created a tool for translating package addresses to different routing zones in a serverless architecture.
Data Science Developer
Christian Gosset
- Transformed an 88GB rule file into thousands of smaller C++ files for compiling into an executable.
- Run the binary executable in parallel to take advantage of multi-core architectures.
- Read input data from an Apache Parquet file and feed that into processing pipelines.
Machine Learning Expert
Christoph Sommer
- Defined the mathematical modeling of an optimization problem involving electric vehicles, batteries and PV arrays. Translated the requirements into a mixed integer linear programming setup.
- Used Python libraries like Pyomo to encode the MILP problem. After coding the problem in Pyomo, I translated the encoding to MATLAB.
- Interfaced the MILP problem into Simulink by creating a MATLAB script that ran the simulation step by step, taking into account the output of the MILP optimization.
Data Engineering Consultant
Walletconnect
- Defined the data pipeline to ingest raw WebSocket data into a S3 data lake.
- Created the data warehouse that reads data from the data lake into a star schema in Athena. Moving data was done through DBT models.
- Created all dashboards and data definitions for exploitation of the data in the warehouse.
Full-stack Data Engineer
Greeneffort
- Defined, researched, and decided on the provider for doing OCR in invoices. Created the data pipeline that moved invoices from out systems through the OCR and finally left metadata in the DB.
- Created the whole server architecture for the frontend using Akka HTTP for the REST API and Slick for DB access. The different services were living in a GKE cluster.
- Created an ontology for mapping the Life Cycle Impact Assessment (LCIA) of different products to our internal data definitions that allowed richer queries on the impact of different products on CO2 consumption.
Semantic Web Consultant
Dow Jones and Company
- Developed the ontologies for data modelling in the area of bankruptcies in the US. The base was the Common Core Ontology and it was extended to accommodate all other concepts.
- Created a compiler that read an OWL file with a schema definition and created Scala code to manage programmatically and fully typed the concepts in that ontology.
- Implemented the Cloud Dataflow pipelines that read the firehose of articles at Dow Jones, processed them, and ingested the semantic data into the semantic data store.
Lead Data Engineer
Nansen
- Worked implementing the dbt models that populated Nansen's warehouse.
- Worked with blockchain ETL library to analyze how to ingest data from diferent blockchains into the raw data lake.
- Performed different data analyses with the graph DB TigerGraph in order to track where some ETH went in a famous scam in 2018.
Lead Data Engineer
Coinfi
- Created the ETL orchestration systems using Airflow with Composer in Google Cloud.
- Created scraping services for getting crypto data (prices, events, and news) to ingest into the platform.
- Set up dbt models to report on blockchain data publicly available in BigQuery datasets.
Head of Data Science
Stuart
- Designed the company's data warehouse using Redshift.
- Created a forecasting model for predicting drivers' login into the platform and deliveries to be served.
- Architected an event sourcing system for complex event processing.
- Deployed a route optimization algorithm for picking drivers based on route and package size.
- Created the data science team from scratch, led the hiring process, created role definitions, and established OKRs.
Chief Data Officer
Enerbyte
- Architected the infrastructure for ingesting data from IoT devices.
- Researched algorithms for energy disaggregation from a single point of measure.
- Created the data science team from scratch, leading the hiring process, role definitions, and quarterly OKRs.
Head of Data Science
Softonic
- Created a recommender system based on textual content from app reviews.
- Developed an improved search engine using machine learning and Solr.
- Created the data science team from scratch. Hired all relevant profiles and set up the OKRs and managerial tasks.
Experience
Type Classes Talk
https://github.com/tonicebrian/typeclasses-talkEducation
Master's Degree in Artificial Intelligence
Universitat Politecnica de Catalunya - Barcelona, Spain
Postgraduate Degree in Quantitative Techniques for Financial Products
Universitat Politecnica de Catalunya - Barcelona, Spain
Certifications
Cloudera Certified Hadoop Professional
Cloudera
Skills
Libraries/APIs
Spark Streaming, Pandas, NumPy, PubSubJS, Python Asyncio, TensorFlow, XGBoost, Stanford NLP, OpenAPI, Slick
Tools
Apache Airflow, Cloud Dataflow, Apache Beam, Amazon Athena, Solr, Apache Avro, Protégé, Google Kubernetes Engine (GKE), AWS Glue, BigQuery, MATLAB, MATLAB Statistics & Machine Learning Toolbox
Languages
Python, Python 3, Scala, SQL, RDF, Haskell, C++, OWL, PHP, Simulink, Regex
Frameworks
Spark, Akka, Hadoop
Paradigms
Functional Programming, REST, Reactive Programming, Linear Programming
Platforms
Google Cloud Platform (GCP), Apache Kafka, Linux, TigerGraph, Stardog, Kubernetes, Amazon Web Services (AWS), Blockchain
Storage
Redshift, Cassandra, PostgreSQL, Google Cloud, Redis, Neo4j, Amazon S3 (AWS S3), Apache Parquet
Other
Machine Learning, Akka HTTP, Data Mining, Data Science, Data Engineering, Artificial Intelligence (AI), Technical Leadership, Leadership, Consulting, Mentorship & Coaching, Google BigQuery, Big Data, APIs, Back-end Development, Crypto, NEO, Data Flows, Recommendation Systems, Word2Vec, Semantic Web, Web Scraping, Natural Language Processing (NLP), Deep Learning, Financial Modeling, Monte Carlo Simulations, Time Series, Data Build Tool (dbt), RDFox, Ontologies, Optical Character Recognition (OCR), Invoice Processing, Document Parsing, Decision Trees
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring