Ioannis Melas, Developer in Cambridge, United Kingdom
Ioannis is available for hire
Hire Ioannis

Ioannis Melas

Verified Expert  in Engineering

Data Scientist and Machine Learning Developer

Location
Cambridge, United Kingdom
Toptal Member Since
March 30, 2021

Ioannis is a data scientist with expertise in prototyping, developing, and deploying data science and ML workflows that best leverage business data. This includes exploratory analysis such as dimensionality reduction, clustering, feature extraction, model fit/parameter estimation, and supervised analysis such as classification and regression. Ioannis is an expert in structured and unstructured data (NLP). Some of his notable clients include the U.S. Food & Drug Administration and AstraZeneca.

Portfolio

Shell
Python 3, Scikit-learn, NumPy, Pandas, Git, Machine Learning...
Carbon Connect Enterprise Strategies Inc.
Data Science, Machine Learning, Large-Scale Computing, CTO
AstraZeneca
Python 3, Bash Script, Data Science, Machine Learning...

Experience

Availability

Full-time

Preferred Environment

Linux, Python 3, Flask-RESTful, Streamlit, Spotfire, Python, R, Bash

The most amazing...

...NLP solution I've developed was for the text summarization and classification of biomedical literature routinely used by research scientists.

Work Experience

Data Science Contractor

2022 - 2023
Shell
  • Designed solutions for carbon sequestration. Mined structured and unstructured data on the molecular processes driving the carbon cycle in the soil. Identified interventions to optimize the carbon cycle.
  • Validated findings against the published literature (20 million articles) using NLP.
  • Developed an interactive dashboard with the results and published it to end users.
Technologies: Python 3, Scikit-learn, NumPy, Pandas, Git, Machine Learning, Artificial Intelligence (AI), GPT, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Statistics

Machine Learning CTO

2022 - 2022
Carbon Connect Enterprise Strategies Inc.
  • Developed a platform for monitoring forest growth and carbon credit budgeting.
  • Mined and segmented LiDAR and satellite images for tree identification and growth.
  • Built a dashboard using Streamlit in Python, deployed in GCP to allow the users to query their data.
Technologies: Data Science, Machine Learning, Large-Scale Computing, CTO

Data Science Contractor

2020 - 2022
AstraZeneca
  • Developed a machine learning workflow to leverage and interpret genetic data. This included parsing and preprocessing patient data, normalization, dimensionality reduction, statistical tests, and supervised analysis.
  • Created a natural language solution for mining biomedical literature. The data was structured in an Elasticsearch database, cleaned, tokenized using the Natural Language Toolkit (NLTK), vectorized, and then used in a text classification framework.
  • Built dashboards and UI using Streamlit in Python and deployed them using NGINX.
Technologies: Python 3, Bash Script, Data Science, Machine Learning, GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Scikit-learn, Keras, TensorFlow, Streamlit, NGINX, Python, Data Analysis, Spotfire, Flask, Git, Data Visualization

Data Science Contractor

2019 - 2020
Arm
  • Built a machine learning framework for maximizing coverage in CPU verification. Development was in Python; deployed on HPC using the Slurm Workload Manager.
  • Developed workflows leveraging adversarial learning using GANs and programmed in Python Keras.
  • Addressed numerical optimization problems using genetic algorithms with a custom GA implementation.
Technologies: Python 3, Scikit-learn, Keras, TensorFlow, Generative Adversarial Networks (GANs), Bash, Jenkins, Git, Slurm Workload Manager, GitHub, Python, Deep Learning, Genetic Algorithms, Numerical Methods, Convex Optimization, Data Visualization

Principal Data Scientist

2016 - 2019
UCB Celltech
  • Built machine learning workflows to predict patient response to candidate drugs. Developed in R.
  • Led a team of three developers to create exploratory analytics solutions/dashboards to visualize high-dimensional data. Results were pre-calculated in R, then imported in TIBCO Spotfire.
  • Designed machine learning solutions to predict drug activity in assays. Used LSTMs to model chemical structures as free text and applied methods from text classification.
Technologies: R, Python 3, Spotfire, Linux, H20, Keras, LSTM, Git, Python, Data Analysis, Data Analytics, Data Science, Machine Learning, Bioinformatics, Genomics, Data Visualization

Postdoctoral Research Fellow

2014 - 2016
U.S. Food & Drug Administration
  • Developed a solution for predicting drug adverse events based on their transcriptomic profiles.
  • Created a linear programming formulation to model the structure of directed graphs.
  • Applied a solution to predict the adverse effects of new compounds.
Technologies: R, Linux, C, Slurm Workload Manager, Linear Optimization, NetworkX, Bioinformatics, Genomics, Drug Development, Python, Data Science, Data Analytics

Mining Biomedical Literature Using Elasticsearch and NLP

I parsed and created a local copy of PubMed, indexed it using Elasticsearch, and created a UI using Streamlit. This allowed the user to query the whole of PubMed, pull the papers that match their query, and perform basic NLP tasks using NLTK and spaCy.

Framework for CPU Verification

Developed a machine learning framework for maximizing coverage in CPU verification. I leveraged adversarial learning using GANs and programmed in Python Keras. It was deployed using a command line API and is now routinely used in new products.

Method for Predicting the Efficacy of New Drugs

Developed machine learning workflows to predict patient response to candidate drugs. I integrated several data sources—including free text datasets—to built drug profiles, which I then used in a classification framework to predict their efficacy on patients. This was developed in R, and the results were ported to Spotfire for visualization.

Languages

Bash, Bash Script, Python, Python 3, R, SQL, C

Libraries/APIs

Scikit-learn, Keras, NumPy, TensorFlow, LSTM, Natural Language Toolkit (NLTK), Flask-RESTful, SpaCy, NetworkX, PySpark, Pandas

Paradigms

Data Science

Platforms

Jupyter Notebook, Linux, H20, Amazon Web Services (AWS), Google Cloud Platform (GCP), Kubernetes

Industry Expertise

Bioinformatics

Other

Machine Learning, Mathematical Modeling, Linear Optimization, Genomics, Numerical Methods, Numerical Simulations, Numerical Modeling, Natural Language Processing (NLP), Slurm Workload Manager, Data Analysis, Mixed-integer Linear Programming, Convex Optimization, Pharmaceuticals, GPT, Generative Pre-trained Transformers (GPT), Generative Adversarial Networks (GANs), Computational Physics, Deep Learning, Genetic Algorithms, Data Analytics, Drug Development, Gunicorn, Data Visualization, Artificial Intelligence (AI), Statistics, Large-Scale Computing, CTO

Frameworks

Streamlit, Flask

Tools

Spotfire, Git, NGINX, Jenkins, MATLAB, GitHub, Apache Airflow

Storage

Elasticsearch

2008 - 2013

Ph.D. in Numerical Optimization, Machine Learning, Bioinformatics

National Technical University of Athens - Athens, Greece

2003 - 2008

Master's Degree in Mechanical Engineering

National Technical University of Athens - Athens, Greece

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring