Ioannis Melas, Data Scientist and Machine Learning Developer in Cambridge, United Kingdom
Ioannis Melas

Data Scientist and Machine Learning Developer in Cambridge, United Kingdom

Member since March 30, 2021
Ioannis is a data scientist with expertise in prototyping, developing, and deploying data science and ML workflows that best leverage business data. This includes exploratory analysis such as dimensionality reduction, clustering, feature extraction, model fit/parameter estimation, and supervised analysis such as classification, regression. His expertise includes both structured and unstructured data (NLP). Notable clients include the U.S. Food & Drug Administration and AstraZeneca.
Ioannis is now available for hire

Portfolio

Experience

Location

Cambridge, United Kingdom

Availability

Full-time

Preferred Environment

Linux, Python 3, Flask-RESTful, Streamlit, Spotfire, Python, R, Bash

The most amazing...

...NLP solution I've developed was for the text summarization and classification of biomedical literature routinely used by research scientists.

Employment

  • Data Science Contractor

    2022 - 2023
    Shell
    • Design solutions for carbon sequestration. Mined structured and unstructured data on the molecular processes driving the carbon cycle in the soil. Identified interventions to optimize the carbon cycle.
    • Validated findings against the published literature (20M articles) using NLP.
    • Developed interactive dashboard with the results and published to end users.
    Technologies: Python 3, Scikit-learn, NumPy, Pandas, Git, Machine Learning, Artificial Intelligence (AI), Natural Language Processing (NLP), Statistics
  • Machine Learning CTO for carbon emission reduction project

    2022 - 2022
    Carbon Connect Enterprise Strategies Inc.
    • Developed a platform for monitoring forest growth and carbon credit budgeting.
    • Mined and segmented Lidar and satellite images for the identification of trees and tree growth.
    • Built a dashboard using Streamlit in Python, deployed in GCP to allow the users query their data.
    Technologies: Data Science, Machine Learning, Large-Scale Computing, CTO
  • Data Science Contractor

    2020 - 2022
    AstraZeneca
    • Developed a machine learning workflow to leverage and interpret genetic data. This included parsing and preprocessing patient data, normalization, dimensionality reduction, statistical tests, and supervised analysis.
    • Created a natural language solution for mining biomedical literature. The data was structured in an Elasticsearch database, cleaned, tokenized using the Natural Language Toolkit (NLTK), vectorized, and then used in a text classification framework.
    • Built dashboards and UI using Streamlit in Python. Deployed using Nginx.
    Technologies: Python 3, Bash Script, Data Science, Machine Learning, Natural Language Processing (NLP), Scikit-learn, Keras, TensorFlow, Streamlit, NGINX, Python, Data Analysis, Spotfire, Flask, Git, Data Visualization
  • Data Science Contractor

    2019 - 2020
    Arm
    • Built a machine learning framework for maximizing coverage in CPU verification. Development was in Python; deployed on HPC using the Slurm Workload Manager.
    • Developed workflows leveraging adversarial learning using GANs and programmed in Python Keras.
    • Addressed numerical optimization problems using genetic algorithms with a custom GA implementation.
    Technologies: Python 3, Scikit-learn, Keras, TensorFlow, Generative Adversarial Networks (GANs), Bash, Jenkins, Git, Slurm Workload Manager, GitHub, Python, Deep Learning, Genetic Algorithms, Numerical Methods, Convex Optimization, Data Visualization
  • Principal Data Scientist

    2016 - 2019
    UCB Celltech
    • Built machine learning workflows to predict patient response to candidate drugs. Developed in R.
    • Led a team of three developers to create exploratory analytics solutions/dashboards to visualize high-dimensional data. Results were pre-calculated in R, then imported in TIBCO Spotfire.
    • Designed machine learning solutions to predict drug activity in assays. Used LSTMs to model chemical structures as free text and applied methods from text classification.
    Technologies: R, Python 3, Spotfire, Linux, H20, Keras, LSTM, Git, Python, Data Analysis, Data Analytics, Data Science, Machine Learning, Bioinformatics, Genomics, Data Visualization
  • Postdoctoral Research Fellow

    2014 - 2016
    U.S. Food & Drug Administration
    • Developed a solution for predicting drug adverse events based on their transcriptomic profiles.
    • Created a linear programming formulation to model the structure of directed graphs.
    • Applied a solution to predict the adverse effects of new compounds.
    Technologies: R, Linux, C, Slurm Workload Manager, Linear Optimization, NetworkX, Bioinformatics, Genomics, Drug Development, Python, Data Science, Data Analytics

Experience

  • Mine Biomedical Literature Using Elasticsearch and NLP

    Parsed and created a local copy of Pubmed, indexed using Elasticsearch, and created a UI using Streamlit to allow the user to query the whole of Pubmed, pull the papers that match their query, and perform basic NLP tasks using NLTK and Spacy.

  • Framework for CPU Verification

    Developed a machine learning framework for maximizing coverage in CPU verification. I leveraged adversarial learning using GANs and programmed in Python Keras. It was deployed using a command line API and is now routinely used in new products.

  • Method for Predicting Efficacy of New Drugs

    Developed machine learning workflows to predict patient response to candidate drugs. I integrated several data sources including free text datasets to built drug profiles, which I then used in a classification framework to predict their efficacy on patients. Developed in R. Results were ported to Spotfire for visualization.

Skills

  • Languages

    Bash, Bash Script, Python, Python 3, R, SQL, C
  • Libraries/APIs

    Scikit-learn, Keras, NumPy, TensorFlow, LSTM, NLTK, Flask-RESTful, SpaCy, NetworkX, PySpark, Pandas
  • Paradigms

    Data Science
  • Platforms

    Jupyter Notebook, Linux, H20, Amazon Web Services (AWS), Google Cloud Platform (GCP), Kubernetes
  • Industry Expertise

    Bioinformatics
  • Other

    Machine Learning, Mathematical Modeling, Linear Optimization, Genomics, Numerical Methods, Numerical Simulations, Numerical Modeling, Streamlit, Natural Language Processing (NLP), Slurm Workload Manager, Data Analysis, Mixed-integer Linear Programming, Convex Optimization, Pharmaceuticals, Generative Adversarial Networks (GANs), Computational Physics, Deep Learning, Genetic Algorithms, Data Analytics, Drug Development, Gunicorn, Data Visualization, Artificial Intelligence (AI), Statistics, Large-Scale Computing, CTO
  • Tools

    Spotfire, Git, NGINX, Jenkins, MATLAB, GitHub, Apache Airflow
  • Frameworks

    Flask
  • Storage

    Elasticsearch

Education

  • Ph.D. in Numerical Optimization, Machine Learning, Bioinformatics
    2008 - 2013
    National Technical University of Athens - Athens, Greece
  • Master's Degree in Mechanical Engineering
    2003 - 2008
    National Technical University of Athens - Athens, Greece

To view more profiles

Join Toptal
Share it with others