Miguel is available for hire

Miguel Vazquez

Verified Expert in Engineering

Data Science Developer

Location

Barcelona, Spain

Toptal Member Since

January 17, 2022

Miguel has a Ph.D. in computer science and bioinformatics. He learned about data mining 20 years ago and discovered a great field of application in Bioinformatics. Since then, Miguel has worked in many areas focusing on cancer genomics (text-mining, systems biology, web development, and workflows). He has written hundreds of thousands of lines of open-source code and developed the Rbbt framework, one of the most effective tools for data analysis, downloaded over a million times.

Machine Learning Data Mining Programming Web Development Data Analytics Data Analysis Data Collection Unix Ruby JavaScript Web Scraping Statistics Python R Databases Bioinformatics Genomic Data Computational Biology

Portfolio

Barcelona Supercomputing Center

Ruby, Python, R, HTML, JavaScript, C, Pipelines, Microsoft HPC, GPT...

Norwegian University of Science and Technology

Programming, Text Mining, Boolean Modeling, Drug Development, Data Analytics...

Spanish National Cancer Center

Programming, Statistics, Text Mining, Genomics, Web Development, Web Services...

Experience

Machine Learning - 20 years Text Mining - 15 years Programming - 15 years Data Analytics - 15 years Unix - 15 years Ruby - 15 years Pipelines - 10 years Statistics - 10 years

Availability

Part-time

Preferred Environment

Linux, Vi, SSH

The most amazing...

...tool I've developed is the Rbbt framework, which has made me one of the most effective programmers in the field of bioinformatics.

Work Experience

Head of Unit

2017 - PRESENT

Barcelona Supercomputing Center

Developed a suite of bioinformatics pipelines covering a wide range of functionalities: DNA and RNA-Seq alignment, variant calling, quantification, clonality, cohort statistics (cancer drivers, survival, etc.), drug response, synergies, modelling...
Contributed to the development of my own bioinformatics framework (Rbbt) with functionalities improving integration on HPC: Flexible and elastic resource allocation, and automatic deployment of heterogeneous workloads across sites and containers.
Developed interactive web portals to visualize terabytes of genomics data, with secure policies respecting data privacy. Integrated and developed multiple data visualization frameworks to explore the multiple data types. On-demand analyses.

Technologies: Ruby, Python, R, HTML, JavaScript, C, Pipelines, Microsoft HPC, GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Artificial Intelligence (AI), NoSQL, Databases, Data Science, Genomics, Data Analytics, Oncology & Cancer Treatment, Biology, Molecular Biology, Computational Biology, Data Manipulation, Large Data Sets, Data Collection

Postdoctoral Researcher

2016 - 2019

Norwegian University of Science and Technology

Developed a complex pipeline for the prioritization of targeted drug combination treatments for cancer based on integrative analysis of multiple genomics data sources, Bayesian statistics modeling, and Boolean cell signaling simulations.
Created a comprehensive resource for gene transcription regulation information based on our own text mining and integration with multiple curated database resources. Resolved cross-species integration, normalization issues, and quality assessment.
Built a tool to assess drug synergies across arrow drug sensitivity assays implementing the most used statistics: CI, Bliss, HSA, etc. Produced interactive plots. Support for massive execution batches using HPC and the Rbbt workflow enactment.

Technologies: Programming, Text Mining, Boolean Modeling, Drug Development, Data Analytics, NoSQL, Databases, Data Science, Genomics, Oncology & Cancer Treatment, Biology, Molecular Biology, Computational Biology, Data Manipulation, Large Data Sets, Data Collection

Postdoctoral Researcher

2010 - 2016

Spanish National Cancer Center

Released own bioinformatics framework Rbbt (Ruby Bioinformatics Toolkit). Arguably, it's the most comprehensive framework for developing bioinformatics applications. The core package (rbbt-util) has been downloaded more than 1.2 million times.
Made essential contributions to the Pancancer Analysis of Whole Genomes (PCAWG), an international project: Web data visualization portals, functional annotation of variants, driver predictions, and pathway enrichment analyses (statistics).
Developed a workflow enactment engine for Rbbt with many advanced features not present in competing solutions: cmdline + HTML + web services, multi-step forking streaming API, HTTP hijacking, and elastic concurrency.

Technologies: Programming, Statistics, Text Mining, Genomics, Web Development, Web Services, Pipelines, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), NoSQL, Databases, Data Science, Data Analytics, Oncology & Cancer Treatment, Biology, Molecular Biology, Computational Biology, Data Manipulation, Large Data Sets, Data Collection

Teaching Assistant

2005 - 2010

Universidad Complutense de Madrid

Developed Rbbt (Ruby Bioinformatics Toolkit) incrementally through the different projects I was involved with. Support for data processing, text-mining, and web development, among other things.
Developed several bioinformatics web applications: Text-mining for the functional description of gene lists through NMF, functional enrichment analyses of genes and proteins across multiple databases, and named-entity recognition and normalization.
Produced the first SOAP and REST web services for the different projects in my group.

Technologies: Programming, Statistics, Machine Learning, Web Development, Web Services, Data Analysis, Text Mining, NoSQL, Databases, Data Manipulation, Large Data Sets, Data Extraction, Data Collection

Freelance Junior Programmer

2002 - 2005

Several Business

Contributed to the development of the system used by Jazztel to manage internal orders and provisions using Java Spring.
Build a tool to extract all literal strings of text in code and replace them with dictionary entries to support the localization of a large web application for car rental.
Developed a clustering model to process survey responses in a sociological study.

Technologies: Perl, Java, Programming, Frameworks, Clustering

Experience

Ruby Bioinformatics Toolkit

http://mikisvaz.github.io/rbbt/

Arguably the most comprehensive toolkit for bioinformatics development. It has been developed for more than a decade. While it has not been actively promoted and is used internally, it has been downloaded more than a million times in RubyGems. It consists of several core packages implementing the different base utilities and several dozen different 'workflows' or domain-specific functionalities. Among other things, it features one of the most advanced workflow enactment systems.

The project was developed with bioinformatics in mind, but its functionalities are applicable to any field of data analysis. It has been used in nearly 50 different projects of all sizes, from small utilities to support larger investigations to crucial components of massive international projects.

PCAWG Scout

https://pcawgscout.bsc.es/

A comprehensive web data visualization portal for the international project Pancancer Analysis of Whole Genomes. This is one of the most ambitious re-sequencing projects that analyzed many terabytes of genomics data. This portal offers on-demand analyses to explore the data in multiple ways; each analysis can be followed up with further analyses. It features multiple visualization tools such as networks, 3D protein visualization, genome browsers, pathway enrichment analyses, and survival statistics.

Text-mining for Transcription Regulation Information

https://extri.org/

Text-mining system for biomedical articles to extract sentences containing transcription regulation interactions. This work is being followed on projects on molecular system modeling, where this information has proven to improve performance significantly, as well as for the generation of focused curation stacks.

Education

2005 - 2010

Ph.D. in Computer Science & Bioinformatics

Universidad Complutense de Madrid - Madrid, Spain

1997 - 2002

Bachelor's Degree in Computer Science

Universidad Complutense de Madrid - Madrid, Spain

Certifications

JUNE 2016 - PRESENT

Management Fundamentals for Scientists and Researchers in Business Administration

IE Business School

Skills

Libraries/APIs

Microsoft HPC

Languages

Ruby, JavaScript, Python, R, HTML, C, Perl, Java

Paradigms

Data Science, Management

Platforms

Unix, Linux

Storage

Databases, NoSQL

Industry Expertise

Bioinformatics

Other

Programming, Algorithms, Text Mining, Web Development, Web Services, Frameworks, Pipelines, Machine Learning, Data Mining, Genomics, Data Analytics, Data Analysis, Vi, SSH, Oncology & Cancer Treatment, Data Manipulation, Data Extraction, Data Collection, Statistics, Clustering, Data Visualization, Web Scraping, Hypothesis Testing, Regression, Biology, Molecular Biology, Computational Biology, Large Data Sets, Accounts, Business, Marketing Mix, Boolean Modeling, Drug Development, Natural Language Processing (NLP), Artificial Intelligence (AI), Deep Learning, GPT, Generative Pre-trained Transformers (GPT)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring