Radu Nedelcu, Data Scientist and AI Developer in London, United Kingdom
Radu Nedelcu

Data Scientist and AI Developer in London, United Kingdom

Member since June 9, 2019
Radu has been writing code since the age of 14. He has since built a career in data science and worked on projects in computer vision, text analysis, and financial data algorithms. His engineering background paired with his algorithm expertise enables him to work on the full pipeline from idea generation to proof of concept, development of the product to bringing it to production. He's excited about his next challenge and can't wait to get started.
Radu is now available for hire

Portfolio

  • ContractPod AI
    Transformers, Data Science, Natural Language Processing (NLP), NLTK, SpaCy
  • Sprout AI
    Data Science, Transformers, Natural Language Processing (NLP), SpaCy
  • University of London
    Data Science, Spark, Hadoop, MapReduce

Experience

Location

London, United Kingdom

Availability

Part-time

Preferred Environment

SpaCy, Transformers, Jupyter Notebook, Keras, Pandas, Bash, PyCharm, PySpark, Scikit-learn

The most amazing...

...thing I've built was a vectorization method of text and entities for a news app such that our users could get news recommendations based on multiple topics.

Employment

  • Senior Data Scientist

    2020 - PRESENT
    ContractPod AI
    • Worked on information extraction from legal documents.
    • Built a feature to understand whether contracts are signed or not based on converting the pages into graphs with nodes being built of words, lines, and signatures.
    • Researched methodologies for signature detection and obtained open-source free data to train on.
    • Fine-tuned Yolo v3 to detect signatures to an accuracy of 80%.
    • Built a dotted line detector to extract lines in documents using OpenCV.
    • Developed a signature requirement classifier that used an ensemble of mechanisms such as word density, dotted line presence, neighboring words. The classifier had 90% accuracy on the test set.
    • Built a matching algorithm that matched signature requirements to the signatures.
    • Created a clause comparison system to understand whether the clauses in contracts match approved clauses.
    Technologies: Transformers, Data Science, Natural Language Processing (NLP), NLTK, SpaCy
  • Senior Data Scientist

    2020 - PRESENT
    Sprout AI
    • Led a small team of consultants to improve information extraction from claims.
    • Performed error analysis to understand current system results and what subsystems needed to be improved.
    • Annotated damaged items in insurance claims to build a custom model.
    • Trained an NER detector to detect damaged items in claims using Huggingface Transformers to an F1 score of 75%.
    Technologies: Data Science, Transformers, Natural Language Processing (NLP), SpaCy
  • MSc. Data Science Tutor

    2020 - PRESENT
    University of London
    • Answered student questions around Hadoop/Spark and cluster processing.
    • Organized tutorials for the students to help them with their Hadoop/Spark questions.
    • Graded coursework submissions - first and second marker.
    Technologies: Data Science, Spark, Hadoop, MapReduce
  • Senior Data Scientist

    2020 - 2020
    Foreign, Commonwealth & Development Office - UK Government
    • Defined and explained a number of experiments that could improve information extraction from news around the world.
    • Scraped news from news websites, and cleaned and deduplicated them.
    • Built an MVP of an automated topic detection mechanism in the news using LDA and extracted topic names.
    • Aggregated processed data into a PowerBI visualization.
    Technologies: Gensim, SpaCy, NLTK, Microsoft Power BI, Agile Data Science
  • Senior Data Scientist

    2020 - 2020
    Fortress AI
    • Consulted on the strategic direction to implement machine learning on network devices for home environments.
    • Researched information around adblocking with machine learning and scraped ads, and built an MVP of an ad-blocking mechanism using machine learning on JavaScript using TfIdf and logistic regression.
    • Researched information about doing QoS (quality of service) with machine learning and produced a report.
    Technologies: Web Scraping, Scikit-learn, Pandas
  • Technical Trainer

    2020 - 2020
    OpenClassrooms
    • Developed a practical introductory course on deep learning.
    • Wrote a 3-part course that aimed to introduce students to deep learning with a focus on practicality and simple explanations. The course had the main theme of students working for a pizza company that uses machine learning.
    • Focused the first part on the differences between traditional machine learning and deep learning; the second on neurons, how they work, and fully connected networks; and the third part on convolutional neural networks and recurrent neural networks.
    • Developed a number of practical examples that the students are encouraged to follow and develop in their own Jupyter Notebooks to gain a better understanding and have a reference tool later on.
    Technologies: Linux, Keras, Teamwork, Data Visualization, Pandas, Machine Learning, Jupyter Notebook, Python 3
  • Senior Data Scientist

    2020 - 2020
    Cabinet Office
    • Worked on the discovery and alpha phases aimed at understanding user problems and creating MVPs.
    • Defined and explained a number of experiments that could improve knowledge management such as faceted search and classifiers for different Tags.
    • Participated in a number of user interviews to better understand their ways of working.
    • Wrote a number of small-scale experiments to test ideas.
    • Built, cleaned, and labeled datasets for the tasks.
    • Created a document type classifier that was able to distinguish between documents based on keywords and structure with an Accuracy of 90%. The system used Pika and Spacy in order to extract features and Scikit-learn to build the classifier.
    • Created a duplicate document and near-duplicate document detector using MinHash in order to make it easy to avoid duplication and understand related documents.
    • Built a 100,000 node knowledge graph using Spacy, DBpedia, Gensim, and Neo4J in order to better understand connections between people and important topics in the documents.
    • The project was mentioned in The Times: https://www.thetimes.co.uk/article/ai-trawls-20-000-miles-of-state-papers-j0l9k5gx9.
    Technologies: Linux, Teamwork, Data Visualization, Pandas, Machine Learning, Agile Data Science, Google Docs, Scikit-learn
  • Data Scientist and Machine Learning Engineer

    2019 - 2020
    Ernst & Young
    • Researched public and internal information on ML models for mergers and acquisitions and participated in workshops to generate ideas for potential use cases of ML in the M&A process.
    • Did data cleaning to ensure entities existed at different points in time and correct merging of entities from different datasets based on dates.
    • Created the first proof of concept models for applications of machine learning for M&A using Pandas and Random Forests in Scikit-Learn.
    • Set up the ML architecture to ensure integration with the engineering architecture in Azure and selected Databricks as it would allow for use of Spark for cluster-based data processing, MLFlow for experiment tracking and deployment into Kubernetes.
    • Researched and experimented with a number of mechanisms to allow for modeling of imbalanced datasets–weight balancing, blagging (random forests where decision trees use undersampling), undersampling and oversampling, and transfer learning.
    • Analyzed multiple data sources and selected complementary data sources such as CapIQ for financial data, Factiva for news, and Oxford Economics for forecasts.
    • Managed the machine learning team and had duties such as planning the team's workload, providing guidance on priorities, planning the team structure and size, interviewing, and hiring.
    • Participated in user interviews to help shape both how we build the algorithms and the platform on which they would be run. A simple product and model explainability were key takeaways.
    • Participated in a number of presentations with the aim of explaining how machine learning works and how it could be used by C-level stakeholders.
    • Implemented a number of best practices in the team, such as random seed start, in order to get accurate scores of our models.
    Technologies: Linux, Keras, Teamwork, Data Engineering, Data Visualization, Pandas, Machine Learning, Agile Data Science, Imblearn, Scikit-learn, MLflow, Databricks, PySpark, Python
  • Data Scientist and Machine Learning Engineer

    2017 - 2019
    Serendipity AI
    • Helped put in practice a news classifier and created a topic/user based news recommendation system using NLP.
    • Used named entity detectors from Spacy, DBpedia, and Jaccard Similarity together with Levehnstein distance to detect and match named entities in news and other text data.
    • Developed a new vectorization method for the detected named entities in text and worked on a mechanism that would qualify their expertise to different topics.
    • Deployed Spark, Hadoop, and HBase on a cluster of three computers in order to speed up machine learning processing.
    • Developed an ML processing pipeline that would allow information to flow to HBase and processed it in parallel using PySpark. Every stage in the pipeline was designed as a microservice which had access to only an input and an output table.
    • Implemented a recommendation system using a neural network set up as an autoencoder and cosine similarity from Spotify Annoy.
    • Brought to production level an article judging system. The system had a classification service and a training application. I used Celery to train every night and to restart the worker pool of the judging service when new models were available.
    • Improved the code quality and reduced repeated code across applications written both in Flask and Cherrypy by creating a shared library. Added a logging system based on Python logging that had handlers for local logging and Rollbar.
    • Created a number of APIs using Flask that ran on AWS and connected to Neo4j.
    • Set up a testing framework that would allow APIs to be tested before and after deployment using Jenkins, and wrote integration tests for the APIs.
    Technologies: Linux, Teamwork, Data Engineering, Data Visualization, Pandas, Machine Learning, Agile Data Science, SpaCy, Gensim, Scikit-learn, HBase, PySpark, Python
  • Data Scientist and Machine Learning Engineer

    2017 - 2017
    Cappfinity
    • Researched and integrated an automatic machine learning algorithm picker in Python.
    • Researched auto-sklearn (bayesian optimization for algorithm selection), TPOT (genetic algorithms for feature processing and algorithm selection), and NEAT (genetic algorithms for neural network evolution).
    • Developed the architecture for experimentation and result visualization for machine learning algorithms using services built with C# ASP.net Core and Python-Flask which communicate via REST and RabbitMQ.
    • Built the system's presentation layer using Angular 4.
    • Wrote a text extraction service from speech using Google Speech to Text API.
    • Integrated MongoDB and connected all the services to it so that they can save processing results.
    • Integrated all the applications in Docker with their own private network and Docker Compose to allow for continuous integration and faster deployment.
    Technologies: Linux, Teamwork, Pandas, Machine Learning, Tree-Based Pipeline Optimization Tool (TPOT), Flask, TensorFlow, Scikit-learn, Python
  • Research Engineer

    2016 - 2017
    Oxehealth
    • Led the data engineering team and worked on big data micro-services that would connect cameras installed on-site with Oxehealth’s data warehouse.
    • Worked on Oxehealth’s TechCrunch London live demo that connected a room in Oxford with a human being monitored to the stage in London.
    • Designed and developed the microservices architecture for video data retrieval from customer sites using ZeroMQ, GRPC, and Boost Program Options and Property Tree for C++.
    • Set up a VPN Network to connect customer deployments to a central data repository using pfSense.
    • Built a breathing robot that could replicate different breathing patterns.
    • Designed and developed an application that allowed for multiple room monitoring using Qt.
    Technologies: Teamwork, Data Engineering, Machine Learning, RabbitMQ, ZeroMQ, Python, C++, C
  • Computer Vision and Algorithms Engineer

    2016 - 2016
    Meta Vision Systems
    • Designed the full stack from image capture and processing to point clouds sent over the network using multiple threads and a pipeline architecture in order to measure oil pipes with lasers and cameras.
    • Wrote general purpose GPU (GPGPU) code to accelerate image processing algorithms–convolution and point extraction via new kernels or through OpenCV, reducing processing time from 40s to 40ms for some code paths.
    • Implemented algorithms such as K-means and ordinary least squares through OpenCV for finding points of interest and then line fitting.
    • Designed and set up the network communication channels for transmission of data, commands, and replies using Type Length Value (TLV) messages via Boost ASIO.
    • Designed and developed a logging system using Microsoft ETW.
    • Set up point cloud library (PCL) for surface reconstruction and for visualization of STL files and point clouds.
    • Used Boost Property Tree to implement a configuration file parser that uses JSON files.
    • Deployed Jenkins for automatic build verification and to run test cases.
    Technologies: Linux, Teamwork, Machine Learning, CUDA, C++, C, OpenCV
  • Software Engineer

    2013 - 2016
    Qualcomm
    • Wrote the first Windows driver for Qualcomm's NFC chip.
    • Participated in a number of integration activities where I helped set up new platforms with our NFC chip.
    • Worked on the launch of a Windows mobile phone that contained the chip I worked on.
    • Advised other teams across the globe on Windows driver development.
    • Developed a script in PowerShell for improving the team’s efficiency.
    • Debugged customer and partner issues and those arising during testing.
    • Trained new team members from different disciplines such as software engineering and testing.
    Technologies: Linux, Teamwork, C++, C

Experience

  • M&A Predictor

    I built an application that uses financial data of public companies and predicts whether they will go through a merger or acquisition event. The application was built using financial reports as well as more recent market data. The predictor had an F1 score of 0.2 - on average returning 600 companies of which around 100 were correct.

  • News Recommendation System

    I worked on a news recommendation system that allowed users to follow a range of different topics such as those extracted by named entity recognition as well as some topics from the DBPedia Ontology.
    A vector made out of the same features was extracted for all the different types above and it found recommendations using locality sensitive hashing from Spotify Annoy.

  • Document Type Classifier

    A classifier that used information about the document structure and keywords inside it in order to classify documents into one of several types of documents available in the organization. The classification could then be used to make automatic retention or deletion decisions that saved the company millions of pounds.

  • Linked Documents Detector

    A locality sensitive hashing based application that allowed for documents to be linked either because of perfect duplication or because they were being used as a template or were versions of another document. The application improved the organization's search systems by adding contextual search.

Skills

  • Languages

    Python 3, Python, C++, C, RDF, Bash
  • Libraries/APIs

    Pandas, Scikit-learn, PySpark, Keras, SpaCy, OpenCV, NLTK, ZeroMQ, TensorFlow
  • Tools

    Jupyter, Git, PyCharm, RabbitMQ, Gensim, Microsoft Power BI, Tree-Based Pipeline Optimization Tool (TPOT)
  • Paradigms

    Concurrent Programming, Data Science, Agile, MapReduce
  • Platforms

    Jupyter Notebook, Linux, CUDA, Databricks
  • Other

    Agile Data Science, Machine Learning, Data Visualization, Imblearn, Data Engineering, Natural Language Processing (NLP), Teamwork, Open Minded, mlfow, Data Scraping, MLflow, Web Scraping, Transformers
  • Frameworks

    Flask, Spark, Hadoop
  • Storage

    Neo4j, HBase

Education

  • Bachelor of Engineering degree with honors in Electronic and Communications Engineering
    2010 - 2013
    London Metropolitan University - London, England

Certifications

  • Natural Language Processing Specialization
    DECEMBER 2020 - PRESENT
    Coursera - deeplearning.ai
  • Deep Learning Specialization
    FEBRUARY 2020 - PRESENT
    Coursera - deeplearning.ai
  • Machine Learning
    SEPTEMBER 2016 - PRESENT
    Coursera - Stanford Online
  • Cisco Certified Network Associate - Security
    NOVEMBER 2011 - NOVEMBER 2014
    Cisco
  • Auditor/Lead Auditor (ISO 27001:2005)
    JULY 2009 - PRESENT
    IQMS
  • Cisco Certified Network Associate
    DECEMBER 2008 - NOVEMBER 2014
    Cisco

To view more profiles

Join Toptal
Share it with others