Radu Nedelcu, Data Scientist and AI Developer in London, United Kingdom
Radu Nedelcu

Data Scientist and AI Developer in London, United Kingdom

Member since March 19, 2020
Radu has been writing code since the age of 14. He has since built a career in data science and worked on projects in computer vision, text analysis, and financial data algorithms. His engineering background paired with his algorithm expertise enables him to work on the full pipeline from idea generation to proof of concept, development of the product to bringing it to production. He's excited about his next challenge and can't wait to get started.
Radu is now available for hire

Portfolio

  • SONY
    OpenCV, Python 3, Spark, AWS, Deep Learning, PyTorch
  • University of London
    Data Science, Spark, Hadoop, MapReduce, Financial Data modelling, Jupyter
  • Future Anthem
    Spark, Recommendation Systems, Python 3, Delta Lake, Microsoft Power BI

Experience

Location

London, United Kingdom

Availability

Part-time

Preferred Environment

Transformers, Jupyter Notebook, Keras, Pandas, Bash, PyCharm, PySpark, Scikit-learn, PyTorch, Image Processing

The most amazing...

...thing I've built was a vectorization method of text and entities for a news app such that our users could get news recommendations based on multiple topics.

Employment

  • Senior Data Scientist

    2021 - PRESENT
    SONY
    • Experimented with various technologies to build a better user experience as part of the PlayStation team.
    • Performed requirement gathering from various stakeholders, then collected data and aggregated data from various data sources using technologies such as Alation, Snowflake, Sagemaker, AWS EMR, and Databricks.
    • Worked on player-to-player recommenders based on their game activity,. computer vision-based systems such as shot boundary detectors, blur detectors, salient object detectors, and automatic cropping from game videos.
    • Also worked on changing avatar emotions based on people’s faces using GANs, Dockerizing projects that had to be shared/deployed, and large compute Cluster setups.
    Technologies: OpenCV, Python 3, Spark, AWS, Deep Learning, PyTorch
  • University of London Tutor

    2020 - PRESENT
    University of London
    • Provided online tutor activities for the Bachelor's Degree in Computer Science and Master's Degree in Data Science.
    • Answered student questions about Financial Data Modelling, Hadoop, Spark, Python, and cluster processing.
    • Organized webinars for the students that covered a range of topics and prepared them for their mid-terms and finals.
    • Graded coursework and exams for various modules such as Big Data and Software Development.
    Technologies: Data Science, Spark, Hadoop, MapReduce, Financial Data modelling, Jupyter
  • Senior Data Scientist

    2021 - 2022
    Future Anthem
    • Aggregated data and did data wrangling using PySpark in Databricks on Azure.
    • Set up a recommendation system with 3 subsystems that would recommend games to users.
    • Built a user-item recommendation subsystem based on cosine similarity to make recommendations to new users.
    • Created a sequence-based recommendation system that could be used to make recommendations to early-stage users.
    • Constructed a collaborative filtering system based on implicit feedback using LightFM. The system was trained using the number of plays a user had in a game.
    • Built dashboards and performed data analysis to understand how new Future Anthem customers are performing and to help them get better results.
    • Delivered part of the work via other engineers from the Disruptive Engineering team who I managed.
    Technologies: Spark, Recommendation Systems, Python 3, Delta Lake, Microsoft Power BI
  • Senior Data Scientist

    2020 - 2022
    ContractPod AI
    • Worked on information extraction from legal documents.
    • Built an API to understand whether contracts are signed or not based on computer vision and NLP.
    • Researched methodologies for signature detection and obtained open-source, free data to train on.
    • Fine-tuned Yolo to detect signatures to an accuracy of 80%.
    • Built a dotted line detector to extract lines in documents using OpenCV.
    • Built a graph that represented the document and all the extractions.
    • Built a signature requirement classifier that used an ensemble of mechanisms such as word density, dotted line presence, neighboring words. The classifier had 90% accuracy on the test set.
    • Built a matching algorithm that matched signature requirements to the signatures. The API was deployed on CUDA-enabled Docker containers.
    • Built and conducted interviews to expand the team and offered support and mentorship to the team.
    • Built a contract clause comparison API to understand whether clauses in contracts match pre-approved clauses for multiple languages. Used a pre-trained BERT transformer that was fine-tuned with in-house data and deployed with Docker Containers.
    Technologies: Transformers, Data Science, Natural Language Processing (NLP), NLTK, SpaCy, Flask, Hugging Face, Jupyter
  • Senior Data Scientist

    2020 - 2020
    Sprout AI
    • Led a small team of consultants to improve information extraction from claims.
    • Performed error analysis to understand current system results and what subsystems needed to be improved.
    • Annotated damaged items in insurance claims to build a custom model.
    • Trained an NER detector to detect damaged items in claims using Huggingface Transformers to an F1 score of 75%.
    Technologies: Data Science, Natural Language Processing (NLP), SpaCy, Jupyter
  • Senior Data Scientist

    2020 - 2020
    Foreign, Commonwealth & Development Office - UK Government
    • Defined and explained a number of experiments that could improve information extraction from news worldwide.
    • Scraped news from news websites and cleaned and deduplicated them.
    • Built an MVP of an automated topic detection mechanism in the news using LDA and extracted topic names.
    • Aggregated processed data into a PowerBI visualization.
    Technologies: Gensim, SpaCy, NLTK, Microsoft Power BI, Agile Data Science, Jupyter
  • Senior Data Scientist

    2020 - 2020
    Fortress AI
    • Consulted on the strategic direction to implement machine learning on network devices for home environments.
    • Researched information around adblocking with machine learning and scraped ads and built an MVP of an ad-blocking mechanism using machine learning on JavaScript using TfIdf and logistic regression.
    • Researched information about doing QoS (quality of service) with machine learning and produced a report.
    Technologies: Web Scraping, Scikit-learn, Pandas, Jupyter
  • Technical Trainer

    2020 - 2020
    OpenClassrooms
    • Developed a practical introductory course on deep learning.
    • Wrote a 3-part course that aimed to introduce students to deep learning, focusing on practicality and simple explanations. The course had the main theme of students working for a pizza company that uses machine learning.
    • Focused the first part on the differences between traditional machine learning and deep learning; the second on neurons, how they work, and fully connected networks; and the third part on convolutional neural networks and recurrent neural networks.
    • Developed a number of practical examples that the students are encouraged to follow and develop in their Jupyter Notebooks to better understand and have a reference tool later on.
    Technologies: Linux, Keras, Teamwork, Data Visualization, Pandas, Machine Learning, Jupyter Notebook, Python 3, Jupyter
  • Senior Data Scientist

    2020 - 2020
    Cabinet Office
    • Worked on the discovery and alpha phases aimed at understanding user problems and creating MVPs.
    • Defined and explained a number of experiments that could improve knowledge management, such as faceted search and classifiers for different Tags.
    • Participated in a number of user interviews to better understand their working methods.
    • Wrote a number of small-scale experiments to test ideas.
    • Built, cleaned, and labeled datasets for the tasks.
    • Created a document type classifier that was able to distinguish between documents based on keywords and structure with an Accuracy of 90%. The system used Pika and Spacy in order to extract features and Scikit-learn to build the classifier.
    • Created a duplicate document and near-duplicate document detector using MinHash to make it easy to avoid duplication and understand related documents.
    • Built a 100,000 Node.js knowledge graph using Spacy, DBpedia, Gensim, and Neo4J to better understand connections between people and important topics in the documents.
    • Received a feature for the project in The Times: https://www.thetimes.co.uk/article/ai-trawls-20-000-miles-of-state-papers-j0l9k5gx9.
    Technologies: Linux, Teamwork, Data Visualization, Pandas, Machine Learning, Agile Data Science, Google Docs, Scikit-learn, Jupyter
  • Data Scientist | Machine Learning Engineer

    2019 - 2020
    Ernst & Young
    • Researched public and internal information on ML models for mergers and acquisitions and participated in workshops to generate ideas for potential use cases of ML in the M&A process.
    • Did data cleaning to ensure entities existed at different points in time and correct merging of entities from different datasets based on dates.
    • Created the first proof of concept models for applications of machine learning for M&A using Pandas and Random Forests in Scikit-Learn.
    • Set up the ML architecture to ensure integration with the engineering architecture in Azure and selected Databricks. It allows the use of Spark for cluster-based data processing and MLFlow for experiment tracking and deployment into Kubernetes.
    • Researched and experimented with a number of mechanisms to allow for modeling of imbalanced datasets–weight balancing, blagging (random forests where decision trees use undersampling), undersampling and oversampling, and transfer learning.
    • Analyzed multiple data sources and selected complementary data sources such as CapIQ for financial data, Factiva for news, and Oxford Economics for forecasts.
    • Managed the machine learning team and had duties such as planning the team's workload, providing guidance on priorities, planning the team structure and size, interviewing, and hiring.
    • Participated in user interviews to help shape how we built the algorithms and the platform on which they would be run. A simple product and model explainability were key takeaways.
    • Participated in a number of presentations with the aim of explaining how machine learning works and how it could be used by C-level stakeholders.
    • Implemented a number of best practices in the team, such as random seed start, in order to get accurate scores of our models.
    Technologies: Linux, Keras, Teamwork, Data Engineering, Data Visualization, Pandas, Machine Learning, Agile Data Science, Imblearn, Scikit-learn, MLflow, Databricks, PySpark, Python, Jupyter, Data Scraping
  • Data Scientist and Machine Learning Engineer

    2017 - 2019
    Serendipity AI
    • Helped put in practice a news classifier and created a topic/user-based news recommendation system using NLP.
    • Used named entity detectors from Spacy, DBpedia, and Jaccard Similarity together with Levehnstein distance to detect and match named entities in news and other text data.
    • Developed a new vectorization method for the detected named entities in text and worked on a mechanism to qualify their expertise to different topics.
    • Deployed Spark, Hadoop, and HBase on a cluster of three computers to speed up the machine learning processing.
    • Developed an ML processing pipeline that would allow information to flow to HBase and processed it in parallel using PySpark. Every stage in the pipeline was designed as a microservice with access to only an input and an output table.
    • Implemented a recommendation system using a neural network set up as an autoencoder and cosine similarity from Spotify Annoy.
    • Brought to production level an article judging system. The system had a classification service and a training application. I used Celery to train every night and restart the judging service's worker pool when new models were available.
    • Improved the code quality and reduced repeated code across applications written in Flask and Cherrypy by creating a shared library. Added a logging system based on Python logging that had handlers for local logging and Rollbar.
    • Created a number of APIs using Flask that ran on AWS and connected to Neo4j.
    • Set up a testing framework that would allow APIs to be tested before and after deployment using Jenkins and wrote integration tests for the APIs.
    Technologies: Linux, Teamwork, Data Engineering, Data Visualization, Pandas, Machine Learning, Agile Data Science, SpaCy, Gensim, Scikit-learn, HBase, PySpark, Python, Jupyter, Data Scraping
  • Data Scientist and Machine Learning Engineer

    2017 - 2017
    Cappfinity
    • Researched and integrated an automatic machine learning algorithm picker in Python.
    • Researched Auto-Sklearn (bayesian optimization for algorithm selection), TPOT (genetic algorithms for feature processing and algorithm selection), and NEAT (genetic algorithms for neural network evolution).
    • Developed the architecture for experimentation and result visualization for machine learning algorithms using services built with C# ASP.NET Core and Python-Flask, which communicate via REST and RabbitMQ.
    • Built the system's presentation layer using Angular 4.
    • Wrote a text extraction service from speech using Google Speech to Text API.
    • Integrated MongoDB and connected all the services to it so that they can save processing results.
    • Integrated all the applications in Docker with their own private network and Docker Compose to allow for continuous integration and faster deployment.
    Technologies: Linux, Teamwork, Pandas, Machine Learning, Tree-Based Pipeline Optimization Tool (TPOT), Flask, TensorFlow, Scikit-learn, Python
  • Research Engineer

    2016 - 2017
    Oxehealth
    • Led the data engineering team and worked on big data microservices that would connect cameras installed on-site with Oxehealth’s data warehouse.
    • Worked on Oxehealth’s TechCrunch London live demo that connected a room in Oxford with a human being monitored to the stage in London.
    • Designed and developed the microservices architecture for video data retrieval from customer sites using ZeroMQ, GRPC, and Boost Program Options and Property Tree for C++.
    • Set up a VPN Network to connect customer deployments to a central data repository using pfSense.
    • Built a breathing robot that could replicate different breathing patterns.
    • Designed and developed an application that allowed for multiple room monitoring using Qt.
    Technologies: Teamwork, Data Engineering, Machine Learning, RabbitMQ, ZeroMQ, Python, C++, C
  • Computer Vision and Algorithms Engineer

    2016 - 2016
    Meta Vision Systems
    • Designed the full stack from image capture and processing to point clouds sent over the network using multiple threads and a pipeline architecture to measure oil pipes with lasers and cameras.
    • Wrote general-purpose GPU (GPGPU) code to accelerate image processing algorithms–convolution and point extraction via new kernels or through OpenCV, reducing processing time from the 40s to 40ms for some code paths.
    • Implemented K-means and ordinary least squares algorithms through OpenCV for finding points of interest and then line fitting.
    • Designed and set up the network communication channels to transmit data, commands, and replies using Type Length Value (TLV) messages via Boost ASIO.
    • Designed and developed a logging system using Microsoft ETW.
    • Set up point cloud library (PCL) for surface reconstruction and visualization of STL files and point clouds.
    • Used Boost Property Tree to implement a configuration file parser that uses JSON files.
    • Deployed Jenkins for automatic build verification and to run test cases.
    Technologies: Linux, Teamwork, Machine Learning, CUDA, C++, C, OpenCV
  • Software Engineer

    2013 - 2016
    Qualcomm
    • Wrote the first Windows driver for Qualcomm's NFC chip.
    • Participated in a number of integration activities where I helped set up new platforms with our NFC chip.
    • Worked on the launch of a Windows mobile phone that contained the chip I worked on.
    • Advised other teams across the globe on Windows driver development.
    • Developed a script in PowerShell for improving the team’s efficiency.
    • Debugged customer and partner issues and those arising during testing.
    • Trained new team members from different disciplines such as software engineering and testing.
    Technologies: Linux, Teamwork, C++, C

Experience

  • M&A Predictor

    I built an application that uses the financial data of public companies and predicts whether they will go through a merger or acquisition event. The application was built using financial reports and more recent market data. The predictor had an F1 score of 0.2 - on average, returning 600 companies, of which around 100 were correct.

  • News Recommendation System

    I worked on a news recommendation system that allowed users to follow a range of different topics, such as those extracted by named entity recognition and some topics from the DBPedia Ontology.
    A vector made out of the same features was extracted for all the different types above, and it found recommendations using locality-sensitive hashing from Spotify Annoy.

  • Document Type Classifier

    A classifier that used information about the document structure and keywords inside it to classify documents into one of several types of documents available in the organization. The classification could then be used to make automatic retention or deletion decisions that saved the company millions of pounds.

  • Linked Documents Detector

    A locality-sensitive hashing-based application that allowed for documents to be linked either because of perfect duplication or because they were being used as a template or were versions of another document. The application improved the organization's search systems by adding contextual search.

Skills

  • Languages

    Python 3, Python, C++, C, RDF, Bash
  • Libraries/APIs

    Pandas, Scikit-learn, PySpark, Keras, SpaCy, OpenCV, NLTK, ZeroMQ, TensorFlow, PyTorch
  • Tools

    Jupyter, Git, PyCharm, RabbitMQ, Gensim, Microsoft Power BI, Tree-Based Pipeline Optimization Tool (TPOT), Apache Tika
  • Paradigms

    Concurrent Programming, Data Science, Agile, MapReduce
  • Platforms

    Jupyter Notebook, Linux, CUDA, Databricks
  • Other

    Agile Data Science, Machine Learning, Data Visualization, Imblearn, Data Engineering, Natural Language Processing (NLP), Teamwork, Open Minded, mlfow, Data Scraping, MLflow, Web Scraping, Transformers, Financial Data modelling, Hugging Face, GAN, Deep Neural Networks, Recommendation Systems, Delta Lake, AWS, Deep Learning, Image Processing
  • Frameworks

    Flask, Spark, Hadoop
  • Storage

    Neo4j, HBase

Education

  • Bachelor of Engineering Degree with Honors in Electronic and Communications Engineering
    2010 - 2013
    London Metropolitan University - London, England

Certifications

  • Generative Adversarial Networks (GANs)
    OCTOBER 2021 - PRESENT
    deeplearning.ai
  • Natural Language Processing Specialization
    DECEMBER 2020 - PRESENT
    Coursera - deeplearning.ai
  • Deep Learning Specialization
    FEBRUARY 2020 - PRESENT
    Coursera - deeplearning.ai
  • Machine Learning
    SEPTEMBER 2016 - PRESENT
    Coursera - Stanford Online
  • Cisco Certified Network Associate - Security
    NOVEMBER 2011 - NOVEMBER 2014
    Cisco
  • Auditor/Lead Auditor (ISO 27001:2005)
    JULY 2009 - PRESENT
    IQMS
  • Cisco Certified Network Associate
    DECEMBER 2008 - NOVEMBER 2014
    Cisco

To view more profiles

Join Toptal
Share it with others