
Daniel C Ferreira
Verified Expert in Engineering
Machine Learning & Natural Language Processing Developer
Vienna, Austria
Toptal member since July 25, 2022
Daniel is a Machine Learning expert with a background in mathematics and six years of experience in academia and industry. His specialties lie in applying ML to NLP and cyber-security problems. Daniel has substantial experience in the full lifecycle of ML, which he obtained while working for a leading cybersecurity company and the Technical University of Vienna, among others. He enjoys tackling challenging problems in environments where he can have a strong impact.
Portfolio
Experience
- Linux - 12 years
- Python - 8 years
- Machine Learning - 8 years
- Pandas - 8 years
- Scikit-learn - 8 years
- Docker - 7 years
- Deep Learning - 7 years
- Natural Language Processing (NLP) - 6 years
Availability
Preferred Environment
Linux, TensorFlow, Python, Bash, Pandas, Databricks, Docker, Spark, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Traffic Analysis
The most amazing...
...tool I've developed is a production-ready Machine Learning system that scrapes and categorizes websites based on their content.
Work Experience
Data Scientist
Cyan Security
- Developed a full ML pipeline that takes URLs, fetches the website, extracts text (in any language) and images, and categorizes it using state-of-the-art methods (Transformers, LLMs).
- Built multiple CI/CD pipelines with linting, testing, publishing, and deploying steps.
- Identified and blocked scams, phishing, and other malicious websites using state-of-the-art ML methods (Transformers, LLMs).
- Developed a serverless tool for fetching websites at a massive scale.
- Created a Python library for quickly parsing and extracting text content from HTML.
- Contributed to go-flows, an open-source network traffic flow exporter written in Go.
- Defined a unified REST API for delivering input/output to/from the in-house ML models.
- Developed a Python tool to facilitate extracting Zeek network features from PCAP files.
- Created a Python library to identify nearly identical websites to "fuzzily" deduplicate data.
- Mentored a student developing a tool for detecting DNS tunneling activity.
Researcher
Technical University of Vienna
- Launched and managed a public initiative for cataloging and categorizing network traffic related research papers, including developing multiple assisting tools.
- Researched and prototyped a way to visualize network traffic flows in 2D and aggregate them based on labels.
- Developed a random data generator in Python, explicitly made for clustering research problems.
- Developed City-GAN, a tool that uses GANs to generate building façades, which takes into account the city's style and can generate the same façade in different styles.
- Contributed to the DeepArchitect project, a framework for neural network architecture search.
Researcher
Priberam Labs
- Researched, generated, and published one of the first pre-trained multilingual word embeddings.
- Developed a machine learning model for tackling the "named-entity recognition" problem in multilingual news articles and media.
- Collaborated in defining a REST API for an automated media monitoring tool, developed with multiple industry partners and universities for an H2020 project.
- Assisted in organizing and helping students in a summer Machine Learning school, with a predominantly international attendance.
Experience
MDCGenPy
https://github.com/CN-TU/mdcgenpyTraffic Flow Mapping
https://github.com/dcferreira/network_analysis_feature_reductionMultilingual Embeddings
https://github.com/dcferreira/multilingual-joint-embeddings/NTARC Database
https://www.cn.tuwien.ac.at/network-traffic/ntadatabase/City-GAN
https://github.com/muxamilian/city-ganPersonal Website
https://dcferreira.comToxic News
https://toxicnews.dcferreira.com/Education
Master's Degree in Informatics and Applied Mathematics
Instituto Superior Técnico - Lisbon, Portugal
Bachelor's Degree in Informatics and Applied Mathematics
Instituto Superior Técnico - Lisbon, Portugal
Skills
Libraries/APIs
NumPy, TensorFlow, Pandas, Scikit-learn, BentoML, Keras, Theano, PyTorch, PIL, Node.js, PySpark, SciPy, SpaCy
Tools
Git, Jupyter, PyCharm, LaTeX, Wireshark, TensorBoard, GitLab, GitLab CI/CD, Mathematica, Zsh, Jira, Confluence, AutoML, GitHub
Languages
Python, Bash, C, Java, R, Lisp, Go, JavaScript, SQL, HTML
Platforms
Linux, Databricks, Docker, Amazon Web Services (AWS), Google Cloud Platform (GCP), Zeek, Azure
Frameworks
Spark, Electron, Tailwind CSS
Storage
Data Pipelines, JSON/XML Schemas
Paradigms
Scrum, Agile
Other
Machine Learning, Natural Language Processing (NLP), Deep Learning, Applied Mathematics, Artificial Intelligence (AI), Data Science, Data Analysis, Language Models, Web Scraping, Data Scraping, API Integration, Word Embedding, Generative Pre-trained Transformers (GPT), Fine-tuning, Mathematics, Statistics, Generative Adversarial Networks (GANs), Networks, BERT, DNS, Transformers, Traffic Analysis, Traffic Monitoring, MLflow, Regression Modeling, Data Engineering, APIs, Search Engines, NetFlow, Serverless, FastAPI, IoT Security, Intrusion Detection Systems (IDS), Bokeh, Image Processing, CI/CD Pipelines, Google BigQuery, GitHub Actions
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring