Data Science and Databases

Showing 1-16 of 139 results

Share

Fine-tuning LLMs for Your Industry: Optimal Data Labeling Strategies

LLMs have a vast knowledge base, but training them with domain-specific data can extend their capabilities to specialized industries and tasks. This article delves into data labeling for fine-tuning and includes a step-by-step tutorial for training GPT-4o.

18-minute readContinue Reading
Jedrzej Kardach

Jedrzej Kardach

Jedrzej is a machine learning engineer who specializes in AI and data science. He has delivered several NLP-based classification algorithms and reinforcement learning solutions to clients, and has worked alongside researchers at Princeton University developing ML and data analytics tools. Jedrzej has partnered with clients in multiple industries, including service, finance, and insurance.

Architecting Effective Data Labeling Systems for Machine Learning Pipelines

Machine learning models are trained on massive datasets in which each data point is labeled to give it context and meaning. This deep dive describes how to build a data labeling architecture from scratch, with a focus on workflow, security, and data quality.

16-minute readContinue Reading
Reza Fazeli

Reza Fazeli

Reza is a machine learning engineer specializing in natural language processing and computer vision. At IBM, he developed machine learning algorithms designed to improve text classification and automate model training, innovations that resulted in six patents. Reza has a master’s degree in engineering from the University of Toronto.

Theory, Tools, and Business Applications: An In-depth Look at Quantum Computing

Quantum computing is challenging the realities of technology, security, and industry as we know them. Here, we investigate the nuances of quantum mechanics and how to enter the world of quantum software development with tools such as Cirq and TensorFlow Quantum.

22-minute readContinue Reading
Joao Diogo de Oliveira

Joao Diogo de Oliveira

Joao is an AI developer who holds a Quantum Excellence Certificate from IBM. He specializes in machine learning and deep learning and has partnered with Fortune 100 companies like Procter & Gamble and Hearst. Joao has more than 14 years of experience and holds a master’s degree in computer engineering from the University of Porto.

Advancing AI Image Labeling and Semantic Metadata Collection

Image labeling can be a tedious, time-consuming task, compounded by the sheer volume of data needed to train deep neural networks. This article breaks down large data set processing and explains how a new SaaS product can help automate image labeling.

13-minute readContinue Reading
Neven Pičuljan

Neven Pičuljan

Neven is an artificial intelligence engineer with extensive experience in machine learning, computer vision, algorithms, and a range of AI-related technologies. Prior to founding an AI R&D consulting company, Neven helped create and train cutting-edge computer vision models used by healthcare, e-commerce, real estate, and financial services companies across the globe.

Apache Spark Optimization Techniques for High-performance Data Processing

Apache Spark is an analytics engine that can handle very large data sets. This guide reveals strategies to optimize its performance using PySpark.

11-minute readContinue Reading
Necati Demir, PhD

Necati Demir, PhD

Necati is a software engineer specializing in data science, machine learning, back-end development, and DevOps. He is an AWS Certified Solutions Architect and AWS Certified Machine Learning Specialist with a doctorate in computer engineering. Necati serves as Chief AI Officer and CTO of Datagran, a machine learning automation company that he co-founded.

World-class articles, delivered weekly.

By entering your email, you are agreeing to our privacy policy.

5 Pillars of Responsible Generative AI: A Code of Ethics for the Future

Generative AI advances raise new questions around data ownership, content integrity, algorithmic bias, and more. Here, three experts at the forefront of NLP present recommendations for developing ethical generative AI solutions.

12-minute readContinue Reading
Madelyn Douglas

Madelyn Douglas

Madelyn is the Lead Editor of Engineering at Toptal and a former software engineer at Meta. She has more than six years of experience researching, writing, and editing for engineering publications, specializing in emerging technologies and AI. She previously served as an editor at USC’s Viterbi School of Engineering and her research on engineering ethics was published at IEEE’s NER 2021 conference.

In this ask-me-anything-style Q&A, leading Toptal AI developer Joao Diogo de Oliveira fields questions from fellow engineers about resources for pivoting to ML, approaches to large language models, and the most critical future applications of AI.

6-minute readContinue Reading
Joao Diogo de Oliveira

Joao Diogo de Oliveira

Joao is an AI developer with more than 10 years of experience at Fortune 100 companies like Procter & Gamble and startups in the healthcare, energy, and finance industries. Joao holds a master’s degree in computer science from the University of Porto and has multiple certifications in ML and deep learning.

Advantages of AI: Using GPT and Diffusion Models for Image Generation

Generative AI is taking the world by storm, with potentially profound impacts on the content we create. Learn the basics of AI image generation and produce sophisticated artistic renderings with this tutorial.

7-minute readContinue Reading
Juan Manuel Ortiz de Zarate

Juan Manuel Ortiz de Zarate

Juan is a developer, data scientist, and doctoral researcher at the University of Buenos Aires where he studies social networks, AI, and NLP. Juan has more than a decade of data science experience and has published papers at ML conferences including SPIRE and ICCS.

Ask an NLP Engineer: From GPT Models to the Ethics of AI

Want to expand your skills amid the current surge of revolutionary language models like GPT-4? In this ask-me-anything-style tutorial, Toptal data scientist and AI engineer Daniel Pérez Rubio fields questions from fellow programmers on a wide range of machine learning, natural language processing, and artificial intelligence topics.

10-minute readContinue Reading
Daniel Pérez Rubio

Daniel Pérez Rubio

Daniel is a data scientist, developer, and former CTO who has specialized in NLP for more than six years, most recently focusing on large language models (LLMs). His experience includes being a senior data scientist at BASF and Daimler.

An Expert Workaround for Executing Complex Entity Framework Core Stored Procedures

Microsoft’s Entity Framework Core is a popular object-relational mapper, but it doesn’t support the return of complex type results from stored procedures. A clever bit of code gets us around this limitation, returning non-database entities with ease.

5-minute readContinue Reading
Pankaj Kansodariya

Pankaj Kansodariya

Pankaj is a back-end developer and Microsoft Certified Professional with more than 18 years of experience within the Microsoft ecosystem, including C#, VB.NET, SQL Server, and cloud computing with Microsoft Azure. He has worked as a .NET developer at companies including Granicus, Gartner, and Jacobs.

Strategic Listening: A Guide to Python Social Media Analysis

Listening is everything—especially when it comes to effective marketing and product design. Gain key market insights from social media data using sentiment analysis and topic modeling in Python.

9-minute readContinue Reading
Federico Albanese

Federico Albanese

Federico is an expert Python developer and data scientist who has worked at Facebook, implementing deep learning models. He is a university lecturer, and his PhD research focuses on natural language processing and machine learning.

Mining for Data Clusters: Social Network Analysis With R and Gephi

Explore X (formerly Twitter) data clusters to uncover user behaviors (e.g., repost and reply patterns) within online communities. This guide focuses on a politically charged data set to illustrate the process of visualizing and analyzing social data.

8-minute readContinue Reading
Juan Manuel Ortiz de Zarate

Juan Manuel Ortiz de Zarate

Juan is a developer, data scientist, and doctoral researcher at the University of Buenos Aires where he studies social networks, AI, and NLP. Juan has more than a decade of data science experience and has published papers at ML conferences, including SPIRE and ICCS.

Supply Chain Optimization Using Python and Mathematical Modeling

Improving supply chains is a top priority worldwide. Discover how mathematical optimization and Python coding can help keep a complex supply chain competitive.

14-minute readContinue Reading
Michael Hopf

Michael Hopf

Michael is a developer, data science consultant, and expert in supply chain optimization for heavy industry clients in rail, vessels, and mining. He has a PhD in mathematical optimization and was a consultant in the analytics practice of McKinsey & Company’s QuantumBlack for four years.

Identifying the Unknown With Clustering Metrics

Clustering in machine learning has a variety of applications, but how do you know which algorithm is best suited to your data? Here’s how to amplify your data insights with comparison metrics, including the F-measure.

12-minute readContinue Reading
Surbhi Gupta

Surbhi Gupta

Surbhi is a data scientist and developer with expertise in machine learning and robotics. A former senior data science engineer at Utopia Global, her experience ranges across ML fields including NLP, computer vision, and OCR. She has a master’s degree in mechatronics, and has published research in the field of robotics and optimization.

Python vs. R: Syntactic Sugar Magic

Python and R empower data scientists to solve problems using elegant syntactic sugar, simplifying coding and solution exploration. Each language brings its unique capabilities and approach to bear.

7-minute readContinue Reading
Leandro Roser

Leandro Roser

Leandro is a data scientist and machine learning developer who creates solutions for companies, including Profasee and Boston Consulting Group. He has expertise with TensorFlow, Spark, Python, and R.

Social Network Analysis Using Power BI and R: A Custom Visuals Guide

Microsoft’s Power BI is one of the most popular software solutions used to perform social network analysis. Here’s how to create custom Power BI visuals in R for compelling and flexible results.

14-minute readContinue Reading
Bharat Garg

Bharat Garg

Bharat is a data scientist and developer who specializes in designing and developing interactive reports and tools to facilitate decision-making. He has worked with small startups and large corporations, such as Comcast, MetLife, UnitedHealth Group/Optum, and Jefferson Health. One of Bharat’s projects delivered $6 million in revenue, and another delivered $10 million in savings.

Toptal Engineering Expert

Gabriel Courtemanche

Gabriel is a highly efficient and reliable professional who possesses a broad skill set for web application development. He's been working on a range of products and clients—from working on scalability problems in production engineering teams at Shopify and Autodesk to launching new applications for startups. Most of his work consists of leading technical teams, by creating an easy development environment, fixing technical debts, providing best practices code examples, and mentoring devs.
Read more

Previously At

Shopify

World-class articles, delivered weekly.

By entering your email, you are agreeing to our privacy policy.

Join the Toptal® community.