How to Hire a Great AI Engineer
Artificial intelligence (AI) signals forward-thinking and cutting-edge technology to developers and the world. That’s why the process you use to hire AI engineers is now more important than ever.
Many engineers are eager to pivot their careers toward AI development. Creative new applications of artificial intelligence in software development are exciting, but our focus here won’t be on applying existing AI technology services. Instead, we’re looking at how to hire AI engineers capable of developing such services in the first place.
We spoke to leading AI hiring managers about how they screen for the best candidates for their teams. If you’re already comfortable with that, feel free to jump into our section on interview topics.
Finding AI Engineer Candidates
In many ways, finding AI engineers is standard fare: Local outreach, word-of-mouth networking, or exposing the more interesting problems you’re working on via social media are all good tactics.
As mentioned, you may find candidates who are experts at applying AI services, but who are certainly not skilled at pushing into new areas of research. It’s akin to hiring a Linux kernel hacker compared with someone who simply uses Linux as their desktop OS. There are different levels of experience and skill to be seen in both roles, but at their core they’re wildly different. Anyone involved in hiring an AI engineer—recruiters and HR departments included—must understand this distinction.
In fact, one hiring manager was surprised at how many self-styled AI experts would apply for advanced jobs like this but not even understand basics “like algorithms and big-O notation, much less why you would prefer to use a U-net or a resnet or straight machine learning regression versus a complex CNN.”
Sometimes you simply need to call a bluff on someone’s CV. But for applicants who are both honest and self-aware enough, a detailed enough job posting will help filter out a least a few non-starters.
To cite a couple examples of job postings from Matterport:
“We have a broad agenda for deep learning, including semantic labeling and segmentation, 3D object classification and pose estimation, depth from RGB, estimation of unseen 3D surfaces, texture/depth in-filling, key point matching, and dense stereo. We’re also working closely with some of the leading academic research labs in the country.”
The wording we highlighted above should function as an immediately red flag to a candidate who is used to calling the AI cloud for results but would be way in over their head in an AI engineering position. Our second example is perhaps even more direct:
“As a member of the computer vision team, you’ll be responsible for developing robust completely-automated computer vision algorithms to handle any real-world environment…
“We work on […] point cloud alignment, generation/texturing of 3D meshes from point clouds, 3D mesh manipulation, camera calibration, SLAM, multi-view stereo, machine learning, and semantic understanding.”
There’s no room for mistake here. If an applicant tries to fudge their way into such a job, it should be very clear at any step from your background check through interviewing that they are not up to the task. Asking your most critical questions up front at each stage—ideally, your candidate will be returning the favor—will also help.
Job descriptions like the above will also shield you from applicants who perhaps have the skill level but not specific expertise. This may or may not be what you want, though.
Roles and Technical Areas for AI Engineers
There are several technical areas for which you may be aiming to hire. Machine learning—specifically computer vision—is one of the most competitive subsets to fill, as is natural language processing.
It’s important to focus on specifics. That said, one AI hiring manager advised us:
Cast a wider net first. Generic roles like ‘data scientist’ or ‘machine learning scientist’ can mean a lot of different things. If you have a specific domain, ‘computer vision engineer’ or ‘natural language processing’ mean something specific.
In other words, you don’t want high-quality, moldable talent to slink away due to categorical differences that could be overcome with clear direction. Some of these finer details can be sussed out during the interview problem-solving process while you decide whether your candidate has the right mix of personality and skills.
So how specific should you get? Depends on your urgency and whether you’re getting too many or too few applicants. You can always adjust your description if need be.
Academic Publications, Open-source Projects, and Work-life Balance
Most AI engineers will have a degree. Many will also have written and published post-graduate research. But is it important for AI engineers to have open-source projects available as part of their resume? Not always—if you have them, they’ll be looked at, but it’s not necessarily assumed that every spare moment of your life goes into AI work.
In fact, a mature AI engineer is expected to have other obligations. This was a refreshing realization and may even signal a change in culture. To attract the best candidates to your team, respect their commitment to personal time. Their interest in artificial intelligence may well drive their problem-solving passion, but ultimately it may be most helpful to promote a culture of balance.
Math and AI Development
AI development’s popularity means it’s crucial to, assess the mathematical competence of candidates. Taking math skills into account separates those with experience in computer science and interest in AI from those who will be truly effective AI engineers.
A deep knowledge of math—specifically linear algebra—was a good indicator of whether or not an AI engineer would be up to the task. Your candidate should also be able to program and to template real-world solutions: While mathematical knowledge is critical to the role, it does not necessarily mean aptitude in putting that knowledge into practice.
As raw data can be quite messy, you’ll want to test your candidates’ knowledge and speed at cleaning and fixing data before they build models. Additionally, are they able to create rules based on those models to make predictions? Your interviewee should be aware of the biases within their data and should show initiative in thinking of how their data set operates in real life. An AI engineer should also know to consider the ethical duties of working with the data they are cleaning and fixing.
Tried and True Questions
As you gather and create artificial intelligence interview questions, be consistent. The managers we spoke with generally use several variations of a preferred set of questions. This will more accurately measure their knowledge, as well as their fit within your team.
During an interview, it’s unlikely you’ll have time to train a network. However, you can judge an engineer’s knowledge of frameworks using a whiteboard to discuss structure and process.
One of the hiring managers we interviewed mentioned, “A lot of candidates know what Keras is and have run the demos in the examples directory and have seen OpenCV but can’t really tell you how RANSAC or graph cuts work.” As a hiring manager or team leader, acknowledge that while a candidate’s grasp of a framework is useful, fashions in math tend to change much more slowly. For example, Theano may be on its way out; but as its creators note, its concepts live on elsewhere in the ecosystem of deep learning frameworks.
So awareness of the math underlying a framework is indicative of a more versatile candidate. Encourage data application during the interview—but you’re testing math, primarily, not coding.
What sorts of problems would you present your candidates with? Again, AI is a wide field, so not all of these will apply to every specialization an AI engineer will face. But these will give you an idea of the depth you should get into:
- Suppose we have five-dimensional data vectors split apart. What would your candidate do with them? Ask them for more traditional techniques. What information can they gain using traditional math? Take note of their aptitude working with matrices, frame transformations, and as mentioned, linear algebra.
- Provide the candidate with an open-ended, publically available problem and then ask them to solve an NLP on the data to build a basic classifier or regression. There are many solutions, and maybe the interviewer can think of five ways of doing it themselves, even if none of them are the most valuable solution. Here you are looking for minimal performance, whether they can explain what errors it will produce, and how they would measure performance in the first place. Don’t provide any recommendations or cross-validation: This helps weed out people who are merely interested in AI from those who understand the topic deeply.
- For computer vision projects, give the AI engineer an image that has been unwrapped to be just a vector. You’ve taken every row and lined them up next to each other. Ask them to put it back in the shape that it was without the original dimensions. With a fixed number of pixels there will only be a certain number of variations. Define some metrics for defining which ratios are correct. Some notion of complexity—either they may look for a number of edges or apply some transform or large coefficients. They don’t need to be an expert to do something interesting. Pair that with practical problems—you can’t just do one.
- How would you use deep learning for an image classification task with relatively few examples? As deep learning uses complex models (large networks) the best way is to use a pre-trained network. This means the network parameters are already tuned and the limited data would be sufficient to train, especially since with a pre-trained network only the last layers require training. Data augmentation could also be used, which involves transforming the training data to obtain similar, but different training examples. For instance the training images can be slightly zoomed, rotated, reflected and so on.
- What would you look for in a pre-trained network and why? Ideally one trained on a similar dataset; a simpler one in the exploration phase, and then a larger one for fine tuning. Pre-trained networks are networks that have already been tuned to get excellent results on standard data sets, such as ImageNet. Your data set doesn’t have to be identical to one of these, but the closer it is, the better the results. Since there are both smaller and larger pre-trained network architectures available, you can start with a smaller one in order to test quickly, and if the results are promising, move on to a larger pre-trained network.
- A network sometimes overfits a training set. How would you detect and avoid this problem? Overfitting happens when the network does really well (small error) on the training set, and significantly worse on the test set. Small training sets are susceptible to this issue, since the network can end up learning their peculiarities instead of learning generally useful features. As mentioned, data augmentation effectively increases the number of available training samples, so it can help with overfitting. Other ways to prevent overfitting include dropout and regularization, which your candidate should be able to explain in detail.
- Is a multi-layer neural network with linear activation more powerful than a single layer linear neural network? Why? (A rather difficult one, but this would show great understanding of the consequences of linear functions.) Our intuition might tell us that a deep network might be more powerful than a shallow network. However, since each layer is a linear function, and the combination of linear functions is also linear, then no expressive power is gained.
- Why does a multi-layer neural network with sigmoid activations tend to behave linearly if it’s regularized strongly using L2 regularization? (Also difficult, but demonstrates knowledge of regularization and the behavior of the sigmoid function—which is used a lot.) First of all, we have to understand that L2 regularization tends to reduce the magnitude of the weights of the neural network. If we have strong regularization then they will tend to be close to zero, and the logits (the layer prior to the sigmoid activation) will be very small too. Now if you take a look at the sigmoid function below, you will notice that for inputs near zero, the function has a very linear shape/behavior. Therefore, if you regularize a sigmoid network too strongly, you will end up with something like a linear network, with very little expressive power.
Furthermore, some questions specific to deep learning:
- Why are ReLUs better activation functions for deep networks than, for example, sigmoid activations?
- In general, why could it be that the outputs of an autoencoders are blurrier than those of a GAN?
- Why is it that deep networks that use batch normalization are able to train faster?
- What is the basic idea behind residual connections?
- What is the basic idea behind the inception module?
- What are the benefits of SqueezeNet and how does the architecture achieve them?
- What is the main difficulty of training RNNs and what solutions are there?
- How can convolutional neural networks be used for time series analysis?
Remember, there’s no formula that applies to every business and project. If you can’t tell which questions and scenarios are applicable to your context, it’s wise to listen to team members who can.
Steps For Your AI Engineer Interviewees
This will vary depending on your team size.
On a small team, interviewees might expect a conversation. When leading a smaller team you need to consider both technical skill and fit within the team.
On a larger team, the hiring manager might be more concerned with whether or not this interviewee is embracing of company culture. If the hiring manager is out of their depth concerning the specifics of your department or project, the CTO is usually a major part of reaching out to candidates and having the initial conversation.
This initial conversation would last about ten minutes. If the candidate is successful, it’s the norm to have a sit-down with two to four members of the technical team. During this time, they may be asked to solve a preformed function or do some tensor boxing on a whiteboard.
It’s also important to tailor some of your questions to what the interviewee has claimed as their expertise. Due to the popularity of artificial intelligence and machine learning, it’s critical to evaluate whether your candidate has a deep, academic understanding and can solve problems theoretically.
After this step, the candidate may be handed back to the CTO who began the process. While hiring teams are encouraged to be thorough, it’s best not to lag. The popularity of the field of AI development means that truly viable candidates are being bombarded with opportunities.
Making Your Decision
Artificial intelligence and machine learning is a competitive field. There is an exceedingly high demand for engineers who are competent in all the areas mentioned throughout this article.
In making your decision, analyze your candidate’s decision-making process whether in person or virtually. Which frameworks is the applicant employing, and why? Are these methods compatible with the way your team approaches problems even if the frameworks themselves are not?
Allow interviewees the space and time to talk through the structure and process of the choices they are making. As one of our experts emphasized, “We like frameworks. Applicants should know why they make the choices they make when using them, based on lower-level knowledge.”
On that note, best of luck hiring developers as you blaze new trails in AI development.
We would like to thank Kevin Bjorke, Senior Software Engineer at Matterport; Jason Laska, Machine Learning Engineer in R&D at Clara Labs; and Steve Macenski, Senior Software Engineer, Robotics and Navigation Lead at Simbe Robotics for their insights as we put together this article. Thanks also to Toptal freelancers Radu Balaban and Cristian Garcia for their invaluable input.