
Jan Krepl
Verified Expert in Engineering
Machine Learning Engineer and Developer
Geneva, Switzerland
Toptal member since July 14, 2023
Jan is a lead machine learning (ML) engineer with expertise in cloud architecture, full-stack development, and freelance software engineering across diverse projects. He designs and deploys machine learning solutions with a strong focus on natural language processing, computer vision, and time series analysis, and seamlessly embeds them into full-stack applications. Outside work, Jan contributes to open-source projects and shares technical knowledge through educational content.
Portfolio
Experience
- Python - 9 years
- Machine Learning - 7 years
- Natural Language Processing (NLP) - 7 years
- Deep Learning - 6 years
- Machine Learning Operations (MLOps) - 5 years
- Amazon Web Services (AWS) - 5 years
- PyTorch - 5 years
- Large Language Models (LLMs) - 4 years
Preferred Environment
Python, Machine Learning, Notion, MongoDB, Amazon Web Services (AWS)
The most amazing...
...thing I've developed is a question-answering tool extracting knowledge from scientific papers.
Work Experience
Machine Learning and Back-end Engineer (via Toptal)
Software Company
- Implemented internal document RAG pipeline using technologies like pgvector and docling.
- Integrated inside a Django back end (DRF, Postgres, Celery) and deployed in Azure (Container Apps).
- Implemented multiple document comparison pipelines.
Experienced AI Developer (via Toptal)
Creative Tech Firm
- Designed and built a Python MCP server, enhancing the creative process.
- Stored relevant data in a knowledge graph using Neo4j.
- Shipped the MCP server as a Python package and the whole infrastructure via Docker Compose.
RAG Full-stack Developer (via Toptal)
Construction Company
- Built a full-stack application with RAG functionalities that help fill out forms (looking for new clients, internal procedures, etc.).
- Deployed on AWS using Terraform and Terragrunt while keeping complexity low.
- Advised on best practices and documented all the work.
Machine Learning Section Manager | Blue Brain Project
The EPFL
- Designed a literature search system focused on semantic search, question answering, named entity recognition, and entity linking, built on top of recent large language models. The entire system was deployed at scale with Kubernetes and AWS.
- Managed a team of four experienced machine learning engineers.
- Acted as a lead developer enforcing best practices.
Back-end Developer, AI Engineer, Cloud Architect (via Toptal)
AI Infrastructure Company
- Built an automated knowledge graph generation back end (FastAPI). Used various vector databases and LLMs.
- Built a Python SDK for the back-end API that was focused on developers.
- Deployed the back end on AWS using Terraform and Terragrunt.
- Implemented unit and integration tests covering the essential functionalities.
Senior Data Scientist (via Toptal)
Innovative Financial Services Company
- Advised on optimal code structure and back-end development.
- Advised on AWS Lambda best practices and optimizations.
- Advised on ML-related topics (pandas and scikit-learn).
ML Developer (via Toptal)
Biotech Company
- Onboarded an open-source (BERT-like) model to the client's platform.
- Advised on best practices (Python, FastAPI, and back-end).
- Advised on best practices (machine learning and deep learning).
Senior AI Developer (via Toptal)
Technology Company at the Intersection of AI, Design, and Science
- Built the back end with FastAPI. Added many LLM wrapper endpoints that integrated with internal data.
- Performed ETL on web scraping datasets and made it available via the back end.
- Handled deployment on AWS (EC2, S3) using Terraform (IaC).
Data Scientist (via Toptal)
Private Trading Firm
- Turned POC Jupyter notebooks into a production-grade Python package. The code backtests a trading algorithm given some parameters.
- Implemented hyperparameter search using Optuna, which allowed us to find the optimal trading parameters.
- Generated Weights & Biases dashboards that helped with feature selection and hyperparameter optimization.
Senior NLP Developer (via Toptal)
US Research Institution
- Optimized and deployed a custom sentiment analysis model (based on BERT) on AWS (SageMaker).
- Wrote a full FastAPI back end for a web application and deployed it on AWS. Collaborated with a React front-end developer to deliver the web application—a standard three-tier web application with extra ML model inference endpoints.
- Contributed to batch model inference on internal data together with LLM APIs. Made data available via the back end.
Machine Learning Engineer | Blue Brain Project
The EPFL
- Conceived and implemented a supervised algorithm for 2D brain slice image registration that became a part of internal workflows.
- Developed a knowledge extraction pipeline for scientific articles with main functionalities such as parsing, neural search, and named entity recognition.
- Engaged directly in various neuroscientific projects, including neuron-type classification with graph neural networks and morphology image synthesis with generative adversarial networks.
Data Scientist
Nectar Financial
- Enhanced internal portfolio optimization algorithms with return forecasting using supervised learning techniques. Added custom constraints and objective functions, making the tool more flexible.
- Applied text embedding algorithms, such as Doc2Vec and TF-IDF, on hedge fund fact sheets and reports. In turn, these embeddings were used for clustering, which allowed for better diversification.
- Developed a custom back-testing framework considering various hedge-fund-specific constraints like lock-ups.
Quantitative Risk Analyst
UBS
- Maintained the Lombard lending section's stress-testing codebase that used Visual Basic, SQL, and SAS.
- Generated regular risk reports used as inputs for other departments.
- Supported senior analysts in creating custom risk models.
Experience
Mildlyoverfitted | Educational Videos
https://www.youtube.com/@mildlyoverfitted/DeepDow | Portfolio Optimization with Deep Learning
https://github.com/jankrepl/deepdow/• Forecasting the market's future evolution, such as long short-term memory networks (LSTM) and generalized autoregressive conditional heteroskedasticity (GARCH).
• Providing optimization problem designs and solutions, such as convex optimization.
It does so by constructing a pipeline of layers. The last layer performs the allocation, and all the previous ones serve as feature extractors. The overall network is fully differentiable, and one can optimize its parameters by gradient descent algorithms.
MLtype | Command Line Tool
https://github.com/jankrepl/mltype/Atlas Alignment | Multimodal Registration and Alignment
https://github.com/BlueBrain/atlas-alignment/PyChubby | Automated Face-warping Tool
https://github.com/jankrepl/pychubby/Distortion Catcher
Education
Master's Degree in Quantitative Finance
ETH Zurich - Zurich, Switzerland
Bachelor's Degree in Economics
Charles University - Prague, Czechia
Certifications
Microsoft Certified: Azure Fundamentals
Microsoft
AWS Certified Solutions Architect - Professional
Amazon Web Services
HashiCorp Certified: Terraform Associate (003)
HashiCorp
AWS Certified Solutions Architect - Associate
Amazon Web Services
Google Cloud Certified Professional Machine Learning Engineer
Google Cloud
AWS Certified Machine Learning - Specialty
Amazon Web Services
Databricks Certified Associate Developer for Apache Spark 3.0
Databricks Inc.
AWS Certified Cloud Practitioner
Amazon Web Services
CKAD: Certified Kubernetes Application Developer
The Linux Foundation
Professional Scrum Master (PSM I)
Scrum.org
CFA Level I (Passed)
CFA Institute
Skills
Libraries/APIs
PyTorch, Scikit-learn, NumPy, Keras, Hugging Face Transformers, SciPy, Pandas, Matplotlib, REST APIs, TensorFlow, Asyncio, Python Asyncio, JAX, React, React Query, OpenCV, SpaCy, Terragrunt, OpenAI API, Pydantic, Ray Serve
Tools
Vim Text Editor, Git, GitLab CI/CD, Pytest, TensorBoard, GitLab, GitHub, ChatGPT, Notion, Amazon SageMaker, Cloud Dataflow, Google Compute Engine (GCE), AWS Glue, Terraform, Inkscape, Apache Airflow, Auth0, Amazon Cognito, Adobe Premiere Pro, Seaborn, Gensim, StatsModels, Scikit-image, Google Kubernetes Engine (GKE), Amazon Elastic Container Service (ECS), MongoDB Atlas, Celery, Docker Compose
Languages
Python, JavaScript, TypeScript, CSS, SQL, SAS, Excel VBA, Python 3, Cypher
Paradigms
Unit Testing, Test-driven Development (TDD), REST, Scrum, Agile Software Development, Entity Component System (ECS), Model Context Protocol (MCP)
Platforms
Kubernetes, Docker, Amazon Web Services (AWS), Jupyter Notebook, Vertex AI, Amazon EC2, Google Cloud Platform (GCP), AWS Lambda, AWS ALB, Azure, Weights & Biases, LocalStack
Storage
Elasticsearch, PostgreSQL, Google Cloud Storage, Amazon S3 (AWS S3), Redis, Redis Cache, NoSQL, Neo4j, MongoDB, MySQL
Frameworks
Next.js, Tailwind CSS, Apache Spark, Optuna, Ray, Django, Django REST Framework
Other
Probability Theory, Mathematical Analysis, Linear Algebra, Statistics, Machine Learning, Portfolio Optimization, Orchestration, Machine Learning Operations (MLOps), Shell Scripting, Generative Pre-trained Transformers (GPT), BERT, Sphinx, Natural Language Processing (NLP), FastAPI, Finance, Data Science, Computer Vision, OpenAI GPT-4 API, Artificial Intelligence (AI), Hugging Face, APIs, Regular Expressions, Natural Language Understanding (NLU), Algorithms, Back-end Development, Language Models, Pub/Sub, Full-stack Development, Large Language Models (LLMs), Technical Leadership, Leadership, Retrieval-augmented Generation (RAG), OpenAI, Back-end, Containerization, Serverless, SDKs, Pinecone, Cloud, Vite, Full-stack, FAISS, Vector Search, AI Agents, AI Chatbots, Optimization, Microeconomics, Macroeconomics, Mathematical Finance, Quantitative Risk Analysis, Numerical Methods, MLflow, LangChain, Time Series Analysis, Product Consultant, Web Scraping, Measure Theory, Econometrics, Private Company Valuation, Deep Learning, Scrum Master, CI/CD Pipelines, Online Course Design, Recurrent Neural Networks (RNNs), Open Source, Image Registration, Data Versioning, Google BigQuery, Text-to-text Transfer Transformer (T5), Trading, Automated Trading Software, Algorithmic Trading, Quantitative Analysis, Data Classification, OpenAI SDK, Transformers, Httpx, Amazon API Gateway, Semantic Search, Vector Databases, Railway, AI Research, Knowledge Graphs, Cursor AI, Agentic AI, Pgvector, Document Processing
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring