
Janos Horvath
Verified Expert in Engineering
Research Engineer and AI Developer
Santa Clara, CA, United States
Toptal member since February 17, 2025
Janos is an experienced research engineer and academic specializing in machine learning, computer vision, and video processing. His innovative work at Dolby Laboratories and Purdue University drives advancements in Dolby Vision and satellite image forensics. As a prolific author and patent holder, Janos brings a unique blend of technical expertise and visionary leadership, fostering collaborative breakthroughs in high-impact technology.
Portfolio
Experience
- C - 12 years
- Linux - 10 years
- Python - 9 years
- Computer Vision - 8 years
- AI Research - 8 years
- Machine Learning - 8 years
- Video Coding - 8 years
- Image Processing - 8 years
Availability
Preferred Environment
Python, Linux, n8n, Agentic AI, AI Chatbots, Interactive Voice Response (IVR), Image Classification, DeepSeek, Bash, Bash Script, Git, SSH, Terminal, Edge Computing, Google Vision API, Drones, YOLOv8, GitHub Actions, NVIDIA CUDA, Correlational Analysis, Feature Engineering, Statistical Analysis, Statistical Modeling, Funnel Analysis, Churn Analysis, Voice Chat, TypeScript, Data Extraction, Roboflow, Reinforcement Learning from Human Feedback (RLHF), LoRa, Supervised Learning, Open-source LLMs, OpenAI GPT-4 API, Text to Image, Graph Databases
The most amazing...
...thing I've done is pioneer a DARPA-funded project on satellite image forensics that revolutionized detection accuracy, fueling my passion for tech innovation.
Work Experience
Senior Research Engineer
Dolby Laboratories
- Developed a space- and time-efficient denoising and super-resolution method that significantly reduced processing time while enhancing image clarity.
- Engineered a TPB-based compression method that lowered storage requirements while maintaining high video quality.
- Pioneered a new 360 video codec that improved streaming efficiency and decreased latency in real-time applications.
- Spearheaded floor plan construction for multiple perspective videos using object-based latent vector aggregation, enhancing reconstruction accuracy and performance.
- Implemented advanced deep learning models for time-series forecasting (RNN, LSTM, GRU, CNN, and Transformer-based models), achieving improved accuracy through metrics like MAPE, RMSE, MAE, SMAPE, R², and log loss.
- Gained insights into EV charging trends through industry research, analyzing factors like time of day, weather, and location while identifying grid management challenges.
Experience
PhD Thesis
Working under the guidance of Professor Edward J. Delp in the video and image processing (VIPER) laboratory, I created advanced detection algorithms that included a fusion-based method for forensic splicing localization and a data-driven approach for panchromatic imagery copy-paste localization. By integrating state-of-the-art techniques such as vision transformers, deep belief networks, and nested attention U-Nets, I enhanced manipulation detection capabilities and set new benchmarks in digital forensics research. My work has been featured in prominent conferences, including SI22 SPIE Defense + Commercial Sensing, CVPRW, and the International Conference on Acoustics, Speech, and Signal Processing, highlighting its impact on advancing the field.
Education
PhD in Electrical and Computer Engineering
Purdue University - West Lafayette, IN, USA
Skills
Libraries/APIs
PyTorch, TensorFlow, Keras, Matplotlib, NumPy, OpenAI API, Hugging Face Transformers, OpenCV, Pandas, Scikit-learn, LSTM, WebRTC, Google Speech API, Google Vision API, Dask
Tools
ChatGPT, Mathematica, You Only Look Once (YOLO), Algorithm Design, Whisper, Git, Terminal, AutoML, Amazon Transcribe, n8n, DJI SDK, DeepSeek
Languages
Python, C, Bash, Bash Script, TypeScript, C++, SQL, JavaScript
Paradigms
Business Intelligence (BI), Synthetic Data Generation, Automation, Mobile Development, Mobile Design, Agile Software Development, ETL
Platforms
Linux, Kubernetes, Jupyter Notebook, NVIDIA CUDA, Docker, Windows, LiveKit, Amazon Web Services (AWS), AWS Lambda, AWS Cloud Computing Services, iOS, Google Cloud Platform (GCP), Kubeflow
Storage
Data Integration, Amazon S3 (AWS S3), Neo4j, Graph Databases
Frameworks
Agentic Frameworks
Industry Expertise
Formulation, Healthcare
Other
AI Research, Computer Vision, Image Processing, Machine Learning, Dolby Vision, Video Coding, API Integration, Data Engineering, Natural Language Processing (NLP), Artificial Intelligence (AI), Data Classification, Data Science, Data Analytics, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), Hugging Face, Speech Recognition, OpenAI API integration, Architecture, AI Model Training, Diffusion Models, Image Segmentation, Deep Learning, Convolutional Neural Networks (CNNs), Forecasting, MAPE, RMSE, LangChain, OpenAI, OpenAI GPT-3 API, Reinforcement Learning, Transformers, Quantization, Pipedrive, Prompt Engineering, Geospatial Analytics, AI Programming, Geospatial Data, Automatic Speech Recognition (ASR), BERT, Speech to Text AI, Optical Character Recognition (OCR), Video & Audio Processing, Technical Analysis, Facial Recognition, Text to Speech (TTS), Video Analysis, Machine Learning Operations (MLOps), Audio Analysis, Video Transcoding, Data Analysis, Data Build Tool (dbt), Demand Forecasting, Data Visualization, Neural Networks, Technical Leadership, Algorithms, AI Data Classification, Data Processing, Agentic AI, Machine Learning Algorithms, AI Chatbots, Interactive Voice Response (IVR), Software Architecture, AI Model Intergration, Image Generation, Conversational AI, Retrieval-augmented Generation (RAG), Speech to Text, Multimodal Models, Real-time Data, Time Series Analysis, Time Series Forecasting, Image Classification, Xarray, SSH, Drones, YOLOv8, Fine-tuning, GitHub Actions, LSTM Networks, Correlational Analysis, Feature Engineering, Data Scientist, Statistical Analysis, Statistical Modeling, Data Extraction, Reinforcement Learning from Human Feedback (RLHF), LoRa, Supervised Learning, Unsupervised Learning, Open-source LLMs, OpenAI GPT-4 API, APIs, LLM integration, Windows UI Automation, Multithreading, Recurrent Neural Networks (RNNs), Pinecone, AI Agents, Vector Databases, Object Detection, Chatbot Conversation Design, Chatbots, Video transformers, Generative Pre-trained Transformers (GPT), Video Editing, Large Language Model Operations (LLMOps), Detectron2, Exploratory Data Analysis, AI Modeling, Cloud, FastAPI, Graphics, Stable Diffusion, Fashion, Back-end, Web Development, Multi GPU training, Deep Reinforcement Learning, Text Generation Inference, NVIDIA TensorRT, Supervised Fine-tuning Trainer, LLM as a judge, LLM Evaluation BLEU - ROUGE, LM Evaluation Harness, Bayesian Inference & Modeling, AWS SSH Keys, Edge Computing, Google Cloud Functions, Health, Medical Software, Funnel Analysis, Churn Analysis, Voice Chat, Claude, Anthropic, Roboflow, Small Language Models (SLMs), AI Content Creation, Text to Image, Audio Processing, IPC (Inter-Process Communication), Memory management and optimization, Project Scoping, Materials Science, Manufacturing, Electronic Health Records (EHR), LLM inference, Tekton
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring