Janos Horvath, Developer in Santa Clara, CA, United States
Janos is available for hire
Hire Janos

Janos Horvath

Verified Expert  in Engineering

Bio

Janos is an experienced research engineer and academic specializing in machine learning, computer vision, and video processing. His innovative work at Dolby Laboratories and Purdue University drives advancements in Dolby Vision and satellite image forensics. As a prolific author and patent holder, Janos brings a unique blend of technical expertise and visionary leadership, fostering collaborative breakthroughs in high-impact technology.

Portfolio

Dolby Laboratories
Dolby Vision, Video Coding, AI Research, Image Processing, Computer Vision...

Experience

  • C - 12 years
  • Linux - 10 years
  • Python - 9 years
  • Computer Vision - 8 years
  • AI Research - 8 years
  • Machine Learning - 8 years
  • Video Coding - 8 years
  • Image Processing - 8 years

Availability

Part-time

Preferred Environment

Python, Linux, n8n, Agentic AI, AI Chatbots, Interactive Voice Response (IVR), Image Classification, DeepSeek, Bash, Bash Script, Git, SSH, Terminal, Edge Computing, Google Vision API, Drones, YOLOv8, GitHub Actions, NVIDIA CUDA, Correlational Analysis, Feature Engineering, Statistical Analysis, Statistical Modeling, Funnel Analysis, Churn Analysis, Voice Chat, TypeScript, Data Extraction, Roboflow, Reinforcement Learning from Human Feedback (RLHF), LoRa, Supervised Learning, Open-source LLMs, OpenAI GPT-4 API, Text to Image, Graph Databases

The most amazing...

...thing I've done is pioneer a DARPA-funded project on satellite image forensics that revolutionized detection accuracy, fueling my passion for tech innovation.

Work Experience

Senior Research Engineer

2022 - 2025
Dolby Laboratories
  • Developed a space- and time-efficient denoising and super-resolution method that significantly reduced processing time while enhancing image clarity.
  • Engineered a TPB-based compression method that lowered storage requirements while maintaining high video quality.
  • Pioneered a new 360 video codec that improved streaming efficiency and decreased latency in real-time applications.
  • Spearheaded floor plan construction for multiple perspective videos using object-based latent vector aggregation, enhancing reconstruction accuracy and performance.
  • Implemented advanced deep learning models for time-series forecasting (RNN, LSTM, GRU, CNN, and Transformer-based models), achieving improved accuracy through metrics like MAPE, RMSE, MAE, SMAPE, R², and log loss.
  • Gained insights into EV charging trends through industry research, analyzing factors like time of day, weather, and location while identifying grid management challenges.
Technologies: Dolby Vision, Video Coding, AI Research, Image Processing, Computer Vision, API Integration, SQL, APIs, ChatGPT, Data Engineering, Data Integration, Natural Language Processing (NLP), Artificial Intelligence (AI), Data Classification, Data Science, Data Analytics, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), PyTorch, Hugging Face, TensorFlow, Speech Recognition, LLM integration, Windows UI Automation, Automation, Windows, OpenAI API integration, Multithreading, IPC (Inter-Process Communication), Memory management and optimization, Architecture, AI Model Training, Diffusion Models, Image Segmentation, Keras, Deep Learning, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Matplotlib, Forecasting, NumPy, MAPE, RMSE, LSTM, LangChain, OpenAI, LiveKit, OpenAI API, OpenAI GPT-3 API, AutoML, Amazon Web Services (AWS), WebRTC, Pinecone, AI Agents, Vector Databases, Reinforcement Learning, Transformers, Quantization, JavaScript, Object Detection, Chatbot Conversation Design, Chatbots, Pipedrive, Prompt Engineering, Geospatial Analytics, AI Programming, Geospatial Data, Automatic Speech Recognition (ASR), BERT, AWS Lambda, Amazon S3 (AWS S3), Amazon Transcribe, Speech to Text AI, Optical Character Recognition (OCR), Project Scoping, Video & Audio Processing, Technical Analysis, Formulation, Facial Recognition, Text to Speech (TTS), Video transformers, Video Analysis, Hugging Face Transformers, Machine Learning Operations (MLOps), Generative Pre-trained Transformers (GPT), Audio Analysis, Kubernetes, Video Editing, Video Transcoding, Large Language Model Operations (LLMOps), Mobile Development, AWS Cloud Computing Services, OpenCV, iOS, Mobile Design, You Only Look Once (YOLO), Detectron2, Data Analysis, Data Build Tool (dbt), Exploratory Data Analysis, Demand Forecasting, Business Intelligence (BI), Data Visualization, Neural Networks, AI Modeling, Cloud, Technical Leadership, Agile Software Development, Algorithms, Algorithm Design, AI Data Classification, Data Processing, FastAPI, Machine Learning Algorithms, AI Chatbots, Interactive Voice Response (IVR), Software Architecture, AI Model Intergration, Image Generation, Graphics, Stable Diffusion, Conversational AI, Retrieval-augmented Generation (RAG), Fashion, Speech to Text, Whisper, Multimodal Models, Google Speech API, Real-time Data, Jupyter Notebook, Back-end, Web Development, Time Series Analysis, Time Series Forecasting, Multi GPU training, Deep Reinforcement Learning, LLM inference, Text Generation Inference, Synthetic Data Generation, NVIDIA TensorRT, Supervised Fine-tuning Trainer, LLM as a judge, LLM Evaluation BLEU - ROUGE, LM Evaluation Harness, ETL, Neo4j, Google Cloud Platform (GCP), Pandas, Scikit-learn, Image Classification, Bayesian Inference & Modeling, Xarray, AWS SSH Keys, Bash, Bash Script, Git, SSH, Terminal, DJI SDK, Drones, Fine-tuning, GitHub Actions, Google Cloud Functions, Dask, Tekton, LSTM Networks, NVIDIA CUDA, Correlational Analysis, Feature Engineering, Data Scientist, Statistical Analysis, Statistical Modeling, TypeScript, Data Extraction, Claude, Anthropic, Supervised Learning, Unsupervised Learning, Small Language Models (SLMs), Open-source LLMs, OpenAI GPT-4 API, AI Content Creation, Text to Image

Experience

PhD Thesis

During my PhD at Purdue University, I developed innovative methods for verifying the integrity of overhead satellite images as part of the DARPA- and AFRL-funded MediFor and SemaFor programs.

Working under the guidance of Professor Edward J. Delp in the video and image processing (VIPER) laboratory, I created advanced detection algorithms that included a fusion-based method for forensic splicing localization and a data-driven approach for panchromatic imagery copy-paste localization. By integrating state-of-the-art techniques such as vision transformers, deep belief networks, and nested attention U-Nets, I enhanced manipulation detection capabilities and set new benchmarks in digital forensics research. My work has been featured in prominent conferences, including SI22 SPIE Defense + Commercial Sensing, CVPRW, and the International Conference on Acoustics, Speech, and Signal Processing, highlighting its impact on advancing the field.

Education

2018 - 2022

PhD in Electrical and Computer Engineering

Purdue University - West Lafayette, IN, USA

Skills

Libraries/APIs

PyTorch, TensorFlow, Keras, Matplotlib, NumPy, OpenAI API, Hugging Face Transformers, OpenCV, Pandas, Scikit-learn, LSTM, WebRTC, Google Speech API, Google Vision API, Dask

Tools

ChatGPT, Mathematica, You Only Look Once (YOLO), Algorithm Design, Whisper, Git, Terminal, AutoML, Amazon Transcribe, n8n, DJI SDK, DeepSeek

Languages

Python, C, Bash, Bash Script, TypeScript, C++, SQL, JavaScript

Paradigms

Business Intelligence (BI), Synthetic Data Generation, Automation, Mobile Development, Mobile Design, Agile Software Development, ETL

Platforms

Linux, Kubernetes, Jupyter Notebook, NVIDIA CUDA, Docker, Windows, LiveKit, Amazon Web Services (AWS), AWS Lambda, AWS Cloud Computing Services, iOS, Google Cloud Platform (GCP), Kubeflow

Storage

Data Integration, Amazon S3 (AWS S3), Neo4j, Graph Databases

Frameworks

Agentic Frameworks

Industry Expertise

Formulation, Healthcare

Other

AI Research, Computer Vision, Image Processing, Machine Learning, Dolby Vision, Video Coding, API Integration, Data Engineering, Natural Language Processing (NLP), Artificial Intelligence (AI), Data Classification, Data Science, Data Analytics, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), Hugging Face, Speech Recognition, OpenAI API integration, Architecture, AI Model Training, Diffusion Models, Image Segmentation, Deep Learning, Convolutional Neural Networks (CNNs), Forecasting, MAPE, RMSE, LangChain, OpenAI, OpenAI GPT-3 API, Reinforcement Learning, Transformers, Quantization, Pipedrive, Prompt Engineering, Geospatial Analytics, AI Programming, Geospatial Data, Automatic Speech Recognition (ASR), BERT, Speech to Text AI, Optical Character Recognition (OCR), Video & Audio Processing, Technical Analysis, Facial Recognition, Text to Speech (TTS), Video Analysis, Machine Learning Operations (MLOps), Audio Analysis, Video Transcoding, Data Analysis, Data Build Tool (dbt), Demand Forecasting, Data Visualization, Neural Networks, Technical Leadership, Algorithms, AI Data Classification, Data Processing, Agentic AI, Machine Learning Algorithms, AI Chatbots, Interactive Voice Response (IVR), Software Architecture, AI Model Intergration, Image Generation, Conversational AI, Retrieval-augmented Generation (RAG), Speech to Text, Multimodal Models, Real-time Data, Time Series Analysis, Time Series Forecasting, Image Classification, Xarray, SSH, Drones, YOLOv8, Fine-tuning, GitHub Actions, LSTM Networks, Correlational Analysis, Feature Engineering, Data Scientist, Statistical Analysis, Statistical Modeling, Data Extraction, Reinforcement Learning from Human Feedback (RLHF), LoRa, Supervised Learning, Unsupervised Learning, Open-source LLMs, OpenAI GPT-4 API, APIs, LLM integration, Windows UI Automation, Multithreading, Recurrent Neural Networks (RNNs), Pinecone, AI Agents, Vector Databases, Object Detection, Chatbot Conversation Design, Chatbots, Video transformers, Generative Pre-trained Transformers (GPT), Video Editing, Large Language Model Operations (LLMOps), Detectron2, Exploratory Data Analysis, AI Modeling, Cloud, FastAPI, Graphics, Stable Diffusion, Fashion, Back-end, Web Development, Multi GPU training, Deep Reinforcement Learning, Text Generation Inference, NVIDIA TensorRT, Supervised Fine-tuning Trainer, LLM as a judge, LLM Evaluation BLEU - ROUGE, LM Evaluation Harness, Bayesian Inference & Modeling, AWS SSH Keys, Edge Computing, Google Cloud Functions, Health, Medical Software, Funnel Analysis, Churn Analysis, Voice Chat, Claude, Anthropic, Roboflow, Small Language Models (SLMs), AI Content Creation, Text to Image, Audio Processing, IPC (Inter-Process Communication), Memory management and optimization, Project Scoping, Materials Science, Manufacturing, Electronic Health Records (EHR), LLM inference, Tekton

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring