Roman Semeine
Verified Expert in Engineering
Data Scientist and Software Developer
Roman is a data scientist with extensive experience managing large technical teams and complex software projects. Due to his diverse background, he can blend low-level data engineering with advanced analytics and cutting-edge artificial intelligence.
Portfolio
Experience
Availability
Preferred Environment
Git, Linux
The most amazing...
...thing I've coded is a GPU-powered database for the storage of time-series data.
Work Experience
AI Developer
Optimize Prime A.I. LLC
- Built a working prototype for extracting knowledge from a diverse set of PDF documents and linked that knowledge with large language models (LLMs) using the RAG approach.
- Developed an AI-based system for parsing non-textual PDF documents that utilized both machine learning and human-in-the-loop approaches.
- Adapted open-source You Only Look Once (YOLO) framework to the client's needs, reducing costs and paving the way for building in-house intellectual property.
Vice President of Data Science
Anteriad
- Handled a global team of engineers and data scientists. Oversaw the work of 20+ team members.
- Researched and implemented NLP techniques (LLM embeddings, fastText, etc.) for web traffic classification and segmentation and for corporate entity search, clustering, and retrieval.
- Led a technical team to create a data product for producing targetable segments (on demand) in the B2B and B2B2C marketing space, contributing to new business acquisitions and a significant boost in revenue.
- Developed an NLP and ML-powered real-time matching engine for resolving and augmenting extensive company listings against industry standard company profiles, increasing match rates and segment sizes by 300%.
- Developed an innovative ML-powered approach for classifying companies based on their business trends using alternative data sources (such as weblogs, movement data, and more).
- Created an identity graph solution for producing targetable identifiers, improving marketing campaign coverage and accuracy for over 100 high-profit accounts.
- Developed statistical models for generating highly relevant leads based on the client's accounts profile, significantly expanding their reach and improving the clients' ROI for a marketing budget.
Data Scientist
180 by Two (via Toptal)
- Built a geo-attribution system for a big location dataset.
- Developed algorithms for geo attribution cleansing and verification.
- Provided guidelines for geographical data specification using the OpenStreetMap interface.
Data Scientist
SteppeChange
- Developed customer churn models using historical data with Hadoop, Python, and TensorFlow.
- Improved the churn model performance by 25% using mobile network social data.
- Built a user-segmentation pipeline based on mobile network historical records using the Spark infrastructure.
- Created a chatbot ecosystem intended for easy customization and to easily integrate customer data.
- Built a 95% accurate gesture recognition pipeline for wearable electronics with TensorFlow.
Data Scientist
Radiumone
- Measured the effectiveness of mobile ad campaigns using geolocation data from hundreds of millions mobile devices over the campaign's duration (Hadoop, Hive, and Python).
- Built competitor advertising segments for a major U.S. airline using the terminals' geolocation data.
- Reduced media expenses by 5% by developing a high-cost media filtering system using deep learning techniques.
- Designed and implemented a distributed real-time GPU-powered time series database.
- Designed and implemented a set of tools for the processing and visualization of a large geographical dataset (C++, Cuda, PHP, and jQuery).
- Reduced content classification costs by 90% by developing a classification pipeline for future popular content identification.
- Developed a model for social data sharing, increasing performance by over 100% for selected audiences.
Software Architect
Doctorsoft
- Gathered the initial requirements and created the application architecture by taking into account the existing restrictions.
- Estimated the costs for running the application in Amazon Cloud and for the scaling process.
- Worked on the HIPAA certification, providing that the usage of Amazon technology stack would meet the requirements.
- Implemented an integration with an electronic prescribing service provider (eRx).
Experience
Mobile Customer Segmentation Process
Mobile Ad Campaign Effectiveness
Advertisement Targeting for the Customers of Rival Major Airlines
Customer Journey Analytics
Conversion Funnel Steps Prediction
Chatbot Development Suite
• The dialog definition module provided the end user means to define a conversation as a flow diagram,
• Chatbot runtime extended the flow functionality by means of Python callbacks.
• The back-end adapters allowed for different NLP providers selection—IBM Watson, AWS Lex, Microsoft's Text Analytics API, etc.
• The system was also capable of ingesting proprietary data such as CRM or product catalogue and augmenting the NLP accordingly
Skills
Languages
HTML, CSS, Erlang, SQL, C++, C, Python, PHP, JavaScript, Java
Frameworks
Spark, Hadoop
Libraries/APIs
Keras, jQuery, Stanford NLP, TensorFlow, Amazon EC2 API, Pandas, NumPy, Azure Cognitive Services, SciPy, PyTorch, Node.js
Tools
Git, Amazon Elastic MapReduce (EMR), IBM Watson, Amazon Lex, You Only Look Once (YOLO), Jetson TX2, ChatGPT
Paradigms
Functional Programming, Data Science, Business Intelligence (BI)
Platforms
Jupyter Notebook, Linux, Amazon EC2, NVIDIA CUDA, AWS Lambda, Databricks, Amazon, Amazon Web Services (AWS), Azure, Kubernetes, Google Cloud Platform (GCP)
Storage
Amazon S3 (AWS S3), Apache Hive, Redis, NoSQL, Amazon DynamoDB, PostgreSQL
Other
Convolutional Neural Networks (CNN), Big Data, Recurrent Neural Networks (RNNs), Neural Networks, Analytics, Natural Language Processing (NLP), Deep Neural Networks, Data Visualization, Deep Reinforcement Learning, Reinforcement Learning, Big Data Architecture, R-trees, Geospatial Data, Data Analytics, Machine Learning, GPT, Generative Pre-trained Transformers (GPT), Data Scientist, Artificial Intelligence (AI), APIs, Technical Leadership, Generative AI, Computer Vision, LangChain, Statistical Analysis, Regression, Financial Modeling, Backtesting Trading Strategies, Statistics, Software, Mathematics, Physics
Education
Master of Science Degree in Computer Science
Peter the Great St. Petersburg Polytechnic University - Saint Petersburg, Russia
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring