Roman Semeine, Developer in New York, NY, United States
Roman is available for hire
Hire Roman

Roman Semeine

Verified Expert  in Engineering

Data Scientist and Software Developer

Location
New York, NY, United States
Toptal Member Since
January 24, 2018

Roman is a data scientist with extensive experience managing large technical teams and complex software projects. Due to his diverse background, he can blend low-level data engineering with advanced analytics and cutting-edge artificial intelligence.

Portfolio

Optimize Prime A.I. LLC
ChatGPT, Python, JavaScript, Artificial Intelligence (AI)...
Anteriad
Pandas, PyTorch, Spark, Machine Learning, Statistics...
180 by Two (via Toptal)
Azure, Python, Hadoop, Spark, Machine Learning, Data Science, PyTorch...

Experience

Availability

Full-time

Preferred Environment

Git, Linux

The most amazing...

...thing I've coded is a GPU-powered database for the storage of time-series data.

Work Experience

AI Developer

2023 - 2023
Optimize Prime A.I. LLC
  • Built a working prototype for extracting knowledge from a diverse set of PDF documents and linked that knowledge with large language models (LLMs) using the RAG approach.
  • Developed an AI-based system for parsing non-textual PDF documents that utilized both machine learning and human-in-the-loop approaches.
  • Adapted open-source You Only Look Once (YOLO) framework to the client's needs, reducing costs and paving the way for building in-house intellectual property.
Technologies: ChatGPT, Python, JavaScript, Artificial Intelligence (AI), Natural Language Processing (NLP), PostgreSQL, Machine Learning

Vice President of Data Science

2019 - 2023
Anteriad
  • Handled a global team of engineers and data scientists. Oversaw the work of 20+ team members.
  • Researched and implemented NLP techniques (LLM embeddings, fastText, etc.) for web traffic classification and segmentation and for corporate entity search, clustering, and retrieval.
  • Led a technical team to create a data product for producing targetable segments (on demand) in the B2B and B2B2C marketing space, contributing to new business acquisitions and a significant boost in revenue.
  • Developed an NLP and ML-powered real-time matching engine for resolving and augmenting extensive company listings against industry standard company profiles, increasing match rates and segment sizes by 300%.
  • Developed an innovative ML-powered approach for classifying companies based on their business trends using alternative data sources (such as weblogs, movement data, and more).
  • Created an identity graph solution for producing targetable identifiers, improving marketing campaign coverage and accuracy for over 100 high-profit accounts.
  • Developed statistical models for generating highly relevant leads based on the client's accounts profile, significantly expanding their reach and improving the clients' ROI for a marketing budget.
Technologies: Pandas, PyTorch, Spark, Machine Learning, Statistics, Business Intelligence (BI), Big Data, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Computer Vision, Data Science, LangChain, Data Scientist, Statistical Analysis, Kubernetes, Google Cloud Platform (GCP), Regression, Financial Modeling, Backtesting Trading Strategies, You Only Look Once (YOLO), Artificial Intelligence (AI), PostgreSQL

Data Scientist

2018 - 2018
180 by Two (via Toptal)
  • Built a geo-attribution system for a big location dataset.
  • Developed algorithms for geo attribution cleansing and verification.
  • Provided guidelines for geographical data specification using the OpenStreetMap interface.
Technologies: Azure, Python, Hadoop, Spark, Machine Learning, Data Science, PyTorch, Data Scientist, Statistical Analysis, Kubernetes, Backtesting Trading Strategies, Artificial Intelligence (AI), PostgreSQL

Data Scientist

2016 - 2018
SteppeChange
  • Developed customer churn models using historical data with Hadoop, Python, and TensorFlow.
  • Improved the churn model performance by 25% using mobile network social data.
  • Built a user-segmentation pipeline based on mobile network historical records using the Spark infrastructure.
  • Created a chatbot ecosystem intended for easy customization and to easily integrate customer data.
  • Built a 95% accurate gesture recognition pipeline for wearable electronics with TensorFlow.
Technologies: C++, Hadoop, TensorFlow, Python, Machine Learning, Data Analytics, Data Science, Analytics, PyTorch, Data Scientist, Statistical Analysis, Node.js, Kubernetes, Regression, Backtesting Trading Strategies, Jetson TX2, Artificial Intelligence (AI), PostgreSQL

Data Scientist

2013 - 2016
Radiumone
  • Measured the effectiveness of mobile ad campaigns using geolocation data from hundreds of millions mobile devices over the campaign's duration (Hadoop, Hive, and Python).
  • Built competitor advertising segments for a major U.S. airline using the terminals' geolocation data.
  • Reduced media expenses by 5% by developing a high-cost media filtering system using deep learning techniques.
  • Designed and implemented a distributed real-time GPU-powered time series database.
  • Designed and implemented a set of tools for the processing and visualization of a large geographical dataset (C++, Cuda, PHP, and jQuery).
  • Reduced content classification costs by 90% by developing a classification pipeline for future popular content identification.
  • Developed a model for social data sharing, increasing performance by over 100% for selected audiences.
Technologies: jQuery, NVIDIA CUDA, Python, C++, Apache Hive, Hadoop, Machine Learning, Data Analytics, Data Science, Analytics, PyTorch, Data Scientist, Statistical Analysis, Regression, Financial Modeling, Backtesting Trading Strategies, Artificial Intelligence (AI)

Software Architect

2010 - 2012
Doctorsoft
  • Gathered the initial requirements and created the application architecture by taking into account the existing restrictions.
  • Estimated the costs for running the application in Amazon Cloud and for the scaling process.
  • Worked on the HIPAA certification, providing that the usage of Amazon technology stack would meet the requirements.
  • Implemented an integration with an electronic prescribing service provider (eRx).
Technologies: JavaScript, Amazon Web Services (AWS), Amazon, SQL, Java, Data Science, Analytics

Mobile Customer Segmentation Process

I built a customer segmentation process on historical mobile communications data. On this, I used Spark, Hadoop, manual feature engineering, self-organizing maps, and k-means clustering.

Mobile Ad Campaign Effectiveness

I implemented a framework for measuring the effectiveness of a mobile ad campaign based on geographical data gathered from mobile devices. Here, I mainly used Hadoop, Hive, and OpenStreetMap data

Advertisement Targeting for the Customers of Rival Major Airlines

I developed a process for the identification of passengers loyal to a major US airline's competitors and facilitated the advertisement delivery to such people. For this project, I used a variety of technologies: Hadoop, Hive, advertisement historical data, US airport geographical locations, flight schedules, and more.

Customer Journey Analytics

I created a set of tools for the customer journey analytics on behalf of an online retailer with approximately a 20 million customer base. The goal was to provide analysts with a convenient and painless visualization of individual customer history as well as an aggregate view on a subset of customers. Here I mainly used Hadoop, Hive, MySQL, Python, and jQuery.

Conversion Funnel Steps Prediction

I built a process facilitating the prediction of future conversion funnel steps of an online retailer customer. The funnel consisted of the conversion sequence starting from a product page view and ended with a product purchase. I chiefly used Python and Tensorflow.

Chatbot Development Suite

I built a chatbot infrastructure for Stepechange which consisted of a dialog definition module, chatbot runtime, and a number of back-end adapters.
• The dialog definition module provided the end user means to define a conversation as a flow diagram,
• Chatbot runtime extended the flow functionality by means of Python callbacks.
• The back-end adapters allowed for different NLP providers selection—IBM Watson, AWS Lex, Microsoft's Text Analytics API, etc.
• The system was also capable of ingesting proprietary data such as CRM or product catalogue and augmenting the NLP accordingly

Languages

HTML, CSS, Erlang, SQL, C++, C, Python, PHP, JavaScript, Java

Frameworks

Spark, Hadoop

Libraries/APIs

Keras, jQuery, Stanford NLP, TensorFlow, Amazon EC2 API, Pandas, NumPy, Azure Cognitive Services, SciPy, PyTorch, Node.js

Tools

Git, Amazon Elastic MapReduce (EMR), IBM Watson, Amazon Lex, You Only Look Once (YOLO), Jetson TX2, ChatGPT

Paradigms

Functional Programming, Data Science, Business Intelligence (BI)

Platforms

Jupyter Notebook, Linux, Amazon EC2, NVIDIA CUDA, AWS Lambda, Databricks, Amazon, Amazon Web Services (AWS), Azure, Kubernetes, Google Cloud Platform (GCP)

Storage

Amazon S3 (AWS S3), Apache Hive, Redis, NoSQL, Amazon DynamoDB, PostgreSQL

Other

Convolutional Neural Networks (CNN), Big Data, Recurrent Neural Networks (RNNs), Neural Networks, Analytics, Natural Language Processing (NLP), Deep Neural Networks, Data Visualization, Deep Reinforcement Learning, Reinforcement Learning, Big Data Architecture, R-trees, Geospatial Data, Data Analytics, Machine Learning, GPT, Generative Pre-trained Transformers (GPT), Data Scientist, Artificial Intelligence (AI), APIs, Technical Leadership, Generative AI, Computer Vision, LangChain, Statistical Analysis, Regression, Financial Modeling, Backtesting Trading Strategies, Statistics, Software, Mathematics, Physics

1991 - 1996

Master of Science Degree in Computer Science

Peter the Great St. Petersburg Polytechnic University - Saint Petersburg, Russia

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring