Scroll To View More
Roman Semeine

Roman Semeine

New York, NY, United States
Member since January 3, 2018
Roman is a data scientist who has also worked as an engineer for a number of years. Due to his diverse background, he can blend together low-level data engineering with advanced analytics and cutting-edge artificial intelligence. Lately, he's been heavily involved in projects dealing with machine learning and artificial intelligence.
Roman is now available for hire
  • C++, 15 years
  • SQL, 10 years
  • Big Data Architecture, 5 years
  • Hadoop, 5 years
  • Python, 5 years
  • Deep Neural Networks, 5 years
  • Spark, 3 years
  • TensorFlow, 2 years
New York, NY, United States
Preferred Environment
Linux, Git
The most amazing...
...thing I've coded was a GPU-powered database for the storage of time-series data.
  • Data Scientist
    2018 - 2018
    180 by Two (via Toptal)
    • Built a geo-attribution system for a big location dataset.
    • Developed algorithms for geo attribution cleansing and verification.
    • Provided guidelines for geographical data specification using OpenStreetMap interface.
    Technologies: Spark, Hadoop, Python, Azure
  • Data Scientist
    2016 - 2018
    • Developed customer churn models using historical data with Hadoop, Python, and TensorFlow.
    • Improved the churn model performance by 25% using mobile network social data.
    • Built a user-segmentation pipeline based on mobile network historical records using the Spark infrastructure.
    • Created a chatbot ecosystem intended for easy customization and to easily integrate customer data.
    • Built a 95% accurate gesture recognition pipeline for wearable electronics with TensorFlow.
    Technologies: Python, TensorFlow, Hadoop, C++
  • Data Scientist
    2013 - 2016
    • Measured the effectiveness of mobile ad campaigns using geolocation data from hundreds of millions mobile devices over the campaign's duration (Hadoop, Hive, and Python).
    • Built competitor advertising segments for a major U.S. airline using the terminals' geolocation data.
    • Reduced media expenses by 5% by developing a high-cost media filtering system using deep learning techniques.
    • Designed and implemented distributed a real-time GPU-powered time series database.
    • Designed and implemented a set of tools for processing and visualization large geographical dataset (C++, Cuda, PHP, and jQuery).
    • Reduced content classification costs by 90% by developing classification pipeline for future popular content identification.
    • Developed a model for social data sharing, increasing performance by over 100% for selected audiences.
    Technologies: Hadoop, Hive, C++, Python, Cuda, jQuery
  • Software Architect
    2010 - 2012
    • Gathered the initial requirements and created the application architecture by taking into account the existing restrictions.
    • Estimated the costs for running the application in Amazon Cloud and for the scaling process.
    • Worked on the HIPAA certification, providing that the usage of Amazon technology stack would meet the requirements.
    • Implemented an integration with an electronic prescribing service provider (eRx).
    Technologies: Java SQL, Amazon AWS, JavaScript
  • Mobile Customer Segmentation Process (Development)

    I built a customer segmentation process on historical mobile communications data. On this, I used Spark, Hadoop, manual feature engineering, self-organizing maps, and k-means clustering.

  • Mobile Ad Campaign Effectiveness (Development)

    I implemented a framework for measuring the effectiveness of a mobile ad campaign based on geographical data gathered from mobile devices. Here, I mainly used Hadoop, Hive, and OpenStreetMap data

  • Advertisement Targeting for the Customers of Rival Major Airlines (Development)

    I developed a process for the identification of passengers loyal to a major US airline's competitors and facilitated the advertisement delivery to such people. For this project, I used a variety of technologies: Hadoop, Hive, advertisement historical data, US airport geographical locations, flight schedules, and more.

  • Customer Journey Analytics (Development)

    I created a set of tools for the customer journey analytics on behalf of an online retailer with approximately a 20 million customer base. The goal was to provide analysts with a convenient and painless visualization of individual customer history as well as an aggregate view on a subset of customers. Here I mainly used Hadoop, Hive, MySQL, Python, and jQuery.

  • Conversion Funnel Steps Prediction (Development)

    I built a process facilitating the prediction of future conversion funnel steps of an online retailer customer. The funnel consisted of the conversion sequence starting from a product page view and ended with a product purchase. I chiefly used Python and Tensorflow.

  • Chatbot Development Suite (Development)

    I built a chatbot infrastructure for Stepechange which consisted of a dialog definition module, chatbot runtime, and a number of back-end adapters.
    • The dialog definition module provided the end user means to define a conversation as a flow diagram,
    • Chatbot runtime extended the flow functionality by means of Python callbacks.
    • The back-end adapters allowed for different NLP providers selection—IBM Watson, AWS Lex, Microsoft's Text Analytics API, etc.
    • The system was also capable of ingesting proprietary data such as CRM or product catalogue and augmenting the NLP accordingly

  • Languages
    Python, C++, SQL, Erlang, PHP, JavaScript, C, HTML, CSS
  • Frameworks
    Hadoop, Spark, AWS EMR
  • Libraries/APIs
    TensorFlow, jQuery, AWS EC2 API, Stanford NLP, Keras, Pandas, NumPy, SciPy, Microsoft Cognitive Services, Node.js
  • Tools
    Git, IBM Watson, Amazon Lex
  • Paradigms
    Functional Programming
  • Platforms
    Jupyter Notebook, Linux, AWS Lambda, AWS EC2, CUDA
  • Storage
    Apache Hive, AWS S3, AWS RDS, AWS DynamoDB, NoSQL, Redis
  • Other
    Analytics, Neural Networks, Deep Neural Networks, Recurrent Neural Networks, Natural Language Processing (NLP), Reinforcement Learning, Deep Reinforcement Learning, Data Visualization, Big Data, Big Data Architecture, Azure Data Lake, R-trees, Geospatial Data, Chatbots, Convolutional Neural Networks
  • Master of Science degree in Computer Science
    1991 - 1996
    Peter the Great St. Petersburg Polytechnic University - Saint Petersburg, Russia
  • Private Pilot
    FAA | Federal Aviation Administration
I really like this profile
Share it with others