Roman Semeine, Big Data Architecture Developer in New York, NY, United States
Roman Semeine

Big Data Architecture Developer in New York, NY, United States

Member since January 3, 2018
Roman is a data scientist who has also worked as an engineer for a number of years. Due to his diverse background, he can blend together low-level data engineering with advanced analytics and cutting-edge artificial intelligence. Lately, he's been heavily involved in projects dealing with machine learning and artificial intelligence.
Roman is now available for hire




New York, NY, United States



Preferred Environment

Git, Linux

The most amazing...

...thing I've coded was a GPU-powered database for the storage of time-series data.


  • Data Scientist

    2018 - 2018
    180 by Two (via Toptal)
    • Built a geo-attribution system for a big location dataset.
    • Developed algorithms for geo attribution cleansing and verification.
    • Provided guidelines for geographical data specification using OpenStreetMap interface.
    Technologies: Azure, Python, Hadoop, Spark
  • Data Scientist

    2016 - 2018
    • Developed customer churn models using historical data with Hadoop, Python, and TensorFlow.
    • Improved the churn model performance by 25% using mobile network social data.
    • Built a user-segmentation pipeline based on mobile network historical records using the Spark infrastructure.
    • Created a chatbot ecosystem intended for easy customization and to easily integrate customer data.
    • Built a 95% accurate gesture recognition pipeline for wearable electronics with TensorFlow.
    Technologies: C++, Hadoop, TensorFlow, Python
  • Data Scientist

    2013 - 2016
    • Measured the effectiveness of mobile ad campaigns using geolocation data from hundreds of millions mobile devices over the campaign's duration (Hadoop, Hive, and Python).
    • Built competitor advertising segments for a major U.S. airline using the terminals' geolocation data.
    • Reduced media expenses by 5% by developing a high-cost media filtering system using deep learning techniques.
    • Designed and implemented distributed a real-time GPU-powered time series database.
    • Designed and implemented a set of tools for processing and visualization large geographical dataset (C++, Cuda, PHP, and jQuery).
    • Reduced content classification costs by 90% by developing classification pipeline for future popular content identification.
    • Developed a model for social data sharing, increasing performance by over 100% for selected audiences.
    Technologies: jQuery, CUDA, Python, C++, Apache Hive, Hadoop
  • Software Architect

    2010 - 2012
    • Gathered the initial requirements and created the application architecture by taking into account the existing restrictions.
    • Estimated the costs for running the application in Amazon Cloud and for the scaling process.
    • Worked on the HIPAA certification, providing that the usage of Amazon technology stack would meet the requirements.
    • Implemented an integration with an electronic prescribing service provider (eRx).
    Technologies: JavaScript, Amazon Web Services (AWS), Amazon, AWS, SQL, Java


  • Mobile Customer Segmentation Process

    I built a customer segmentation process on historical mobile communications data. On this, I used Spark, Hadoop, manual feature engineering, self-organizing maps, and k-means clustering.

  • Mobile Ad Campaign Effectiveness

    I implemented a framework for measuring the effectiveness of a mobile ad campaign based on geographical data gathered from mobile devices. Here, I mainly used Hadoop, Hive, and OpenStreetMap data

  • Advertisement Targeting for the Customers of Rival Major Airlines

    I developed a process for the identification of passengers loyal to a major US airline's competitors and facilitated the advertisement delivery to such people. For this project, I used a variety of technologies: Hadoop, Hive, advertisement historical data, US airport geographical locations, flight schedules, and more.

  • Customer Journey Analytics

    I created a set of tools for the customer journey analytics on behalf of an online retailer with approximately a 20 million customer base. The goal was to provide analysts with a convenient and painless visualization of individual customer history as well as an aggregate view on a subset of customers. Here I mainly used Hadoop, Hive, MySQL, Python, and jQuery.

  • Conversion Funnel Steps Prediction

    I built a process facilitating the prediction of future conversion funnel steps of an online retailer customer. The funnel consisted of the conversion sequence starting from a product page view and ended with a product purchase. I chiefly used Python and Tensorflow.

  • Chatbot Development Suite

    I built a chatbot infrastructure for Stepechange which consisted of a dialog definition module, chatbot runtime, and a number of back-end adapters.
    • The dialog definition module provided the end user means to define a conversation as a flow diagram,
    • Chatbot runtime extended the flow functionality by means of Python callbacks.
    • The back-end adapters allowed for different NLP providers selection—IBM Watson, AWS Lex, Microsoft's Text Analytics API, etc.
    • The system was also capable of ingesting proprietary data such as CRM or product catalogue and augmenting the NLP accordingly


  • Languages

    HTML, CSS, Erlang, SQL, C++, C, Python, PHP, JavaScript, Java
  • Frameworks

    AWS EMR, Spark, Hadoop
  • Libraries/APIs

    Keras, jQuery, Stanford NLP, TensorFlow, AWS EC2 API, Pandas, NumPy, Azure Cognitive Services, SciPy, Node.js
  • Tools

    Git, IBM Watson, Amazon Lex
  • Paradigms

    Functional Programming
  • Platforms

    Jupyter Notebook, Linux, AWS EC2, CUDA, AWS Lambda, Amazon, Amazon Web Services (AWS), Azure
  • Storage

    AWS S3, Apache Hive, Redis, NoSQL, AWS DynamoDB
  • Other

    Convolutional Neural Networks, Azure Data Lake, Big Data, Recurrent Neural Networks, Neural Networks, Analytics, Natural Language Processing (NLP), Deep Neural Networks, Data Visualization, Deep Reinforcement Learning, Reinforcement Learning, Big Data Architecture, R-trees, Geospatial Data, Chatbots, AWS


  • Master of Science degree in Computer Science
    1991 - 1996
    Peter the Great St. Petersburg Polytechnic University - Saint Petersburg, Russia


  • Private Pilot
    FAA | Federal Aviation Administration

To view more profiles

Join Toptal
Share it with others