Scroll To View More
Roman Semeine

Roman Semeine

New York, NY, United States
Member since January 3, 2018
Roman is a data scientist who has also worked as an engineer for a number of years. Due to his diverse background, he can blend together low-level data engineering with advanced analytics and cutting-edge artificial intelligence. Lately, he's been heavily involved in projects dealing with machine learning and artificial intelligence.
Roman is now available for hire
Portfolio
Experience
  • C++, 15 years
  • SQL, 10 years
  • Big Data Architecture, 5 years
  • Hadoop, 5 years
  • Python, 5 years
  • Deep Neural Networks, 5 years
  • Spark, 3 years
  • TensorFlow, 2 years
New York, NY, United States
Availability
Part-time
Preferred Environment
Linux, Git
The most amazing...
...thing I've coded was a GPU-powered database for the storage of time-series data.
Employment
  • Data Scientist
    2018 - 2018
    180 by Two (via Toptal)
    • Built a geo-attribution system for a big location dataset.
    • Developed algorithms for geo attribution cleansing and verification.
    • Provided guidelines for geographical data specification using OpenStreetMap interface.
    Technologies: Spark, Hadoop, Python, Azure
  • Data Scientist
    2016 - 2018
    SteppeChange
    • Developed customer churn models using historical data with Hadoop, Python, and TensorFlow.
    • Improved the churn model performance by 25% using mobile network social data.
    • Built a user-segmentation pipeline based on mobile network historical records using the Spark infrastructure.
    • Created a chatbot ecosystem intended for easy customization and to easily integrate customer data.
    • Built a 95% accurate gesture recognition pipeline for wearable electronics with TensorFlow.
    Technologies: Python, TensorFlow, Hadoop, C++
  • Data Scientist
    2013 - 2016
    Radiumone
    • Measured the effectiveness of mobile ad campaigns using geolocation data from hundreds of millions mobile devices over the campaign's duration (Hadoop, Hive, and Python).
    • Built competitor advertising segments for a major U.S. airline using the terminals' geolocation data.
    • Reduced media expenses by 5% by developing a high-cost media filtering system using deep learning techniques.
    • Designed and implemented distributed a real-time GPU-powered time series database.
    • Designed and implemented a set of tools for processing and visualization large geographical dataset (C++, Cuda, PHP, and jQuery).
    • Reduced content classification costs by 90% by developing classification pipeline for future popular content identification.
    • Developed a model for social data sharing, increasing performance by over 100% for selected audiences.
    Technologies: Hadoop, Hive, C++, Python, Cuda, jQuery
  • Software Architect
    2010 - 2012
    Doctorsoft
    • Gathered the initial requirements and created the application architecture by taking into account the existing restrictions.
    • Estimated the costs for running the application in Amazon Cloud and for the scaling process.
    • Worked on the HIPAA certification, providing that the usage of Amazon technology stack would meet the requirements.
    • Implemented an integration with an electronic prescribing service provider (eRx).
    Technologies: Java SQL, Amazon AWS, JavaScript
Experience
  • Mobile Customer Segmentation Process (Development)

    I built a customer segmentation process on historical mobile communications data. On this, I used Spark, Hadoop, manual feature engineering, self-organizing maps, and k-means clustering.

  • Mobile Ad Campaign Effectiveness (Development)

    I implemented a framework for measuring the effectiveness of a mobile ad campaign based on geographical data gathered from mobile devices. Here, I mainly used Hadoop, Hive, and OpenStreetMap data

  • Advertisement Targeting for the Customers of Rival Major Airlines (Development)

    I developed a process for the identification of passengers loyal to a major US airline's competitors and facilitated the advertisement delivery to such people. For this project, I used a variety of technologies: Hadoop, Hive, advertisement historical data, US airport geographical locations, flight schedules, and more.

  • Customer Journey Analytics (Development)

    I created a set of tools for the customer journey analytics on behalf of an online retailer with approximately a 20 million customer base. The goal was to provide analysts with a convenient and painless visualization of individual customer history as well as an aggregate view on a subset of customers. Here I mainly used Hadoop, Hive, MySQL, Python, and jQuery.

  • Conversion Funnel Steps Prediction (Development)

    I built a process facilitating the prediction of future conversion funnel steps of an online retailer customer. The funnel consisted of the conversion sequence starting from a product page view and ended with a product purchase. I chiefly used Python and Tensorflow.

  • Chatbot Development Suite (Development)

    I built a chatbot infrastructure for Stepechange which consisted of a dialog definition module, chatbot runtime, and a number of back-end adapters.
    • The dialog definition module provided the end user means to define a conversation as a flow diagram,
    • Chatbot runtime extended the flow functionality by means of Python callbacks.
    • The back-end adapters allowed for different NLP providers selection—IBM Watson, AWS Lex, Microsoft's Text Analytics API, etc.
    • The system was also capable of ingesting proprietary data such as CRM or product catalogue and augmenting the NLP accordingly

Skills
  • Languages
    C++, Python, PHP, CSS, HTML, C, JavaScript, Erlang, SQL
  • Frameworks
    AWS EMR, Spark, Hadoop
  • Libraries/APIs
    Pandas, NumPy, Microsoft Cognitive Services, SciPy, Keras, Stanford NLP, AWS EC2 API, jQuery, TensorFlow, Node.js
  • Tools
    IBM Watson, Amazon Lex, Git
  • Paradigms
    Functional Programming
  • Platforms
    CUDA, AWS EC2, AWS Lambda, Linux, Jupyter Notebook
  • Storage
    Redis, NoSQL, DynamoDB, AWS RDS, AWS S3, Apache Hive
  • Other
    Convolutional Neural Networks, Natural Language Processing (NLP), R-trees, Geospatial Data, Chatbots, Azure Data Lake, Recurrent Neural Networks, Deep Neural Networks, Neural Networks, Analytics, Big Data, Big Data Architecture, Data Visualization, Deep Reinforcement Learning, Reinforcement Learning
Education
  • Master of Science degree in Computer Science
    1991 - 1996
    Peter the Great St. Petersburg Polytechnic University - Saint Petersburg, Russia
Certifications
  • Private Pilot
    AUGUST 2016 - PRESENT
    FAA | Federal Aviation Administration
I really like this profile
Share it with others