Adrian Dominiczak, Developer in Warsaw, Poland
Adrian is available for hire
Hire Adrian

Adrian Dominiczak

Verified Expert  in Engineering

Data Engineer and Developer

Location
Warsaw, Poland
Toptal Member Since
July 21, 2020

Adrian is a senior big data engineer with nearly a decade of professional experience. Adrian started his career as a software engineer at Samsung's R&D and has worked on a range of projects from machine learning and big data engineering in banking and pharmaceutical industries to big data and cloud architecting at Santander and Lingaro. Adrian's areas of expertise lie mainly with Hadoop and Spark.

Portfolio

Roche
Bamboo, GitLab CI/CD, Docker, SQL, Conda, Pandas, Python, YARN, Hadoop, Spark
Lingaro
Spark, Kubernetes, Apache Airflow, Microsoft Power BI, SQL, Python, Redis...
Santander Consumer Technology Services GmbH
Kudu, Apache Hive, SQL, Pandas, Python, Scala, Bash, RHEL, Spark, Cloudera...

Experience

Availability

Full-time

Preferred Environment

IntelliJ IDEA, PyCharm, Linux

The most amazing...

...thing I've done was optimizing a Spark app by measuring the accuracy of ML models while monitoring the client's machines' health statuses.

Work Experience

Big Data and ML Engineer

2019 - 2020
Roche
  • Designed, implemented, and productized software written in Spark for monitoring of the accuracy of statistical models monitoring medical machines' health statuses.
  • Designed and improved the project structure by refactoring existing projects before deployments in the area of automatic medical documents generation, and retrieval knowledge from medical documents.
  • Designed and developed solutions for processing, auto-generation, and knowledge extraction from medical origin documents.
Technologies: Bamboo, GitLab CI/CD, Docker, SQL, Conda, Pandas, Python, YARN, Hadoop, Spark

Big Data Architect | Technical Leader

2019 - 2019
Lingaro
  • Represented a software house and prepared an offer containing architecture design, scope, pricing for a project connecting several independent data platforms, with batch and NRT generated data, with data mart and dashboarding developed in MS Azure.
  • Provided architecture and team lead support in an acquired project.
  • Analyzed the business needs of clients and translated them into technical requirements.
  • Coordinated the project’s development and delivery using the agile methodology.
  • Took part in improving and refactoring code along with mentoring younger developers.
  • Took part in sales activity.
Technologies: Spark, Kubernetes, Apache Airflow, Microsoft Power BI, SQL, Python, Redis, Microsoft Azure

Big Data Architect

2018 - 2019
Santander Consumer Technology Services GmbH
  • Monitored and provided improvements for the production of Hadoop clusters, ETL processes, and resource utilization.
  • Coordinated projects by serving as a single point of contact for stakeholders from the business domain and a team of developers; also monitored, planned, and reported on projects before going live.
  • Mentored and managed a small team of junior developers along with leading the development of a PySpark reporting application using the agile methodology.
  • Set up development environments, test deployments of software from external providers; also created reports, documentations, and tutorials.
  • Analyzed the architectures, functionalities, and performance of solutions from external providers.
  • Attended meetings with external software providers including managers and architects.
Technologies: Kudu, Apache Hive, SQL, Pandas, Python, Scala, Bash, RHEL, Spark, Cloudera, YARN, HDFS, Hadoop

Big Data and ML Engineer

2017 - 2018
Roche
  • Served as a machine learning and big data expert while obtaining external software (implemented in AWS) for extracting data from a medical origin document; also prepared for the internal knowledge transfer to a support team.
  • Designed and improved project structures by providing on-demand refactoring the existing projects before deployments.
  • Designed and developed solutions for medical origin document analysis, processing, and auto-generation.
Technologies: Elasticsearch, Bamboo, GitLab CI/CD, Docker, SQL, Conda, Pandas, Python, YARN, Hadoop, Spark

Big Data Engineer

2015 - 2017
mBank S.A.
  • Implemented algorithmic trading software (with an ML approach) that traded live with S&P 500 stocks.
  • Designed and implemented ML-based credit-scoring models.
  • Implemented a web service for custom visualizations of business data hosted on a Hadoop cluster.
Technologies: JavaScript, H2, Play, SQL, R, Scala, Python, Java, Apache Sqoop, YARN, Hadoop, Spark

Software Engineer

2014 - 2015
Samsung Electronics Poland, R&D Center, Artificial Intelligence Group
  • Designed, implemented, and supported a module in an NLP user utterance recognition engine.
  • Implemented a web service platform used internally by linguists as a tool for gathering, cleaning, and tagging data sets used for training machine learning models for NLP (natural language processing).
  • Implemented a knowledge database for closed domain and web scrapers used as sourcing tools.
  • Implemented connectors from Prolog to Java in order to utilize knowledge databases stored in Prolog format in internal Java libraries building statistical models in the the NLP domain.
Technologies: Weka, Prolog, JavaScript, SQL, Python, Java

Programmer

2013 - 2014
Polish Academy of Sciences
  • Found a method to accurately recognize and distinguish bone internal structure based on scattered ultrasound signals using machine learning and time series analysis methods.
  • Proposed a new method to recognize skin cancer changes based on ultrasound signals using advanced time series analysis and a complex networks mathematical framework.
  • Utilized a new way of researching medical origin time series using mathematical frameworks for mapping between time series and complex networks.
Technologies: Mathematica, MATLAB, Python

Algorithmic Trading

Project: A machine learning-based algorithmic trading app running live on S&P 500 stocks.

I implemented modules for training and using built models for daily predictions and took part in the discussion about mathematical approaches for portfolio handling and rebalancing. I also integrated data from a range of data sources: internet, data providers, and so on.

Data Mart in MS Azure with a Dashboard

Project: An MS Azure cloud solution that fits the needs of the client.

I architected, designed, and supported the development of an MS Azure cloud solution that synced independent data platforms with various frequencies of generated data, from one-day batches to NRT. I also designed the ETL pipelines, data storage, data mart, and a fast, efficient dashboarding solution.

Statistical Models Validation Software

Project: A Spark-based application for computing measures describing the accuracy of statistical models predicting medical machines' health statuses sent from all sensors.

I optimized and implemented the advanced PoC algorithm along with preparing it for production deployment.

User Utterance Recognition

Project: A Java-based framework in natural language processing for a leading electronic manufacturer.

I designed and implemented the framework in plain Java and was intended to be used as an internal library. The framework performed sentence recognition using the hybrid engine, which was fed by both machine-learning-based and rule-based predictors.
2011 - 2014

Master of Science (MSc) Degree in Applied Physics

Warsaw University of Technology, Faculty of Physics - Warsaw, Poland

2008 - 2011

Bachelor of Science (BSc) Degree in Physics

Warsaw University of Technology, Faculty of Physics - Warsaw, Poland

JUNE 2020 - PRESENT

Essential Google Cloud Infrastructure: Foundation

Coursera

JUNE 2020 - PRESENT

Google Cloud Platform Fundamentals: Core Infrastructure

Coursera

JUNE 2020 - PRESENT

Essential Google Cloud Infrastructure: Core Services

Coursera

Libraries/APIs

Pandas

Tools

PyCharm, IntelliJ IDEA, MATLAB, Mathematica, Weka, Apache Sqoop, GitLab CI/CD, Bamboo, Cloudera, Kudu, Microsoft Power BI, Apache Airflow

Frameworks

Spark, Hadoop, YARN, Play

Paradigms

ETL Implementation & Design, ETL, Data Science

Languages

Python, Java, SQL, JavaScript, Prolog, Scala, R, Bash

Platforms

Amazon Web Services (AWS), Google Cloud Platform (GCP), Linux, Docker, Kubernetes

Storage

Databases, H2, Elasticsearch, HDFS, Apache Hive, Redis

Other

Big Data, Data Analytics, Data Engineering, Big Data Architecture, Applied Mathematics, Machine Learning, Statistics, Computational Physics, Conda, RHEL, Microsoft Azure

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring