Marc Matt, Data Engineer and Developer in Hamburg, Germany
Marc Matt

Data Engineer and Developer in Hamburg, Germany

Member since October 27, 2020
Marc is a data engineer with a passion for data and 15+ years of experience in leading teams and building data platforms focusing on the information technology, real estate, and services industries. He created a Python-based AVRO schema generator that makes parts of a scheme reusable. Marc excels with automation, integrations, analysis, the building of models, statistics, big data, CI/CD pipelines, and data modeling.
Marc is now available for hire

Portfolio

  • Food Marketing Company
    Talend, JSON, Redshift Spectrum, Redshift
  • Janus
    AWS Glue, Spark, SQL, Amazon Aurora, Python
  • Emma
    Python, AWS Kinesis, AWS, Redshift, Redshift Spectrum...

Experience

Location

Hamburg, Germany

Availability

Part-time

Preferred Environment

Apache Airflow, Tableau Server, Tableau, SQL, Pandas, Python, Apache Beam, Git, Linux

The most amazing...

...app I've developed provides pose estimation data in real-time to help optimize customer fitness goals.

Employment

  • ETL Engineer

    2021 - 2021
    Food Marketing Company
    • Parsed JSON data in Talend and loaded it into Redshift.
    • Integrated data from Web APIs with Talend into Redshift.
    • Transformed customer data using Talend and loaded into Salesforce.
    Technologies: Talend, JSON, Redshift Spectrum, Redshift
  • Data Engineer

    2021 - 2021
    Janus
    • Translated legacy ETL pipelines to scalable AWS Glue jobs.
    • Automated resource deployment using AWS CloudFormation.
    • Designed and built the framework in PySpark to make adding future pipelines easier.
    Technologies: AWS Glue, Spark, SQL, Amazon Aurora, Python
  • Senior Data Engineer

    2021 - 2021
    Emma
    • Designed new data entry API for the data platform to enable streaming analytics.
    • Set up binlog streaming process and parsing of events in real-time using Kinesis, Lambda, and Kinesis Firehose.
    • Optimized data load in Redshift by analyzing queries and tables to add optimized sort and distkeys .
    Technologies: Python, AWS Kinesis, AWS, Redshift, Redshift Spectrum, Matillion ETL for Redshift, AWS Lambda, Parquet, AWS Fargate, Docker, Databases
  • Data Specialist

    2020 - 2021
    Ear-Reality GmbH
    • Developed a Data Lake based on Kinesis and Athena inlcuding embedded reporting in Metabase.
    • Shifted production system to a serverless scalable architecture.
    • Automated load testing of application using Python and Locust.io.
    Technologies: Amazon Web Services (AWS), SQL, AWS Kinesis, Amazon Athena, AWS Elastic Beanstalk, Docker, Python, AWS CloudFormation, Databases, Data Reporting, Business Intelligence (BI)
  • Senior Data Engineer

    2018 - 2020
    Engel & Völkers
    • Designed and built a data platform, including a tool selection and data modeling.
    • Built a TensorFlow model to predict property values in a real-time environment.
    • Implemented CI/CD pipelines to automatically deploy all features of the data platform.
    Technologies: Jenkins, SQL, Tableau, BigQuery, Apache Beam, Apache Airflow, TensorFlow, Google Kubernetes Engine (GKE), Docker, Python, Data Engineering, Data Architecture, Data Analysis, NoSQL, Google BigQuery, Data Pipelines, ETL, Data Warehousing, Data Warehouse Design, Database Modeling, Data Modeling, Google Cloud Platform (GCP), Google Cloud SQL, Data Science, Databases, Data Reporting, Business Intelligence (BI)
  • Head of Data Engineering/Machine Learning

    2014 - 2018
    Surf Media
    • Led a team of six and was responsible for their personal development.
    • Designed big data systems and data lakes including tool selection and data modeling.
    • Designed data pipelines and model selection for the development of recommendation engines and fraud. The recognition systems work in a real-time environment.
    • Created the technology roadmap. Oversaw the advancement of all affected data systems.
    Technologies: TensorFlow, RabbitMQ, Apache Avro, Tableau, Hortonworks Data Platform (HDP), SQL, Apache NiFi, Apache HAWQ, Talend, Python, Data Engineering, PostgreSQL, AWS S3, AWS Lambda, Data Architecture, Amazon Web Services (AWS), NoSQL, Data Pipelines, ETL, Data Warehousing, Data Warehouse Design, Database Modeling, Data Modeling, Talend ETL, Data Science, Databases, Data Reporting, Business Intelligence (BI)
  • Business Intelligence Analyst

    2012 - 2014
    Surf Media
    • Designed, developed, and operated a DWH for the company group consisting of five companies.
    • Developed a statistical model for predicting orders.
    • Analyzed customers to understand how best to optimize revenue in a social network.
    Technologies: KNIME, RapidMiner, Tableau, Perl, Python, MySQL, Data Engineering, PostgreSQL, Data Architecture, Amazon Web Services (AWS), Data Pipelines, ETL, Data Warehousing, Data Warehouse Design, Database Modeling, Data Modeling, Talend ETL, Databases, Data Reporting, Business Intelligence (BI)
  • Database Consultant

    2010 - 2012
    EOS Information Services, GmbH.
    • Designed, developed, and operated a DWH for a Decision Engine used in risk management.
    • Designed processes for risk management.
    • Completed conception and development of a process for managing addresses using Perl and Uniserv.
    Technologies: Oracle, Java, Perl, Uniserv, Data Engineering, Data Architecture, Data Analysis, Data Pipelines, ETL, Data Warehousing, Data Warehouse Design, Database Modeling, Databases, Business Intelligence (BI)
  • Datawarehousing Consultant

    2009 - 2010
    Key-Work Consulting, GmbH.
    • Migrated the sales reporting for a mailorder company.
    • Developed a statistical model to optimize sales planning of a mail order company.
    • Built a statistical model for a dynamic shipping schedule.
    Technologies: RapidMiner, FastStats, Python, SQL, SQL Server 2010, Data Engineering, Data Analysis, Data Pipelines, ETL, Data Warehousing, Data Warehouse Design, Database Modeling, Data Modeling, Databases, Data Reporting, Business Intelligence (BI)
  • Database Management

    2008 - 2009
    Coxulto Marketing Solutions, GmbH.
    • Defined and selected target groups for marketing campaigns.
    • Completed affinity analysis for the complete customer base.
    • Administered and operated the address database including duplicate termination.
    Technologies: Perl, SQL, SAS, Data Engineering, Data Analysis, ETL, Data Warehouse Design, Data Warehousing, Databases, Data Reporting, Business Intelligence (BI)
  • Lead of Business Intelligence Consumer Products

    2007 - 2008
    1&1 Internet A
    • Coordinated and prioritized all tasks of the Business Intelligence team.
    • Designed and developed KPI reports for the board of directors.
    • Analysed customer structures and built a model for churn prediction.
    Technologies: Sybase, Java, Perl, Data Engineering, Data Analysis, ETL, Data Warehouse Design, Data Warehousing, Database Modeling, Data Modeling, Databases, Data Reporting, Business Intelligence (BI)
  • Business Intelligence Analyst

    2003 - 2007
    1&1 Internet AG
    • Designed and developed an automated reporting system for customer and contract inventory, as well as internet usage and customer behavior.
    • Integrated the customer usage data of the company websites into the DWH.
    • Coordinated all tasks between management and development departments.
    • Analysed all new and existing customer campaigns for effectiveness.
    Technologies: Sybase, Java, MySQL, Perl, Data Engineering, Data Analysis, ETL, Data Warehouse Design, Data Warehousing, Databases, Data Reporting, Business Intelligence (BI)

Experience

  • AVRO Schema Generator
    https://gitlab.com/datascientists.info/avro-generator

    A Python-based AVRO schema generator I developed myself, that adds the ability to make parts of a schema reusable. This is useful as AVRO does not provide this functionality by itself.

    If certain data structures are used in several schemas, this tool provides the ability only to define these structures once and then reuse them over several schemas.

  • Evalution of Property Value

    A Python/TensorFlow-based deep learning model and API I built to predict property prices based on their geolocation and other attributes. The value is predicted in real-time using a Flask REST API integrated on the clients' website.

  • Design and Set-up of Data Platform

    A platform consolidating all relevant data of a social media company, where I designed and helped setting up various tools. This platform provided access to all data in real-time for operational decision support as well as for analytic workloads.

Skills

  • Languages

    Python, SQL, Perl, Java, XML
  • Tools

    BigQuery, Apache HAWQ, Apache Avro, Git, Apache Beam, Tableau, Apache Airflow, Jenkins, Apache NiFi, RabbitMQ, Microsoft Excel, Google Kubernetes Engine (GKE), Talend ETL, Amazon Athena, AWS CloudFormation, Redshift Spectrum, Matillion ETL for Redshift, AWS Fargate, AWS Glue
  • Paradigms

    ETL, Business Intelligence (BI), Data Science
  • Storage

    MySQL, Google Cloud, Database Modeling, Databases, SQL Server 2010, Data Pipelines, AWS S3, PostgreSQL, Google Cloud SQL, Apache Hive, HDFS, NoSQL, Redshift, Amazon Aurora, JSON
  • Other

    Data Visualization, Data Analysis, Data Architecture, Data Engineering, Data Warehousing, Data Modeling, Data Warehouse Design, Data Reporting, Tableau Server, Google BigQuery, AWS, Data Profiling, Parquet
  • Libraries/APIs

    Pandas, TensorFlow
  • Platforms

    Linux, Docker, Talend, Hortonworks Data Platform (HDP), Oracle, Amazon Web Services (AWS), AWS Lambda, Google Cloud Platform (GCP), AWS Kinesis, AWS Elastic Beanstalk
  • Frameworks

    Flask, Django, Spark

Certifications

  • Google Cloud Certified - Professional Data Engineer
    AUGUST 2019 - AUGUST 2021
    Google

To view more profiles

Join Toptal
Share it with others