Lukasz Jaworowski, Web Scraping Developer in Warsaw, Poland
Lukasz Jaworowski

Web Scraping Developer in Warsaw, Poland

Member since November 21, 2016
Lukasz has been a Python developer since 2015 and is a web scraping expert who focuses on Python and cloud computing. He is a strong contributor to product development from idea generation to product maintenance. He's focused on big data processing in PySpark and advanced data analytics in AWS. Lukasz has experience in dynamic startup environments and corporate projects for international clients.
Lukasz is now available for hire

Portfolio

Experience

Location

Warsaw, Poland

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), AWS, PyCharm, Linux

The most amazing...

...thing I've designed and developed was efficient distributed data processing system in Python, Celery, and RabbitMQ.

Employment

  • AWS/Python Developer

    2020 - 2020
    Freelance (Three Month Project for Startup in the Education Sector)
    • Developed Lambdas in Python to extract data from PDF files using Apache Tika, PyMuPDF, PDF2text, and more.
    • Developed API in Django Rest Framework for data management with custom filters using Elasticsearch on AWS.
    • Developed front end in Vue.js with Amplify, integrated with DRF back end.
    • Managed app deployment, hosting on AWS using Elastic Beanstalk, Chalice, Amplify, and AWS Cognito.
    • Setup environment on AWS: VPC networking, Elasticsearch service, Elastic Beanstalk, Route53, S3, Lambda, AWS Cognito.
    Technologies: Amazon Web Services (AWS), Pandas, Django REST Framework, Amazon Cognito, Django, Tika, AWS Lambda, Data Extraction, AWS, Python 3, Vue.js
  • Python Developer

    2019 - 2020
    GFT Technologies
    • Developed GraphQL API for top European Fintech - Python, Django, graphene.
    • Discovered and fixed performance issues related to Celery and database operations. Built user notification modules.
    • Developed PySpark ETL jobs for one of the largest financial institutions.
    • Set up a project from scratch from repository creation to Git hooks, CI/CD, and deployment.
    • Developed REST API for datasets management using Python, FastAPI.
    • Supported application deployment to Pivotal Cloud Foundry.
    • Created unit and integration tests for PySpark ETL jobs and FastAPI app.
    • Conducted cross-team Python learning sessions for a group of 50-100 developers and analysts. Topics included Core Python, clean code, and git webhooks.
    Technologies: Amazon Web Services (AWS), Pandas, Django REST Framework, Python 3, Spark, PCF, Linux, Jenkins, PySpark, Docker, GraphQL, Django, Python
  • Python Developer

    2018 - 2019
    Appriss
    • Developed a Django app integrated with the Jira platform to support the development of the web scraping software.
    • Developed ETL scripts to process complex incarceration data.
    • Migrated ETL scripts from AWS Lambda functions to on-premise Rancher + Docker environment.
    • Created over 150 web scraping spiders in Scrapy (Python).
    • Developed unit and integration tests. Performed code reviews.
    • Created data dashboards in Dash (Python), to present web scraping results stats.
    • Migrated data from external PDF, Excel, and Docx files to on-premise systems.
    Technologies: Amazon Web Services (AWS), Pandas, Django REST Framework, Python 3, Web Scraping, Web Crawlers, Docker, SQL, Scrapy, Django, AWS, Python
  • Python Developer

    2015 - 2018
    Vacancysoft
    • Developed a core web scraping system to gather data about the latest job offers in Europe.
    • Replaced a third-party web scraping and data processing software with Python applications.
    • Automated various operations processes in Python, Celery, and Django. Moved existing Linux scripts to a more maintainable Python environment.
    • Introduced CI/CD to automatically deploy Celery applications.
    • Developed extract transform load (ETL) pipelines in Celery and RabbitMQ.
    • Provided in-house training to junior developers on topics like ETL, Python, and SQL.
    • Developed job title classifier in Python + NLTK library.
    Technologies: Amazon Web Services (AWS), Pandas, Django REST Framework, Python 3, Web Scraping, Web Crawlers, Celery, RabbitMQ, Selenium, Scrapy, Django, Python
  • Developer

    2017 - 2017
    DreamLab
    • Developed web applications for one of the biggest media groups in Poland.
    • Prepared data migration scripts for articles data stored on Solr.
    • Developed web app components in jQT, Less, and Node.js.
    Technologies: CSS, HTML, Node.js

Experience

  • Recruitment Platform - Applicant Tracking System - Bachelor's Thesis (2018)

    I was responsible for the database design, project specs, and back-end development.
    The API was written in Python and Django REST framework, and it was hosted on the AWS stack (EC2 + RDS).
    Two other developers worked on the web/iOS clients for this API.
    The application provided the following functionalities:
    Recruiters were able to:
    - Add company's profiles
    - Add jobs to company's profiles
    - Invite other recruiters to company
    - Add jobs to the company
    - Describe recruitment steps for job offer
    - Describe recruitment steps templates for various job types
    - Move candidate to next recruitment step / Reject candidate
    - Contact candidates via chat
    Candidates were able to:
    - See a list of jobs
    - Apply to job
    - Create a candidate's profile
    - See various dashboards
    - See current progress in pending recruitment processes

Skills

  • Languages

    Python 3, Python, Python 2, SQL, PCF, HTML, CSS, GraphQL
  • Frameworks

    Django, Scrapy, Django REST Framework, Spark, Selenium
  • Tools

    Celery, Amazon SQS, PyCharm, Amazon Cognito, RabbitMQ, Jenkins
  • Platforms

    AWS Lambda, Docker, Linux, Amazon Web Services (AWS)
  • Other

    APIs, Scraping, Data Extraction, Web Scraping, Web Crawlers, AWS
  • Libraries/APIs

    Pandas, Node.js, Vue.js, Tika, Vue.js 2, PySpark, Asyncio
  • Paradigms

    ETL, REST
  • Storage

    MySQL, PostgreSQL

Education

  • Bachelor of Science degree in Computer Science
    2014 - 2018
    Polish-Japanese Academy of Information Technology - Warsaw, Poland

Certifications

  • Amazon Web Services Solutions Architect Associate
    DECEMBER 2019 - DECEMBER 2022
    Amazon Web Services (AWS)
  • Amazon Web Services Cloud Practitioner
    JULY 2019 - JULY 2020
    Amazon Web Services (AWS)

To view more profiles

Join Toptal
Share it with others