Data Engineer
2019 - 2021Fortune 500 Retail Company (via Toptal)- Redesigned Spark application in order to make it more robust and flexible for data scientists and software engineers.
- Researched to increase Spark S3 parquet write performance. In order to analyze Spark jobs I have used Spark History Server and Ganglia to pinpoint where time is spent and to compare configuration fixes.
- Implemented a robust and generic test framework for Spark pipelines.
- Redesigned Spark application from a monolith into a modular Spark application which writes intermediate results and can be run in parallel.
- Built a Spark application for ingesting 1TB of data from multiple sources and generate 40k possible features based on which the data science team can perform EDA and test models.
- Implemented new application APIs in order to help other teams improve their productivity.
Technologies: Jupyter Notebook, Pandas, Amazon Web Services (AWS), Parquet, Python, PySparkPython Developer
2018 - 2019Reconstrukt(via Toptal)- Implemented a concurrent orchestrator for a real-time video rendering system using Python Tornado.
- Used HTTP, Websockets, raw TCP connections and AWS S3 and NAS storage to build the pipelines needed to process content.
Technologies: Amazon Web Services (AWS), Asyncio, Tornado, PythonData Engineer
2016 - 2018Spyhce- Matched mutable objects (which users can create/update) with other millions of immutable objects in real time (or as close as possible) by creating three Spark-based apps. Additional details about the project can be found in my portfolio section.
- Built a task manager Django application over Celery. The application allows administrators to easily manage tasks and view progress/statistics without an additional monitoring service.
- Developed a Django audit application over the Django ORM in order to keep track of all of the user actions.
Technologies: Jenkins, Docker, Cassandra, Redis, Elasticsearch, PostgreSQL, Apache Kafka, RabbitMQ, Celery, Django, Python, SparkSoftware Engineer
2012 - 2016Hewlett-Packard- Developed a Python based build system for a virtual appliance that allows HP customers to deploy the product into production with little effort.
- Maintained the project.
- Introduced software/patch time-windowed installation that the server agents use in order to avoid loading/rebooting the server during critical hours.
- Led the upgrade from SSL to TLS between all server and client components.
- Redesigned the strategy that the server agents use to select the IP address in order to communicate with the core components.
- Legacy code refactor in order to support custom installation path for the Windows server agent.
Technologies: OpenSSL, Windows, Unix, Linux, PostgreSQL, Oracle Database, C++, Python, Spring, WebLogic, WildFly, JavaSoftware Developer
2012 - 2012GFI Software- Improved download speed for patches by using the cache of LAN neighbors.
- Enhanced the build system for a better UX.
- Maintained the project.
- Redesigned the product legacy architecture in order to easily extend with new features.
- Added a feature for discovering Android and iOS devices inside the LAN.
Technologies: Microsoft SQL Server, Delphi, .NET, C#, C++