- Associate Software EngineerTrifacta Inc.2016 - 2017
Technologies: Node.js, Java, Python, C++, Docker, Google Cloud Storage, Google Dataflow, BigQuery
- Wrote and optimized algorithms for computing data transformation primitives on GCP’s Dataflow engine for parallel data processing.
- Developed a time scheduling microservice based on Java Quartz, designed for high availability and resilience.
- Integrated Google’s BigQuery large-scale data warehouse into the product, spanning multiple back-end services (Node.js, Java, Python) and the platform’s web application interface (front-end and back-end).
- Research InternMax Planck Institute for Informatics2015 - 2016
Technologies: Java, Scala, Apache: HDFS, MapReduce, Spark SQL, Pig, Avro, Parquet.
- Built a Java tool for exporting Wikipedia’s full edit history XML dumps (+10TB uncompressed) into Avro format.
- Extracted the full link structure of all +37M pages and +640M revisions in Wikipedia’s edit history.
- Wrote a data processing pipeline for Apache Spark SQL engine to compute Jaccard-type semantic relatedness scores between pages and various page popularity metrics.
- Software Engineering Intern2015 - 2015
Technologies: Blaze, Piper, Java, Guice, FlumeJava, Borg
- Wrote a FlumeJava distributed processing pipeline for detecting book series from messy or incomplete book metadata.
- Set up automatic deployment for the developed pipeline using Borg for daily extraction.
- Executed extraction on data provided by major book partners yielding +1500 book series.
- Freelance Software EngineerData Extraction Freelance Projects2013 - 2014
Technologies: PHP, MySQL, Python, Scrapy Framework
- Created a stand-alone tool for continuous, high-performance web data extraction jobs. Written in PHP and multi-cURL for leveraging multiple asynchronous requests, the tool harvested millions of entries per day producing a MySQL database as output.
- Developed multiple customized web crawlers using Python's Scrapy Framework, later deployed to the cloud for autonomous periodic execution.
- Web DeveloperArtfos SA2012 - 2012
- Developed and maintained CRUD applications with a standardized development process.
- Launched a PHP continuous integration server based on JenkinsCI.
- Wrote automated end-to-end tests with Selenium IDE.