Wenjie Xu, Software Developer in New York, NY, United States
Wenjie Xu

Software Developer in New York, NY, United States

Member since May 21, 2020
Wenjie is an experienced data engineer with expertise in the management consulting industry. She has developed data classifiers and mining algorithms, built data entry systems to improve data quality and eliminate noise, KPI dashboards to improve recruiting performance, and deployed data pipelines to seamlessly manage big data. Wenjie is a strong information technology professional skilled in AWS products, PySpark, Python-Flask web development, integrations, and Python data analytics modules.
Wenjie is now available for hire

Portfolio

Experience

Location

New York, NY, United States

Availability

Part-time

Preferred Environment

Zeppelin, Jupyter Notebook, PyCharm

The most amazing...

...project I've developed, deployed, and managed was a robust, fault-tolerant data pipeline in AWS.

Employment

  • Data Analytics Engineer

    2017 - 2020
    PACO Technologies, Inc.
    • Built an internal data entry system by Python-Flask to improve data quality and eliminate data noise.
    • Automated the data acquisition process to reduce human errors significantly. Configured the server environment on AWS EC2 and RDS with reliable security groups.
    • Designed and developed complex SQL queries, Python script, and triggers for ETL jobs. Integrated and maintained data from a variety of sources, assuring they adhere to data quality and accessibility standards.
    • Generated bi-weekly Ad-Hoc data reports by CloudWatch, Lambda, and SQL/Excel to prevent manual queries.
    • Developed a KPI dashboard by Power BI to track company recruiting performance internally and facilitate the decision-making process.
    • Developed, deployed, and managed the data pipeline (DocumentDB, Athena, Redshift, S3, Lambda) that cleans, transforms, and aggregates unorganized and messy data into databases, allowing for seamless collection, storage, and management of big data.
    • Developed data classifiers, mining algorithms, and models for engineering documents sentiment analysis, topic mining, and data visualization.
    Technologies: Amazon Web Services (AWS), AWS, Spark, SQL, Python

Experience

  • ETL Project: Data Integration from CSV and XML to Relational Database (Development)
    https://git.toptal.com/Ivan-Ilijasic/wenjie-xu

    This was a PySpark-based ETL project developed using Spark-SQL and Python script to transform CSV/XML data into a relational database. Te project integrated data from various formats and sources into the data warehouse to ensure that target data adhered to data quality and accessibility standards. On top of the transformed data, I developed a dashboard using Power BI to provide actionable insights.

  • Web Scraping Using Scrapy (Development)
    https://github.com/xwjsarah/scraping/blob/master/homedepot.py

    I used the Scrapy framework to capture construction projects' bidding info on agency websites (MTA, Port Authority) to deliver clean and reliable bidding data to the marketing department. This data was used for further decision-making purposes. This project captured over 500+ records per day from various websites, which replaced manual searching and significantly improved work efficiency for the marketing team.

  • Python-Flask Data Entry System Development (Development)

    I built a Python-Flask-based data entry system independently to enable data quality check-up functionality and eliminate data noise. I automated the data acquisition process to reduce human errors significantly. A dynamic dashboard based on Power BI was also deployed on this system and tracked data trends and insights as new data fed in. This system was a configured server environment on AWS EC2 and RDS with reliable security groups.

Skills

  • Languages

    Python 3, SQL, Python
  • Tools

    Microsoft Power BI, AWS Glue, Spark SQL, PyCharm
  • Frameworks

    AWS EMR, Flask, Scrapy, Spark
  • Libraries/APIs

    Pandas, PySpark, Spark ML
  • Platforms

    AWS Lambda, Jupyter Notebook, Zeppelin, Amazon Web Services (AWS)
  • Storage

    AWS S3, Redshift
  • Other

    AWS

Education

  • Master's degree in Computer Science
    2016 - 2018
    Montclair State University - Montclair, New Jersey, USA

Certifications

  • AWS Certified Developer
    JANUARY 2017 - PRESENT
    Amazon Web Services (AWS)

To view more profiles

Join Toptal
Share it with others