Verified Expert in Engineering
Data Engineer and Developer
Salman is a GCP-certified professional data engineer specializing in building, maintaining, and optimizing warehouse and data pipelines with cost efficiency in mind. He has 5+ years of experience in fintech, eCommerce, and sourcing industries. Salman helps businesses store and process data in the best ways possible to convert it into actionable insights and predictions.
Windows, Linux, Jupyter, Slack
The most amazing...
...thing I've developed is a search engine for MOOCs, courses, and jobs from all around the web.
- Integrated multiple 3rd-party sources into BigQuery for dashboarding and analysis.
- Developed multiple data pipelines to clean and transform data according to business needs.
- Set up alerting and monitoring for pipelines and infrastructure so that relevant stakeholders can be notified whenever any issue arises.
- Designed workflows in Airflow to automate different processes.
- Sped up dashboards using different techniques, such as materialized views.
- Upgraded and maintained Cloud SQL for MySQL instance for optimal performance.
Big Data Engineer | Consultant
- Wrote Spark, MapReduce, Hive, and Python scripts for distributed data processing.
- Extracted products from eCommerce store reviews and predicted their sentiments using Python, NLTK, spaCy, and Flair.
- Helped generate reports and articles related to big data, data engineering, and data science.
- Developed different Big Query ETLs, scheduled them using Airflow, and built some nice visualizations on Google Data Studio.
- Integrated several third-party sources with Big Query, including CRM, Asana, Bevy, etc., using stitch data, custom scripts, and APIs/webhooks.
- Optimized already running ETLs, which reduces the GCP cost and retrieval time by approximately 40-60%.
- Collaborated with a team to build pipelines that collect, clean, and analyze different eCommerce brands' data.
- Managed the Airflow server for optimal performance and set up a proper process to synchronize it with the GitHub repository.
- Optimized an already-running ETL with reduced network and CPU usage.
- Normalized and converted NoSQL (JSON) data to relational forms.
- Created a process to automate the regex-based pattern matching for products from their warehouse.
- Wrote multiple scrapers and parsers to fetch and transform data into its best form.
- Set up data pipelines on Databricks to scrape different public sites and clean and ingest data into the warehouse.
- Wrote transformation pipelines on Spark and Spark SQL to clean and transform data for further analysis.
Big Data Consultant
- Developed and automated fault-tolerant ETL pipelines in multiple banking streams to load data into a dimension layer that is then provided to the BI team and relevant departments for report generation and decision-making.
- Worked with clients to translate business problems into quantitative queries and collect/clean the necessary data.
- Optimized Informatica/SSIS workflows and improved the performance of ETL pipelines that process GBs of data daily.
- Developed non-skewed data marts on Teradata, where daily ETLs can be run easily.
- Built an app called Data Robot to search for accounts and entities in a data lake using fuzzy search, which helps multiple departments in their ongoing operations. The manual process that preceded this product used to take weeks.
- Automated regulatory returns through a data lake, using big data and reporting tools, which helps in making the process fast, dynamic, and scheduled instead of doing it manually in Excel.
- Built an app called ScoreCard to automate Spark reports for anomalous transactions, accounts, and agents. It helped the bank score their app agents to reward and improve their services.
- Developed processes to check for ETL anomalous behavior and alerting capabilities to ensure the data is loaded correctly and on time.
Movie Review Sentiment Analysis (Kernels Only)https://www.kaggle.com/code/pantherpanther/simple-1d-cnn-with-glove-twitter-embeddings?scriptVersionId=10076590
Assessments for Udemy Business Pro
Impala, BigQuery, Apache Airflow, GitHub, Informatica ETL, Stitch Data, Looker, Amazon Elastic MapReduce (EMR), Jupyter, Slack, Microsoft Power BI, Tableau, MySQL Workbench, Microsoft Excel, Confluence, Jenkins, Google Analytics, AWS Glue
Database Development, ETL, Automation, Data Science, Business Intelligence (BI)
Jupyter Notebook, Databricks, Google Cloud Platform (GCP), AWS Lambda, Oracle, Windows, Linux, Docker, Kubernetes, Azure, Amazon Web Services (AWS), Amazon EC2, Apache Kafka
Apache Hive, Data Pipelines, JSON, Databases, Relational Databases, RDBMS, MySQL, Teradata, Google Cloud Storage, PostgreSQL, MariaDB, SQL Performance, Database Administration (DBA), SQL Server Analysis Services (SSAS), Data Lakes, Redshift, SQL Server Integration Services (SSIS), Microsoft SQL Server, SQL Server Reporting Services (SSRS), Alibaba Cloud, Azure SQL Databases, Azure SQL, SSAS Tabular, Amazon S3 (AWS S3), Google Cloud SQL
Data Engineering, ETL Tools, Data Queries, ELT, Schemas, Data Transformation, CSV, CSV File Processing, Big Data, Data Warehousing, Informatica, Data Analysis, Natural Language Processing (NLP), Analytics, Data Analytics, Data Modeling, Query Optimization, Data Warehouse Design, API Integration, Integration, Performance Tuning, Automated Data Flows, Data Architecture, Change Data Capture, Data Processing, Relational Data Mapping, Google BigQuery, Amazon RDS, Consulting, Scaling, Reporting, CRM APIs, GPT, Generative Pre-trained Transformers (GPT), Visualization, Machine Learning, Scraping, APIs, Deep Learning, Data Visualization, Delta Lake, Business Intelligence (BI) Platforms, CI/CD Pipelines, Google Data Studio, Azure SQL Data Warehouse (SQL DW), Reports, Pub/Sub, Dashboards, Google Search Console, Artificial Intelligence (AI), Google Cloud Functions
Spark, Hadoop, Apache Spark, Flask, Django, Selenium
PySpark, Keras, Pandas, Matplotlib, Scikit-learn
Bachelor's Degree in Computer Science
National University of Computer and Emerging Sciences (FAST) - Pakistan
Professional Data Engineer