Ganesh Jujjuru
Verified Expert in Engineering
Data Engineer and Developer
Hyderabad, Telangana, India
Toptal member since April 24, 2024
Ganesh is a professional with over 13 years of experience in data architecture, solution architecture, data analytics, and application development. He possesses expertise in enterprise architecture, including TOGAF, Zachman, and creating customer frameworks. Ganesh is a skilled cloud engineer with data governance qualifications and data science expertise. He also has experience working on prescriptive and exploratory analysis, including ChatGPT, and is enthusiastic about his next venture.
Portfolio
Experience
- Informatica - 10 years
- Data Engineering - 10 years
- MDM - 8 years
- Data Governance - 7 years
- Azure - 4 years
- Spark - 3 years
- Amazon Web Services (AWS) - 2 years
- Data Science - 1 year
Availability
Preferred Environment
Azure, Collibra, Informatica, Snowflake, Reltio, Data Science, TOGAF, Amazon Web Services (AWS), Odoo, Profisee MDM
The most amazing...
...thing I've done is build and lead a team of 19 associates in data governance and Unity integration for a supply chain organization with an approved SOW.
Work Experience
Enterprise Architect
Data Pride Solutions Private Limited
- Demonstrated expertise in strategy roadmaps and became a trusted advisor to customers with knowledge in enterprise architecture, data architecture, data management, data governance, data science, and AI-driven functions.
- Drove end-to-end execution of the solution, including identification, interaction with the client, requirement analysis, workflow solution development, statement of work (SOW) preparation, solution customization, configuration, and implementation.
- Worked on the SOW and the active mobilization of the project map for 12 associates at MobileOrg. Aligned the financial organization's roadmap for data governance and data lake migration modules. Approved the SOW for six quarters with six associates.
- Led a team of 19 associates in data quality and Unity integration for a supply chain organization with an approved SOW for 7 quarters. Led a team of 7 associates for an FCMG client in the UK on data governance with an approved SOW for 6 quarters.
- Acted as a data science and governance AI engineer for a gas and power organization.
- Contributed to working on the networking organization, handling data governance, data lake, and enterprise data warehouse (EDW) projects.
- Handled master data management and policy management at a networking organization.
- Collaborated with a supply chain organization on data quality and EDW projects and with a manufacturing organization on data governance and data lake projects.
- Engaged as a data architect for a financial organization, working on a data lake migration project.
- Carried our data governance and data engineering, reordering the unclustered domains for a fast-moving consumer goods (FMCG) organization.
Data Engineering and Governance Architect
Advance Auto Parts India
- Aligned the roadmap of data governance and gained the approval of the Architecture Review Board (ARB) on the model and process design, identifying the data fabric and ontology of the existing integration.
- Developed conceptual and logical information models within the context of the enterprise and line of business information architecture.
- Unified the governance and centralized data engineering framework development.
- Facilitated the data governance council for CDEs sourced from the enterprise data lake. Implemented Collibra Data Governance Center (DGC) workflows to enable data management capabilities to leverage Collibra API and Connect to adjacent platforms.
- Utilized Reltio MDM for match/merge, data stewardship, hierarchy patterns, rules finalization, enrichment, and consolidation. Integrated with Axiom for household and address validation.
- Leveraged Dun and Bradstreet (DNB) for customer enrichment to eliminate duplicate super Data Universal Numbering Systems (DUNS).
Principal Architect for Data Governance
F5
- Worked on the solution architecture for rules and the business glossary in data management, which includes data governance, data quality, metadata management, reference management, and master data management (MDM).
- Carried out the solution for "one source of truth" on data and process aspects. Utilized the data vault approach of data modeling on data aspects with Snowflake for source scope and conformed dimensions with the integration of MDM.
- Created point-to-point data flows on the process aspect, including the hub of Snowflake, Oracle DBs, SQL DBs, Dell Boomi end-end Integrations, Apigee messaging queues, Salesforce, Marketo, Workday, Cornerstone, OBIEE, and MSBI reporting platforms.
- Built the enterprise data warehouse for order management within SQL Server and later migrated to a data lake and Snowflake, using Azure Data Factory and WhereScape as data ingestion platforms.
- Used Fivetran as an accelerator until streaming and batch processing were established. We then utilized dbt and ADF for transformation within Azure ADLS, Synapse, and Snowflake to move the data across the Raw, EH, and EUH layers and curate it.
- Incorporated custom transformation templates, including macros, LookML, Lift, and shift into dbt modeling for creating materialized incremental views, snapshots-dimensions, and facts.
- Contributed to building a customer 360-degree view and building customer MDM, including product MDM, invoices, order management, SKU lifecycle, customer success, and campaigns.
- Created API for outside integration and integrated the customer with DNB and Axiom to identify the industry Standard Industrial Classification (SIC) codes and build the hierarchy using the Data Universal Numbering System (DUNS).
- Developed rich graphical representations and visualizations in dashboards to create live charts, bar charts, multi-cards, treemaps/hierarchical maps, and trends within PowerBI and DAX queries and canonical views.
- Constructed 360 views of the customer with Tableau references to lead generation, product enablement, lifecycle changes, and order history.
Senior Data Engineer
Dell
- Worked on building enterprise rules that can be utilized across different modules to track data quality dimensions, including completeness, consistency, accuracy, data decay, uniqueness, referential integrity, and logical total passing metrics.
- Built the gateway to store the scorecard and profiling information in the reporting layer, which reports KPIs' data lineage to Collibra and Tableau.
- Handled the ETL design for the integration framework for data lineage of Collibra and Tableau reports.
Data Engineer
Cognizant
- Acted as a senior ETL developer and prepared the initial business requirement documents, including estimation and technical design documents with data modeling and getting sign-off from the client for MDM, IDQ, and data integration projects.
- Worked with Informatica to utilize various transformations, including XML, HTTP, Salesforce lookup, Java, Match, and Parser within Informatica Data Quality (IDQ). Contributed to synchronization and replication mappings within the cloud environment.
- Managed different modules within the project, including Network Access Protection (NAP), Ads, Serenity, and Hub Console.
System Engineer
Tata Consultancy Services
- Conducted gap analysis of the multiple source systems and integrated them with extract, transform, load (ETL) development using Informatica and Abnitio.
- Worked on Informatica Integration Cloud Services (IICS) and Data Integration Hub (DIH).
- Used trusted data to provide error-free reports in a timely and consistent manner.
Experience
Nokia – Data Governance, Data Lak, and Enterprise Data Warehouse (EDW) Projects
• Built a solution architecture for rules and a business glossary within data management encompassing data governance, reference management, and master data management.
• Implemented a solution for achieving "one source of truth" for data and process aspects.
• Utilized a data vault approach to data modeling within Snowflake, integrating source-scoped dimensions with conformed dimensions and incorporating master data management and reference management.
• Established data warehouse and data lake structures, building the enterprise data warehouse for order management in SQL Server and transitioning to the data lake, Snowflake, using ADF and WhereScape as the data ingestion platform.
• Leveraged Fivetran as an accelerator for one-time data transfer until streaming and batch processing were established.
• Employed DBT and ADF for transformation within Azure ADLS, Synapse, and Snowflake to facilitate data movement across Raw, EH, EUH, and curated layers.
• Built custom transformation templates, including Macros and Lift, and shift in DBT modeling, creating materialized incremental views, dimensions, and facts.
• Worked on a customer 360-view, including customer MDM, product MDM, invoices, and customer success.
Shell – Data Science and Governance AI
• Implemented a data strategy and governance framework with clearly defined roles and responsibilities.
• Provided thought leadership by addressing business problems and guiding on functional and technical aspects.
• Achieved compliance with the General Data Protection Regulation (GDPR) by collaborating with the information security team.
• Established a metadata catalog, including classification, dependencies, and impact using Alation for data sources, including Azure Synapse, ADF, and Postgres.
• Conducted profiling using native scanners and created customized profiles for Synapse, Cosmos DB, Parquet, and Avro files, driving identical profiling for duplicate remediation.
• Built customized dashboards for business products and enrichment and access policy management via Business Process Model and Notation (BPMN) workflows and ServiceNow.
• Formed talented data science teams that created AI/ML and generative AI data products.
• Estimated the composition of hydrocarbons in oil using genetic algorithms with end-to-end product development using Scala-Spark.
• Implemented predictive maintenance for machines using sensor data.
• Built graph anomaly detection models utilizing tailor-made algorithms for rate engine applications.
Accolite – Data Quality and Enterprise Data Warehouse (EDW) Projects
• Implement a centralized data quality and reconciliation framework.
• Develop a data quality framework using Informatica Data Quality (IDQ) and Python scripts.
• Create a reconciliation framework for the stock-keeping unit (SKU) and customer lifecycle.
• Design conceptual and logical information models for the enterprise and business information architecture.
• Ensure compliance with DW/BI standards and guidelines for developing information models and database designs.
• Collaborate with the DBA to translate the logical information model into a preliminary logical database design or a physical information model for the target DBMS.
• Generate DDL and DML scripts to load data into BigQuery from GCS buckets.
• Formulate ETL and ELT strategies using SQL scripts to load data into BigQuery, leveraging GCP Composer airflow.
• Establish BigQueryz data sets, tables, and pipelines for storing processed results, configuring services storage, and BigQuery using Cloud Shell in GCP.
• Develop audit tables for reconciliation and metadata tracking.
• Utilize Airflow to orchestrate workflows between internal components in GCP.
• Use the Dataplex data governance tool to establish data lineage for all Composer and Data Fusion pipelines.
Future Focus – EDW Migration to Data Lake Project
• Identify point-to-point integrations, API gateways, and downstream analytics within the existing architecture during discussions with business analysts, subject matter experts, architects, and project owners.
• Design the data lake in Snowflake and the landing zone in AWS S3.
• Create data ingestion processes with reusable frameworks.
• Develop data ingestion pipelines utilizing various AWS cloud services, including Lambda, Step Functions, CloudWatch Events, Simple Notification Service (Amazon SNS) notifications, S3, EC2, Python Boto3 SDK, Athena queries, IAM roles, policies, AWS Glue, and notebooks.
• Gained expertise in Unified Data Analytics with Databricks, managing the Databricks workspace UI, Databricks Notebooks, Delta Lake with Python, and Delta Lake with Spark SQL.
• Design and develop Spark programs/code using Scala, Java APIs, Hive, and HBase.
• Work in cloud technologies, specializing in designing data lakes in Snowflake and MongoDB.
• Create Hive queries and Pig scripts for data analysis, transfer, and table design.
• Develop data pipelines and real-time streaming using Kafka, Spark Streaming, and Flume, including topics and producers.
• Implement MapReduce jobs using Java APIs and Pig Latin.
Mars – Data Governance and Data Engineering Reordering Unclustered Domains
• Develop reusable data pipelines with Azure Data Factory (ADF) and Databricks for delta processing across domains, ensuring code consistency.
• Organize Azure Data Lake Storage (ADLS) directories within BDAT domains and banners.
• Reorganize ADF pipelines and Databricks code based on the ADLS layer.
• Migrate pipelines for the curation layer and configure batch process parameters for bronze layer ingestion and silver layer processing using reusable code.
• Implement data quality metrics at Hive Metastore (HMS) during the data transfer from the bronze to the silver layer using DQ Soda and Databricks.
• Establish partner mobile device management (MDM) and customer master data management (MDM) to monitor the customer lifecycle from campaign to churn usage.
• Utilize Profisee MDM and configure Purview scans on ADLS, ADF, Databricks, PowerBI, Soda DQ, and Profisee MDM.
• Address Purview limitation issues related to lineage using Atlas APIs, including duplicate asset creation, unresolved names, view lineage, DDL, and PowerBI scanner.
• Customize workflow templates for self-read access to governance interface sources for access management.
Pennymac – Data Lake and Governance
• Identify the integration's data fabric and ontology and close the gaps for real-time and batch integration using Informatica Intelligent Cloud Services (IICS), application integration, data integration, and mass ingestion for one-time loading.
• Align the roadmap of extract, transform, and load (ETL) and extract, load, and transform (ELT) and streaming using multiple statements of work (SOWs) on proof of concept for the technology stack of Fivetran, including Kafka and Azure Data Factory (ADF).
• Discover the resource skills required to fill them with the competent skills.
• Onboard the resources at the client end from 1 to 9 associates within 6 months, manage deliverable tracking, and guide the technical team.
• Identify the enterprise architecture roadmap, design the governance strategy, and create the technical and business model framework.
• Provide a platform for various applications by transforming and sharing financial messages of different formats.
• Identify the business stakeholders and approvers for the asset changes and data models.
• Capture the business glossary and hierarchical terms on the business and technical end.
• Categorize domains using BDAT terminology and identify sensitive data PII and non-PII.
Education
Bachelor's Degree in Electronics and Computer Engineering
Nagarjuna University - Guntur, India
Certifications
The Open Group Architecture Framework (TOGAF)
Open Group
Skills
Tools
Collibra, Informatica Sub Version, Power BI Desktop, Microsoft Power BI, Amazon Athena, Odoo, Kafka Streams
Languages
Snowflake, Python, SQL, Scala
Frameworks
TOGAF, Spark
Paradigms
Requirements Analysis, Business Intelligence (BI), ETL
Platforms
Azure, Reltio, Amazon Web Services (AWS), Databricks, Azure AI Studio, Google Cloud Platform (GCP), Apache Kafka
Storage
Master Data Management (MDM), Data Pipelines, Dell Boomi
Other
Computer Engineering, Informatica Data Quality, MDM, Azure Databricks, purview, Enterprise Architecture, Data Engineering, Data Governance, Electronics, Azure Data Lake, Azure Data Factory, Data Analysis, Agile Project Management, Technical Project Management, Data Architecture, Data Management, Analytical Thinking, Business Requirements, Data Warehousing, Data Modeling, Data Analytics, Data Science, Profisee MDM, Data Build Tool (dbt), WhereScape, Retrieval-augmented Generation (RAG), Machine Learning, Artificial Intelligence (AI), Informatica, Data Quality, Informatica Cloud, EndNote, Large Language Models (LLMs)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring