Praveen Raju
Verified Expert in Engineering
Software Developer
Toronto, ON, Canada
Toptal member since July 22, 2024
Praveen is an experienced data engineer with eight years of experience designing scalable data solutions across various industries. He's proficient in big data technologies like Apache Spark, Hadoop, and AWS and specializes in data workflow development with Scala and Java. Skilled at Amazon RDS, Amazon S3 (AWS S3), EMR, Neptune, and Microsoft Power BI, Praveen reduces processing times and costs while enhancing business outcomes through Agile methodologies and DevOps practices.
Portfolio
Experience
- Hadoop - 7 years
- Apache Spark - 7 years
- Spark - 7 years
- Python - 7 years
- Apache Pig - 6 years
- Apache Airflow - 6 years
- Java - 6 years
- Scala - 6 years
Availability
Preferred Environment
Apache Spark, Apache Airflow, Java, Scala, Hadoop, Amazon Web Services (AWS), Azure, Apache Pig, MapReduce
The most amazing...
...outcome analysis tool I've developed using Apache Spark and Hadoop boosted data processing speed significantly.
Work Experience
Data Engineer
Lasik MD Vision
- Developed custom Apache Spark applications to efficiently manage real-time data streams, resulting in a substantial 37% reduction in data processing latency, markedly boosting performance and responsiveness.
- Built high-performance data processing applications using Scala, which led to a remarkable 40% increase in processing speed, significantly boosting system efficiency and performance.
- Implemented scalable data processing workflows using Hadoop, which led to a 44% increase in data processing throughput, significantly enhancing operational efficiency and data handling capabilities.
- Designed and implemented data orchestration workflows using Apache Airflow, enhancing data pipeline automation and reducing manual oversight.
- Integrated EMR with other AWS services like ASW S3 and Redshift, improving data flow and accessibility by 20%.
- Increased system reliability by implementing fault-tolerant data processing pipelines with Apache Spark.
- Optimized Apache Pig scripts for complex data transformations, resulting in a 30% improvement in processing efficiency.
- Designed and implemented robust data integration solutions in Java, improving system reliability.
- Conducted performance tuning of Hadoop and Spark jobs running on EMR, improving job execution times.
- Created robust and maintainable ETL pipelines using Scala, reducing data transformation errors.
BI Reporting Analyst
BSNL
- Migrated legacy data processing systems to Apache Spark, resulting in a 30% reduction in maintenance costs.
- Optimized existing Scala codebases, achieving a 25% reduction in execution time and resource consumption, significantly enhancing system performance and efficiency.
- Designed and implemented efficient MapReduce jobs for large-scale data processing, achieving a notable 45% increase in processing speed and efficiency and significantly accelerating data throughput and system performance.
- Implemented and managed Amazon Neptune instances to support graph-based queries, enhancing data retrieval speeds and improving complex data relationship analysis.
- Architected and managed data solutions on EMR, reducing infrastructure costs.
- Configured and managed Jetty servers to host large-scale web applications, enhancing server responsiveness and uptime.
- Optimized Apache Airflow configurations to improve the scheduling and execution of complex data tasks, increasing workflow efficiency.
- Developed and maintained robust data processing applications using Python, enhancing data analysis capabilities and reducing processing time.
- Enhanced Apache Airflow configurations to improve the scheduling and execution of complex data tasks.
- Leveraged Scala to build scalable microservices for data integration and enhance system scalability.
BI Engineer
Paramount Airways
- Developed custom operators and directed acrylic graphs (DAG) in Apache Airflow, improving pipeline customization and extending functionality by 30%.
- Optimized Java code for data-intensive applications, resulting in a 50% reduction in execution time.
- Implemented data analytics and processing workflows using EMR, reducing costs and optimizing resource allocation.
- Designed and managed distributed storage solutions with Hadoop, improving data storage efficiency.
- Introduced advanced data analytics algorithms in Scala, improving predictive model accuracy.
- Conducted performance tuning and resource optimization for Apache Spark clusters, enhancing cluster utilization.
- Integrated Apache Spark with other big data tools to increase data processing efficiency.
BI Junior Data Engineer
Accenture
- Implemented data caching strategies to improve the performance of BI reports and dashboards.
- Developed and maintained data governance frameworks to ensure compliance with industry-specific regulations such as SOX and PCI DSS.
- Automated data validation and cleansing processes using Apache Spark, reducing data errors.
- Migrated legacy ETL processes to Scala, reducing processing time by 45%.
- Created custom MapReduce jobs in Hadoop to increase data processing speed.
- Built custom data validation and cleansing tools in Java to reduce data errors.
Experience
Outcome Analysis Tool
Call Center Performance Tool
Analytics Visualization Solution
Analytics on Call Detail Records
Education
Bachelor's Degree in Electronics and Communication Engineering
Anna University - India
Skills
Libraries/APIs
PySpark
Tools
Apache Airflow, AWS Glue, AWS Step Functions, Amazon Elastic MapReduce (EMR), Oozie, AWS CodeBuild, Amazon SageMaker, Microsoft Power BI, Apache Maven, Amazon Athena, Jetty
Languages
Scala, Python, SQL, T-SQL (Transact-SQL), Java
Frameworks
Apache Spark, Hadoop, Spark
Paradigms
MapReduce
Platforms
Amazon Web Services (AWS), Databricks, Kubernetes, Linux, Apache Pig, AWS Lambda, Azure
Other
Data Engineering, Amazon Neptune, Software, Electronics, Data Communication, Amazon RDS, Electronic Medical Records (EMR), SOX, PCI DSS
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring