Rajeshwar is available for hire

Rajeshwar Agrawal

Verified Expert in Engineering

Back-end Developer

Location

Jabalpur, Madhya Pradesh, India

Toptal Member Since

March 23, 2021

Rajeshwar is a seasoned data engineer with 9+ years of experience working on back-end and big data projects in the fintech, autonomous vehicle, and online commerce sectors. He has served as the primary system designer and developer for multiple apps, services, and platform initiatives spanning multiple teams. Rajeshwar possesses expertise in developing solutions for distributed systems, data platforms, data lakes, and ETL applications from the ground up.

Software Development Back-end Cloud Architecture Distributed Systems Software Engineering Cloud Parquet Data Engineering Multithreading Big Data Data Warehousing FastAPI Python REST APIs Git Autonomous Vehicle Software Refinitiv Data Compression AWS Athena

Portfolio

SteelEye

Data, Big Data, Big Data Architecture, Data Management Platforms...

SteelEye

Python, Elasticsearch, Big Data, REST APIs, Microservices...

Motional

Terraform, Amazon Virtual Private Cloud (VPC), Networking...

Experience

Microservices Architecture - 8 years Python - 8 years Back-end - 8 years Distributed Systems - 6 years Parquet - 5 years Java - 5 years Data Engineering - 5 years Event-driven Architecture - 5 years

Availability

Part-time

Preferred Environment

Ubuntu, JetBrains, Slack, Linux, MacOS

The most amazing...

...data platform I've created ingests millions of trade and communication surveillance data and provides complete observability of the data pipeline.

Work Experience

Senior Software Engineer

2021 - PRESENT

SteelEye

Designed the system architecture for a data platform using Netflix Conductor, enabling developers to build and deploy workflows on a managed Kubernetes platform.
Built a platform capable of handling multi-tenant workloads on serverless (lambda) Kubernetes deployments.
Enabled ready-to-use scaling, observability, continuous deployment, and streaming capabilities on the data platform.
Achieved a 60% reduction in AWS infrastructure cost on the newly designed data platform.
Built observability capabilities using Grafana alerts, Kafka events, Sentry, Prometheus, and AWS CloudWatch metrics, providing real-time monitoring and failure notification of the entire platform and services.
Optimized an ETL pipeline to gather Refinitiv market data for trading instruments by 80%.
Built complete CI on monorepository using pantsbuild and github actions, which enhanced developer experience.

Technologies: Data, Big Data, Big Data Architecture, Data Management Platforms, Event-driven Architecture, Prefect, Conductor, Netflix OSS, Orkes, Apache Airflow, Management, ETL, Refinitive API, Architecture, Cloud Architecture, Data Engineering, Parquet, Data Pipelines, Python 2, Distributed Systems, PostgreSQL, Algorithms, API Development, Kubernetes, FastAPI, GraphQL, Data Management, NumPy, Pytest, Containerization, Asyncio, Python Asyncio, Redis, Testing, GitHub Actions, Continuous Delivery (CD), Continuous Integration (CI)

Senior Software Engineer

2021 - 2021

SteelEye

Developed and re-engineered Python FastAPI REST endpoints, which improved back-end API performance.
Developed RESTful microservice APIs using FastAPI to split a monolith back-end API.
Improved the performance of Elasticsearch queries for fetching regulatory reporting and financial services data.

Technologies: Python, Elasticsearch, Big Data, REST APIs, Microservices, Event-driven Architecture, FastAPI, Architecture, Cloud Architecture, Data Engineering, Parquet, Data Pipelines, Python 2, Distributed Systems, PostgreSQL, Algorithms, API Development, Kubernetes, Data Management, NumPy, Pytest, Containerization, Asyncio, Python Asyncio, Testing

DevOps Engineer

2018 - 2020

Motional

Spearheaded the usage of infrastructure-as-code methodology by implementing Terraform for the deployment of data platform infrastructure.
Improved security and access to a data platform by engineering the design of AWS VPC network cloud infrastructure.
Improved the QoS of deployed applications and services on AWS by deploying monitoring tools like Datadog and Amazon CloudWatch.
Simplified application build and deployment process by implementing shell scripts for PyPI packages, Docker images, EMR clusters, and several AWS deployments.

Technologies: Terraform, Amazon Virtual Private Cloud (VPC), Networking, Amazon Web Services (AWS), Monitoring, Amazon CloudWatch, Datadog, Shell, Microsoft SQL Server, PyPI, Apache Maven, Docker, Deployment, AWS Deployment, Scalability, Software Architecture, Object-oriented Design (OOD), Object-oriented Programming (OOP), Databases, Software Implementation, Python 2, Kubernetes

Back-end Engineer

2018 - 2020

Motional

Enhanced access to autonomous vehicle test rides by developing serverless, auto-scaled back-end REST APIs in C# ASP.NET Core, Python Flask, and Django on AWS Lambda, which improved access to data warehouse, ETL results, test ride KPIs and metadata.
Accelerated runtime of aggregate queries on millions of rows of autonomous vehicle test ride KPIs by designing a MySQL database schema with views and triggers.
Devised SLA and quantified the performance of data ingestion and upload tools used for uploading terabytes of logs generated from autonomous vehicle rides every day. Benchmarked I/O performance of data ingestion NFS server using Fio utility.
Achieved test coverage of more than 90% by writing unit, functional, and e2e tests using frameworks like Python Pytest, Unittest, and Nose2; and C# NUnit and Fluent assertions.
Reduced access time to terabytes of autonomous vehicle ride data by 20% by designing an event-driven data pipeline consisting of AWS Kinesis to drive REST APIs, batch programs, ETL apps, AWS services, and data ingestor.
Leveraged AWS SQS messages to generate failure events from AWS services to provide actionable insights into the cause of failure.

Technologies: C#, ASP.NET, REST, REST APIs, Databases, MySQL, Relational Database Services (RDS), Serverless, Terraform, Amazon EC2, Amazon S3 (AWS S3), Python, Python 2, Python 3, Flask, Flask-Marshmallow, SQLAlchemy, PyMySQL, Docker, Spring Boot, Distributed Systems, Scaling, .NET Core, AWS Lambda, Software Engineering, Linux, APIs, Datadog, Data Engineering, Relational Databases, Back-end, PySpark, .NET, Django, Messaging, Cloud, JDBC, Agile, Amazon Elastic Container Service (Amazon ECS), ECS, Amazon Kinesis, Amazon CloudWatch, AWS Step Functions, State Machines, Autoscaling, Scrum, Maps, SSH, Infrastructure as Code (IaC), Infrastructure Monitoring, Django REST Framework, Scalability, Software Architecture, Object-oriented Design (OOD), Object-oriented Programming (OOP), API Integration, WebSockets, Event-driven Architecture, Kubernetes, Software Implementation, Lambda Functions, RESTful Microservices, Unit Testing, SOLID Principles, Serverless Architecture, Architecture, Cloud Architecture, PostgreSQL, Algorithms, API Development, Data Management, NumPy, Pytest, Containerization, Asyncio, Python Asyncio, Testing

Data Engineer

2018 - 2020

Motional

Enabled access to hundreds of KPIs from autonomous vehicle ride data, by developing 25+ ETL apps in Python PySpark, using pandas, IPython, NumPy, and SciPy, running on Spark EMR cluster.
Improved accessibility to complex autonomous vehicle ride data by developing a transformer that converted complex nested data structures to simple flat columnar format in Parquet, deployed as Docker container, running on EC2 autoscaling group.
Reduced the monthly cost of running ETL apps, by revamping the ETL pipeline consisting of Spark EMR cluster, AWS S3 data storage, EC2 computer cluster, and MySQL DB cluster.
Reduced running time of thousands of daily jobs on terabytes of data, by optimizing PySpark apps and tuning parameters of Spark, Yarn, and Hadoop on EMR cluster.
Redesigned data pipeline as a state machine on AWS Step Functions, which consisted of ETL apps, AWS services, serverless functions, and data ingestor. This reduced cost of maintaining the pipeline, and reduced downtime of data platform services.
Planned and oversaw the development of 20+ ETL apps, developed ETL design, helped with the development, performed code reviews, performance testing, and acceptance testing.
Improved accessibility to a data warehouse by developing a Jupyter notebooks SaaS using Sparkmagic running on EMR compute engine, which accessed parquet data warehouse via JDBC connection to AWS Athena.

Technologies: Python, Spark, Amazon Elastic MapReduce (EMR), MapReduce, Hadoop, C#.NET, REST, REST APIs, Microservices Architecture, Databases, SQL, Flask, Git, Amazon Web Services (AWS), Data Modeling, Big Data, Data Warehouse Design, Serverless, Terraform, Parquet, ETL, Data Pipelines, Data Engineering, Back-end, Apache Hive, Apache Avro, Data Compression, AWS Glue, Amazon Athena, C#, Distributed Systems, Scaling, Docker, AWS Lambda, Software Engineering, Linux, APIs, Datadog, Relational Databases, PySpark, .NET, Django, Data Wrangling, Messaging, Cloud, JDBC, Agile, Amazon Elastic Container Service (Amazon ECS), ECS, Amazon Kinesis, Redshift, Amazon Redshift Spectrum, Amazon CloudWatch, AWS Step Functions, State Machines, Autoscaling, Scrum, Maps, Pandas, Jupiter, Jupyter, Jupyter Notebook, IPython Notebook, SSH, Infrastructure as Code (IaC), Infrastructure Monitoring, Scalability, Software Architecture, Object-oriented Design (OOD), Object-oriented Programming (OOP), MySQL, Event-driven Architecture, Software Implementation, Lambda Functions, RESTful Microservices, Message Queues, Unit Testing, SOLID Principles, Serverless Architecture, Architecture, Python 2, Algorithms, Kubernetes, Data Management, NumPy, Pytest, Testing

Back-end Engineer

2014 - 2018

Works Applications

Migrated a large enterprise-grade Java EE monolith eCommerce web application to several small Java Spring boot RESTful API microservices.
Achieved an SLA of 1000 orders fulfilled per second, by creating an integrated order fulfillment pipeline. The pipeline integrated multiple Java Spring Boot microservices including order, cart, payments, shipments, customer, product, inventory, etc.
Integrated payment and shipment APIs from multiple third-party vendors in the eCommerce platform.
Designed, developed, tested, and maintained features and fixed bugs for Japanese eCommerce clients like Mitsubishi, Starbucks, Panasonic, etc.
Redesigned the system architecture of eCommerce's Java monolithic application into multiple Java Spring microservices consisting of domain-specific back-end APIs, including order, cart, payments, shipments, customer, product, inventory, etc.
Achieved test coverage of more than 90% for eCommerce features by performing unit, functional and e2e tests using Java JUnit test framework.
Designed data models in Cassandra and MySQL for the eCommerce platform.

Technologies: Microservices, Spring Microservice, Cassandra, RESTful Development, Jenkins, CI/CD Pipelines, Hadoop, REST APIs, Java, Microservices Architecture, Databases, SQL, Spring, NoSQL, Git, Amazon Web Services (AWS), Data Modeling, Apache Kafka, REST, Back-end, Distributed Systems, Scaling, Software Engineering, Linux, APIs, Relational Databases, eCommerce, Java 8, Java EE, Messaging, Cloud, JDBC, Autoscaling, Elasticsearch, SSH, Apache Cassandra, Scalability, Software Architecture, Hibernate, Object-oriented Design (OOD), Object-oriented Programming (OOP), Docker, MySQL, API Integration, Enterprise Resource Planning (ERP), Event-driven Architecture, Software Implementation, Lambda Functions, RESTful Microservices, Message Queues, Unit Testing, JUnit, SOLID Principles, Architecture, Cloud Architecture, Python 2, Algorithms, API Development, Testing

DevOps Engineer

2016 - 2017

Works Applications

Reduced build time by up to 75% for Java Spring web applications by tuning JVM and Maven build parameters.
Incorporated git workflow in eCommerce development by migrating its entire source code from SVN to Git. It resulted in an overall improvement in the merge process, code reviews, and team collaboration.
Improved release and continuous delivery process by migrating Java build artifact deployment from SVN to maven repository hosted on JFrog.
Developed Jenkins CI/CD pipelines for eCommerce microservices written in Java Spring boot.
Administered Jenkins cluster and Gitlab server on AWS EC2 instances.

Technologies: Jenkins, Jenkins Pipeline, Continuous Integration (CI), Continuous Delivery (CD), CI/CD Pipelines, Bash, Apache Maven, Java, Amazon Web Services (AWS), Git, Amazon EC2, Software Engineering, Linux, REST APIs, APIs, Relational Databases, Back-end, eCommerce, Java 8, Java EE, Cloud, DevOps, Autoscaling, GitLab, GitLab CI/CD, SSH, Scalability, Software Architecture, Object-oriented Design (OOD), Object-oriented Programming (OOP), Spring, Databases, Docker, Enterprise Resource Planning (ERP), Software Implementation, Python 2

Experience

Data Platform

A data platform capable of processing ETL trade and communication surveillance apps with millions of records. I was the primary designer and developer of the platform's architecture and components. The new platform resulted in a significant reduction of cloud computing costs associated with the ETL apps and improved observability and monitoring of the entire platform's data pipelines.

Refinitiv Market Data Ingestion

As the main developer, I built a data pipeline for ingesting and processing Refinitiv's market data (security quotes) using Parquet storage format and Prefect batch jobs. This resulted in an 80% reduction in Refinitiv data ingestion and storage costs.

Data Warehouse

A data warehouse solution that allowed users to easily access autonomous vehicle rides data via SQL queries.

• Enhanced accessibility to test ride data by storing it in Parquet format and enabling SQL queries via Amazon Athena.
• Converted terabytes of data for hundreds of daily test rides into Parquet format.
• Designed the transformation pipeline as a state machine on AWS Step Functions, which enhanced the debugging and troubleshooting process.

Streamable Logs

This project improved the accessibility of large non-streamable data from autonomous vehicle test ride logs.

• Enhanced data accessibility by converting it to row-based Avro format, which allowed seeking logs via timestamps.
• Ingested 20TB of logs daily using a data pipeline on AWS Step Functions.
• Enabled 3.1Gbps encoding and compression speed.
• Featured no-lag seeking of the logs using a highly concurrent Python app for streaming and decoding.

Order Fulfillment System

A back-end system for an eCommerce platform that integrated multiple APIs to fulfill orders placed on the eCommerce website.

• Developed and integrated microservices for orders, payments, shipment, and cart services using Java 8 and Spring Boot.
• Engineered an event-driven architecture system design for async communication between multiple microservices of the eCommerce platform. Used Kafka messages to connect the platform's different services.
• Achieved an SLA of 1,000 orders fulfilled per hour by designing autoscaling rules for horizontal scaling of a distributed system of microservices.

AWS S3 File Downloader and Uploader

A Python desktop app for downloading and uploading files from/to Amazon S3, allowing users to reliably download large files from S3 (approx. 500GB per file) at the highest possible speeds.

• Achieved the maximum physically possible download speed for a given storage and network IO speed by
– bypassing OS page cache through the use of the advanced flag for Python's "open" function;
– configuring an optimal size of IO chunk size through the benchmarking network and storage IO performance;
– implementing a highly concurrent network and storage IO engine using a combination of the ThreadPoolExector and ProcessPoolExecutor.
• Tracked memory usage using a combination of queues and counters to avoid OOM errors.
• Improved the reliability of the downloader by implementing retries with backoff at the IO chunk level, which retries downloading small chunks of data in case of failures.
• Used Python's "threading" library features like Events, Semaphores, Queues, and Lock to achieve high concurrency.

Achieved
• Download and upload speed of 125Mbps on 1Gbps LAN.
• 3.1 Gbps on m5dn.12xlarge EC2 instance.
• 60% better performance than the official AWS Python SDK.
• Improved user experience for downloading very large files (approx. 500GB).

SDK for ETL Apps

An SDK for developing ETL apps. The SDK allowed developers of other teams to quickly develop, deploy, and execute custom ETL apps on data platform infrastructure.

• Reduced app development time by developing a base Docker image containing an SDK for app development. Developers would then write a Docker image for their custom app.
• Enabled developers to leverage a scalable, secure, available, access-controlled, and monitored infrastructure to quickly develop and run ETL apps.
• Achieved highly scalable infrastructure for ETL execution through the use of EC2 auto-scaling groups.
• Reduced app integration time by developing a framework.
• Improved execution lifecycle visibility by implementing Slack notifications for execution results.
• Reduced time to debug and troubleshoot apps by enabling monitoring via Datadog and Amazon CloudWatch.
• Simplified the system architecture by architecting it as a state machine running on AWS Step Functions.
• Successfully launched more than 10 customized ETL apps on the SDK within three months, which would have earlier taken an estimated 2-3 months per app, and required considerable efforts from both the data platform team and external team.

Education

2010 - 2014

Bachelor's Degree in Computer Science

Indian Institute of Information Technology - Jabalpur, India

Skills

Languages

Python, Java, C#, Python 2, Python 3, Java 8, C#.NET, SQL, Bash, Batch, GraphQL

Frameworks

Hadoop, Spark, Spring Microservice, Spring Boot, .NET Core, ASP.NET, .NET, JUnit, Flask, Spring, Django, Django REST Framework, Hibernate

Libraries/APIs

REST APIs, PySpark, API Development, Slack API, JDBC, Pandas, NumPy, Asyncio, Python Asyncio, Flask-Marshmallow, SQLAlchemy, Jenkins Pipeline, Refinitive API

Tools

Git, Amazon Athena, Amazon Elastic MapReduce (EMR), Apache Maven, Pytest, Terraform, AWS Glue, AWS Step Functions, Amazon Elastic Container Service (Amazon ECS), Jenkins, Apache Avro, Amazon CloudWatch, Amazon Redshift Spectrum, GitLab, GitLab CI/CD, Jupyter, IPython Notebook, Amazon Virtual Private Cloud (VPC), Shell, PyPI, AWS Deployment, AWS SDK, Apache Airflow, Kafka Streams, Amazon Simple Queue Service (SQS)

Paradigms

MapReduce, Microservices, Microservices Architecture, ETL, Object-oriented Design (OOD), Object-oriented Programming (OOP), Unit Testing, Serverless Architecture, Testing, REST, Agile, DevOps, Event-driven Architecture, Continuous Integration (CI), Continuous Delivery (CD), RESTful Development, Scrum, Management

Platforms

Docker, Amazon Web Services (AWS), Amazon EC2, AWS Lambda, Linux, Java EE, Ubuntu, Kubernetes, Apache Kafka, Jupyter Notebook, MacOS

Storage

Amazon S3 (AWS S3), Relational Databases, Databases, Data Pipelines, MySQL, PostgreSQL, NoSQL, Cassandra, Apache Hive, Datadog, Redshift, Elasticsearch, Microsoft SQL Server, Redis

Other

Software Development, Parquet, Big Data, Data Engineering, Back-end, Distributed Systems, Software Engineering, APIs, eCommerce, Data Warehousing, Cloud, System Design, Software Architecture, Software Implementation, RESTful Microservices, Multithreading, SOLID Principles, FastAPI, Architecture, Cloud Architecture, Data Management, Containerization, Data Warehouse Design, Relational Database Services (RDS), Scaling, Algorithms, Amazon RDS, Slackbot, Data Wrangling, Containers, Autoscaling, Zstandard, ECS, SSH, Infrastructure as Code (IaC), Scalability, API Integration, Enterprise Resource Planning (ERP), Lambda Functions, Message Queues, CI/CD Pipelines, Data Compression, Serverless, Data Modeling, PyMySQL, Amazon API Gateway, Autonomous Navigation, Self-driving Cars, Messaging, SDKs, State Machines, Amazon Kinesis, Maps, Jupiter, Infrastructure Monitoring, Apache Cassandra, Networking, Monitoring, Deployment, WebSockets, Concurrency, Memory Management, Memory Mapped Files, Processing & Threading, Benchmarking, Memory Profiling, Data, Software Integration, Big Data Architecture, Data Management Platforms, Prefect, Conductor, Netflix OSS, Orkes, GitHub Actions

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring